**A Detection and Tracking Algorithm for Resolvable Group with Structural and Formation Changes Using the Gibbs-GLMB Filter**

#### **Xinfeng Ru, Yudong Chi and Weifeng Liu \***

College of Automation, Hangzhou Dianzi University, Hangzhou 310018, China; a840064210@hdu.edu.cn (X.R.); cyd0418@163.com (Y.C.)

**\*** Correspondence: liuwf@hdu.edu.cn

Received: 25 May 2020; Accepted: 12 June 2020; Published: 15 June 2020

**Abstract:** In the field of resolvable group target tracking, further study on the structure and formation of group targets is helpful to reduce the tracking error of group bluetargets. In this paper, we propose an algorithm to detect whether the structure or formation state of group targets changes. In this paper, a Gibbs Generalized Labeled Multi-Bernoulli (GLMB) filter is used to obtain the estimation of discernible group target bluestates. After obtaining the state estimation of the group target, we extract relevant information based on the estimation data to judge whether the structure or formation state has changed. Finally, several experiments are carried out to verify the algorithm.

**Keywords:** target tracking; group targets; GLMB; structure; formation

#### **1. Introduction**

In many fields, the application of target tracking technology can be seen. For example, target tracking is an important part in the field of self-driving cars. Self-driving cars technology is one of the research focuses at present. In the process of driving a car, its control system needs to detect and discriminate the surrounding environment of the vehicle, and target tracking technology plays an important role in this part [1]. Target tracking technology can also be used in the field of national defense such as aircraft tracking in the air, ships tracking at sea, and vehicles tracking on land. Usually, they coordinate their movements in a certain way. Scholars have made many excellent achievements in the field of target tracking, among which the research on multi-target tracking based on random finite set (RFS) is also very fruitful; such as machine learning [2], computer vision [3], autonomous vehicle [4], sensor scheduling [5–12], sensor network [13–15], blue, in particular, a fast RFS based distributed tracking algorithm is presented for a sensor network in [15] and track-before-detect, tracking of merged measurements, and target tracking[16–20].

Mahler is the first one to apply the RFS theory to the field of target tracking [21–24] and gives the probability hypothesis density(PHD) filter [22,25], the cardinalized PHD (CPHD) filter [26,27], and the multi-target multi-Bernoulli (MeMBer) [28]. The MeMBer filter is different from the PHD filter and the CPHD filter. The MeMBer filter propagates the parameters of a multi-Bernoulli distribution that approximate the posterior multi-target density and the others propagate the moments and cardinality distributions. Subsequently, the CBMeMBer filter [29] is proposed to solve the problem of cardinal deviation of MeMber filter. References [30,31] introduce label into RFS and propose a Generalized Labeled Multi-Bernoulli (GLMB) filter. Reference [30] gives the implementation method *δ*-Generalized Labeled Multi-Bernoulli (*δ*-GLMB) filter of GLMB, and the truncation method is used to refine the *δ*-GLMB in Reference [31]. In Reference [32,33], Vo et al. simplified the prediction step and update step of GLMB filter into a single step. This new method to implementation is known as Gibbs GLMB. The Gibbs-GLMB filter greatly reduces the computational complexity of GLMB filter and

improves the computational efficiency. Based on these filters, some people have done some work on tracking hybrid targets. Hybrid targets is a collection of multiple target, group target and extended target. The problem of tracking multi-measurement targets is concerned in [34]. In Reference [35], group target tracking in the case of uncertain number of targets and groups. Reference [19] focuses on dynamic modeling and tracking estimation for multiple resolvable group targets. Reference [36] considers group state estimation. Under the framework of RFS, the work of modeling and tracking estimation for multi-extended targets is contributed in the Reference [37]. In the Reference [33], we apply a Gibbs-GLMB filter to estimate the state of resolvable group targets and track them.

In this paper, based on the collaborative relationships between targets and the state information of each independent target, we propose a method to determine the structure and formation change of resolvable group targets. We call the dependency relationship between group targets as structure and the shape formed by group targets with fixed spatial distance as formation.

The content of this paper is arranged as follows: Section 2 introduces relevant background knowledge including RFS, LRFS, and graph theory; Section 3 describes a state estimation method for resolvable group targets; Section 4 discusses how to determine the structure and formation of resolvable group targets in continuous and discrete environments. Then we carried out two simulations to verify our proposed method in Section 5. Finally, Section 6 is the summary of the paper.

#### **2. Backgrounds**

#### *2.1. Labeled Random Finite Set (Labeled RFS)*

In [30], Vo combines RFS with labels. RFS is essentially a set with random number of members, random state of members, and no fixed sorting rules among members. The random set is used to represent the target state of resolvable group targets. It can be expressed as Equation (1) at time *k*.

$$X\_k = \{ \mathbf{x}\_{k,1}, \dots, \mathbf{x}\_{k,N\_k} \}. \tag{1}$$

In group target tracking, the situation of the target is very complex and changeable. For example, a certain target may suddenly disappear, or a new target may appear at a certain moment, or a target may be decomposed into multiple targets, and multiple targets may be combined into one target set. Therefore, Equation (1) can be expressed as Equation (2).

$$X\_k = \left[ \bigcup\_{x \in X\_{k-1}} S\_{k|k-1} \left( x \right) \right] \bigcup \left[ \bigcup\_{x \in X\_{k-1}} B\_{k|k-1} \left( x \right) \right] \bigcup \Gamma\_{k\prime} \tag{2}$$

where *Sk*|*k*−<sup>1</sup> (*x*), *Bk*|*k*−<sup>1</sup> (*x*), <sup>Γ</sup>*<sup>k</sup>* stand for the surviving targets, the spawned targets, and the birth targets, respectively. Similar to the state set, the measurement set of group targets is shown in the Equation (3).

$$Z\_k = \{ \mathbf{z}\_{k,1}, \dots, \mathbf{z}\_{k,M\_k} \}. \tag{3}$$

RFS improved to labeled RFS is the addition of a unique tag - <sup>∈</sup> <sup>L</sup> <sup>=</sup> {*α<sup>i</sup>* : *<sup>i</sup>* <sup>∈</sup> <sup>N</sup>} for each target *<sup>x</sup>k*,*i*. N is the set of positive integers. We use the constraint function Equation (4) to guarantee the uniqueness of the label.

$$\Delta(X) = \begin{cases} 1, |\mathcal{L}(X)| = |X| \\ 0, |\mathcal{L}(X)| \neq |X| \end{cases} \tag{4}$$

Reference [30] not only proposes the labeled RFS but also gives the densities of an LMB RFS and a labeled Poisson RFS. The LMB RFS's density is described as:

$$\pi(\{(\mathbf{x}\_{1},l\_{1}),\cdots,(\mathbf{x}\_{\text{fl}},l\_{\text{fl}})\})=\delta\_{\pi}(\{\{l\_{1},\cdots,l\_{\text{fl}}\}\})\times\prod\_{\substack{\mathbf{j}\in\mathbf{Y}}}(1-r^{\sum})\prod\_{j=1}^{\text{n}}\frac{1\_{\mathbf{a}(\mathbf{y})}(l\_{j})r^{\langle\mathbf{a}^{-1}(l\_{j})\rangle}}{1+r^{\langle\mathbf{a}^{-1}(l\_{j})\rangle}}\ \ .\tag{5}$$

#### *2.2. Graph Theory*

Graph theory has a wide range of applications. In this paper, we use directed graph to describe the structure of group. We take independent targets as vertices *V* of the graph *G* and relationships between targets as edges *E*. At the same time, we are describing this graph as a asymmetric adjacency matrix, which is the structure of the resolvable group target. The matrix is shown by the Equation (6).

$$A\_d = \begin{bmatrix} 0 & a(1,2) & \cdots & a(1,n) \\ a(2,1) & 0 & \cdots & a(2,n) \\ \vdots & \vdots & \ddots & \vdots \\ a(n,1) & a(n,2) & \cdots & 0 \end{bmatrix}' \tag{6}$$

Let target *i* be the parent node of target *j*, *a*(*i*, *j*) = 1. If target *i* is target *j*'s child node, or target *i* has no relationship with target *j*, *a*(*i*, *j*) = 0.

#### *2.3. Graph Theory Model of Labeled RFS*

For the relationship between any two vertex *vi*, we can express it in terms of Equation (7).

$$
\pi\_{i,j} : (\mathfrak{x}\_i, \mathfrak{x}\_j) \to \{1, 0\}. \tag{7}
$$

Since each vertex has an unique label, we can simplify Equation (7) to Equation (8).

$$e\_{i,j} : (l\_{i\prime}l\_{j}) \to \{1, 0\}. \tag{8}$$

Equation (8) shows that the structure of the group is encapsulated by the graph defined on the target labels.

### *2.4. Revolvable Group Tracking with Maneuver and the Efficient Implementation for the GLMB Filter* Let target state be *xk*,*<sup>i</sup>* given as follows:

$$\mathbf{x}\_{k,i} = [p\_{k, \mathbf{x}}(i), \dot{p}\_{k, \mathbf{x}}(i), p\_{k, \mathbf{y}}(i), \dot{p}\_{k, \mathbf{y}}(i)] \in \mathcal{X}\_k. \tag{9}$$

where *pk*,*x*(*i*) and *p*˙ *<sup>k</sup>*,*x*(*i*) are the position and velocity of target *i* on the *x*-axis, *pk*,*y*(*i*) and *p*˙ *<sup>k</sup>*,*y*(*i*) indicate the position and velocity of target *i* on the *y*-axis.

Suppose a target has a single parent node. We introduce the displacement *bk*(*l*, *i*) vector to describe this relationship and the resolvable group targets dynamic model is given as follows:

$$\mathbf{x}\_{k+1,i} \quad = \, \, F\_{k,l}\mathbf{x}\_{k,l} + b\_k(l,i) + \Gamma\_{k,i}\omega\_{k,i} \tag{10}$$

where *bk*(*l*, *i*), *l* contains the direction and distance information between the parent node *l* and child nodes *i*.

Therefore, for group targets, the displacement vector *bk*(*l*, *i*) can be represented as:

$$b\_k(l,i) = \left[R\_k(l,i) \times \cos(\theta\_k(l,i) - \beta\_k(l,i)), 0, R\_k(l,i) \times \sin(\theta\_k(l,i) - \beta\_k(l,i)), 0\right]^T,\tag{11}$$

where *Rk*(*l*, *i*) denotes the designed distance between parent node *l* and child node *i*. *βk*(*l*, *i*) is the designed angle between nodes *l* and *i*. *θk*(*l*, *i*) is the motion direction for the parent target.

Under a maneuvering motion model, we assume that the formation of the group is stable within a certain time interval, so Equation (11) can be transferred to Equation (14).

$$b\_k(l,i) = c\_k(l,i)\mathbb{C}\_a(l,i)\mathbf{x}\_{k,l} \tag{12}$$

$$c\_k(l, i) = \frac{R(l, i)}{\sqrt{\dot{p}\_{k,x}^2(l) + \dot{p}\_{k,y}^2(l)}}\tag{13}$$

$$\mathbf{C}\_{\mathbf{a}}(l,i) = \begin{bmatrix} 0 & a\_{\beta,2}(l,i) & 0 & a\_{\beta,1}(l,i) \\ 0 & 0 & 0 & 0 \\ 0 & a\_{\beta,1}(l,i) & 0 & -a\_{\beta,2}(l,i) \\ 0 & 0 & 0 & 0 \end{bmatrix} \tag{14}$$

The derivation process has been described in detail in Reference [33].

Another point is the relation between covariance and adjacency matrix. To calculate the means and covariance, the parent vertex *l* should be first known. This is dependent on adjacency matrix Equation (6). In this paper, the adjacency matrix is defined on the label space and known in prior. That is, in the predicted stage, the adjacency matrix can be gotten and adopted according to the predicted labels. In contrast, if the adacency matrix is unknown, it needs to be estimated according to the predicted states. In general, the adacency relation is based on the target states, or the motion information. A detailed discussion can be found in Reference [36].

In the original GLMB filter [30,31], both the prediction step and the update step have their own independent truncation. This makes the computational complexity of GLMB filter very high. In order to optimize this problem, Vo et al. proposed the Gibbs GLMB in Reference [32]. The Gibbs GLMB algorithm combines the prediction and update steps of the GLMB filter so that there is only one truncation in each iteration. The density of the joint step of prediction and update step at time k is shown as:

$$
\pi\_k \left( \mathbf{X} \right) \approx \Delta \left( \mathbf{X} \right) \sum\_{I, \vec{\zeta}, \theta} \left( \sum\_{I} \omega\_{k-1}^{\left( I, \vec{\zeta} \right)} \omega\_k^{\left( \left( I, \vec{\zeta} \right), \theta \right)} \left( Z\_k \right) \right) \delta\_I \left( \mathcal{L} \left( X \right) \right) \left[ p\_k^{\left( \left. \vec{\zeta} \right)} (\cdot \left| Z\_k \right) \right]^X \right) \tag{15}
$$

where

*ω*(*I*,*ξ*,*J*,*θ*) *<sup>k</sup>* (*Zk*) = 1Θ*<sup>k</sup>* (*J*)(*θ*)[1 − *rB*,*k*] *Bk*−*J* [*rB*,*k*] *Bk*∩*J* [<sup>1</sup> <sup>−</sup> *<sup>P</sup>*(*ξ*) *<sup>S</sup>*,*<sup>k</sup>* ] *I*−*J* [*P*(*ξ*) *<sup>S</sup>*,*<sup>k</sup>* ] *I*∩*J* [Ψ(*ξ*,*θ*) *Zk*,*<sup>k</sup>* ] *J* Ψ(*ξ*,*θ*) *Zk*,*<sup>k</sup>* (*l*) = Ψ(*ξ*,*θ*(*l*)) *Zk*,*<sup>k</sup>* (·, *<sup>l</sup>*), *<sup>p</sup>* (*ξ*) *<sup>k</sup>*|*k*−1(·, *<sup>l</sup>*) *<sup>P</sup>*(*ξ*) *<sup>S</sup>*,*<sup>k</sup>* (*l*) = *PS*,*k*(·, *l*), *p* (*ξ*) *<sup>k</sup>*−1(·, *<sup>l</sup>*) *pk* (*ξ*,*θ*) (*x*, ·|*Zk*) ∝ Ψ(*ξ*,*θ*(*l*)) *Zk*,*<sup>k</sup>* (*x*, *<sup>l</sup>*)*<sup>p</sup>* (*ξ*) *<sup>k</sup>*|*k*−1(*x*, *<sup>l</sup>*) *p* (*ξ*) *<sup>k</sup>*|*k*−1(*x*, *<sup>l</sup>*) = <sup>1</sup>*Bk* (*l*)*p*(*B*, *<sup>k</sup>*)(*x*, *<sup>l</sup>*) + <sup>1</sup>L*k*−<sup>1</sup> (*l*) *PS*,*k*(·, *<sup>l</sup>*)*fk*|*k*−1(*x*|·, *<sup>l</sup>*), *<sup>p</sup>* (*ξ*) *<sup>k</sup>*−1(·, *<sup>l</sup>*) *<sup>P</sup>*(*ξ*) *<sup>S</sup>*,*<sup>k</sup>* (*l*) ,

Truncation by sampling {(*I*(*i*), *<sup>ξ</sup>*(*i*), *<sup>J</sup>*(*i*), *<sup>θ</sup>*(*i*))} *Hk*,*max <sup>i</sup>*=<sup>1</sup> from some distribution *π*. It should be noted in the predicted density *p* (*ξ*) *<sup>k</sup>*|*k*−1(·, *<sup>l</sup>*) that the collaboration noise *<sup>ω</sup>*<sup>o</sup> *<sup>k</sup>*,*<sup>i</sup>* is adopted, instead of process noise *ωk*,*i*.

#### **3. Analysis of the Structure and Formation of Discernible Group Targets**

Structures and formations exist in every aspect of our lives. For example, every bridge needs a suitable structure to carry its weight. The motorcade of the bride and groom is organized into a beautiful pattern, that is, a formation. In the military field, structures and formations are easier to spot. The design of structure and formation is a very important part of the air force, ground fighting vehicle

force, and sea ship force, which affects the combat effectiveness of troops. This section describes in detail how to identify structures and formations of resolvable group targets.

#### *3.1. The Structure and Formation of Resolvable Group Targets*

#### 3.1.1. Structure

In this paper, the structure describes the collaborative relationship between the targets, we regard the collaboration between parent and child targets as the framework of the structure, while parent and child targets can be regarded as the important nodes of the structure. There is a detailed introduction on how to extract the structure of group targets in the Reference [35]. This paper focuses on how to determine whether the structure and formation of group targets have changed. First of all, let us talk about group structure in detail, focusing on when we think the group structure has not changed and when the group structure has changed. Since structure is the embodiment of collaboration, we can think of it this way: as long as the collaboration between group targets remains unchanged, the structure will remain unchanged. For example, in Figure 1, there are two independent subgroup targets, and each subgroup target has four subtargets. In these two subgroup targets, although the distance between its subtargets and the included angle of its relationship are different, their cooperative relationship is the same, so their structure is the same.

**Figure 1.** Two group targets with the same structure.

#### 3.1.2. Formation

The formation and structure of group objects are very similar. First of all, formation and structure are dependent on the collaborative relationship between sub-targets. If the cooperative relationship changes, formation and structure will inevitably change. The difference between the two is that formation has more strict standards than structure. For example, in a military parade in the queue, keep formation is in the process of the whole travel distance between the two fighters to almost keep constant, each consisting of three soldiers geometry angles that are almost the same. In the group of target formation, too, so we can regard it as a group of target formation and have a child between target space distance between the fixed and the geometry of fixed structure. Still here in Figure 1 as an example, two group of target Figure 1a and Figure 1b the same known structure, but target *x*<sup>1</sup> and *x*<sup>2</sup> in the Figure 1a the space distance and Figure 1b in the target space distance differences between *x*1, *x*2, and Figure 1a in the target consisting of *x*1, *x*2, *x*<sup>4</sup> geometrical angle and Figure 1b the angle of the corresponding target in also have bigger difference, so although the structure of the two group of consistent, but its formation is not consistent. In the process of judging, as long as there is a big

difference between spatial distance and geometric angle, we can determine that the formation of the two groups is not consistent.

#### *3.2. The Determination of Distinguishable Group Target Structure and Formation*

In this paper, it is assumed that the relationship between father and son nodes in the group target will not be reversed, that is, the parent node will always be the parent node, will die, will separate, but will not become the child of its original child node.

#### 3.2.1. Determination of Group Target Structure in Continuous State

In this section we introduce the determination of group target structure in a continuous state. Based on the above description, we can clearly determine whether the elements of the structure are collaboration or not. The key to determine whether there is a collaboration relationship is the motion state of each subtarget and the spatial position between them. If the structure of a group is to remain in state, then the motion state of its members should be similar. If the motion state of a member differs greatly from that of other members, it is obvious that it must be separated from the group, and the structure of the group will also change accordingly. According to this principle, we can carry out a comparative analysis of its elements one by one. Firstly, in the discernible group target, the speed keeping approximation of all members is one of the important factors to judge whether the parent–child target cooperation relationship in the group target changes. We took the group target in Figure 2 as an example to demonstrate. At time *k*, the state of each target is *xk*,*i*, and the velocity is *vk*,*i*, then the state of target *x*1, *x*2, and *x*<sup>3</sup> at time *k* + 1 can be expressed as:

$$\begin{aligned} \mathbf{x}\_{k+1,1} &= \mathbf{x}\_{k,1} + \upsilon\_{k,1} \\ \mathbf{x}\_{k+1,2} &= \mathbf{x}\_{k,2} + \upsilon\_{k,2} \\ \mathbf{x}\_{k+1,3} &= \mathbf{x}\_{k,3} + \upsilon\_{k,3} \end{aligned} \tag{16}$$

Therefore, the position relationship L*k*(*i*, *j*) between *x*<sup>1</sup> and *x*2, *x*<sup>1</sup> and *x*3, and *x*<sup>2</sup> and *x*<sup>3</sup> at time *k* and *k* + 1 can be expressed as:

$$\begin{aligned} \mathcal{L}\_k(1,2) &= \mathbf{x}\_{k,2} - \mathbf{x}\_{k,1} \\ \mathcal{L}\_k(1,3) &= \mathbf{x}\_{k,3} - \mathbf{x}\_{k,1} \\ \mathcal{L}\_k(2,3) &= \mathbf{x}\_{k,3} - \mathbf{x}\_{k,2} \end{aligned} \tag{17}$$

$$\begin{aligned} \mathcal{L}\_{k+1}(1,2) &= \mathbf{x}\_{k+1,2} - \mathbf{x}\_{k+1,1} \\ \mathcal{L}\_{k+1}(1,3) &= \mathbf{x}\_{k+1,3} - \mathbf{x}\_{k+1,1} \\ \mathcal{L}\_{k+1}(2,3) &= \mathbf{x}\_{k+1,3} - \mathbf{x}\_{k+1,2} \end{aligned} \tag{18}$$

Substitute Equation (16) into Equation (18):

$$\begin{aligned} \mathcal{L}\_{k+1}(1,2) &= \mathbf{x}\_{k,2} - \mathbf{x}\_{k,1} + (\upsilon\_{k,2} - \upsilon\_{k,1}) \\ \mathcal{L}\_{k+1}(1,3) &= \mathbf{x}\_{k,3} - \mathbf{x}\_{k,1} + (\upsilon\_{k,3} - \upsilon\_{k,1}) \\ \mathcal{L}\_{k+1}(2,3) &= \mathbf{x}\_{k,3} - \mathbf{x}\_{k,2} + (\upsilon\_{k,3} - \upsilon\_{k,2}) \end{aligned} \tag{19}$$

If the formation of the group is guaranteed to remain unchanged, the following conditions must be satisfied:

$$
\mathcal{L}\_{k+1}(i,j) = \mathcal{L}\_k(i,j) \tag{20}
$$

That is:

$$\begin{aligned} \upsilon\_{k,2} - \upsilon\_{k,1} &= 0 \\ \upsilon\_{k,3} - \upsilon\_{k,1} &= 0 \\ \upsilon\_{k,3} - \upsilon\_{k,2} &= 0 \end{aligned} \tag{21}$$

Therefore, we can obtain the following conditions for the formation to remain stable:

$$
\upsilon\_{k,1} = \upsilon\_{k,2} = \upsilon\_{k,3} \tag{22}
$$

Therefore, we can draw a conclusion that, in the continuous state, the resolvable group targets can maintain stable structure and formation by judging whether the velocity of each subtarget in the group is consistent

**Figure 2.** The trajectories of group targets under continuous conditions.

The degree of approximation, we can use the speed *δvel* and speed-direction difference threshold *δdir* to limit speed. The deviation of velocity has cumulative, namely with the accumulation of time, while its speed is still approximate, but its space relative position deviation of the group members will be accumulated, so make sure that the collaboration will remain with the original, with its father–child node distance also needing to be tested, the distance threshold *δdis* is as shown in Equation (23) :

$$\left\| \begin{bmatrix} p\_x^1 \\ p\_y^1 \end{bmatrix} - \begin{bmatrix} p\_x^2 \\ p\_y^2 \end{bmatrix} \right\| \prec \delta\_{\text{dis}} \tag{23}$$

The constraint of the difference of velocity size is:

$$\left\| \left\| \begin{bmatrix} \dot{p}\_x^1 \\ \dot{p}\_y^1 \end{bmatrix} - \begin{bmatrix} \dot{p}\_x^2 \\ \dot{p}\_y^2 \end{bmatrix} \right\| \right\| \prec \delta\_{\text{vol}} \tag{24}$$

The difference of velocity direction *βvdi f* deviation is:

$$\beta\_{\text{rudif}} = \frac{\dot{p}\_y^1}{\dot{p}\_x^1} - \frac{\dot{p}\_y^2}{\dot{p}\_x^2} < \delta\_{\text{dir}} \tag{25}$$

Through the above three constraints, we can determine the structure state of the group target, and the three elements of the structure are shown in Figure 3.

**Figure 3.** Schematic diagram of velocity in group target *v*, distance between father and son target *lij*, and velocity deviation *βvdi f* .

#### *3.3. Determination of Group Target Formation in a Continuous State*

The formation of the group target is a special case of the structure of the group target. Therefore, when analyzing the formation, we assume that the structure of the group target is unchanged. In other words, the speed factor of the group target member does not need to be considered, and we only need to judge whether the formation changes through its position state. In essence, it is to determine whether the relative position of two objects in a cooperative relationship has changed in two consecutive moments. The first step is to obtain the position deviation matrix *AP* of all members of the group target. the acquisition method is shown in the Equation (26).

$$A\_p = \begin{bmatrix} 0 & ||\eta\_{12}|| & \cdots & ||\eta\_{1n}|| \\ ||\eta\_{21}|| & 0 & \cdots & ||\eta\_{2n}|| \\ \vdots & \vdots & \ddots & \vdots \\ ||\eta\_{n1}|| & ||\eta\_{n2}|| & \cdots & 0 \end{bmatrix}\_{n \times n} \tag{26}$$

where *n* is the number of group members, and *ηij* is the deviation vectors of the relative position vectors of the two targets in the group at time *k* and time *k* − 1. It can be seen from Figure 4 that the position deviation vector *ηij* can be obtained from the displacement vector *rpos* of the target and the target at time *k* and time *k* − 1. The calculation method is as follows:

$$
\eta\_{k,ij} = r p \alpha\_{k,ij} - r p \alpha\_{k-1,ij} \tag{27}
$$

$$
\hat{x}^{\prime}p\text{os}\_{k,ij} = \begin{bmatrix} 1 & 0 & 1 & 0 \ \end{bmatrix} \* \begin{pmatrix} \mathbf{x}\_{k,i} - \mathbf{x}\_{k,j} \end{pmatrix} \tag{28}
$$

After obtaining position deviation matrix *AP*, we can obtain the formation change coefficient of group target through Equation (29).

$$\varphi = \|A\_k A\_P\|\_2 \tag{29}$$

*Ak* is the adjacency matrix describing the group structure. Through *Ak*, the data with collaboration relationship targets in the group are screened out from *Ap*. According to the observation, we set a threshold of formation coefficient *σdis*. If the coefficient *Φ* is greater than the threshold *σdis*, the formation of group targets is considered to be fixed; if not, it is considered to have not changed, and the relationship is shown in the formula.

$$\begin{cases} 1, & \Phi > \sigma\_{\text{dis}} \\ 0, & \Phi \le \sigma\_{\text{dis}} \end{cases} \tag{30}$$

**Figure 4.** The schematic diagram of position deviation vector *ηk*,12.

#### **4. Simulations**

#### *4.1. Experiment 1*

#### 4.1.1. Configuration Parameters for Experiment 1

In this experiment, we used a Gibbs-GLMB filter to get the state estimation of group targets. There are three subgroups of group target, including 4, 4, and 6 members, respectively. Their structure is shown in Figure 5.

**Figure 5.** The structure of subgroup targets.

The initialized state of the distance between any parent and child targets is 100 meters. {(*x*1, -<sup>1</sup>), ··· ,(*x*4, -<sup>4</sup>)} is subgroup 1, {(*x*5, -<sup>5</sup>), ··· ,(*x*8, -<sup>8</sup>)} is subgroup 2, and {(*x*9, -<sup>9</sup>), ··· ,(*x*14, -<sup>14</sup>)} is subgroub 3. The three subgroups are independent of each other. In the initial state, the adjacency matrix of the three subgroups is set as known in this paper, which can be expressed as:

$$A\_1(\ell\_1, \ldots, \ell\_4) = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix} \tag{31}$$

$$A\_2(\ell\_5, \ldots, \ell\_8) = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 \end{bmatrix} \tag{32}$$

$$A\_2(\ell\_9, \ldots, \ell\_{14}) = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \end{bmatrix} \tag{33}$$

The monitoring scope of experiment 1 is [−*π*/2, *π*/2; 0*m*, 3000*m*]. The duration of the experiment was 100 s: subgroup 1 was born at *k* = 0 s, subgroup 2 and subgroup 3 were both born at *k* = 20 s, target *x*1, the head node in subgroup 1, died at *k* = 30 s, target *x*<sup>8</sup> in subgroup 2 and target *x*<sup>11</sup> in subgroup 3 died at *k* = 70 s. In terms of structure, subgroup 3 is decomposed into two subgroup targets at *k* = 30 s. Target *x*<sup>11</sup> and its child target *x*<sup>14</sup> are separated from subgroup 3 into subgroup 4. At this time, subgroub 4 only has target *x*<sup>11</sup> and target *x*<sup>14</sup> and the dependency between target *x*<sup>11</sup> and target *x*<sup>14</sup> is the same as before. At *k* = 70 s, target *x*2, *x*3, and *x*<sup>4</sup> of subgroup 1 are completely separated into three independent targets, and target *x*<sup>11</sup> of subgroup 4 becomes an independent target. The structure between *k* = 30 and *k* = 70 s is shown in Figure 6, and the structure of the group at *k* > 70 s is shown in Figure 7. The covariance of the observed noise is *R* = diag[ 0.0012 100 ]. The covariance of process noise is *Q* = diag[ 0.04 0.04 0.04 ]. The real trajectory of the target is shown in Figure 8. The curve represents the trajectory, the circle represents the starting point, and the triangle represents the ending point.

**Figure 6.** The structure of subgroup targets during *k* = 30 and *k* = 70.

**Figure 7.** The structure of subgroup targets during *k* > 70.

**Figure 8.** The structure of subgroup targets during *k* > 70.

#### 4.1.2. The Result of the Experiment 1

In this experiment, we set that when the structure of the group target is detected to change, the estimated point of the group target and the structure of the group target at this time are immediately output. In the experimental environment parameter configuration, we can get that the group structure will change significantly when *k* = 20, *k* = 30, and *k* = 70. At *k* = 20, 10 targets will be born. At *k* = 30, a target dies, and subgroup 3 splits into two subgroups. At *k* = 70, there is both the death of the target and the change in the target structure. Therefore, we focus on the detection results of these three moments from the results. The following are the experimental result figures of the three moments in this experiment. When *k* = 20, the position of the target and the structure of the group target are shown in Figures 9 and 10. In Figure 9, we can see that Gibbs-GLMB tracks 13 points at this point, compared to *k* = 19, there are 9 target regenerations. Although the collaboration relationship between the targets has not changed compared with the previous moment, the overall structure of the group is bound to change with the increase of the members of the group, so the tracked target is traced and its structure is output. When *k* = 30, the head node of a subgroup dies, and a subgroup is separated into two subgroups, and the change is also detected successfully, as shown in Figures 11 and 12. When *k* = 70, the detection results are shown in Figures 13 and 14.

**Figure 9.** The state estimation of group targets at *k* = 20.

**Figure 10.** The structure estimation of group targets at *k* = 20.

**Figure 11.** The state estimation of group targets at *k* = 30.

**Figure 12.** The structure estimation of group targets at *k* = 30.

**Figure 13.** The state estimation of group targets at *k* = 70.

**Figure 14.** The structure estimation of group targets at *k* = 70.

The position state estimation of Gibbs-GLMB filter, OSPA distance, and target number estimation are shown in Figures 15–17.

**Figure 15.** The state estimation by Gibbs Generalized Labeled Multi-Bernoulli (GLMB) filter.

The results show that the proposed method can effectively identify the change of group target structure, but there are some problems, such as false alarm. As a result that the method relies on Gibbs GLMB state estimation results, when the state estimation error occurs, the structure estimation error will also occur. In addition, the threshold setting too large or too small will cause error, the rationality of the threshold setting is one of the problems worth in-depth study.

**Figure 16.** The OSPA distance by Gibbs-GLMB filter.

**Figure 17.** The estimated number of targets by GLMB filter.

#### *4.2. Experiment 2*

#### 4.2.1. Configuration Parameters for Experiment 2

In this experiment, we only selected a single group target with four members for the experiment, whose structure is shown in the Figure 18:

**Figure 18.** The structure of the single group targets.

The distance between the parent and child targets set in the initial state is 100 m, the same as in experiment 1. The adjacency matrix of the group {(*x*1, -<sup>1</sup>), ··· ,(*x*4, -<sup>4</sup>)} is:

$$A\_1(\ell\_1, \dots, \ell\_4) = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix} \tag{34}$$

The other parameters are basically the same as those in experiment 1, except that there are no birth, death and ovulation of the target in this experiment. As a result that this experiment studies the change of formation, we assume that its structure is stable and unchanged, which can simplify our modeling. In the model, we increase the distance between target *x*<sup>1</sup> and target *x*<sup>2</sup> in the group by 10*m* per second between *k* = 30 and *k* = 40 s. In other words, the formation of the group will continue to change during this period, and its track is as Figure 19:

**Figure 19.** The track of the single group targets.

#### 4.2.2. The Result of the Experiment 2

When a change in formation is detected, the current and previous positions of the group target are plotted immediately.The results show that all the preset change nodes can be detected successfully, and the detected position diagram at the time *k* = 30 to *k* = 41 is shown in Figure 20. The blue cross represents the current target position and the red circle represents the previous target position.

**Figure 20.** The position of the target at *k* = 30 to *k* = 40.

The experiment was carried out for ten times, and the detection accuracy of ten times was shown in Table 1. According to the data in the Table 1, we can know that the average recognition accuracy of the ten experiments is 74.70%.


**Table 1.** The judgment precision of ten experiments and the average value of ten experiments.

#### **5. Conclusions**

In this paper, we analyze the structure and formation of group targets based on the information of target position and velocity, and then propose a method to determine whether they have state changes. First, we use the Gibbs-GLMB filter to estimate the state of group target. Then, by analyzing and comparing the state estimation data of the target, we can determine whether its structure or formation has changed. The structure problem is mainly determined by the distance between the targets, the velocity difference between the two targets, and the angle difference between the velocity direction and the position vector. For the formation problem, it is mainly determined by the distance between the targets and the offset angle difference of the position vector. Finally, experiments show that the method proposed in this paper can effectively identify the structure and formation changes of group targets.

**Author Contributions:** For conceptualization, W.L.; methodology, X.R.; software, Y.C.; validation, W.L. and X.R.; formal analysis, W.L. and X.R.; investigation, W.L.; writing—original draft preparation, W.L. and X.R.; writing—review and editing, W.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by the National Natural Science Foundation of China (61771177,61333011), and the Natural Science Foundation for Young Scientists of Jiangsu Province (BK20160148).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Sea–Sky Line and Its Nearby Ships Detection Based on the Motion Attitude of Visible Light Sensors**

#### **Xiongfei Shan 1,2, Depeng Zhao 1,2, Mingyang Pan 1,2,\*, Deqiang Wang 1,2 and Lining Zhao 1,2**


Received: 10 August 2019; Accepted: 14 September 2019; Published: 16 September 2019

**Abstract:** In the maritime scene, visible light sensors installed on ships have difficulty accurately detecting the sea–sky line (SSL) and its nearby ships due to complex environments and six-degrees-of-freedom movement. Aimed at this problem, this paper combines the camera and inertial sensor data, and proposes a novel maritime target detection algorithm based on camera motion attitude. The algorithm mainly includes three steps, namely, SSL estimation, SSL detection, and target saliency detection. Firstly, we constructed the camera motion attitude model by analyzing the camera's six-degrees-of-freedom motion at sea, estimated the candidate region (CR) of the SSL, then applied the improved edge detection algorithm and the straight-line fitting algorithm to extract the optimal SSL in the CR. Finally, in the region of ship detection (ROSD), an improved visual saliency detection algorithm was applied to extract the target ships. In the experiment, we constructed SSL and its nearby ship detection dataset that matches the camera's motion attitude data by real ship shooting, and verified the effectiveness of each model in the algorithm through comparative experiments. Experimental results show that compared with the other maritime target detection algorithm, the proposed algorithm achieves a higher detection accuracy in the detection of the SSL and its nearby ships, and provides reliable technical support for the visual development of unmanned ships.

**Keywords:** SSL; six-degrees-of-freedom motion; motion attitude model; edge detection; straight-line fitting; visual saliency

#### **1. Introduction**

In recent years, with the continuous development of artificial intelligence (AI), big data, and communication technology, unmanned driving technology has made breakthrough achievements. Unmanned aerial vehicles (UAVs) have gradually entered the civil field from the military field, and unmanned ground vehicles (UGVs) are continually testing on public roads around the world. The research on unmanned ships is also developing rapidly. Major research institutions at home and abroad are investing a large amount of manpower, material resources, and financial resources to carry out theoretical research, technology research, and development of large-tonnage unmanned merchant ships. The key technologies of unmanned ships mainly include situational awareness, intelligent decision-making, motion control, maritime communication, and shore-based remote control, etc., and situational awareness is the premise of all other technologies. The advanced sensors are used to obtain the situation information around unmanned ships, provide basic data support for complex tasks such as intelligent decision-making and motion control, and ensure the autonomous operation safety of unmanned ships [1].

Currently, ships perceive the maritime environment mainly through two kinds of sensors, namely, radio detection and ranging (RADAR) and automatic identification system (AIS). They transmit the

target information to the electronic chart display and information system (ECDIS), which realizes a certain degree of intelligent analysis and decision. However, the maritime navigation environment is complex and variable. RADAR and AIS cannot directly reflect the spatial information of detection targets. The situational awareness cannot be established quickly, and mariners need to confirm the situation. At the same time, RADAR detection is sensitive to meteorological conditions and the shape, size, and material of the target. AIS cannot effectively detect small targets that are not equipped with it or are not turned on. Visible light sensors are intuitive, reliable, informative, and cost-effective [2]. With the continuous development of computer vision technology, visible light cameras as important situational awareness sensors are gradually being applied to unmanned ships, providing a reliable source of information for intelligent decision-making.

The main targets for maritime detection using cameras include ships, rigs, navigation aids, and icebergs. When maritime targets appear in the field of view of the camera, they must appear in the vicinity of the sea–sky line (SSL). As the distance between the camera and the target approaches, the target gradually enters the sea area. It can be seen that extracting the SSL and performing maritime target detection in its vicinity can greatly reduce the target detection range and reduce the complexity and calculation amount of the algorithm. However, a target near the SSL has a very small area in the image, usually only a few tens or hundreds of pixels, which is easily overwhelmed by the complex sea–sky background, resulting in target missed detection or false detection [3]. Therefore, this paper proposes an algorithm based on the motion attitude model of a visible light camera for the SSL and its nearby ships.

#### **2. Related Work**

In general, maritime target detection technology mainly includes three steps, namely, SSL detection, background removal, and foreground segmentation. Based on the research status at home and abroad in recent years, this paper briefly reviews the three algorithms and proposes the main technical framework.

#### *2.1. SSL Detection*

The SSL is an important feature of maritime images, and there are many related researches, which are mainly divided into two categories. The first category is based on the combination of edge detection and straight-line fitting. The image is processed by the edge detection operator, and then the high gradient edge pixels are straight-line fitted or projected. Liu [4] proposed an SSL detection algorithm based on inertial measurement and Hough transform fusion. The inertial data of the shipboard camera are used to estimate the position of the SSL in successive frames, then Canny operator and Hough transform are used to realize SSL extraction in the detection region. Wang Bo et al. [5] proposed an SSL detection algorithm based on gradient saliency and region growth. The gradient saliency calculation effectively improves the characteristics of the SSL and suppresses the influence of complex sea conditions such as clouds and sea clutter. Kim et al. [6] proposed an algorithm for estimating the position of the SSL by camera pose and fitting it using random sampling consistency (RANSAC). Fefilatyev et al. [7] used the combination of Gaussian distribution and Hough transform to select the optimal SSL from five candidate SSLs. Santhalia et al. [8] proposed a Sobel operator edge detection algorithm based on eight directions, which effectively eliminates edge noise and has small computational complexity and strong stability. The second category is based on the method of image segmentation, which extracts the upper part of the SSL by threshold processing or background modeling. Dai et al. [9] proposed an edge detection algorithm based on local Otsu segmentation, which solves the problem of poor global threshold segmentation. Zhang et al. [10] proposed an SSL extraction algorithm based on Discrete Cosine Transform (DCT) coefficients. The image is segmented into 8 × 8 non-overlapping blocks, and the DCT coefficients in the block are calculated to segment the sky and sea areas. Zeng et al. [11] extracted the contour edges using the improved Canny operator of the surrounding texture suppression, and then voted the Hough transform to finely detect the horizontal or oblique SSL. Nasim et al. [12] proposed a K-means algorithm to segment the sea scene into clusters, and

extract the SSL by analyzing the image segments. The above algorithms have achieved good detection results in their respective experiments, but the first category of algorithm is not able to balance SSL edge extraction and wave edge suppression according to the gradient extraction edge process, and the second category of algorithm is not able to get the SSL limited by the image segmentation accuracy.

#### *2.2. Background Removal*

In the maritime scene, we usually segment the sea–sky background by simulating the color, texture, saturation, and other features, and subtract it from the original image. Kim et al. [13] used improved mean difference filtering to improve the target signal-to-noise ratio while processing infrared images, and averaged the sea–sky background to remove sea clutter interference. However, this method only worked well for structural clutter similar to the SSL, and had a poor effect on the sea surface interference with strong light reflection. Zeng et al. [14] used the surrounding texture filter instead of mean-shift filter to improve the mean-shift image segmentation algorithm, and controlled the filter parameters to perform fast region clustering to remove the sea–sky background. However, the texture filter parameters and clustering parameters needed to be manually set, which needed a certain prior knowledge. In addition, a technique based on the visual saliency model is gradually being applied to maritime target detection. It simulates human visual features through intelligent algorithms, suppresses the sea–sky background, and extracts visual salient regions in the image—that is, regions of human interest. Fang et al. [15] applied the theory of color space and wavelet transform to extract the low frequency, high frequency, hue, saturation, and brightness characteristics of the task water image. The visual attention operator was used to fuse various features, effectively overcoming the background disturbances such as waves, wakes, and onshore buildings. Lou et al. [16] solved the small target detection problem in color images from two aspects of stability and saliency. By multiplying the stability and saliency maps by pixels, the noise interference in the background was eliminated. Agrafiotis et al. [17] designed a maritime tracking system by combining a visual saliency model with a Gaussian Mixed model (GMM) and used an adaptive online neural network tracker to further refine the tracking results. Liu et al. [18] achieved further enhancement of the visual saliency model through a two-scale detection scheme. On a larger scale, the sea surface was removed by the mean-shift filter. On a smaller scale, the target was coarsely extracted by extracting the edge of the significant region, and then the fine processing of the chroma component was used to select the output target. The above algorithm achieves background removal by reducing the background noise of the sea–sky background and enhancing the salient features of the region of interest, but when there is strong cloud, wave, or ship wake disturbance in the sea–sky background, the saliency is the same or higher than the target, which causes the visual saliency algorithm to produce large errors. In addition to the above documents, Ebadi et al. [19] proposed a modified approximated robust PCA algorithm that can handle moving cameras and takes advantage of the block sparse structure of the pixels corresponding to the moving objects.

#### *2.3. Foreground Segmentation*

After the image background is removed, we can apply morphological processing to obtain the maritime target. Westall et al. [20] applied improved morphological processing of close-minus-open (CMO) techniques to enhance target detection. Fefilatyev [21] used the Otsu algorithm to obtain global thresholds in the region above the SSL, and used the global threshold to segment the target vessel. Although features such as edges and contours are widely used in target ship detection, it is still difficult to achieve ideal results in complex sea–sky backgrounds with the above algorithms. Besides, Kumar et al. [22] and Selvi et al. [23] made full use of the target ship's color, texture, shape, and other information, and used the support vector machine to classify the target. Frost et al. [24] also applied the prior knowledge of ship shape to the level set segmentation algorithm to improve ship detection results. Loomans et al. [25] integrated a multi-scale Histogram of Oriented Gradient (HOG) detector and a hierarchical Kanade-Lucas-Tomasi (KLT) feature point tracker to track ships in the port, and achieved

better detection and tracking effects. The above algorithm is not based on the background subtraction algorithm, but is based on the manually set ship characteristics for target detection. With the continuous development of deep learning technology, the feature extraction algorithm, based on convolutional neural network, is gradually dominating image classification, detection, segmentation, etc., and is gradually being applied to the field of maritime target detection. Ren et al. [26] proposed an improved Faster R-CNN system to detect small target ships in remote sensing images. The statistical algorithms were used to screen the appropriate anchors, and the detection techniques were greatly improved by using jump links and texture information. Yang et al. [27] designed a rotational density pyramid network model to extract the ship's direction while accurately detecting the target ship. Zhang et al. [28] proposed a scheme combining saliency detection and convolutional neural networks to accurately detect ships in remote sensing images of different poses, scales, and shapes. In addition to the above documents, Biondi [29] presented a complete procedure for the automatic estimation of maritime target motion parameters by evaluating the generated Kelvin waves detected in synthetic aperture radar (SAR) images. Graziano et al. [30] proposed a novel technique using X-band Synthetic Aperture Radar images provided by COSMO/SkyMed and TerraSAR-X for ship wake detection. Biondi et al. [31] proposed a new approach where the micro-motion estimation of ships, occupying thousands of pixels, was measured, processing the information given by sub-pixel tracking generated during the co-registration process of several re-synthesized time-domain and overlapped sub-apertures.

In summary, the above algorithms have achieved good application results in their respective research fields, but it is still difficult to achieve high detection accuracy for the SSL and its nearby ships in the complex sea–sky background. For the above problems, we propose the technical framework of this paper, which mainly includes two technologies, as shown in Figure 1. First, SSL detection. After the camera and the inertial sensor acquire the data synchronously, we pass the inertial data to the camera motion attitude model to obtain the image candidate region (CR) position, then cut the CR from the original image, and only perform edge detection and Hough transform in the CR to extract the optimal SSL. Finally, the CR with the optimal SSL is stitched back to the original image. Second, according to the optimal SSL position, we cut the region of ship detection (ROSD) of the image, and then only perform saliency detection and foreground segmentation on the ROSD, and finally stitch the detection result back to the original image.

**Figure 1.** Technical framework of this paper. Roll angle (γ) and pitch angle (β) represent inertial data, (*l, w*) represents the size of the original image. [LC\_CR, H\_CR] represents the candidate (CR) of the image, where LC\_CR represents the upper left corner of the CR, and H\_CR represents the height of the CR. [LC\_ROSD, H\_ROSD] represents the region of ship detection (ROSD) of the image, where LC\_ROSD represents the upper left corner of the ROSD, and H\_ROSD represents the height of the ROSD.

In the remainder of this paper, we describe the camera motion attitude model in Section 3, the SSL detection model in Section 4, and the visual saliency detection model in Section 5. We introduce the dataset used in the experiment and compare experiments with other algorithms in Section 6. Finally, in Section 7, we summarize and draw conclusions.

#### **3. Camera Six-Degrees-of-Freedom Motion Attitude Modeling**

In navigation, the ship is sailing in a large circle at sea; the tester with an eye height of *h* sees that the farthest sea and the sky intersect into a circle, which is called the tester's visible horizon, that is, the SSL. In ship vision, we use cameras instead of human eyes for sea target detection and identification. Assuming that the installation position of the camera is *h* from the sea level, the geometric relationship can be obtained considering the curvature of the earth and the difference of atmosphere refraction, as shown in Figure 2. The circle *MN* represents the SSL and the blue triangle represents the camera. Before using it, we finished camera calibration and distortion correction [32]. Therefore, in this analysis, we suppose the optical axis of the camera is parallel to the horizontal plane, which is called the initial state of the camera motion. The point *O* is the camera center, the point *K* is the projection of the point *O* at the sea level, *r* represents the radius of the earth, δ represents the angle of the ball, ε represents the difference of atmosphere refraction, the difference in the navigation is (1/13) δ, and the straight line *OM* represents the actual distance from camera to the SSL instead of *KM*ˆ , which is expressed by *De*. In the triangle ΔOKM, since both (δ/2) and (δ/2 − ε) are small angles, we can approximate cos(δ/2) ≈ 1 and sin(δ/2 − ε) ≈ δ/2 − ε , and *De* can be obtained by Equation (1). According to the 1 nautical mile representing 1852 meters in navigation, it can be inferred that the *r* is 6366707 m. The position angle θ of the SSL in the camera can be obtained by Equation (2).

$$D\_{\varepsilon} = \frac{h \cos(\delta/2)}{\sin(\delta/2 - \varepsilon)} \approx \sqrt{\frac{26\eta l}{11}} = 3871 \sqrt{h} \tag{1}$$

$$
\theta = \pi / 2 - \varepsilon - \angle KOM = \frac{11D\_{\varepsilon}}{13r} = 0.0295 \,\sqrt{h} \tag{2}
$$

**Figure 2.** Geometric relationship between the big circle and the camera position.

In order to simplify the projection relationship of the camera, we assume the sea level as a plane, while ignoring the relative motion of the camera and the ship, so that the camera coordinate system coincides with the ship's motion coordinate system. Next, we model the camera's six-degrees-of-freedom motion and the SSL position according to the coordinate system projection method [33].

#### *3.1. Influence of Camera Swaying, Surging, and Yawing Motions on the Position of the SSL*

Under the condition of maintaining the initial state, the height *h* of the camera remains unchanged when the camera only performs the swaying, surging, and yawing motions. It can be known from Equation (1) that θ is only related to *h*, so the camera swaying, surging, and yawing motions have no effect on the position of the SSL on the imaging plane.

#### *3.2. Influence of Camera Heaving and Pitching Motions on the Position of the SSL*

Under the condition of maintaining the initial state, we assume the sea level as a plane according to Figure 2, and obtain the geometric relationship as shown in Figure 3a. The triangle represents the camera. In the imaging plane of the camera, the line *js* represents the sky area, the line *sg* represents the sea area, the point *s* represents the projection point of the SSL, and the point *i* represents the center point, which is also taken as the origin (0, 0) of the image coordinate system. Assuming that the pitch angle of the camera is represented by β, the camera's vertical viewing angle is represented by 2 α, the longitudinal width of the imaging plane of the camera is *w*, and the position of the SSL in the image is represented by *zs*, and *zs* can be obtained by:

$$z\_{\circ} = w \begin{pmatrix} -\theta \end{pmatrix} / 2\alpha \tag{3}$$

**Figure 3.** Geometric relationship between the sea–sky line (SSL) position and the camera position. (**a**) Geometric relationship under the initial state. (**b**) Geometric relationship under the camera heaving motion.

#### 3.2.1. Influence of Camera Heaving Motion

Under the condition of maintaining the initial state, when the camera only performs the heaving motion, as shown in Figure 3b, it is assumed that the heaving height is *h* , and the point *O* represents the new position of the camera center. According to Equation (1), the position angle θ and the position *zh* can be obtained by:

$$
\theta' = 0.0295 \sqrt{h + h'} \tag{4}
$$

$$z\_{h'} = w \left( -\theta' \right) / 2a \tag{5}$$

#### 3.2.2. Influence of Camera Pitching Motion

Under the condition of maintaining the initial state, when the camera only performs the pitching motion, as shown in Figure 4, it is assumed that the pitch angle β clockwise rotation is positive and the counterclockwise rotation is negative. Under β clockwise rotation, when 0 <β<θ, the SSL is located at the lower part of the imaging plane center line and gradually approaches it as β increases. When β = θ, the SSL is located at the center line of the imaging plane. When θ<β<θ + α, the SSL is located on the center line of the imaging plane. As the β increases, it gradually moves away from the center line and close to the top of the image. When β>θ + α, the SSL is not in the imaging plane, and only the sea area can be seen in the image. Under β counterclockwise rotation, when θ − α <β< 0, the SSL is located at the lower part of the center line of the imaging plane, and as the β increases, it gradually moves away from the center line and close to the bottom of the image. When β<θ − α, the SSL is not in the imaging plane, and only the sky area can be seen in the image. According to the above analysis, the position of the SSL after the pitching motion can be obtained by:

$$z\_{\beta} = \begin{cases} w \left( \beta - \theta \right) / 2\alpha & \text{if } \theta - \alpha < \beta < \theta + \alpha \\ \text{invalid} & \text{else} \end{cases} \tag{6}$$

**Figure 4.** Geometric relationship between the SSL position and camera pitching motion. (**a**) Clockwise rotation. (**b**) Counterclockwise rotation.

#### *3.3. Influence of Camera Rolling Motion on the Position of the SSL*

Under the condition of maintaining the initial state, when the camera only performs the rolling motion, as shown in Figure 5, it is assumed that the rolling angle γ is clockwise rotated (it is the same as γ counterclockwise rotation), *x z* is a new image coordinate system, and the SSL intersects the *z* axis at *s* . So, the SSL can be expressed by:

$$y = \propto \tan \gamma + z\_{\circ} / \cos \gamma \tag{7}$$

**Figure 5.** Geometric relationship between the SSL position and camera rolling motion.

Comprehensive analysis of the relationship between the camera six-degrees-of-freedom motion and the SSL shows that when the camera performs the swaying, surging, and yawing motions, the SSL does not change in the image coordinate system. However, when the camera performs the heaving and pitching motions, the SSL performs a translational motion up and down in the image coordinate system, and when the ship performs the rolling motion, the SSL performs a rotational motion in the image coordinate system. Equations (1)–(7) can be used to obtain the estimation equation of the SSL in the image coordinate system, as shown in Equation (8), where the range of β is θ − α <β<θ + α. It can be seen from Equation (8) that the height change *h* generated by the camera's heaving motion has less influence on the position of the SSL in the image, and it is also much smaller than the installation height of the camera; so, Equation (8) can be simplified to obtain the final SSL estimation equation, as shown in Equation (9).

$$y = \text{x } \tan \gamma + w / 2a \left( -0.0295 \sqrt{h + \Delta h} \left( 1 / \cos \gamma \right) + \left( \beta - 0.0295 \sqrt{h} \right) \right) \tag{8}$$

$$y = \text{x } \tan \gamma + w / 2a (\beta - 0.0295 \sqrt{h} (1 / \cos \gamma + 1))\tag{9}$$

#### **4. Edge Detection and Hough Transform Algorithm for the Detection of the SSL**

#### *4.1. Estimating the CR of the SSL*

In order to estimate the CR of the SSL, we need to change from the camera coordinate system to the image coordinate system; that is, the coordinate origin moves from the center point to the upper left corner. Then, we begin to explore the relationship between the pixel points on the image and the actual distance at sea. First, we find the position of the SSL on the image in the current coordinate system, as shown in Equation (10), where *zi* represents the pixel points on the image. Then, through Equations (1) and (10), we can obtain the relationship between *zi* and the actual distance at sea, as shown in Equation (11). Assuming *h* = 20 m, the camera parameters are *w* = 964 pixels and α = 3.7◦, and the relationship between *D* and *zi* can be obtained as shown in Figure 6, where horizontal and vertical coordinates represent *D* and *zi*, respectively. Since we only want to show the relationship between the SSL and the sea area, the value of the ordinate is from the center of the image to the bottom, so the range is [482, 964]. From Figure 6, it can be seen that the closer to the SSL, the larger the actual distance represented by each pixel. A distance of 2.55 nm or beyond from the camera can be represented by 30 pixels on the image.

**Figure 6.** Relationship between the pixel points on the image and the actual distance at sea.

Considering that the SSL is usually a straight line that runs through the entire image and generally has a certain angle of inclination, we use a rectangle to describe the SSL. The upper left corner and the height of the rectangle are represented by *LC* and *H*, respectively. In order to reduce the estimation

error, we add a yellow area with a height of 30 pixels to the upper and lower sides of the rectangular area as the CR of the SSL. The parameter value can be obtained by Equation (12), as shown in Figure 7a.

$$z\_i = w \left(\alpha + \theta\right) / 2\alpha \tag{10}$$

$$D = \frac{h \cos 2.36 (2 \alpha z\_i / w - \alpha)}{\sin (2 \alpha z\_i / w - \alpha)} \tag{11}$$

$$\text{CR} = \left[ \text{LC} - (0, \text{\textdegree } 0) \text{\textdegree H} + \text{\textdegree O} \right] \tag{12}$$

**Figure 7.** Figures of each stage in the SSL extraction algorithm. In (**a**), the camera parameters are *h* = 20 m, α = 3.7◦, *l* = 1288 pixels, *w* = 964 pixels; the inertial sensor parameters are β = 0 ◦ , γ = 5.0◦ ; and the green dotted line represents the estimated SSL obtained by the camera motion attitude model. In (**b**), *N* = 16. In (**c**), the clustering parameter is 5. In (**d**), ω = 0.4.

#### *4.2. Edge Detection in the CR*

After acquiring the CR, it is only necessary to process the image edges in the region, which can effectively reduce the calculation amount of the image processing work. In this paper, a novel edge detection algorithm based on the local Otsu segmentation is designed in the CR. The specific algorithm is shown as follows:


3. Edge extraction. For the binary image of the CR, we check the position of the pixel mutation in the vertical direction line-by-line, and the position of the pixel mutation is the edge of the binary image, as shown in Equation (13), where *Iedge* (·) represents the edge image, *Ibm* (·) represents the binary image, *wi* represents the position along any line of pixels, and ( represents the morphological XOR operation. The edge extraction effect is shown in Figure 7b.

$$I\_{\text{clge}}(w\_{i\text{\textdegree}},l) = I\_{\text{bm}}(w\_{i\text{\textdegree}},l) \bigotimes\_{\text{\textdegree}} I\_{\text{bm}}(w\_{i+1\text{\textdegree}},l) \tag{13}$$

#### *4.3. Identifying the Optimal SSL with Improved Hough Transform*

The Hough transform is used to display the edge detection result in the accumulator. As shown in Figure 8, the horizontal coordinate represents the polar angle (θ) and the vertical coordinate represents the polar diameter (ρ), and each curve represents a point in the edge image. The brighter the point, the more the number of curves (represented by τ ) that pass through this point, indicating that the more points are collinear in the edge image. In this paper, we optimize the SSL extraction algorithm by combining the prediction results calculated by Hough transform according to the SSL length, with the measurement results provided by the inertial sensor according to the SSL angle. The specific algorithm is as follows:


$$J\_{\min} = \omega \left( \tau - (l \,/\, \cos \gamma) \right)^2 + (1 - \omega)(q - \gamma)^2 \tag{14}$$

$$\mathbf{x}^\* = \frac{\mathbf{x} - \min}{\max - \min} \tag{15}$$

Assuming *l* = 1288 pixels, γ = −5.0◦, and ω = 0.4, the relevant parameters of the cost function are shown in Table 2. We know SSL-1 is the optimal SSL and display it in the original image as shown in Figure 7d.

**Figure 8.** Hough space of the edge detection image.


**Table 1.** Hough spatial parameters of candidate SSLs ranked top five.



#### **5. Visual Saliency Detection in the ROSD of the SSL**

After obtaining the optimal SSL, we add 30 pixels to the rectangle where the optimal SSL is located, cut it, and define it as the ROSD. In the ROSD, the influence of clouds and sea clutter is small. The long-distance ship is mainly near the SSL, and the sea–sky background is relatively uniform and connected with the boundary part of the area. According to the characteristics of the ROSD, we use the fast minimum barrier distance (FMBD) [35] to measure the connectivity of the pixel and the region boundary. The algorithm operates directly on the original pixel, and does not have to acquire the superpixel of the image through the region abstraction [36–39], which improves the detection performance of the saliency map.

The FMBD algorithm mainly consists of three steps, namely, obtaining the minimum barrier distance (MBD) distance map, backgroundness, and post-processing. We used the same approach as FMBD in the first two steps, but we made appropriate improvements in the post-processing step. The specific algorithm is as follows:

Firstly, we convert the color space of the ROSD from RGB to Lab to better simulate the human visual perception. In each channel, we select a pixel-wide row and column as the seed set *S* in the upper, lower, left, and right boundaries of the ROSD region. Then, the FMBD algorithm is used to calculate the path cost function of each pixel in the ROSD region to the set *S*, as shown in Equation (16), where *i* represents any pixel other than the boundary in the image, and π(*i*) represents the path of the pixel to the set *S*. In this paper, we consider four paths adjacent to each pixel point; I(·) represents the

pixel value of a point, and the cost function βI(π) represents the distance between the highest pixel value and the lowest pixel value on a path.

$$\beta\_{\varGamma}(\pi) = \max\_{i=\left[0,k\right]} \varGamma(\pi(i)) - \min\_{i=\left[0,k\right]} \varGamma(\pi(i)) \tag{16}$$

We scan the ROSD area three times, which are raster scan, inverse raster scan, and raster scan. In each scan, half of the four neighborhoods of each pixel are used; that is, the upper neighborhood and the left neighborhood pixel. The path minimization operation is shown in Equation (17), where P(*y*) represents the path currently assigned to pixel *y*, \* *y*, *x* + represents the edge from pixel *y* to pixel *x*, P(*y*)· \* *y*, *x* + represents the path of *x*, and the direction is from *y* to *x*. Assuming P*y*(*x*) = P(*y*)· \* *y*, *x* + , you can get Equation (18), where U(*y*) and L(*y*) are the maximum and minimum values on the path, respectively.

$$\mathcal{D}(\mathbf{x}) = \min \left\{ \begin{array}{c} D(\mathbf{x}) \\ \beta\_I(\mathcal{P}(\mathbf{y}) \cdot \langle \mathbf{y}, \mathbf{x} \rangle) \end{array} \right. \tag{17}$$

$$\beta\_I(\mathcal{P}\_y(\mathbf{x})) = \max\{\mathcal{U}(y), \mathcal{I}(\mathbf{x})\} - \min\mathcal{L}(y), \mathcal{I}(\mathbf{x}) \tag{18}$$

In summary, when a pixel appears in the region of the salient target, its pixel value should be close to the maximum pixel value on each path, and the cost function here is relatively large. When a pixel appears in the background area, its pixel value should be close to the minimum pixel value on each path, and the cost function here is relatively small. Thereby, the highlighting area can be realized, the background area can be darkened, and the target saliency detection can be completed.

Secondly, after obtaining the FMBD distance maps accumulated in the three-color spaces, we apply the backgroundness cue of the ROSD region to enhance the brightness of the saliency map. In the ROSD, the boundary of the image is the sea–sky background. According to this feature, first, we select 10% of the area in the upper, lower, left, and right directions of the ROSD as the boundary part, and then calculate the Mahalanobis Distance of the color mean between all the pixels and the four boundary areas. Finally, the maximum value of the boundary information is subtracted from the sum of the boundary information obtained from the four regions to obtain a boundary comparison map. Therefore, we can exclude the case where a region may contain a foreground region, as shown in Equation (19), where *x* and *Q*−<sup>1</sup> represent the color mean and covariance of each boundary part, respectively.

$$\begin{cases} u\_k^{ij} = \sqrt{\left(\mathbf{x}\_k^{ij} - \overline{\mathbf{x}}\right)} \mathbf{Q}^{-1} \left(\mathbf{x}\_k^{ij} - \overline{\mathbf{x}}\right)^T\\\ u^{ij} = \sum\_{k=1}^4 u\_k^{ij} - \max\_k u\_k^{ij} \end{cases} \tag{19}$$

Finally, in the post-processing section, the three processing techniques of the original article do not adapt to ship detection near the SSL, so we make appropriate improvements. For the first processing, we replace the previous morphological filtering with morphological reconstruction with opening operation. The specific operation is that we use the structural element *b* to erode the saliency map (the saliency map is represented by *F*) n times to obtain the erosion map *F* , then use *b* to dilate *F* . Next, we take the minimum value of the dilation map and the original map *F*, and iterate the process until *F* no longer changes. The results of our processing can be obtained by Equation (20), where ⊕ and represent the dilation and erosion operations in morphology, respectively. For the second processing, the original processing utilizes the image enhancement technique in the middle of the image, but it is easy to ignore the small targets around, so this paper directly removes this technology. The third processing is consistent with the original article; the sigmoid function is used to increase the contrast between the target and the background region, as shown in Equation (21), where parameter *a* is used to control the contrast level of the target and the background.

$$\begin{cases} \begin{array}{c} F' = (F \ominus nb) \\ O\_R^{(n)}(F) = \min(F' \oplus b), F \end{array} \end{cases} \tag{20}$$

$$f(\mathbf{x}) = \frac{1}{1 + e^{-a(\mathbf{x} - 0.5)}} \tag{21}$$

The saliency feature map obtained by the proposed algorithm has the following characteristics: The target part is highlighted, the background part is darkened, and the contrast is obvious. We select the appropriate threshold to test the saliency map, and use the area threshold to extract the final l target ship, eliminating trivial small area interference. The processing of target detection is shown in Figure 9.

**Figure 9.** A processing case for targets detection. (**a**) is obtained by fusing the average values of the MBD maps of the three channels L, a, and b. In (**b**), each red box represents 10% of the image area in the four directions of up, down, left, and right, and ⊕ represents the average of the four images after adding. (**d**,**e**) represent post-processing, where *n* = 8 and *a* = 10. (**f**,**g**) represent foreground segmentation, where the binarization threshold is 5 times the average intensity of (**e**), and the area threshold is 100 pixels.

#### **6. Experimental Results and Discussion**

This paper conducts a real ship experiment on the "YUKUN" of Dalian Maritime University's special teaching practice ship, and uses an inertial sensor and a visible light camera for data acquisition. The inertial sensor adopts the MTi-G-700 MEMS inertial measurement system produced by Xsens Company of the Netherlands. The measurement range of roll angle and pitch angle is [–180◦, 180◦], and the accuracy of measurement is less than 0.1◦. The camera uses the Blackfly U3-13S2C/M-CS camera from PointGrey, Canada. The chip size is 4.8 × 3.6 mm, and the number of pixels on the target surface is 1288 × 964. The camera focal length during the experiment was 27.82 mm. All the experiments in this paper were tested on an Intel i5 processor, 8G memory MacBook Pro, and programmed in Python.

#### *6.1. Dataset and Evaluation Indicators*

#### 6.1.1. Dataset

For maritime target detection, there is currently no authoritative dataset to verify the validity of the algorithm. A few datasets that have been opened do not include camera-related attitude data. Therefore, the images in our dataset were obtained by the Blackfly U3-13S2C/M-CS camera installed on the "YUKUN" ship. The image size is 1288 × 964 pixels. We used the inertial sensor and visible camera synchronization processing algorithm to obtain the camera motion attitude data. The detailed information of the experiment images is shown in Table 3.



#### 6.1.2. Evaluation Metrics

As can be seen from Section 4.1, each SSL can be represented by a rectangle, so we can describe the SSL by Equation (22). The true value can be obtained by manually marking SSLs in the images. In the experiment, to verify the camera's motion attitude model, we can calculate the difference between the estimate values and the actual values of *LC* and *H* to obtain the model accuracy. In evaluating the detection performance of the SSL, if the difference of *LC* is less than 5 pixels and the difference of *H* is less than 10 pixels, we believe that the SSL is correctly detected.

In evaluating the performance of detection, we used the confusion matrix of classification result to represent the detection results, namely, the true positive (*TP*), false positive (*FP*), true negative (*TN*), and false negative (*FN*). Precision and recall were obtained by Equation (23). Intersection over Union (IoU) was also used as the evaluation metrics; that is, the intersection of the detection result and the true value was compared to their union. When the IoU was greater than or equal to 0.5, the test result was marked as *TP*. When IoU was less than 0.5, the test result was marked as *FP*.

$$L = \begin{bmatrix} L\mathcal{C}, \ H \end{bmatrix} \tag{22}$$

$$P = \frac{TP}{TP + FP'}, \ R = \frac{TP}{TP + FN} \tag{23}$$

#### *6.2. Experimental Results and Discussion on SSL Detection*

The SSL extraction algorithm in this paper mainly includes three models, namely, camera motion attitude model, improved edge detection model, and improved Hough transform model. In order to verify the performance of each model separately, the following three experiments were designed. Some parameter settings in the experiment are shown in Table 4.

1 Experiment 1—Verification of the camera motion attitude model



The camera motion attitude model uses the pitch angle β and roll angle γ provided by the inertial sensor to estimate the CR of the SSL in the image, which can effectively narrow the detection range and is of great significance for subsequent algorithms. In this experiment, we used the difference between *LC* and *H* of the SSL candidate region and the real region to describe the estimation accuracy.

Eight experiment results of the model accuracy are shown in Table 5. The total number of detected images is 2000. The analysis results show that the *LC* estimation accuracy of the camera motion attitude model is 6–13 pixels, and the *H* estimation accuracy is 7–19 pixels. It can be seen from the experimental results that it is reasonable to estimate the rectangular area of the SSL by using the camera motion attitude model, and then increase the height of 30 pixels above and below the estimate rectangular as the CR of the SSL, which can effectively ensure that the real SSL is in the CR.


**Table 5.** Location estimated accuracy of camera motion attitude model.

#### 2 Experiment 2—Verification of the improved edge detection model

This experiment was carried out in sunny, glare, hazy, and occlusion conditions from the train set, and compared the performance with the Canny operator and the deep learning-based holistically-nested edge detection (HED) algorithm [40]. In order to better illustrate the detection performance of various algorithms, one image was selected for description under four conditions, as shown in Figure 10.

First of all, we used the data provided by the inertial sensor to obtain the CR of the SSL through the camera motion attitude model, as shown in Figure 10a–d, then used three algorithms to process the image separately. We can see that the Canny operator had the worst detection performance of the SSL, since the threshold was not adaptive. In sunny conditions, only part of SSL could be detected, as shown in Figure 10(a1). When the glare conditions or the sea–sky background was hazy, the Canny operator failed to detect the edge of SSL, as shown in Figure 10(b1,c1). When there were obstacles such as ships

or islands blocking the SSL, the Canny operator detected the edge of the obstacle and this affected the performance, as shown in Figure 10(d1). The HED algorithm achieved better performance under any condition, but it was easy to cause over-detection. In addition to identifying the SSL, sea clutter was detected in Figure 10(c2), and the edge of the obstruction was added to the SSL in Figure 10(d2). The proposed algorithm achieved the best performance, since the binary division was performed in the adjacent small blocks, over-detection was effectively prevented while ensuring the threshold adaptive. Although part of the spot was detected in Figure 10(a3), it did not affect the extraction of the SSL.

**Figure 10.** Edge detection results by the four method. (**a**–**d**) Original images with CR. (**a1**–**d1**) Edge detection results by Canny (30, 150). (**a2**–**d2**) Edge detection results by HED. (**a3**–**d3**) Edge detection results by the proposed method.

#### 3 Experiment 3—Verification of the improved Hough transform model

This experiment was mainly to verify the effect of length and angle on the cost function at the stage of SSL extraction. In the train set, we set the value of ω to [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], found the minimum cost function for each value, and drew the corresponding SSL. According to the evaluation metrics of the SSL, the average precision (AP) of the SSL under each ω is shown in Figure 11. It can be seen that when ω = 0.4, the extracted SSL had the highest AP, reaching 99.5%, but when only the length factor of the SSL was considered, the AP was the lowest with just 81.23%. Therefore, it can be concluded that the influence of the angle is greater than the length in the cost function extracted by the SSL.

**Figure 11.** Relationship between the average precision (AP) of SSL detection and ω.

The above three experiments were fully verified for each model of the SSL detection. According to the proposed algorithm, we processed 600 images in the test set, and compared with Fefilatyev's method [7] and Zhang's method [10]. The precision and recall rates are as shown in Table 6. We can see that all of the three methods achieved good performance in SSL detection, but the proposed method was still better than the other two methods, whose average precision (AP) and average recall (AR) in the test set reached 99.67% and 100%, respectively.


**Table 6.** Precision and recall scores for the three methods.

#### *6.3. Experimental Results of Ship Detection in the Train Set*

In this experiment, there were a total of 1050 images in the train set. In order to verify the performance of the proposed method, the other three saliency detection algorithms, SR [41], RBD, and traditional FMBD, were used as the comparison experiments. The parameters of the proposed method are in Table 7.

**Table 7.** Experiment parameters of saliency detection.


When evaluating the performance of ship detection, the binarization threshold *T* and the area threshold *S* determine the satisfaction from two aspects of the pixel intensity and the number of pixels connected. If the *T* and *S* are too large, the target will be submerged in the background. If the *T* and *S* are too small, false target interference will occur. When determining the range of *T*, the target pixel intensity significantly exceeds the average intensity *T*, so the value of *T* represents by times of the average intensity (*T*) of the salient map, and a combination of *T* and *S* is shown in Table 8.


According to the above threshold combination, we can draw the precision-recall graphs of the four detection methods, as shown in Figure 12. It can be seen that the proposed method is superior to the other three saliency detection methods.

**Figure 12.** Precision-recall curves of the four object detection methods.

Figure 13 shows the detection performance of several methods on different datasets of the train set. It can be seen that although the residual spectrum (Figure 13(a1–d1)) obtained by the SR method can detect the ship, it does not accurately indicate the position and shape of the target ship, and is liable to cause false detection. The RBD method achieves better detection results when the number of targets in the image is small, as shown in Figure 13(a2,b2), but it is easy to cause missed detection when there are many targets, such as Figure 13(c2,d2). The traditional FMBD method (Figure 13(a3–d3)) can detect the salient targets well, but the contrast between the target and the background is not obvious enough, which is not conducive to subsequent target extraction. The proposed method (Figure 13(a4–d4)) in this paper can clearly distinguish the target and background, accurately detect the shape and position of the target, and the detection performance is the best.

**Figure 13.** Saliency detection results by the four methods. (**a**–**d**) Original images. (**a1**–**d1**) Saliency detection results by SR. (**a2**–**d2**) Saliency detection results by RBD. (**a3**–**d3**) Saliency detection results by FMBD. (**a4**–**d4**) Saliency detection results by the proposed method.

Figure 14 shows segmentation results of the target ships from the salient feature map where the *T* is 5 times of *T* and *S* is set to 100 pixels. It can be seen from Figure 14(a1–d1) that the SR method has the worst segmentation result, the detection result has many missed detections and false detections, and the target ship positioning accuracy is also poor. The RBD method is more powerful than the SR method, but with a certain degree of target missed detection, such as Figure 14(c2,d2). The traditional FMBD method has a good segmentation result, but it still has some shortcomings in target positioning accuracy and missed detection, such as Figure 14(a3–c3). The proposed method accurately detects the target in Figure 14(a4–c4), but the method fails to perform accurate target segmentation when the target ship appears to be covered, as shown in Figure 14(d4). This is also a direction we will focus on in the future.

**Figure 14.** Object segmentation results by the four methods. (**a1**–**d1**) Object segmentation results by the SR method. (**a2**–**d2**) Object segmentation results by the RBD method. (**a3**–**d3**) Object detection results by the FMBD method. (**a4**–**d4**) Object segmentation results by the proposed method.

#### *6.4. Experimental Results of Ship Detection in the Test Set*

In this experiment, we verified the proposed target detection method in the test set and compared it with Fefilatyev's and Zhang's methods. The precision and recall rates are shown in Table 9. Since Fefilatyev's method only detects the ship above the SSL, both AP and AR are relatively low. Zhang's method is relatively good, as the AP and AR reached 59.21% and 73.25%. However, the proposed method achieved the best scores, with an AP and AR of 68.50% and 88.32%, respectively.



#### **7. Conclusions**

This paper proposes a novel maritime target detection algorithm based on the motion attitude of visible light camera. The camera was fixed on the "YUKUN" ship, and the camera's motion attitude data was acquired synchronously by the inertial sensor, so that the CR of the SSL on the image could be estimated. Then, the improved local Otsu algorithm was applied to the edge detection in the CR, and the Hough transform was improved to extract the optimal SSL. Finally, the improved FMBD algorithm was used to detect the target ships in the vicinity of the SSL. The experimental results show that the proposed algorithm has obvious advantages compared with the other maritime target detection algorithms. In the test set, the detection precision of the SSL reached 99.67%, effectively overcoming the complex maritime environment. The ship detection precision and recall rates were 68.50% and 88.32%, respectively, which improved the detection precision while avoiding the ship's missed detection.

The main contribution of this paper is the construction of a camera motion attitude model by analyzing the six-degrees-of-freedom motion of the camera at sea, combined with the maritime target detection algorithm, which narrowed the detection range and improved the detection accuracy. The edge detection algorithm was improved. The local Otsu algorithm was used for edge processing in the CR, which effectively overcame the complex maritime environment. The Hough transform algorithm was improved. The length and angle of the SSL were simultaneously considered as evaluation metrics of the cost function, which effectively improved the accuracy of SSL extraction. The ROSD was detected by the improved the FMBD algorithm. In the post-processing part of the algorithm, the morphological reconstruction with opening operation, was used to replace the previous processing method to smooth the sea–sky background, which effectively improved the target ship's saliency detection effect.

**Author Contributions:** Conceptualization, X.S. and M.P.; Methodology, X.S.; Software, X.S.; Validation, X.S.; Resources, M.P.; Data curation, L.Z.; Writing—original draft preparation, X.S.; Writing—review and editing, X.S., D.W. and L.Z.; Supervision, D.Z.; Funding acquisition, D.W.

**Funding:** This research was financially supported by National Natural Science Foundation of China (61772102) and the Fundamental Research Funds for the Central Universities (3132019400).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **AMARO—An On-Board Ship Detection and Real-Time Information System**

#### **Katharina Willburger 1,\*, Kurt Schwenk <sup>1</sup> and Jörg Brauchle <sup>2</sup>**


Received: 30 January 2020; Accepted: 24 February 2020; Published: 29 February 2020

**Abstract:** The monitoring of worldwide ship traffic is a field of high topicality. Activities like piracy, ocean dumping, and refugee transportation are in the news every day. The detection of ships in remotely sensed data from airplanes, drones, or spacecraft contributes to maritime situational awareness. However, the crucial factor is the up-to-dateness of the extracted information. With ground-based processing, the time between image acquisition and delivery of the extracted product data is in the range of several hours, mainly due to the time consumed by storing and transmission of the large image data. By processing and analyzing them on-board and transmitting the product data directly as ship position, heading, and velocity, the delay can be shortened to some minutes. Real-time connections via satellite telecommunication services allow small packets of information to be sent directly to the user without significant delay. The AMARO (Autonomous Real-Time Detection of Moving Maritime Objects) project at DLR is a feasibility study of an on-board ship detection system involving on-board processing and real-time communication. The operation of a prototype system was successfully demonstrated on an airborne platform in spring 2018. The on-ground user could be informed about detected vessels within minutes after sighting without a direct communication link. In this article, the scope, aim, and design of the AMARO system are described, and the results of the flight experiment are presented in detail.

**Keywords:** real-time communication; maritime situational awareness; ship detection; Iridium; on-board; image processing; flight campaign

#### **1. Introduction**

Nowadays, about 90% of the world's volume of cargo is seaborne [1]. An enormous amount of money depends on reliable transportation routes. However, safeguarding the seaways is not only essential for the carriage of goods, but especially for the integrity of humans' lives. Piracy, illegal fishery, ocean dumping, and refugee transportation are daily occurrences.

Due to these reasons, maritime surveillance is an important factor for government and private organizations. The European Maritime Safety Agency (EMSA), for example, has set up a vessel traffic monitoring and information system to be able to receive information on ships, ship movements, and hazardous cargoes [2]. General information around maritime domain awareness and how it is handled today can be found in [3,4].

One major issue of maritime surveillance is the vast expanse of the sea on the Earth's surface, which makes observation of ship traffic difficult [5]. The only method to globally get general reliable information about a ship's current position in near-real-time is by using satellite-based AIS (Automatic Identification System) information services [6]. AIS is a cooperative system, primarily intended for collision avoidance. Ships send out their identification, position, course, speed, and several other

traffic-related data. This data is then received by other ships and ground stations in close range. Nowadays, to be able to track ships globally in real-time, satellites are also used to receive AIS data [7]. However, based on AIS data only, the detection of illegal activities like water pollution, illegal fishing, or smuggling is limited.

To improve maritime domain awareness, Earth observation (EO) satellite data is a valuable source of information. Great efforts are made in researching the potential of vessel detection in optical and radar satellite images [8–10]. However, in most cases, these images are analyzed long after the data have been acquired [11]. To tackle this bottleneck, there is also promising progress in establishing near-real-time services on the ground, which today can provide information in, at best, the range of 15 min, measured from on-ground data reception [12,13]. However, the most significant time delay occurs between data acquisition on-board and data reception on-ground, since image data are comparatively huge and their downlink requires a direct contact to a ground station. This delay can amount to hours or even days [14].

A second drawback of EO satellites for time-critical applications is their inability to continuously monitor a defined region of interest. Satellites with a reasonable spatial resolution for ship detection orbit in LEO (low Earth orbit) with speeds of approximately 7 km/s over ground, and typically have a revisit cycle of several days [15].

Promising upcoming observation platforms are unmanned autonomous vehicles [16]. For instance, with their Remotely Piloted Aircraft Systems (RPAS), the European Maritime Safety Agency operates a number of services supporting maritime surveillance [17]. These vehicles are small, lightweight, and ready to take off within minutes [16]. However, their operational flight duration, and thus the range of their geographical applicability, is limited. High-altitude pseudo-satellites (HAPS) are the perfect fit for long-endurance wide-area monitoring tasks. Although there is still a significant portion of development left, major progress has been achieved during recent years. One of the most famous HAPS, the Airbus Zephyr S, can carry a payload of up to 20 kg; with nearly 26 days, it holds the world record for the longest uninterrupted flight [18]. However, if HAPS shall be flexibly and rapidly deployable, even in remote areas, they have to overcome a similar problem to that of satellites: Downlinking time-critical information as fast as possible and informing the user immediately without a direct link to a ground station.

To reduce the time between acquiring data with an Earth observation (EO) platform and delivering meaningful information to the user, the capability of real-time communication from the satellite to the ground is needed. One option is using satellite communication services like Iridium or Orbcomm [19]. These services are able to transfer data 24/7 nearly globally within a few minutes, but offer only restricted bandwidth, which is insufficient to send the raw sensor data continuously down to a ground station for on-ground processing. However, the product data that should reach the user within the shortest possible time typically comprises small information like position, heading, velocity, type, and status of the ship. With on-board processing, this information can be extracted directly after acquisition. Since its size amounts to only a fraction compared to the raw sensor data, it can be sent to the user via the mentioned satellite communication services.

A challenge for on-board data processing is that of the limited computer resources that are available on satellites or other autonomous platforms. Furthermore, the special hardware that is used for on-board systems often differs from the mature technology in on-ground data centers, which makes it unfeasible to simply let the on-ground algorithm run on-board. This problem is discussed widely in the literature. In [20], Yuan Yao et al., present a computing system for on-board vessel detection targeting micro- and nano-satellites. This ship detection system extracts image patches and position information from acquisitions using deep learning methods, with the goal of decreasing data size. The authors were able to reduce an image with a size of 90 MB to product data below 1 MB within 1.25 s with a Commercial Off-the-Shelf (COTS) NVIDIA Jetson TX2. In [21], Yu Ji-yang et al., proposed a real-time on-board ship detection method based on FPGA hardware. They used statistical analysis

and shape information for extracting images by marking their pixels. On an 8 bit image with 1024 × 1024 pixels, they were able to extract ships within 10 s with a precision and recall of over 90%.

Another question which seems to be disregarded so far is that of how a modern on-board computing information system should operate as a whole. With the on-board processing systems mentioned above, data are only analyzed and product data are sent to the ground. This approach is a static concept, not allowing user interaction. What we are targeting is an overall and more flexible system, where the user is able to order data, as is done in a web query. They should also be free to choose when and about what to be informed, and to be able to set automated alarms, which are pushed to them in the case of the occurrence of predefined events.

Within this paper, we present the results of a feasibility study of a comprehensive concept for a real-time on-board ship detection system for satellites and other kinds of unmanned flying vehicles. The study involves the development of a prototype system, called AMARO (Autonomous Real-Time Detection of Moving Maritime Objects), and its testing within an aircraft flight experiment campaign. The focus of the study was on how to design a flexible real-time ship detection system for on-board operation, how to realize it, and what performance, especially regarding real-time information capability, can be expected. The prototype system was designed and built using COTS hardware adequate for the aircraft test campaign. The prototype processes image data on-board and communicates the extracted information to the user immediately and without geographical bounds. The system provides product data like the position, heading, velocity, and shape of ships within minutes after sighting. Furthermore, this product data can be individually requested by the user via email on any smart device on the ground, independently of its locality. AMARO was tested in a flight experiment which took place in April 2018.

#### **2. Materials and Methods**

#### *2.1. Conceptualization*

The initial situation we assume involves an EO platform and a user on-ground who demands to be informed about ship-related events in the shortest possible time. Evaluating various usecase scenarios, the following requirements were identified:


Requirements 1 and 2 demand a bidirectional communication link, where users can interactively exchange custom-tailored information with the on-board system. Requirements 3 and 4 imply that the images are processed on-board and messages are linked via a satellite-based communication system, since direct links are ineligible due to their limited range. Requirement 5 suggests the usage of a database for storing and managing information.

Based on these deliberations, the concept of the AMARO on-board ship detection system was developed. It consists of one or more Earth observing platforms carrying a camera, a GNSS receiver, an on-board computer, and a modem for real-time communication. An AIS receiver can be mounted on board, and its signals can be synchronized with the image data. Ships in the observation area, which send no signal—possibly on purpose—can thus be identified.

On board, ships are detected from the image data by means of remote sensing algorithms. Product data like position, heading, velocity, type, and status of the ship are extracted. These data—some kilobytes in size—can be sent from the EO platform to the network of communication

satellites, which forwards the message until it can be delivered. A small quicklook of the detected object can be included for visual inspection. At most, this procedure will take a few minutes.

On operation, sensor data are acquired continuously by the camera system. These data are immediately evaluated on-board the flying platform, and the product data are stored in a database on the satellite. The user shall be able to query this database by using real-time communication. Furthermore, the user shall be able to define events about which he/she is automatically informed.

Figure 1 shows an exemplary sequence of events involving automatically transmitted and manually requested information: A user is interested in ships that send no AIS signals. He/she therefore requests to be informed automatically if a corresponding event occurs. He/she will not be spammed with information about other detected objects that do not fulfill his/her requirements. As soon as a ship without AIS is detected, a message is sent automatically from the Earth observing platform to the user via the satellite communication network. Since the user is further interested in this ship, he/she requests details via a one-time order. Among others, these details can include a small image of the detected object in order to verify it by visual inspection.

(**a**) Images are taken by the Earth observing platform and processed autonomously on board.

(**c**) User requests detailed information.

(**b**) The user has requested automatic notifications of whenever a ship is detected that does not send AIS (Automatic Identification System) information.

(**d**) User receives detailed information about the unknown ship.

**Figure 1.** An exemplary user story.

#### *2.2. Hardware Architecture*

The AMARO-Box and its contents were specially built for the airborne test campaign. The different hardware devices are therefore not necessarily suitable for an operation on another EO platform. An image of the box during assembly is shown in Figure 2. In the following, the components of the AMARO-Box are explained in detail. Since the camera, which was used in the experiment, and the corresponding image data are an essential part of the flight campaign, but not of the AMARO-Box, they are explained later on, in Section 2.4.

**Figure 2.** Autonomous Real-Time Detection of Moving Maritime Objects (AMARO)-Box with hardware components during assembly.

#### 2.2.1. Communication System

As mentioned before, the main goal of AMARO is generating and delivering the product information to the user as fast as possible. The user may be a crisis intervention center with a high-rate internet connection or a single person off any ground-based connection for communication. Similarly, the carrier platform of the AMARO-Box may be off any connection to ground-based communication facilities. Therefore, in order to facilitate permanent and locally independent communication, the use of satellite connections was considered mandatory. The following criteria for the real-time transmission of the product data have been collected: Low latency, global coverage, easy to obtain, easy to maintain, easy to integrate, and easy to operate.

After a careful deliberation over various options, we decided to use the Iridium Short Burst Data (SBD) service. Iridium is a satellite communication network consisting of 66 active satellites, which provides almost 100% global coverage continuously at almost 24/7. With SBD, Iridium offers a simple and efficient service for transmitting small data packages between equipment and centralized host computer systems [22], commonly used for asset tracking. Messages with a size of around 300 B can be exchanged between the on-board device and the on-ground user. For sending and receiving messages from the device, standard email is used. The email is sent to Iridium with the device's serial number as the subject. The message itself is attached to the email as a normal text file with extension \*.sbd and can be of individual content. In the case of AMARO, this \*.sbd file contained a database query in the sql-language.

The latency for data exchange is specified as less than one minute worldwide [23]. The size of the transceiver device is (31.5 mm × 29.6 mm × 8.1 mm w/h/d). The average power consumption is below 0.8 W.

One big advantage of the Iridium system is that the antenna needs no exact pointing alignment to a determined direction. On an aircraft, it is sufficient that the antenna points approximately to the sky. This may not apply to other platforms. The satellites of the Iridium network fly in orbits of approximately 780 km height, and their signals are broadcast such that the regions of reception overlap on the Earth's surface. However, a loss of coverage is supposable for platforms in higher altitudes.

The Iridium SBD modem and antenna are available custom off the shelf; hence, the purchase is fast and uncomplicated. We decided to buy a MiChroBurst-Q modem from Wireless Innovation [24]. It houses an Iridium 9602 modem and comes development-ready with connection ports for power supply and data transfer via RS-232. The whole box is sized 110 mm × 35 mm × 85 mm w/h/d, which is comparable to a packet of cigarettes.

As an Iridium antenna, we bought an AeroAntenna AT2775-110 [25]. Since the antenna had to be specially mounted on the plane's roof, the owner required an aircraft-certified device and its installation by a specialist. The antenna is flat and streamlined to fulfill the aerodynamic requirements, as can be seen in Figure 3. It operates in the frequency band of (1595 ± 30) MHz and consumes approximately 10 W.

While implementing and testing the AIS subsystem, we had access to an AIS simulator. With its ability to generate fake AIS messages that can be received by our system, validation was greatly eased.

**Figure 3.** AeroAntenna AT2775-110 Iridium Antenna mounted on the airplane's roof.

The operation cost of the SBD service was a minor factor. In the time of operation, we paid around \$20 for a monthly data volume of around 12 kB. This is depending on the size of the messages, equivalent to 40 to 120 messages.

#### 2.2.2. AIS Receiver

To able to receive AIS navigation data from accordingly equipped vessels, an AIS receiver and antenna were installed on the airplane. The receiver we used, AMTEC CYPHO-150, is a custom version of the shelf standard device, whose primary usecase is to be installed on recreational boats, which do not need to send out AIS information. The AMTEC CYPHO is not especially qualified for deployment on airplanes and is hence available at a fraction of the price of a dedicated device. However, it worked perfectly without any trouble in installation or loss in performance.

This AIS receiver is capable of receiving AIS messages of classes A and B [26], sent out by commercial and private vessels, respectively. Receiving several other AIS formats is also possible, but was not in our interests.

It is lightweight, small in form factor (128 mm × 36 mm × 88 mm w/h/d), and has a power consumption below 1.50 W; hence, it was perfectly suited to be installed in the AMARO computing box [27].

The AIS receiver can be connected to the on-board computer via a serial or USB interface. We chose the latter, because it can also be used to power the device. To encode AIS messages, the AMTEC CYPHO-150 uses a serial text-based transmitting protocol specified by the NMEA 0183 interface standard. Typically, AIS messages contain the Maritime Mobile Service Identity (MMSI) number, the call sing and name, the type, the length and beam, the cargo information, the position of the vessel, the Course Over Ground (COG), the Speed Over Ground (SOG), the heading, the speed of the ship, and the status of the ship. The AIS messages were parsed by our on-board software and then directly inserted in the database. Our parser was based on the libais library (see [28]) and modified to meet our needs.

As the AIS antenna, a standard PROCOM HX2 was used [29]. It is a flexible 1/4 *λ* helix antenna for the two AIS channels in the frequencies 161.975 and 162.025 MHz. It is around 150 mm long, and was installed together with the camera in the downward-looking hole of the aircraft's fuselage. While implementing and testing the AIS subsystem, we had access to an AIS simulator. With its ability to generate fake AIS messages that could be received by our the system, validation was greatly eased.

#### 2.2.3. On-Board Computer

The on-board computer is the core component of AMARO. It obtains the camera data, controls the Iridium transceiver and the AIS receiver, performs data analysis, and manages inner and outer communication. The requirements for the on-board computer were the following: It had to be small to fit in a 19 inch rack box together with the other components. It had to provide sufficient computing power for data processing. Moreover, it had to be physically and thermally robust for reliable operation on the aircraft (passenger cabin).

After studying the market, we decided to buy a 1.3 l slim standard personal computer (Shuttle DQ170), which is equipped with standard up-to-date desktop PC components. The computer is robust enough to handle 24/7 operation and up to 50 ◦C ambient air temperature. All interfaces needed for attaching the other devices are present. Equipped with modern desktop PC components, that is, Intel Core i7-6700, 16 GB RAM, and a 512 GB SSD, the system may be luxurious compared to today's or even future computer solutions, deployable on-board HAPS or satellites. Power usage, thermal output, space limitations, and radiation impact were insignificant for the demonstration of our prototype. Therefore, for the first proof of concept, we determined a restriction in this regard to be unnecessary. Nevertheless, since we are also involved in building a next-generation space-computing platform [30], we assume that it is possible to integrate the software on a future computer mounted on an autonomous carrier platform.

#### *2.3. Software Architecture*

The software is the most important and labor-intensive component of the AMARO system. Whereas most hardware components could be bought off the shelf, the software system has been developed from scratch. It is designed to be modular and flexible, such that it can modified for a variety of scenarios and deployed on an arbitrary carrier platform.

#### 2.3.1. Software Requirements

The AMARO software system has to handle two main tasks: Data analysis and communication. The data analysis process shall extract useful information from the image data or other sources. The system shall be capable of processing as much data as possible to reach a high situational awareness. Due to the complexity of the ship detection algorithm and the high amount of image data, processing may be computationally intensive. The AMARO communication system has to be readily responsive, and the available bandwidth has to be used efficiently. Since there are only limited maintenance options at runtime and the system is deployed on-board a flying platform, it has to be absolutely reliable. Interruptions of operations are undesired, and in the case of an error, the system has to mitigate it and get back to operation with the least possible loss of information.

#### 2.3.2. Software Infrastructure

To enable fast and efficient software development, a standard x86-64 Linux desktop distribution was selected as the operating system. As the main programming language, C++14 was chosen. Based on these conventions, a lot of up-to-date software development tools and libraries are available to help minimize development costs. The effort to deploy the software is further minimized, since the development and runtime operating systems are identical.

We want to stress that the goal of the development was to build a software system that proves the concept of a real-time on-board ship detection system within an experimental flight. Nevertheless, as C++ and Linux are also used for future on-board systems, we trust that our software is, in principle, implementable on an on-board platform without fundamental changes. In fact, we have already ported essential parts of our software to an on-board computer within the project ScOSA (Scalable On-Board Computing for Space Avionics), which has the goal of developing a high-performance on-board platform for the deployment on satellites [31].

#### 2.3.3. Software Design

As mentioned in Section 2.3.1, the system has to be high-performance, responsive, and reliable. To meet all of these requirements, a service-based architecture was chosen. A top-level view of the service architecture is presented in Figure 4. Every task is carried out by a unique service that can operate independently of other services. As every service is its own Linux process, high responsiveness for the communication services and, at the same time, a high amount of computation time for the ship detection application can be provided. In case of an error, aborting a service has no direct effect on other processes, and the service can be restarted individually.

For inter-service communication and for data storage, we chose the file-based database SQLite [32]. SQLite can be easily implemented without the need for a dedicated database server. The (asynchronous) communication of the services is handled by the database engine itself. Furthermore, the validity of the database is warranted by SQLite in case of a writing failure. The most functional advantage of using an SQL database is the availability of the language SQL (Structured Query Language). The SQL programming language is the key enabling element used for the implementation of the user interaction with the system. In general, SQLite is not recommend for distributed systems (e.g., network file systems) and is not well suited for heavy simultaneous writing on one database file. However, for the experimental demonstration of our prototype system, all services were located on the same computer, and data were written simultaneously to one database file only sparsely. For a future operational system, the usage of a server-based database is strongly recommended.

**Figure 4.** Top level overview of the AMARO software architecture.

The following subsections describe the different independent services of the AMARO software.

#### 2.3.4. Service SBD Message

Our serviceSBD is a messaging service that allows other services to send and receive messages over the Iridum SBD. A service that wants to send a message adds it to the so-called *toSendMsg* table of the database. When a sending slot is available, serviceSBD checks this table and tries to send the most prioritized message. Received messages are inserted into the *msgReceived* table. As sending and receiving of messages is encapsulated in its own (Linux) process, it can be accomplished independently of other services. This guarantees the best usage of bandwidth and very good responsiveness.

#### 2.3.5. Service Query

The serviceQuery is a query response service. A user on ground can send a one-time query over the Iridium SBD to the database. The serviceQuery tries to answer it and generates a response message.

In detail, a user can send a one-time query request via email to the on-board device using the following format:


```
5 : 2 : asd . db SELECT shipID FROM shape WHERE shipArea >= 50
```
The query request is received by the serviceSBD and saved in the *msgReceived* table. The serviceQuery checks the *msgReceived* table periodically. If query requests have arrived, the most prioritized is executed, and the query request is moved from the *msgReceived* table to the *msgReceivedArchive* table. The result of the one time is then put into an SBD message and inserted into the *msgToSend* table.

With serviceQuery, the user can access all on-board databases. As a typical example, the user can request a list of objects which have a defined size and have been detected within a defined time interval.

#### 2.3.6. Service Push

The servicePush is a messaging service that sends automatic notifications if a predefined event occurs. Events can be added and deleted during operation. Examples for such events could be the detection of oil near a ship (ocean dumping), ships entering a restricted area, ships sending no AIS signals, etc.

In detail, an event is defined as an SQL query with timing information. The timing information contains a time window and a period specifying the time points of execution of the SQL query. All activated events are saved in the *push* table. Events can be added or deleted by modifying the *push* table.

The following example shows how an event can be added to the *push* table via a query request:

```
5 : 3 : system . db INSERT INTO PushTable
( S t a r t , Stop , Periode_s , P ri o ri t y , Category , Db, Query ) VALUES
(''2018−04−12 08:36:00 ' ' , ' '2018 −0 4 −12 20:45:0 0 ' ' , ' '300 ' ' , ' '5 ' ' , ' '107 ' ' ,
' ' asd_DB . db ' ' ,
' ' SELECT shipID , course , speed FROM ship s ORDER BY shipID DESC' ' )
```
In normal words, within the time window, every 300 s, AMARO shall try to send information about the IDs, courses, and speeds of the latest detected ships. If the query is successful, servicePush generates a result message and inserts it into the *msgToSend* table.

#### 2.3.7. Service Ship Detection

The serviceShipDetect is responsible for data analysis. It receives the image data from the camera, analyzes them, and enters the results into a database table. Within the flight experiment, the image data are acquired with a frequency of 1 Hz (one acquisition per second) and are sent from the camera control computer to the AMARO system over ethernet. Since subsequent acquisitions will have overlapping content of around 90%, more than one observation will be made for one and the same object. For another mission with other conditions of image acquisition, these values may differ. The detected objects are examined and filtered out if they are too small or too big, or if one of the shape attributes does not match the defined constraints for being a ship. The considered shape attributes are: Size, perimeter, long axis, short axis, axes ratio, circularity, rectangularity, convexity, and solidity. More information about definitions and methods of calculation of these attributes can be found in [33].

If a ship-like object is detected in one acquisition, the following characteristics are extracted and stored in the database:


Two ship-like objects are considered "similar" when they have both appeared within a limited geographical range and a limited time range, and when both have similar shape attributes, as defined above. If, in two or more subsequent acquisitions, "similar" ship-like objects are detected, they are grouped together and treated as a possible ship. The single objects are marked as assigned in order to not check them again. If, in at least four subsequent acquisitions, "similar" ship-like objects are detected, they are treated confidently as ships, and the following characteristics are extracted additionally:


The object data can be directly accessed by the end-user via a query message (serviceQuery) or by defining an event (servicePush). In the current version, only the thermal channel was used. The computational steps involved are correction and normalization of the image data, water–land classification, connected component labelling [34], object analysis, and data comparison on the object's metadata. For further reading, see [35]. As the data analysis is relatively complex and a high amount of data has to be processed, the serviceShipDetect can be run up to eight times in parallel.

#### *2.4. MACS and Image Data*

Images were acquired using the instrument MACS (Modular Aerial Camera System), cf. [36,37]. A picture of the MACS camera system can be found on figure 5. Using the MACS camera, the photos were calibrated for radiometric correction and georeferenced, providing geographic coordinates, position accuracies, and absolute time for every image pixel. For the AMARO experiment, the system was equipped with a passive optical multi-sensor configuration to cover human-visible (RGB), near-infrared (NIR), and thermal infrared (TIR) spectra, as summarized in Table 1, but eventually, only the TIR channel was transmitted to the AMARO-Box. The image rate can be up to four full frames per second simultaneously for all sensors, and was set to 1 Hz during the flight experiment.

Through a hole in the aircraft fuselage, the lenses have an unobstructed view downwards. An embedded desktop class computer enables raw data recording, preprocessing, and immediate data forwarding. The MACS main computer is connected to the AMARO on-board computer through a Gigabit Ethernet link. Data of the selected image sensor are continuously fed as a byte stream. On this real-time stream, the object classification is executed in-memory, hence, without any image storage. Additionally, a function runs on the AMARO computer to re-establish geographic coordinates: Depending on the aircraft position and altitude the images are projected on sea level. The elevation of this plane is derived from the SRTM database. Because the scenery is completely flat over the sea, an image-edge four-point projection is sufficient. For a given image pixel, i.e., corresponding to a matched object, the function interpolates the edge coordinates and provides the geographic coordinates for the particular pixel.



(**a**) Above view on the MACS system (**b**) Illustration of the sensor head

**Figure 5.** Modular Aerial Camera System (MACS).

#### **3. Results**

#### *3.1. Experimental Flight*

The experimental flight was conducted on the 12th of April in 2018. The AMARO-Box, the antennas for Iridium and the AIS, and the MACS camera were installed into a small science aircraft, a Cesna 207T, provided by the Freie Universität Berlin. The flight started from the airfield Schönhagen, located 50 km south of Berlin, Germany, at 09:15 a.m. UTC, and ended ibidem at 03:21 p.m. UTC. From there, the route lead over northern Germany to the mouth of the Elbe in Hamburg, where the actual experiment was conducted. The flight path is depicted in Figure 6.

In the time between 11:10 a.m. and 11:54 a.m. UTC, the main naval traffic route to enter the port of Hamburg was flown forward and backward (see Figure 7). This is called the experimental core time. Afterwards, the flight was interrupted to refuel the aircraft from 11:59 a.m. to 01:10 p.m. UTC. An overview of the different phases of the experimental flight is given in Table 2.


**Table 2.** Overview of the timing of the different phases of the experimental flight.

**Figure 6.** Flight path from the airfield Schönhagen to the North Sea and back.

On-board were the pilot and two scientists, one to supervise the AMARO-Box, the other to control the MACS camera and to support the pilot. The supervision of the AMARO-Box was actually not necessary, since it was designed to operate autonomously. However, to be on the safe side for the first in-flight test, we considered supervision to be beneficial in case of unforeseen misbehavior. For controlling purposes, we connected the AMARO-Box with an external terminal PC. On-ground, two more people were assisting to install the camera and the AMARO-Box in the airplane.

The actual experiment—the communication with the AMARO-Box—was then conducted by a scientist and a technical assistant on-ground. Equipped with a standard office notebook, they operated the experiment from the user's side in the airfield's restaurant, which provided a stable internet connection. We want to mention that these users could have resided anywhere on Earth and could have used any device, as long as an internet connection was present.

**Figure 7.** Mosaic of thermal images over the mouth of the Elbe during AMARO's experimental core time.

#### *3.2. Performance Communication*

#### 3.2.1. Iridium Signal Quality

During operation time, the signal strength of the connection to the Iridium satellite network was measured and logged in the database *signal.db*. An evaluation of the database revealed an excellent overall reception quality for the whole flight. However, from about 10:30 p.m. to 11:00 p.m., no messages were received nor sent from the on-board AMARO-Box. In the evening, we received a notification from the Iridium SBD service informing us about unplanned intermittent outages which had taken place between 10:42 a.m. and 03:28 p.m. Since issues with the Iridium communication service also impact the AMARO performance in general, potential outages have to be taken into account when analyzing the performance. Nevertheless, it is worth noting that during the 15 months of using the Iridium SBD service, we received a total of five unplanned outage notifications, one of them just on the day of the experimental flight. The percentage distribution of the signal strength can be seen in Table 3. Figure 8a shows the distribution of the signal strength over time.

**Table 3.** Distribution of Iridium's signal strength over time in [%] measured on-board.


(**a**) Iridium SBD signal quality measured on-board. (**b**) Time between query and answer measured on-ground.

**Figure 8.** Iridium Short Burst Data (SBD) signal and response measurements.

#### 3.2.2. Message Exchange

The first part of the operating time was taken by the flight to the experimental site at the North Sea. During that time, several messages were exchanged to establish and check the connection and to set up push queries.

In total, 56 messages were sent from ground to AMARO, while 169 messages were received from AMARO by the on-ground operator. From these, 13 and 34 messages are contemporary with the experimental core time, respectively.

Tables 4 and 5 show the amount and type of messages sent from ground to AMARO and vice-versa. The push queries contained information about start time, expiry time, and period, i.e., the time interval in which the query should be executed by AMARO. One-time queries were executed as soon as possible after reception by the AMARO-Box on board. The possibility to exchange chat messages between the on-board and on-ground operators was set up in order to facilitate communication between on-board and on-ground operators during the flight. Empty downlink messages occurred due to technical reasons within the Iridium service, as described in ([38], Section 7.1.3).

Here, we give some examples for the message exchange during the experimental core time:


To all queries, a category is assigned. The answers are branded with the same category, such that the operator on-ground is able to match them with the corresponding queries.

#### 3.2.3. Query–Response Time Interval

Figure 8b shows the distribution of the time intervals between sending a query and receiving the corresponding answer during the experimental flight. It can be seen that these results coincide with the measurements of the SBD signal strength. Furthermore, Table 6 lists the number of message pairs (query/answer) with the time span between sending the query and receiving the answer.As the messages are sent and received by email, the time between the outgoing of the query and the incoming of the corresponding answer is measured with a temporal resolution of one minute. For time intervals larger than five minutes, the uplink time of the query and the downlink time of the corresponding answer were also analyzed. Note that the computation time is negligible, because the processing times of queries involved only database accesses.


**Table 4.** Number of uplinked messages.

**Table 5.** Number of downlinked messages. In M1 messages, the answer fits into one single packet, while for >M1, it had to be split into several parts.


For 7 out of the 46 query messages, we received no answer at all for different kinds of comprehensible reasons, e.g., due to an incorrect SQL syntax or a preceding delete-query where an answer is not expected. These messages are not taken into account hereafter. Apart from this, one message was answered with a delay of 68 min, where uplinking the query took 67 min and downlinking the answer took 1 min. As the query was sent during the refuel stop of the airplane, during which the AMARO system was deactivated, it is also not taken into account.

For 32 out of the remaining 38 messages, i.e., around 84%, the time span between query and answer was below five minutes, with an average of 1.87 min.

The response time for three messages (8%) was between 5 to 10 min, with an average of 6 min.

Another three messages (8%) were answered between 10 and 30 min. The average delay in this range was 20 min, with an average uplink delay of 18 min and an average downlink delay of 2 min. The most likely explanation for the high delays is the Iridium outage mentioned in Section 3.2.1, as the three queries in question were sent subsequently during the beginning of this time frame.

**Table 6.** Query–Response time: Time interval between sending a query and receiving the corresponding answer (email to email). Gray values were not taken into account for further analysis.


#### *3.3. Performance AIS*

During the operating time, 303,986 AIS messages were received by the system, with 275,144 AIS messages of types 1/2/3 and 7660 of type 5. A further 13,082 unsupported messages were received. A detailed overview is presented in Table 7.


**Table 7.** Overview of AIS messages received on-board. For a detailed description, see [26].

In Figure 9, the aggregation over time of received AIS messages is displayed. AIS data were received during the complete operating time, except during the refuel stop, during which the AMARO-Box was not activated. All AIS messages were stored on-board in the AIS database, which was queried several times on the return flight. However, to match them with the results of the image processing is left for the next stage of expansion.

**Figure 9.** Aggregation of received AIS messages during the operating time.

#### *3.4. Performance Image Processing*

As mentioned in Section 2.3.7, image processing was carried out on the thermal channel only. We abstained from creating a mature algorithm in terms of state-of-the-art remote sensing and Earth observation, since our main focus was to demonstrate a prototype for a globally deployable real-time information system. Nevertheless, the algorithm performed quite well. Apart from this, our service also includes the possibility of downlinking a quicklook of the object, such that an operator can double-check the result by visual inspection. An example set of quicklook images is displayed in Figure 10.

Due to the limited communication bandwidth, the maximal data volume of a quicklook was very limited. With a combination of a small image size, reduction of the color depth to one-bit monochrome, the use of a standard run length compression schema, and splitting up of the images into several parts, it was possible to fit the images in one to three SBD messages, each with a size of around 300 bytes.

During the whole experiment, 13,928 thermal images were acquired by the MACS sensor, while 13,607 images were processed by AMARO. Hence, 321 either got lost during transfer or were missed by AMARO because the processing channels were already busy. During the experimental core time, approximately 2570 thermal images were acquired, of which 25 were not processed. All results from the image processing thread were stored on-board in an SQlite database file. The results of the post-flight analysis are summarized in Table 8.


**Table 8.** Analysis of the results produced by the ship detection thread.

Actually, since the algorithm was designed for objects that are surrounded by water, the results during the flight over land are not meaningful. Therefore, the verification of the algorithm's performance is done for the core time only. From the 26 results that were marked by AMARO as ships, we could verify by visual inspection that 23 were truly ships. Out of these, 13 were assigned one-to-one, i.e., AMARO detected one ship where we also see one ship in the images. An example of the visual inspection of one ship observation is shown in Figure 11.

It happened three times that AMARO detected two distinct ships in a time series of subsequent images, where only one and the same was present. In one case, AMARO detected two ships where there were two ships, but mixed up the results. Apart from this, AMARO re-detected three ships, i.e., these ships were overflown two times (while overflying the mouth of the Elbe forward and backward), and AMARO recognized them as one and the same object, which may be wanted or not, depending on the definition. If this effect is undesired, the time span for identifying "similar" objects could be narrowed further on. No ships were missed by AMARO compared to the visual inspection. For a quicker overview, these results are summarized in Table 9.

Even though the design of the algorithm was not our main focus, development efforts were kept comparatively low and only the thermal channel was used; the results are perfectly satisfactory. However, a thorough comparison with other ship detection algorithms would go beyond the scope of the present paper.

**Table 9.** Comparison of the results from AMARO with visual inspection.


**Figure 10.** Example of quicklook images of potential ship objects.

**Figure 11.** Ship detected by AMARO, 12th April 2018, 11:45 am UTC, Mouth of the Elbe, Hamburg, Germany: (**a**) RGB image (**b**) thermal image (**c**) quicklook which was sent to ground from AMARO.

#### **4. Discussion**

We developed a comprehensive prototype system called AMARO for future real-time ship detection on-board satellites and other Earth observing vehicles. It includes on-board image processing, real-time communication via a satellite network, and a user-driven message exchange. To test the concept, the AMARO-box was built as prototype hardware, and the system was tested within a flight campaign over the North Sea.

#### *4.1. Communication*

Most special focus was put on the user-driven near-real-time information capability facilitated by using a satellite communication service. It was successfully demonstrated within our flight campaign, in which the Iridium SBD service was used for message exchange. More than 84% of the user queries were answered in less than five minutes with an average of less than two minutes.

For EO satellites, an information flow within minutes is not possible with the current approach of downloading the sensor data to ground stations and processing them on-ground. In contrast to conventional remote sensing missions, our system does not rely on any direct link to a ground station. By using satellite communication services, as demonstrated with AMARO, product information can be communicated to any device on the ground with connection to the internet, independent of the localities of both the carrier platform and the user. The system is therefore flexibly deployable at varying monitoring sites and especially suitable for the surveillance of remote areas without ground connection; for example, over the open sea. Especially for micro- and nano-platforms, this can be a feasible approach for enabling real-time capability, as it can be used worldwide, 24/7, and no ground infrastructure is required. Apart from this, the operational costs are affordable, even for smaller missions.

Apart from this, with AMARO, users are not drowned with an unmanageable amount data. They can control the flow of information by interactively interchanging messages with the on-board system. They can configure the automatic notification service during operation to get custom-tailored information about events of their interest. Finally, they can request further details by querying the on-board databases.

Being able to get information about ships within few minutes after observation, as we demonstrated, is beneficial in various situations. For example, it can support maritime safety agencies to take action against smuggling, illegal fishing, and sea pollution or support sea rescue services.

Nevertheless, regarding the communication procedure, some aspects were deemed to be in need of improvement. As described in Section 2.2.1, the queries in the SQL language were recorded in text documents and sent to AMARO as email attachments. The AMARO system replied the same way. It turned out that this procedure was uncomfortable to handle, even for the experienced operator. Requests and their corresponding answers always started with the same ID for an easier matching, but nevertheless, it was difficult to oversee which answers were already received, which were wrong, and which were empty or not present at all.

One of our priorities regarding further development is therefore the design of a graphical user interface. In principle, the interface should handle user-defined requests to a database via the internet. In the upcoming stage of expansion, every authorized user should be able to retrieve the information of their personal interest via a web application, using the device of their choice (smartphone, tablet, laptop, etc.). Apart from this, the limited bandwidth of around 300 B per message was a bottleneck in the communication flow. Sophisticated programming and workarounds were necessary in order to transmit a reasonable amount of information. The quicklook images could only be sent as highly compressed binary shapes. However, here we are sure that our approach will be augmented by ongoing and future development, which will continuously allow higher transmission rates. For example, with their next-generation satellites launched in the recent years, Iridium SBD can now transmit packages of around 2 kB in message size, compared to the previous 300 B. Furthermore, there may evolve even more possibilities with globe-spanning satellite-borne internet systems, such as OneWeb or StarLink.

Regarding the deployment on EO satellites, further investigation is necessary to examine the potential of the existing real-time communication services in LEO orbits. Satellite communication networks are usually designed for operating services on-ground and, hence, provide continuous coverage within their operational area on the Earth's surface. Since EO satellites typically fly in an altitude of approximately 200 to 2000 km, at that height, coverage may be rather discontinuous. Apart from this, depending on the relative orbits of EO and communication satellites, a loss of connection may be encountered due to the amplified Doppler effect [39,40]. However, some on-board experiments were already conducted and yielded apparently promising results [41,42].

#### *4.2. Onboard Data Analysis*

Within AMARO, image data are processed directly on-board in order to extract the relevant and rather small-sized product data. In combination with using satellite communication, on-board data reduction is the prerequisite that enables real-time information.

Although the designed algorithm uses the thermal infrared channel only and is altogether kept relatively simple, the results were definitely competitive. More than 88% of the detected objects could be identified as ships. No ships, which were identified by eye, were missed.

We want to mention that this ship detection algorithm was primarily developed to be able to demonstrate the concept of a real-time onboard ship detection system in general. Only limited resources were available for the development and the validation of the ship detection algorithm. For a future version of the system however, we are planning to cooperate with remote sensing experts to integrate a mature, validated, state-of-the-art ship detection and classification processor.

Currently, we are part of the project ScOSA (Scalable On-Board Computing for Space Avionics), which has the goal of developing a high-performance on-board computer for satellite platforms [31]. The ScOSA system consists of multiple hardware nodes, uses a distributed computing approach, and can be dynamically reconfigured during runtime to remove faulty nodes and shift applications to healthy ones. We contribute to this project by porting AMARO to the ScOSA platform in order to stress the overall system and demonstrate its computing capacity [30].

It was not part of our experiment to synchronize the signals from the AIS receiver with the results from the image processing. However, the fusion of AIS and image data would bring a significant benefit. In particular, ships without signals could thus be identified. In the scientific community, there are several ongoing projects engaged in the fusion of AIS and image data [43]. Hence, we are establishing cooperations to rely on profound experience for the future improvement of our application. At this stage, we would like to mention that our system is not limited to optical data and AIS. Other sources of signals, e.g., an SAR camera (Synthetic Aperture Radar) or a pager for mobile phones, can be added without modifying the existing concept or the software structure.

#### *4.3. System Design*

Several publications about the individual subsystems exist, e.g., on-board image processing or real-time communication. However, our investigation and development is aimed at designing an operable system as a whole. We designed a comprehensive modular system for on-board data analysis and real-time information. It detects vessels and sends the results to the interested user within minutes after sighting. Our system is not designed as a monolithic block, but is flexibly expandable and deployable. It is modeled similarly to modern internet searching engines, consisting of a big database and several services that request and modify the database. The software system is therefore easy to expand, to adapt, and to maintain. AMARO is not set up as a simple one-way processing chain, i.e., getting images, extracting information, sending results. In fact, it is an autonomously working entity respondent to the user's needs.

#### *4.4. General Limitations of the System*

By now, the main benefit of the system is achieved by using optical image data. Therefore, usability of the system heavily depends on the weather and lighting conditions. Operation at night is not supported, and during the day, heavy cloudiness can seriously limit the surveillance performance of the system. In the future, synthetic aperture radar sensors may be used to noticeably enhance the surveillance usability of the system. By now, this option is not feasible due to the weight and energy usage of available sensors and the high computing performance needed to process the data. Furthermore, with satellites, a permanent surveillance of a specified region is not feasible, as geostationary satellites do not provide a reasonable image resolution. However, in such a scenario, we see the benefit of the system as an additional data source, instead of as a single permanent surveillance solution.

#### *4.5. Expansion of Deployment*

A field to be investigated in more detail is that of possible flight devices. High-altitude pseudo-satellites seem to be predestined for this, since they offer the possibility of continuously monitoring an area of interest autonomously and for a longer duration. With the DLR working on the development of a high-altitude platform [44] and commercial systems like the Airbus Zephyr [18] starting to become available, we think that, within the next five years, suitable flight platforms may be a realistic option.

Finally, we are planning to expand our system to be deployable for other time-critical Earth observing scenarios that would benefit from a rapid information system; for example, real-time monitoring of traffic, sea ice, or disasters.

**Author Contributions:** Conceptualization, K.W. and K.S.; methodology, K.W. and K.S.; software, K.W. and K.S.; validation, K.W., K.S., and J.B.; investigation, K.W. and K.S.; resources, K.W., K.S., and J.B.; data curation, K.W.; writing—original draft preparation, K.W., K.S., and J.B.; writing—review and editing, K.W., K.S., and J.B.; supervision, K.W. and K.S.; project administration, K.W. and K.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** We also thank Christian Mietner for building the AMARO-box and providing technical support, Sebastian Pless for organizing the flight campaign, Daniel Hein for providing software support for the MACS camera system and all persons at the DLR involved in the AMARO project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:



#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*
