MODAN: Multifocal Object Detection Associative Network for Maritime Horizon Surveillance

Yoon, Sungan; Jalal, Ahmad; Cho, Jeongho

doi:10.3390/jmse11101890

Open AccessArticle

MODAN: Multifocal Object Detection Associative Network for Maritime Horizon Surveillance

by

Sungan Yoon

¹,

Ahmad Jalal

² and

Jeongho Cho

^1,*

¹

Department of Electrical Engineering, Soonchunhyang University, Asan 31538, Republic of Korea

²

Department of Computer Science, Air University, Islamabad 44000, Pakistan

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(10), 1890; https://doi.org/10.3390/jmse11101890

Submission received: 7 September 2023 / Revised: 24 September 2023 / Accepted: 27 September 2023 / Published: 28 September 2023

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In maritime surveillance systems, object detection plays a crucial role in ensuring the security of nearby waters by tracking the movement of various objects, such as ships and aircrafts, that are found at sea, detecting illegal activities and preemptively countering or predicting potential risks. Using vision sensors such as cameras to monitor the sea can help to identify the shape, size, and color of objects, enabling the precise analysis of maritime situations. Additionally, vision sensors can monitor or track small ships that may escape radar detection. However, objects located at considerable distances from vision sensors have low resolution and are small in size, rendering their detection difficult. This paper proposes a multifocal object detection associative network (MODAN) to overcome these vulnerabilities and provide stable maritime surveillance. First, it searches for the horizon using color quantization based on K-means; then, it selects and partitions the region of interest (ROI) around the horizon using the ROI selector. The original image and ROI image, converted to high resolution through the Super-Resolution Convolutional Neural Network (SRCNN), are then passed to the near-field and far-field detectors, respectively, for object detection. The weighted box fusion removes duplicate detected objects and estimates the optimal object. The proposed network is more stable and efficient in detecting distant objects than existing single-object detection models. Through performance evaluations, the proposed network exhibited an average precision surpassing that of the existing single-object detection models by more than 7%, and the false detection rate was reduced by 59% compared to similar multifocal-based state-of-the-art detection methods.

Keywords:

horizon detection; maritime surveillance; multifocal detection; morphology; SRCNN

1. Introduction

As deep learning technology advances, the capabilities and accuracy of object detection are progressively increasing. Such progress is being applied in the defense field for terrestrial applications and coastal border and marine surveillance. Maritime surveillance aims to monitor the activities of various objects at sea, detect abnormal behaviors, and raise alarms as necessary. To achieve this goal, maritime surveillance must be accompanied by object detection, which plays an essential role in ensuring the security of maritime areas by locating ships and aircraft navigating maritime regions, tracking their movement paths, preparing for security threats, and enhancing the ability to issue early warnings by recognizing illegal activities in advance [1,2,3].

The most common maritime surveillance method is radar-based surveillance, which enables wide-area coverage and the detection of distant objects in addition to covering vast maritime areas. Santi et al. [4] demonstrated the feasibility of maritime surveillance using multitransmitter GNSS-based passive radar in the range-doppler domain. Similarly, Nasso and Santi [5] proposed a method for detecting vessels over a wide area by fusing multiple bistatic channels and multiple frames received from a multi-transmitter GNSS-based passive radar. Zhang et al. [6] proposed a low-cost maritime surveillance system using high-frequency surface wave radar and a hierarchical key-area perception model. However, it is difficult for radar-based surveillance to detect nonmetallic objects or objects with low reflectivity in small fishing boats with six to eight passengers; furthermore, it is difficult to obtain information regarding the shape or size of the object [7]. Another method of maritime surveillance employs acoustic sensors, which can identify the location and type of an object by the object’s sound characteristics [8,9]. However, acoustic sensors are vulnerable to interference from wind, waves, and other background noise, and their accuracy is low due to noise generated by the sensor itself.

Given these limitations, various studies have been conducted on vision-based maritime surveillance, which can accurately analyze the detected object’s shape, size, color, and other information through a variety of visual information and high-quality images acquired using vision sensors, such as RGB or infrared (IR) cameras. Tran and Le [10] proposed a method to find dynamic and static objects by using background subtraction and saliency detection, but it revealed the limitation that small objects are often included in the background and undetected. Nita and Vandewal [11] proposed a method to detect large vessels, such as tankers or container ships, using mask regions with convolutional neural networks (R-CNN). However, it is also difficult to extract detailed features for small fishing boats, and the detection performance drops sharply due to the inability to accurately identify the object’s boundary. Qi et al. [12] proposed an improved Faster R-CNN that uses image downscaling and scene narrowing to obtain more valuable features of objects and improve processing speed. While it improves the detection accuracy and processing speed of short-range ships near the harbor compared to the existing Faster R-CNN, it does not consider low-resolution ships located near the horizon and is, therefore, unsuitable for maritime surveillance near the horizon. Ma et al. [13] proposed a method of finding objects by extracting patches in areas where objects are likely to exist based on IR images and using nonlocal dissimilarity measures and multi-frame saliency measures. However, this revealed a weakness, in that the detection performance is reduced when large objects partially obscure small objects.

Similarly, Wang et al. [14] proposed a method of detecting maritime targets by dividing objects based on grayscale distribution curve shift binarization after enhancing the image through local peak singularity measurement based on IR images. However, the detection performance of objects with poor reflection or low reflection intensity due to being far from the sensor is reduced when detecting objects based on IR reflection. However, even with these endeavors of various researchers in the field of vision-based maritime surveillance, detection performance remains unsatisfactory for objects with low resolution, objects near the horizon, and small objects that are difficult to identify from a distance.

The detection of objects near the horizon, which can pose a serious threat to ships, crews, and port security, plays a crucial role in maritime surveillance, but due to its inherent difficulty, limited research has been conducted on this subject. Yang et al. [15] introduced an approach for detecting small objects in infrared images by utilizing the local heterogeneity property and the nonlocal self-correlation property to suppress highly bright background clutters caused by strong waves or sunlight. However, the detection performance diminished in scenarios with a large number of objects or when larger ones partially obscure certain objects. Su et al. [16] proposed a method to detect the horizon using a panoramic vision system and Hough circle transformation, as well as to detect objects near the horizon using local region complexity. However, this method has several drawbacks, as follows: the detection performance of the horizon is greatly dependent on the preset threshold value when unnecessary boundaries are removed for efficient horizon detection, and the detection performance of objects is also affected, making it unstable. Yoneyama and Dake [17] proposed a method using a large object detector to detect large vessels and then segment the horizon region into patches to find small objects near the horizon. This approach involved using a small object detector to identify small vessels within these patches. However, when part of the horizon is obscured by a large vessel, it is difficult to find the horizon, which causes the small object detector to stop working and miss small objects.

Unidentified objects located near the horizon in maritime surveillance environments can be a potential threat; however, they are challenging to detect due to their small size and low resolution. To overcome this limitation, the approach initially focuses on horizon detection and then emphasizes detecting objects near the horizon. However, this approach still does not show a satisfactory performance. Thus, we propose a multifocal object detection associative network (MODAN) for more reliable and improved maritime surveillance in this study. The proposed network first performs a horizon search to detect objects located near the horizon that occupy a very small area of the entire image. The horizon detector initially employs K-means-based color quantization to segment the input image into two regions: the ocean and the sky. Noise is then eliminated through mathematical morphology, ensuring stable horizon detection even when parts of the horizon are obscured. Because the extracted horizon area is horizontally long, when such an extracted image is fed into an object detection network that receives inputs in the form of squares, the object’s shape is distorted and the detection performance is degraded. Thus, the region of interest (ROI) selector extracts the ROI, which is the area of interest around the detected horizon, and then segments the ROI into square-shaped patches. Afterwards, the super-resolution CNN (SRCNN) is used to improve the resolution of the ROI image. Then, the far-field detector (FFD) simultaneously searches for small objects near the horizon, and the near-field detector (NFD) searches for relatively large objects based on the original image. Duplicate detected objects from two independent detectors are removed using weighted box fusion to estimate the bounding boxes of the objects finally. To verify the superiority of the proposed MODAN, a comparative experiment was conducted with other similar models. The comparison results showed that MODAN had an average precision (AP) improvement of more than 7% over existing single-object detection models. It also exhibited an AP improvement of more than 10% over multifocal-based state-of-the-art detection models, and a false detection rate which was reduced by more than 59%, thereby validating the stability and efficiency of the proposed model’s comparative advantage. The main contributions of this study can be summarized as follows:

− A horizon detector based on color quantization is proposed, ensuring stable horizon detection even when parts of the horizon are obscured. As a result, the rate of undetected objects located near the horizon is effectively reduced.
− An approach by which to enhance the resolution of segmented ROI images through SRCNN is proposed, subsequently improving object detection performance by enhancing the shapes of small objects.
− A multifocal detector is proposed to detect both near-field large objects and far-field small objects near the horizon.
− Finally, weighted box fusion is introduced to improve detection performance by reliably removing duplicate detected objects from segmented ROI images.

2. Horizon Detection and ROI Selection

In the maritime environment, horizon detection provides essential information for navigation and exploration and plays a vital role in locating and identifying the destination for autonomous navigation in vehicles such as ships and aeroplanes [18,19]. In particular, objects of interest in maritime surveillance operations often appear near the horizon, making horizon detection an imperative step. Consequently, significant efforts have been devoted to detecting the horizon [16,17,20,21]. In this study, we propose a horizon detection and ROI selection structure similar to the existing methods, as depicted in Figure 1, to more reliably detect objects near the horizon.

To detect the horizon, the ocean and sky regions’ colors are first binarized using color quantization based on K-means, and then background and object noise are removed through mathematical morphology. Based on this, the detected horizon is rotated to align with the horizontal axis of the image. Then, an ROI is extracted evenly, ensuring the visibility of small objects within the ROI. This ROI is subsequently passed to the object detector through ROI segmentation for object detection.

2.1. Color Quantization

Color quantization is a method that converts an image’s colors into a limited set of colors, which maps 24-bit colors to a fixed number of colors. This not only reduces the number of colors in the image, which reduces the amount of computation required for image processing, but also has the advantage of removing noise caused by clouds or waves by eliminating high-frequency components from the image and generating smooth and consistent colors [22].

In this study, traditional K-means clustering is employed for color quantization to cluster the

W \times H

color vectors in the image into two colors, ocean and sky, and each pixel in the input image has vectors of three colors

(R

, G, B). The proposed quantization method randomly extracts N sample vectors from the total color vectors and defines a set of centroid vectors,

S = \{S_{s k y}, S_{s e a}\}

, as any two of the extracted sample vectors. Each centroid vector comes to represent the sky and ocean colors, and based on the Euclidean distance to other sample vectors, these vectors are divided into two groups. The color vector clustered around the centroid

S_{i}

,

I_{k, i}^{q}

, can be calculated as follows:

I_{k, i}^{q} = \underset{i}{argmin} (‖ I_{k} - S_{i} ‖^{2}), i = s k y o r s e a

(1)

Here,

I_{k}

refers to the k-th color vector among

N

sample vectors and

S_{i}

refers to the centroid vector that represents the sky or the sea. For each group, a vector positioned at the mean of the clustered color vectors is selected as a new centroid vector,

S_{i} = (\sum_{k = 1}^{N_{i}} I_{k, i}^{q}) / N_{i}

. Here,

N_{i}

is the number of color vectors belonging to that cluster. This process is repeated until the centroid vectors are stabilized. By doing this, the optimal centroid vectors differentiate distinctly between sky and sea colors. Clustering all color vectors using the optimal centroid vector results in two colors, as shown in Figure 2.

In this way, the image is divided into two regions through the quantization process, but some color vectors are incorrectly clustered, resulting in small holes or dots, as seen in Figure 2b. The holes or dots caused by the incorrectly clustered color vectors can cause errors in the horizon extraction process. To address these errors, dilation followed by erosion operations of mathematical morphology is performed to merge adjacent objects, reduce the gap between objects, and remove noise or holes in objects to detect the horizon reliably.

2.2. Mathematical Morphology

Mathematical morphology is a technique that extracts the desired parts of an image using a kernel of predefined geometric shapes. It extracts the elements necessary to represent the shape of objects, such as boundaries and skeletons. This technique often enhances the clarity of ambiguous object regions resulting from image binarization processes such as noise removal, hole filling, and connecting broken lines [23]. The basic operations include erosion, which removes object protrusions or background noise, and dilation, which removes internal noise such as holes in objects. Opening and closing are based on these operations. Among them, closing is an operation that performs dilation followed by erosion on a binary image to fill in the holes inside the object, refining the object and removing the background noise that is strengthened to smooth the edges of the image. The enhanced binary image is obtained by removing the internal and background noise.

I^{m} = (I^{q} \oplus F) ⊖ F

(2)

Here,

F

is a predefined geometric kernel of size

m \times m

.

\oplus

denotes the dilation process such that

I^{q} \oplus F = \cup_{f \in F} I_{f}^{q}

, where

I_{f}^{q}

is the translation of

I^{q}

by

f

. This means that the kernel fills the overlapping region with 1 if it does not completely overlap with the 1-filled region as it passes through the binarized image. This process expands the area of the object and removes noise, such as irregularities at the boundaries of the object or holes inside the object. On the other hand,

⊖

denotes the erosion process, which is the opposite of the dilation process, in which the kernel makes the overlapping region 0 if it does not entirely overlap with the 1-filled region. This action removes small background noise and, in addition, restores the sizes of objects that have been expanded by the dilation process to their original state. Thus, the noise disappears and the sea and sky regions are separated.

2.3. Horizon Search and Rotation

In the separated binary image, the pixel’s position is moved from the top to the bottom to find the point where the pixel value changes from 0 to 1, and the horizontal line coordinates are detected. However, there may be cases in which the horizon is undetected due to large objects near the horizon or close to the sensor. Therefore, to reliably detect the horizon coordinates,

h

horizontal candidate coordinates are extracted at a regular interval from the left and right sides of the image center, respectively, and the average of these coordinates is calculated to finally estimate the

y

-coordinates of the left and right endpoints of the horizon,

H_{y}^{L}

and

H_{y}^{R}

.

H_{y}^{L} = \frac{\sum_{i = 1}^{h} y_{i}^{L}}{h}, H_{y}^{R} = \frac{\sum_{i = 1}^{h} y_{i}^{R}}{h}

(3)

Here,

y_{i}^{L}

and

y_{i}^{R}

refer to the

i

-th

y

-coordinate values of the coordinates that are candidates for the horizon from the left and right sides, respectively. As a result, the coordinates of the left and right endpoints of the horizon are

H^{L} = (0, H_{y}^{L})

and

H^{R} = (W, H_{y}^{R})

, respectively, where

W

is the width of the image.

A portion of the area around the horizon, which is the ROI, should be separated through the estimated horizon coordinates from the original image, but if the horizon is tilted, the ROI cannot be separated evenly. In this case, to minimize the loss of the object detection result and the matching process of the segmented ROI image with the original, the angle

θ

of the tilted horizon depicted in Figure 3 is found as follows:

θ = \cos^{- 1} \frac{‖ H^{L} - H^{R^{'}} ‖}{‖ H^{L} - H^{R} ‖}

(4)

Assuming that the horizon in the image is parallel to the x-axis, the right endpoint of the horizon,

H^{R^{'}}

, is automatically set to

H^{R^{'}} = (W, H_{y}^{L})

based on the y-coordinate of its left endpoint,

H_{y}^{L}

. Afterward, the image is rotated to make

H_{y}^{L}

and

H_{y}^{R}

the same as follows, and then the ROI around the horizon is extracted.

(\begin{matrix} x^{'} - x_{0} \\ y^{'} - y_{0} \end{matrix}) = (\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}) (\begin{matrix} x - x_{0} \\ y - y_{0} \end{matrix})

(5)

Here,

(x_{0}, y_{0})

is the center of rotation, and

(x, y)

and

(x^{'}, y^{'})

are the pixel coordinates before and after rotation, respectively. The resulting rotated image

I^{r o t}

is obtained through the rotation matrix to make the horizon line parallel.

2.4. ROI Selection and Segmentation

When extracting the ROI near the horizon from the rotated image

I^{r o t}

, if the extracted image is rectangular, the object’s shape may be distorted, and the detection performance may also be degraded for an object detector that receives a square image as its input. Thus, the image is split into square patches as follows so that the object detector can detect objects near the horizon.

I_{i}^{r o i} = I^{r o t} (\frac{W \times (i - 1)}{D} : \frac{W \times i}{D}, H^{L} - u c : H^{L} + l c)

(6)

Here,

D

refers to the maximum integer that can divide the image’s horizontal axis into ROIs with sizes of

|u c - l c|

;

i

is the sequence pair from 1 to

D

; and

u c

and

l c

represent the ROI ranges above and below the horizon, respectively. Consequently, as shown in the example in Figure 4, an ROI image divided into squares near the horizon can be obtained based on the rotated image after detecting the horizon.

3. Multifocal Object Detection

Existing object detection networks have focused on relatively large objects, such as large ships, as they have tried to detect objects from the entire input image. Consequently, small ships near the horizon located far from the sensor have low resolution and occupy a very small part of the entire image, so the object lacks a distinct shape, and vulnerabilities such as misclassification and even undetection exist. To address this limitation, a strategy is required in order to train object detectors by extracting or transforming images more focused on small objects. To this end, we propose a multifocal object detector that simultaneously searches for small objects near the horizon by means of FFD and relatively large objects by means of NFD, based on the original image, to detect all objects at sea after improving the resolution of the segmented ROI image through SRCNN. The block diagram is shown in Figure 5.

The segmented ROI images input into the FFD are images that have been enlarged around the horizon, which is larger than the object’s size in the original image, but the resolution is reduced, so it is not easy to understand the shape using a general object detector. Thus, SRCNN, a deep-learning-based image resolution enhancement network, is used to increase the resolution of the segmented ROI image while restoring the object’s shape so that small objects can be detected. However, both FFD and NFD can detect a large object redundantly. Therefore, the weighted box fusion (WBF) removes overlapping bounding boxes and finally estimates the object based on the concatenation of objects inferred independently through NFD and FFD.

3.1. Resolution Enhancement

Objects located near the horizon are characterized by very low resolution because they are small enough to be indistinguishable from the naked eye and occupy only a small portion of the entire image, as shown in Figure 6a. These small, low-resolution objects are frequently encountered while searching for distant objects. For this reason, research on image resolution enhancement has inevitably proceeded simultaneously with distant object detection [24,25]. In this study, SRCNN is used to restore the low-resolution image of the segmented ROI to high resolution to make the shape of the object clear, and is then fed into the object detector to improve the detection performance, which is similar to the method proposed by Dong et al. [26]. SRCNN consists of three layers: patch extraction and representation, nonlinear mapping, and reconstruction. First, patches are extracted from the low-resolution image of the segmented ROI and fed into the first layer of the CNN, representing each patch as a high-dimensional vector. These vectors form a set of feature maps, which are as follows:

I_{i}^{s r 1} = \max (0, w_{1} * I_{i}^{r o i} + b_{1})

(7)

Here,

w_{1}

and

b_{1}

refer to the filter and bias with weights, respectively, and

*

refers to the convolutional operation. Since

I_{i}^{s r 1}

, which is made through the first layer, is a linear shape, it cannot be directly used in nonlinear operation, which restores high-resolution images from low-resolution images. Thus, the second layer, nonlinear mapping, is used to map the feature map

I_{i}^{s r 1}

to another high-dimensional vector with added nonlinearity.

I_{i}^{s r 2} = \max (0, w_{2} * I_{i}^{s r 1} + b_{2})

(8)

The high-resolution expressions created through this process need to be gathered in order to create the final high-resolution image. Thus, the final high-resolution image

I_{i}^{s r}

is extracted by convolving the high-resolution patches extracted in the previous layer with a linear filter consisting of weights in the last layer, as follows:

I_{i}^{s r} = w_{3} * I_{i}^{s r 2} + b_{3}

(9)

Figure 6 shows an example of the image resolution enhancement process. The vessels, which could not be confirmed in the original image, could be identified to some extent through the ROI selector, although they were blurred. After passing through SRCNN, the mosaic-like noise was significantly reduced, and it was confirmed that the resolution of the objects was greatly improved.

3.2. YOLO Detection Network

As deep learning technology advances, image recognition accuracy using CNNs is steadily increasing. Similarly, the performance of object detection, which determines the location and type of a specific object in an image, has also significantly improved. Initially, two-stage detectors such as regions with convolutional neural network (R-CNN), Fast R-CNN, and Faster R-CNN have emerged and have demonstrated high accuracy. However, with the emergence of 1-stage detectors such as You Only Look Once (YOLO) and the single shot multibox detector, which focus on real-time object detection, greater attention has shifted toward these methods. To improve the detection speed, these one-stage detectors made the bounding boxes and class probabilities in the image a single regression problem, which increased the inference speed and allowed the neural network to learn the entire process of estimating the type and location of an object.

In maritime surveillance, it is important to quickly identify objects that pose a potential threat in order to develop security measures or preemptive actions. Therefore, the latest one-stage detector, YOLOv7, is considered in this study. YOLOv7 exhibits faster and better detection performance by reducing the parameters to about 40%–50% of previous networks [27]. YOLOv7 enables the maintenance of the original gradient route and allows for efficient learning through an extended efficient layer aggregation network (E-ELAN) block. It also reduces the inference cost by efficiently decreasing the parameters through re-parameterized convolution [27]. YOLO divides the input image into regions of an

A \times A

grid and predicts preset

B

bounding boxes. If the actual bounding box and the predicted bounding box overlap by a specific region or more, the predicted bounding box contains an object. The probability that the predicted bounding box is one of the objects that can be classified is then found as follows:

P_{c l s} = P_{o b j} \times I O U^{t h r} \times P_{c l s | o b j}

(10)

Here,

P_{o b j}

refers to the probability that the predicted bounding box contains an object;

I O U^{t h r}

refers to the intersection over union (IOU), which is the overlapping region between the predicted bounding box and the actual bounding box; and

P_{c l s | o b j}

refers to the probability, indicating whether a specific object is being classified among the objects of interest. Finally, the bounding box with the highest

P_{c l s}

among the predicted B bounding boxes is selected as the bounding box for the object [28].

3.3. Weighted Box Fusion

Generally, an object detection model produces a bounding box, indicating the location of the object, and a confidence score, indicating the confidence regarding the detected object. Typically, for a single object, multiple bounding boxes with confidence scores ranging from low to high are generated. Thus, the final part of an object detection model includes removing overlapping bounding boxes, such as through non-maximum suppression (NMS). Similarly, overlapping bounding boxes are inevitably encountered when multiple sensors are used for object detection, necessitating the development of techniques with which to estimate the optimal bounding boxes.

The MODAN proposed in this study consists of NFD, to detect large objects from the entire image, and FFD, to find small objects by enlarging the surroundings around the horizon so that each independent detector estimates overlapping bounding boxes for the same object. In particular, as shown in Figure 7, in the case of large objects, the detection results of FFD based on the divided ROI images may appear as two different bounding boxes for the same object. In this scenario, due to the same object being detected doubly from different detection models or segmented images, each bounding box may possess a high confidence score. However, applying NMS might risk eliminating accurately detected object bounding boxes, as shown in Figure 7a. To overcome this, WBF is applied to improve the accuracy of the prediction and obtain robust detection results by generalizing the predicted results through an ensemble of bounding boxes that are doubly detected.

Given the concatenated detection results of NFD and FFD, WBF sorts the detected bounding boxes by confidence score, then extracts the overlapping bounding boxes above a certain level by calculating the IOU between the first bounding box and all other bounding boxes. The IOU between the first bounding box

A

and any other bounding box

B

is calculated using the following formula:

I O U (A, B) = \frac{A r e a (A \cap B)}{A r e a (A \cup B)}

(11)

Here,

A r e a (A \cap B)

and

A r e a (A \cup B)

represent the intersection and union of bounding boxes

A

and

B

, respectively. Then, a fused bounding box is generated with location coordinates that represent the average of the location coordinates of the extracted overlapping bounding boxes.

B b o x^{f u s e d} = [\frac{x_{c e n t e r}^{A} + x_{c e n t e r}^{B}}{2}, \frac{y_{c e n t e r}^{A} + y_{c e n t e r}^{B}}{2}, \frac{w i d t h^{A} + w i d t h^{B}}{2}, \frac{h e i g h t^{A} + h e i g h t^{B}}{2}]

(12)

The class probability of the fused bounding box is set to the class probability with the highest probability among the overlapping bounding boxes. After iterating over all bounding boxes, the optimal bounding box is selected through NMS [29].

3.4. Proposed MODAN

Based on the multifocal object detector described above, the block diagram of the MODAN proposed in this study to detect both near-field and far-field objects located near the horizon is shown in Figure 8. In summary, the proposed maritime object detection strategy enlarges objects near the horizon that occupy a small portion of the entire image by selecting and segmenting ROIs based on the detected horizon. Afterwards, the resolution is improved to convert the low-resolution images of small objects into high-quality images, and the multifocal object detector is used to find objects reliably at various focal points. Here, WBF is used to efficiently remove overlapping objects and further optimize the estimation of bounding boxes, improving detection performance. The propose MODAN is summarized by Algorithm 1.

Algorithm 1. Proposed MODAN.

Require: Pretrained near-field detector

N F D

, Pretrained far-field detector

F F D

,

Pretrained resolution enhancement weight and bias

w_{1}, w_{2}, w_{3}, b_{1}, b_{2}, b_{3}

Input: Input image

I

, Set of centroid vectors

S

, Predefined geometric kernel

F

, Width of Input image W,

Maximum integer that can divide the image D, ROI ranges above

u c

, ROI ranges below

l c

Output: Detected objects

B b o x_{w}

1:

C l u s t e r e d c o l o r v e c t o r

I^{q}

=

\underset{i}{a r g m i n} (‖ I_{} - S_{i} ‖^{2}), i = s k y o r s e a

//Color quantization

2:

E n h a n c e d b i n a r y i m a g e I^{m}

=

(I^{q} \oplus F) ⊖ F

//Mathematical morphology

3: Set

H_{y}^{L}

= 0,

H_{y}^{R}

= 0

4: for i = 1 to image height //Loop for horizon detection

5: if (

y_{i - 1}^{L}

== 0 &&

y_{i}^{L}

== 1) //Detect left side of horizon

6:

H_{y_{s u m}}^{L} + = y_{i}^{L}

7: if (

y_{i - 1}^{R}

== 0 &&

y_{i}^{R}

== 1) //Detect right side of horizon

8:

H_{y_{s u m}}^{R} + = y_{i}^{R}

9:

H_{y}^{L}

=

H_{y_{s u m}}^{L} / h

,

H_{y}^{R}

=

H_{y_{s u m}}^{R} / h

10: Left endpoints of the horizon

H^{L} = (0, H_{y}^{L})

11: Right endpoints of the horizon

H^{R} = (W, H_{y}^{R})

12: Angle of the tilted horizon

θ = \cos^{- 1} \frac{‖ H^{L} - H^{R^{'}} ‖}{‖ H^{L} - H^{R} ‖}

//Calculate the angle of tilt

13: Rotated image

I^{r o t} = (\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}) I^{m}

//Rotate image to align horizon

14: ROI image

I_{i}^{r o i} = I^{r o t} (\frac{W \times (i - 1)}{D} : \frac{W \times i}{D}, H^{L} - u c : H^{L} + l c), i = 1, \dots, D

//ROI selection and segmentation

15: for i = 1 to D //Loop for resolution enhancement

16:

I_{i}^{s r 1} = m a x (0, w_{1} * I_{i}^{r o i} + b_{1})

17:

I_{i}^{s r 2} = m a x (0, w_{2} * I_{i}^{r o i} + b_{2})

18:

I_{i}^{s r} = w_{3} * I_{i}^{s r 2} + b_{3}

19: for i = 1 to D //Loop for NFD and FFD

20:

B b o x_{i}^{F F D}

=

F F D (I_{i}^{s r})

//Find object in far-field

21:

B b o x^{N F D} = N F D (I^{r o t})

//Find object in near-field

22:

B b o x_{w} = W e i g h t e d B o x F u s i o n (B b o x^{N F D}, B b o x^{F F D})

//Remove overlapping objects

4. Experimental Results

In maritime operations, detecting objects near the horizon is important for ensuring security and safety by locating potentially dangerous objects in nearby waters. However, objects near the horizon exist in a very small region of the entire image, and because their size is also small, their detection using conventional object detectors is challenging. Various methods have emerged to address this difficulty. However, despite the emergence of various methods to detect objects near the horizon by first detecting the horizon and then defining the surrounding area as the ROI, the detection performance has been unsatisfactory due to the reduction in resolution during object magnification, resulting in the loss of some object information. Thus, in this study, we aimed to improve the detection performance of small objects near the horizon by first detecting the horizon; then enlarging the area around the horizon to include small objects that were not visible; and, finally, improving the resolution of the segmented enlarged image.

The proposed MODAN was implemented using the NVIDIA RTX 3060 and Intel Core i7-12700 Central Processing Unit. The real maritime surveillance performance was evaluated using the Singapore maritime dataset [30] to test the ship detection process. The network performance was evaluated based on the AP. AP refers to the area under the precision–recall curve. This metric illustrates the accuracy of the object detection network’s predictions for the detected objects. Precision was calculated as TP/(TP+FP), indicating the ratio of correctly detected instances to the total positive detections. Recall was calculated as TP/(TP+FN), representing the ratio of correctly detected instances to all instances that should have been detected. Here, TP, FP, and FN refer to true positive, false positive, and false negative, respectively.

The geometric kernel

F

used in mathematical morphology was set to a Matrix 1 with a size of

12 \times 12

, which exhibited the best performance while varying the kernel size, since the size of the input image and the average size of the objects are essential factors for performance. The size of the segmented ROI image

I_{i}^{r o i}

was

200 \times 200

, set to ensure that the vertical sizes of small objects in the horizon region were more than 15%, but this can vary depending on the size of the input image. Furthermore, the size of the input image

I_{i}^{s r}

that was applied to YOLOv7 after passing the SRCNN was fixed to

512 \times 512

; it was set to

600 \times 600

to minimize the loss according to the image size adjustment.

To evaluate the performance of the proposed MODAN, object detection performance comparison tests were conducted using state-of-the-art models with similar structures, and the results are summarized in Table 1. As presented in the table, MODAN exhibited more than 7% higher AP than Wang (2022) [27], the most common single detection model, thus verifying the validity of the proposed multifocal detection structure. In addition, MODAN, which was proposed together with the Yoneyama (2022) [17] detection model with a multifocal structure, also detected objects that the existing single models would miss by segmenting and enlarging the horizon area, thereby achieving a higher TP than that of the single models. However, ROI segmentation can cause an object to appear in multiple segmented images, so it should be used in parallel with algorithms that remove overlapping bounding boxes, such as NMS or WBF. As a result, the Yoneyama (2022) [17] model, which uses a multifocal structure, was able to find more objects than the existing single detection models, resulting in a high TP. However, even though NMS was used to remove duplicate bounding boxes, some correctly detected bounding boxes were also removed due to the characteristic of the NMS algorithm that only leaves the bounding boxes with the highest

P_{c l s}

. This led to an increase in false alarms, resulting in a much higher FP than other models, and at the same time, the detection performance was degraded, resulting in a significantly lower AP. The Solovyev (2021) [29] model is a single detection model that uses WBF instead of NMS. While it has a slightly lower TP than the Wang (2022) [27] model, which uses NMS, it also has the advantage of a lower FP. However, despite showing a lower false detection rate than MODAN, it was verified that the overall AP was 7% or lower than MODAN, verifying that it did not outperform MODAN. Conclusively, the proposed MODAN restored the shape of objects with low resolution near the horizon through SRCNN-based image resolution enhancement based on multifocal structure, and effectively removed the detected overlapping bounding boxes using WBF instead of NMS. This reduced the false detection rate by more than 59% and improved the AP by more than 10% compared to the existing model with a multifocal structure. Furthermore, the proposed MODAN exhibited an AP increase of more than 7% compared to the existing single-detection models, verifying its improved detection performance.

Moreover, the examples of the ship detection results between the proposed MODAN and similar state-of-the-art models are shown in Figure 9, Figure 10, Figure 11 and Figure 12. In the figure, the yellow dashed line indicates the ROI, the green solid line indicates the correctly detected objects, the red solid line indicates the undetected objects, and the white solid line indicates the ground truth. As shown in Figure 9, a small ship, which is difficult to distinguish with the naked eye, is located next to a large ship near the horizon, denoted by the ROI. The existing single detection models of Wang (2022) [27] and Solovyev (2021) [29] failed to detect the small ship, while the multifocal detection models, Yoneyama (2022) [17] and the proposed MODAN, were able to successfully detect it. In Figure 10 and Figure 11, as is similar to the previous results, the multifocal detection model, Yoneyama (2022) [17], was able to detect one additional ship out of the two or three small ships that the single detection models missed. Nonetheless, the remaining ship was not detected due to the low resolution. However, the MODAN was able to detect the missed ship by increasing the resolution using SRCNN. In Figure 12, we can see that there are some small ships overlapped in the divided ROI images; by the Yoneyama (2022) [17] model, NMS should be used to remove duplicate objects, but due to the characteristics of NMS, it was not processed correctly and remained undetected. Alternatively, MODAN can detect objects more reliably using WBF.

5. Conclusions

In maritime surveillance systems, detecting objects near the horizon plays a crucial role in ensuring the security of nearby waters by tracking the paths of moving objects found at sea, detecting illegal activities, and preemptively countering or predicting potentially dangerous situations. Vision-based maritime surveillance, which is widely used for this purpose, can perform accurate analyses by identifying an object’s shape, size, and color. However, it has a vulnerability in that the detection performance is low in the case of small and low-resolution objects located at distances far from the sensor, as these objects’ shapes are distorted. Thus, in this study, we proposed a horizon detector and ROI selector as preprocessors to reliably find objects near the horizon, aiming to detect all objects in the sea using the multifocal object detector. In particular, SRCNN was applied to the segmented ROI images to improve the object detection performance by increasing the resolution of the small objects near the horizon. Comparative experiments between the proposed MODAN and the existing similar state-of-the-art models were carried out, and the results exhibited that MODAN detected not only large ships in the sea, but also small ships near the horizon, showing an AP detection performance 7% better compared to that of the existing single detection model. Additionally, MODAN showed that the AP was higher than 10%, and the false detection rate was significantly reduced by more than 59% compared to the state-of-the-art multifocal-based detection model, showing that it can detect small objects near the horizon more reliably and effectively in maritime surveillance.

Author Contributions

S.Y., A.J. and J.C. took part in the discussion of the work described in this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MOE) (No. 2021R1I1A3055973) and the Soonchunhyang University Research Fund.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that they have no competing interests.

References

Su, N.; Chen, X.; Guan, J.; Huang, Y. Maritime Target Detection Based on Radar Graph Data and Graph Convolutional Network. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Wang, J.; Li, S. Maritime Radar Target Detection in Sea Clutter Based on CNN with Dual-Perspective Attention. IEEE Geosci. Remote Sens. Lett. 2022, 20, 1–5. [Google Scholar] [CrossRef]
Maresca, S.; Braca, P.; Horstmann, J.; Grasso, R. Maritime Surveillance Using Multiple High-Frequency Surface-Wave Radars. IEEE Trans. Geosci. Remote Sens. 2013, 52, 5056–5071. [Google Scholar] [CrossRef]
Santi, F.; Pideralice, F.; Pastina, D. Multistatic GNSS-based passive radar for maritime surveillance with long integration times: Experimental results. In Proceedings of the IEEE Radar Conference, Oklahoma City, OK, USA, 23–27 April 2018. [Google Scholar]
Nasso, I.; Santi, F. A Centralized Ship Localization Strategy for Passive Multistatic Radar Based on Navigation Satellites. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Zhang, W.; Wu, Q.M.J.; Yang, Y.; Akilan, T.; Li, M. HKPM: A Hierarchical Key-Area Perception Model for HFSWR Maritime Surveillance. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
Szpak, Z.L.; Tapamo, J.R. Maritime Surveillance: Tracking Ships Inside a Dynamic Background Using a Fast Level-Set. Expert Syst. Appl. 2011, 38, 6669–6680. [Google Scholar] [CrossRef]
Rice, J.; Wilson, G.; Barlett, M.; Smith, J.; Chen, T.; Fletcher, C.; Creber, B.; Rasheed, Z.; Taylor, G.; Haering, N. Maritime Surveillance in the Intracoastal Waterway Using Networked Underwater Acoustic Sensors Integrated with a Regional Command Center. In Proceedings of the International WaterSide Security Conference, Carrara, Italy, 3–5 November 2010. [Google Scholar]
Tesei, A.; Stinco, P.; Micheli, M.; Garau, B.; Biagini, S.; Troiano, L.; Guerrini, P. A buoyancy glider equipped with a tri-dimensional acoustic vector sensor for real-time underwater passive acoustic monitoring at low frequency. In Proceedings of the OCEANS, Marseille, France, 17–20 June 2019. [Google Scholar]
Tran, T.H.; Le, T.L. Vision Based Boat Detection for Maritime Surveillance. In Proceedings of the International Conference on Electronics, Information and Communications, Danang, Vietnam, 27–30 January 2016. [Google Scholar]
Nita, C.; Vandewal, M. CNN-Based Object Detection and Segmentation for Maritime Domain Awareness. Artif. Intell. Mach. Learn. Def. Appl. Ⅱ 2020, 11543, 1154306. [Google Scholar]
Qi, L.; Li, B.; Chen, L.; Wang, W.; Dong, L.; Jia, X.; Huang, J.; Ge, C.; Xue, G.; Wang, D. Ship Target Detection Algorithm Based on Improved Faster R-CNN. Electronics 2019, 8, 959. [Google Scholar] [CrossRef]
Ma, D.; Dong, L.; Xu, W. Detecting Infrared Maritime Dark Targets Overwhelmed in Sunlight Interference by Dissimilarity and Saliency Measure. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Wang, B.; Benli, E.; Motai, Y.; Dong, L.; Xu, W. Robust Detection of Infrared Maritime Targets for Autonomous Navigation. IEEE Trans. Intell. Veh. 2020, 5, 635–648. [Google Scholar] [CrossRef]
Yang, P.; Dong, L.; Xu, W. Infrared Small Maritime Target Detection Based on Integrated Target Saliency Measure. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2369–2386. [Google Scholar] [CrossRef]
Su, L.; Yan, M.; Yin, Y. Small surface target detection algorithm based on panoramic vision. In Proceedings of the Chinese Control Conference, Chengdu, China, 27-29 July 2016. [Google Scholar]
Yoneyama, R.; Dake, Y. Vision-Based Maritime Object Detection Covering Far and Tiny Obstacles. IFAC Pap. 2022, 55, 210–215. [Google Scholar] [CrossRef]
Liang, D.; Liang, Y. Horizon Detection from Electro-Optical Sensors Under Maritime Environment. IEEE Trans. Instrum. Meas. 2020, 69, 45–53. [Google Scholar] [CrossRef]
Xu, Y.; Yan, H.; Ma, Y.; Guo, P. Graph-Based Horizon Line Detection for UAV Navigation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11683–11698. [Google Scholar] [CrossRef]
Gui, Y.; Lin, D.; Liang, S.; Xu, L.; Hong, J. A Coarse-to-Fine Horizon Detection Method Based on Between-Class Variance. In Proceedings of the International Symposium on Computational Intelligence and Design, Hangzhou, China, 10–11 December 2016. [Google Scholar]
Lu, Y.; Zhang, M.; Zheng, L.; Shen, Y. Dark Channel Prior Principle and Morphology Based Horizon Detection Method. In Proceedings of the IEEE International Instrumentation and Measurement Technology Conference, Turin, Italy, 22–25 May 2017. [Google Scholar]
Huang, H.Z.; Xu, K.; Martin, R.R.; Huang, F.Y.; Hu, S.M. Efficient, Edge-Aware, Combined Color Quantization and Dithering. IEEE Trans. Image Process. 2016, 25, 1152–1162. [Google Scholar] [CrossRef]
Ze-Feng, D.; Zhou-Ping, Y.; You-Lun, X. High Probability Impulse Noise-Removing Algorithm Based on Mathematical Morphology. IEEE Signal Process. Lett. 2007, 14, 31–34. [Google Scholar] [CrossRef]
Islam, M.M.; Asari, V.K.; Islam, M.N.; Karim, M.A. Super-Resolution Enhancement Technique for Low Resolution Video. IEEE Trans. Consum. Electron. 2010, 56, 919–924. [Google Scholar] [CrossRef]
Sharma, A.; Khunteta, A. Satellite Image Contrast and Resolution Enhancement Using Discrete Wavelet Transform and Singular Value Decomposition. In Proceedings of the International Conference on Emerging Trends in Electrical Electronics & Sustainable Energy Systems, Sultanqur, India, 11–12 March 2016. [Google Scholar]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Solovyev, R.; Wang, W.; Gabruseva, T. Weighted Boxes Fusion: Ensembling Boxes from Different Object Detection Models. Image Vis. Comput. 2021, 107, 104117. [Google Scholar] [CrossRef]
Prasad, D.K.; Rajan, D.; Rachmawati, L.; Rajabally, E.; Quek, C. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: A survey. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1993–2016. [Google Scholar] [CrossRef]

Figure 1. Image processing from horizon detection to ROI selection and segmentation.

Figure 2. Example of separation into sky and sea through color quantization: (a) original image (b) image clustered into sky and sea.

Figure 3. The expression for horizon tilt angle estimation and linearization alignment through rotation.

Figure 4. Examples of a rotated image after horizon detection and a segmented ROI image near the horizon: (a) horizon detection after morphology; (b) image rotated so that the horizon is parallel; (c) selected and segmented ROI images near the horizon.

Figure 5. Diagram of the multifocal object detector.

Figure 6. Example of the image resolution enhancement process. (a) Original image, (b) segmented ROI image

I_{i}^{r o i}

, (c) resolution-enhanced image

I_{i}^{s r}

through SRCNN.

Figure 6. Example of the image resolution enhancement process. (a) Original image, (b) segmented ROI image

I_{i}^{r o i}

, (c) resolution-enhanced image

I_{i}^{s r}

through SRCNN.

Figure 7. Example of the duplicate detected object by NFD and FFD: (a) Detection result by NFD based on the original image, (b) duplicate detection result by FFD based on the segmented ROI images.

Figure 8. The block diagram of the proposed MODAN.

Figure 9. Results of ship detection between the proposed MODAN and similar state-of-the-art models when a small ship is located closely to a large ship: (a) Original image; (b) Wang (2022) [27]; (c) Solovyev (2021) [29]; (d) Yoneyama (2022) [17]; (e) MODAN.

Figure 10. Results of ship detection between the proposed MODAN and similar state-of-the-art models when there is an object which is difficult to distinguish with the naked eye: (a) Original image; (b) Wang (2022) [27]; (c) Solovyev (2021) [29]; (d) Yoneyama (2022) [17]; (e) MODAN.

Figure 11. Results of ship detection between the proposed MODAN and similar state-of-the-art models when there are small ships overlapped in the divided ROI: (a) Original image; (b) Wang (2022) [27]; (c) Solovyev (2021) [29]; (d) Yoneyama (2022) [17]; (e) MODAN.

Figure 12. Results of ship detection between the proposed MODAN and similar state-of-the-art models when small ships are close to large ship as well as are difficult to distinguish with the naked eye: (a) Original image; (b) Wang (2022) [27]; (c) Solovyev (2021) [29]; (d) Yoneyama (2022) [17]; (e) MODAN.

Table 1. Comparison test results of ship detection performance between the proposed model and similar state-of-the-art models.

Model	Image Enhancement	Detection		Ensembling Boxes		AP [%]	TP	FP
Model	SRCNN	Single	Multifocal	NMS	WBF	AP [%]	TP	FP
Wang (2022) [27]		√		√		59.53	2208	190
Solovyev (2021) [29]		√			√	59.41	2194	187
Yoneyama (2022) [17]			√	√		56.53	2721	1412
MODAN	√		√		√	66.91	2591	588

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yoon, S.; Jalal, A.; Cho, J. MODAN: Multifocal Object Detection Associative Network for Maritime Horizon Surveillance. J. Mar. Sci. Eng. 2023, 11, 1890. https://doi.org/10.3390/jmse11101890

AMA Style

Yoon S, Jalal A, Cho J. MODAN: Multifocal Object Detection Associative Network for Maritime Horizon Surveillance. Journal of Marine Science and Engineering. 2023; 11(10):1890. https://doi.org/10.3390/jmse11101890

Chicago/Turabian Style

Yoon, Sungan, Ahmad Jalal, and Jeongho Cho. 2023. "MODAN: Multifocal Object Detection Associative Network for Maritime Horizon Surveillance" Journal of Marine Science and Engineering 11, no. 10: 1890. https://doi.org/10.3390/jmse11101890

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MODAN: Multifocal Object Detection Associative Network for Maritime Horizon Surveillance

Abstract

1. Introduction

2. Horizon Detection and ROI Selection

2.1. Color Quantization

2.2. Mathematical Morphology

2.3. Horizon Search and Rotation

2.4. ROI Selection and Segmentation

3. Multifocal Object Detection

3.1. Resolution Enhancement

3.2. YOLO Detection Network

3.3. Weighted Box Fusion

3.4. Proposed MODAN

4. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI