Night-Time Vessel Detection Based on Enhanced Dense Nested Attention Network

Zuo, Gao; Zhou, Ji; Meng, Yizhen; Zhang, Tao; Long, Zhiyong

doi:10.3390/rs16061038

Open AccessArticle

Night-Time Vessel Detection Based on Enhanced Dense Nested Attention Network

by

Gao Zuo

¹

,

Ji Zhou

^1,*

,

Yizhen Meng

¹,

Tao Zhang

¹

and

Zhiyong Long

²

¹

School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu 611731, China

²

College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(6), 1038; https://doi.org/10.3390/rs16061038

Submission received: 22 December 2023 / Revised: 4 March 2024 / Accepted: 9 March 2024 / Published: 15 March 2024

Download

Browse Figures

Versions Notes

Abstract

:

Efficient night-time vessel detection is of significant importance for maritime traffic management, fishery activity monitoring, and environmental protection. With the advancement in object-detection approaches, the method of night-time vessel detection has gradually shifted from traditional threshold segmentation to deep learning that balances efficiency and accuracy. However, the restricted spatial resolution of night-time light (NTL) remote sensing data (e.g., VIIRS/DNB images) results in fewer discernible features and insufficient training performance when detecting vessels that are considered small targets. To address this, we establish an Enhanced Dense Nested-Attention Network (DNA-net) to improve the detection of small vessel targets under low-light conditions. This approach effectively integrates the original VIIRS/DNB, spike median index (SMI), and spike height index (SHI) images to maintain deep-level features and enhance feature extraction. On this basis, we performed vessel detection based on the Enhanced DNA-net using VIIRS/DNB images of the Japan Sea, the South China Sea, and the Java Sea. It is noteworthy that the VIIRS Boat Detection (VBD) observations and the Automatic Identification System (AIS) data were cross-matched as the actual status of the vessels (VBD-AIS). The results show that the proposed Enhanced DNA-net achieves significant improvements in the evaluation metrics (e.g., IOU, P_d, F_a, and MPD) compared to the original DNA-net, achieving performance of 87.81%, 96.72%, 5.42%, and 0.36 Wpx, respectively. Meanwhile, we validated the detection performance of Enhanced DNA-net and strong VBD detection against VBD-AIS, showing that the Enhanced DNA-net achieves 1% better accuracy than strong VBD detection.

Keywords:

vessel detection; VIIRS boat detection (VBD); automatic identification system (AIS); VIIRS/DNB; enhanced DNA-net

1. Introduction

Amid the dynamic evolution of maritime trade and fishery, accurate and large-scale detection of vessels is of significant importance for maritime traffic management, fishery activity monitoring, and environmental protection. Maritime traffic management relies heavily on the accurate detection of vessels to ensure safe navigation, prevent collisions, and enforce regulations. Furthermore, monitoring fishery activities during night-time can help in preventing illegal, unreported, and unregulated (IUU) fishing activities and ensuring sustainable fisheries management. Additionally, efficient vessel detection at night can contribute to environmental protection by enabling the monitoring of vessel emissions and their impact on air and water quality [1,2,3]. Notably, night-time vessel detection imposes high demands on both images and detection methods due to the challenges posed by low-light conditions, in contrast to daytime vessel detection, where vessels typically exhibit more discernible texture information.

Night-time vessel detection methodologies are generally based on observations from global navigation and satellite remote-sensing imagery. The global navigation system onboard vessels, such as the Vessel Monitoring System (VMS) and Automatic Identification System (AIS), can provide accurate positioning information for vessels [4]. However, some vessels without vessel-borne positioning devices may not be recorded by VMS/AIS [5]. In contrast, satellite remote sensing, characterized by its continuous spatial coverage, large-scale capabilities, and long-term observational scope, proves highly efficient in detecting vessels based on luminosity information. However, there is still a deficiency in vessel-detection methods designed for low-light conditions that fully make use of remote sensing imagery. Night-time vessel detection primarily relies on observations obtained through synthetic aperture radar (SAR) and night-time light (NTL) glimmer sensors. SAR-based detection capitalizes on the intrinsic attributes of microwave scattering from target vessels, which makes it have the characteristics of all-weather and all-day imaging [6]. However, SAR images face limitations in temporal resolution and high cost, rendering the observation of vessel activities challenging.

With the advancement of NTL glimmer remote sensing, some studies have found that it can effectively capture human socio-economic activities, directly recording artificial light originating from human-driven nocturnal undertakings (e.g., urban areas, industrial complexes, and fishing vessels) [7,8,9]. In 2011, the National Polar-Orbiting Partner/Visible Infrared Imaging Radiometer Suite (NPP/VIIRS) with Day/Night Band (DNB) launched, making the quantitative study of light from vessels possible [10,11,12]. DNB images are now widely used in various fields (e.g., mapping urbanization processes, identifying disasters, and monitoring fisheries), demonstrating excellent accuracy and possessing high potential and promising prospects of application. Several studies have specifically employed VIIRS/DNB data for night-time vessel detection, highlighting the superior capabilities of VIIRS/DNB in this domain. However, the relevant research in this area is relatively limited.

Currently, NTL satellite-based vessel detection methods can be broadly classified into two types, namely, threshold-based methods [13,14,15] and machine learning approaches [16,17] The predominant focus of threshold-based research for identifying illuminated vessels centers on leveraging the significant radiometric contrast between illuminated vessels and the oceanic backdrop. For instance, Elvidge et al. [13] conducted a detailed analysis of spike features within VIIRS/DNB data, resulting in the identification of vessel detections that include strong, weak, and diffused signals, eliminating non-targeted noise from cloud-scattered light. Considering that the algorithms produce many false detections under full-moon conditions, Kim et al. [14] considered the moon phase as a crucial factor influencing the detection threshold, enabling detection through the application of a single specific threshold instead of different thresholds after the relative correction. Xue et al. [15] addressed interference from adjacent pixels and established a rational threshold by implementing a two-step threshold-detection algorithm based on radiometry equations and the diffusion characteristics of nightlight points. Nevertheless, the conventional reliance on manually crafted features within threshold-based methods poses challenges in ensuring robust detection accuracy, given the intricate nature of real-world scenarios and the sheer volume of data.

With the development of machine learning, several small-target-detection methods, such as DNA-net [18] and YOLO-ACN [19], have been proposed successively. Correspondingly, machine learning approaches are gradually being used in the field of vessel detection. For example, Song et al. [20] used the HaarUp-HaarDown module to retain smaller target features and replace the feature pyramid network and the down-sampling layer of the backbone network. Nie et al. [21] proposed a CAA-YOLO model by combining a high-resolution feature layer to better use location information and shallow details, improving the identifying result based on a marine infrared dataset. Dai et al. [22] proposed a pre-training task called random query patch detection for Unsupervised Pre-training of DETR (UP-DETR) to improve object detection, showing that Transformer networks with self-attention mechanisms (VIT) can overcome the limitations in salient target detection. Zheng et al. [23] provided an alternative perspective to change the encoder–decoder architecture by approaching semantic segmentation as a sequence-to-sequence prediction work, preserving global features and effectively maintaining the local features of the target. Although these studies have shown that the machine learning approaches demonstrate exceptional proficiency in target detection, their integration within NTL images remains relatively limited. Shao et al. [16] introduced an improved version of the TASFF-YOLOv5 algorithm, leading to enhanced feature fusion and superior results on a vessel dataset they constructed. In a different approach, Tsuda et al. [17] first used a two-step training method for their machine learning model, which is designed to mitigate the generation of a large number of false positives and achieve continuous and frequent detections, especially after excluding various cloud and moon conditions. The aforementioned studies illustrate that deep learning methods can markedly enhance the efficiency of vessel detection, offering a new paradigm and direction for night-time vessel detection. Nevertheless, the diminutive size of vessels in remote sensing imagery poses a challenge, as the features of vessels tend to diminish with the increasing depth of convolutional layers in traditional deep learning methods. Furthermore, compared to daytime remote sensing imagery, NTL images tend to lack texture information, which makes night-time vessel detection based on the single input (i.e., NTL image) encounter difficulties.

In this context, we introduce a new method for night-time vessel detection, namely, the Enhanced DNA-net. The contributions of this study are as follows: (1) This approach transforms the original DNA-net, designed for efficient detection of small targets, from a single-input model to a multi-input model, thereby extending its capabilities and contributing to more accurate vessel detection based on night-time VIIRS/DNB imagery in Section 4.2. (2) We incorporate VBD and AIS data to extract authentic vessel labels and establish a labeled dataset of night-time vessel imagery as the ground truth for training and validation.

2. Study Area and Materials

2.1. Study Area

In this study, three nearshore areas (i.e., the Japan Sea, the South China Sea, and the Java Sea) with dense vessels are selected to obtain sufficient vessels, ensure spatial randomness, and avoid spatial overlap (Figure 1). The Japan Sea (124°24′06″–135°43′48″E, 29°10′25″–40°8′24″N) is a moderately productive sea that supports a wealth of living marine resources, and it covers the Tsushima Strait, one of the busiest sea lanes [24]. The South China Sea (106°16′23″–118°29′42″E, 9°42′36″–20°09′36″N) is one of the largest marginal seas in the western Pacific Ocean and supports the most intensive fishing activities along the shelf in the world, where IUU fishing activities happens frequently [25]. The Java Sea (104°55′16″–114°17′05″E, 1°24′49″–9°01′15″S) covers the Karimata Strait, which holds significant importance in maritime traffic management [26].

2.2. Datasets and Pre-Processing

2.2.1. VIIRS/DNB

VIIRS, a key component of the Suomi National Polar-Orbiting Partner (Suomi NPP) satellite’s sensor suite, holds a central role among the five onboard sensors. The Day/Night Band (DNB) featured in VIIRS benefits from a robust photoelectric amplification capability, which enables the precise acquisition of low-intensity light data. The selected VIIRS/DNB data, available at the Sensor Data Record (SDR) level, offer a spatial resolution of 750 m and cover the electromagnetic spectrum from 500 to 900 nanometers. This versatile instrument supports global-scale monitoring and facilitates daily data access. The selected daily SDR products for the year 2022 were downloaded through the Comprehensive Large Array-Data Steward System (CLASS) operated by the National Oceanic and Atmospheric Administration (NOAA) [27]. Certain pre-processing steps are essential to prepare the DNB data for input into the network due to the low radiance values and the presence of white noise in the original images. The first step involves scaling the DNB radiances by a factor of one billion (10⁹), converting the radiance units to nw/(m²·sr). Subsequently, in the second step, noise levels across the swath are smoothed and equalized using an adaptive wiener filter, employing a filter size of 3 × 3.

Meanwhile, there are some images of VIIRS/DNB containing both lights on the land and sea. Therefore, Shao et al. [16] found that various land-based lights in the NTL images often led to false alarms in vessel detection. To avoid the influence of various coastal lights on the VIIRS/DNB imagery, the GSHHG database was used to generate a sea–land mask (with a radius of 1400 m, about two pixels on the image), and we applied it to remove land disturbances for better recognition results [28].

2.2.2. Auxiliary Data

Within the realm of machine learning, a prevalent task revolves around the training of neural networks with label data to attain classification, regression, or similar objectives. This approach, which entails training models to discern patterns, is commonly denoted as supervised learning. In supervised learning, the accuracy and reliability of the labels assigned to the training data play a pivotal role in the learning process [29]. In the event of inaccuracies in the label data used for training, the development of an effective predictive model becomes unfeasible. Therefore, we selected reliable auxiliary data, namely, VBD and AIS products, to produce labels.

The VBD data are generated by the Earth Observation Group (EOG) and serve the purpose of vessel localization through the identification of bright spots within VIIRS/DNB images. These data are widely employed in fisheries management across various countries (e.g., Indonesia, Philippines, and Malaysia) and used in many publications [30,31]. Notably, the latest VBD data become available only after a 45-day delay. The VBD data file encompasses essential information, including the geographic coordinates of pixel bright spots, radiance values, the time of satellite overpass, the satellite zenith angle (VZA), and various thresholds used to distinguish between fishing vessels and potential light sources. To decrease the detrimental impact of moonlight interference on VBD data, we implemented a moon phase angle (MPA) condition, specifically MPA > 150°, as derived from the MPA field in VIIRS Day Night Band SDR Ellipsoid Geolocation (GDNBO) data [32]. The used VBD data are accessible for download in daily, weekly, and monthly vector formats from the VBD home page [33].

The AIS, exemplifying the application of cyber–physical systems (CPS) in intelligent maritime transportation, comprises a digital navigation system with components encompassing ground stations and vessel-borne devices [34]. AIS integrates both dynamic and static vessel information, including elements such as vessel position, speed, time, and vessel-specific static data (e.g., vessel name, identification code, vessel type, draught, country of origin, and destination). This amalgamation of details is broadcast via very high-frequency channels to nearby vessels and shore stations, enabling immediate access to real-time dynamic information concerning all vessels in the vicinity [35]. In this study, we obtained a corresponding dataset of AIS data for 2022 relating to the study area [36].

When compared to AIS data, VBD data rely on VIIRS/DNB images and demonstrate superior capability in identifying a significant number of vessels with tonnage below 30 GT. However, the VBD pixel point records using quality flags (QF) of 1 (strong vessel detection), 2 (weak vessel detection), and 3 (blurry detection) for the corresponding periods within the study area. In addition to strong vessel detections, which can be readily identified as vessels, it is crucial to perform cross-matching and label-matching procedures between the weak vessel detections and blurry detections to validate the authenticity of these signals as actual vessels.

Cross-matching: This cross-matching simplification involves generating vessel location during VIIRS imaging by interpolating between adjacent AIS records [5]:

\{\begin{array}{l} X_{t} = X_{1} + \frac{X_{2} - X_{1}}{T_{2} - T_{1}} \times (T_{t} - T_{1}) \\ Y_{t} = Y_{1} + \frac{Y_{2} - Y_{1}}{T_{2} - T_{1}} \times (T_{t} - T_{1}) \end{array},

(1)

where T_t represents satellite overpass time, X_t and Y_t are coordinates of the vessel, X₁ and Y₁ are vessel coordinates in AIS records immediately before T_t, and X₂ and Y₂ are vessel coordinates in AIS records immediately after T_t.

With (X_t, Y_t) as the original coordinate, a distance threshold X_thresh is established. If VBD records are found within the threshold range of the predicted vessel position, those records are marked as “matched”. In this study, the matching threshold for AIS and VBD records is set as 1 km. Because of the definition of quality flag and the uncertainty of the detection results, for VBD records with QF = 2 and QF = 3, if the cross-matching is successful, these records are classified as strong vessel detections. For better presentation in the following content, we refer to the cross-matched results as VBD-AIS.

Label-matching: The quality and scale of the dataset are of great significance for obtaining good training results. Considering that the length of the longest vessel is 458 m [37], it is smaller than one pixel on the VIIRS/DNB image. Therefore, accurate vessel positioning on the VIIRS/DNB image is crucial for the creation of sample labels. Based on the above content, we developed a related program that matches the cross-matched VBD-AIS results with DNB images for positional alignment, thereby completing the dataset-labeling process:

\begin{array}{l} L o c_{DNB} (i, j) = 1, if : \\ \{\begin{array}{l} L a t (i, k) - l a t_{t} \leq (L a t (i, k) - L a t (i - 1, k)) \times \frac{1}{2} \\ L o n (k, j) - l o n_{t} \leq (L o n (k, j) - L o n (k, j - 1)) \times \frac{1}{2} \end{array} \end{array},

(2)

where lat_t and lon_t are the latitude and longitude of target vessels; i and j are rows and columns, respectively; Loc_DNB is an all-zero matrix with the same size as the DNB image; and Lat(i, k) is the closest value to lat_t in the latitude matrix of the DNB image, denoted Lon(k, j).

3. Methodology

DNA-net (Dense Nested Attention Network) is capable of addressing the issue of infrared small targets losing their features due to the deep pooling layers, which is exactly the problem faced when using deep learning for night-time vessel detection [18]. However, it is only applicable to a single input and cannot process multiple inputs (sDNB, SMI and SHI), where improvements are needed. Consequently, we set the new structure at the beginning of DNA-net so that the Enhanced DNA-net can process features from multiple inputs at a time.

3.1. Enhanced Dense Nested Attention Network

Most single-input networks are built upon a direct and simple one-to-one mapping relationship, such as CNN and Fast R-CNN [38,39]. Existing studies on detecting night-time illuminated vessels using deep learning mostly involve processing one input image at a time [40,41]. However, due to the minimal number of pixels occupied by the target vessels, lack of prominent texture features, and limited spectral information, feature extraction is challenging in a single-input scenario. Consequently, we propose a multi-input network structure to better integrate information from multiple input images, including pre-processed DNB (sDNB), spike median index (SMI), and spike height index (SHI) images. The structure of the network structure is shown in Figure 2, which is composed of the multiple-inputs part based on night-time remote sensing images and Enhanced DNA-net including feature extraction and fusion from multiple inputs and the final classification prediction results. Features from each single input are extracted by Dense Nested Interactive Module (DNIM) [18], which contains a paradigm that facilitates the feature interaction between each feature node and CSAM unit that enhances the features from each node. For each single input, the multi-layer features extracted by DNIM are up-sampled and concatenated, and then fused into multi-layer outputs through a 1 × 1 convolutional operation. These multiple-inputs features are processed by the input attention module and activated to output classification.

The object of night-time vessel detection is to locate the target horizon

y \in R^{m \times n}

, where m and n, respectively, represent the row and column numbers of the samples. Generally, utilizing various attributes can improve the accuracy of small-object detection. Hence, the problem is transformed into the construction of a multiple-attribute classification model, which can be represented as follows:

y = E_{Θ} (x_{n}), n \in [N]

(3)

where

E_{Θ}

represents Enhanced DNA-net, which is a well-fitted multiple-classification function parameterized by

Θ

; the multiple attributes of the lighted vessel are expressed as

x_{n} \in R^{m \times n}

; [N] denotes a sequence {1, 2, 3, …N}, representing several attributes input in the Enhanced DNA-net, where N is set as 3.

Given the corresponding training sample

y^{(l)} \in R^{m \times n}

and

x_{n}^{(l)} \in R^{m \times n}

, representing the l-th ground truth and input training data in

R^{m \times n}

. Subsequently, considering the difficulties of the deep-learning approach to detecting night-time vessels, we employ the DNIM to extract high-level features while preserving representations of deep small targets:

y^{(l)} = D_{Θ} (x_{n}^{(l)}), n \in [N],

(4)

where

D_{Θ} (\cdot)

represents the DNIM module that extracts k-th layer feature

X_{n}^{(k)} \in R^{m}

parameterized by

Θ

, and k represents the depth of DNIM.

In DNIM, each node is produced by concatenation or convolution first.

N^{i, j}

denotes the output of node

{\hat{N}}^{i, j}

, where i indicates the down-sampling layer along the encoder and j indicates the convolution layer of the dense block along the plain skip pathway:

N^{i, j} = \{\begin{array}{l} H (N_{i - 1, j}), \\ [F {[N^{i, k}]}_{k = 0}^{j = 1}, H (N^{i - 1, j}), u (F (N^{i + 1, j - 1}))], \end{array} \begin{matrix} j = 0 \\ j > 0 \end{matrix},

(5)

where

F (\cdot)

denotes a sequence of convolutional layers cascaded within the same convolution block,

H (\cdot)

denotes max-pooling operation with a stride of 2 after

F (\cdot)

,

u (\cdot)

denotes an up-sampling layer, and

[\cdot, \cdot]

denotes the concatenation. Concretely, nodes at level j = 0 only receive features from the previous layer of the encoder, and nodes at level j > 0 receive features from the previous j nodes in the same skip pathway, the output of the previous node at level j in the sub-encoder, and the up-sampled output from the i-1 skip pathway.

Then, each node in DNIM is processed by the Channel and Spatial Attention Module (CSAM Unit in Figure 2), which can enhance the features of a small target. The CSAM is sequentially composed of a 1D Channel Attention Map

M_{c} \in R^{C_{i} \times 1 \times 1}

generated by the Channel Attention Module and a 2D Spatial Attention Map

M_{s} \in R^{1 \times H_{i} \times W_{i}}

generated by the Spatial Attention Module. The Channel Attention Module is a module for channel-based attention in convolutional neural networks. It can be represented as follows:

M_{c} (N) = σ (MLP (P_{a v g} (N)) + MLP (P_{\max} (N))),

(6)

N^{'} = M_{c} (N) \otimes N,

(7)

where

σ

denotes the sigmoid function; ⊗ denotes the element-wise multiplication;

P_{a v g} (\cdot)

and

P_{m a x} (\cdot)

denote average pooling and max-pooling operations, respectively; and

C_{i}

,

H_{i}

,

W_{i}

are the number of channels, height, and width of node

N^{i, j}

, respectively. The shared network comprises a multi-layer perception, which encompasses a solitary hidden layer. Notably,

M_{c} (N)

should be stretched to the size of

R^{C_{i} \times H_{i} \times W_{i}}

.

Similarly, the Spatial Attention Module is computed as follows:

M_{s} (N) = σ (f^{7 \times 7} ([P_{a v g} (N); P_{m a x} (N)])),

(8)

{N ″ = M}_{s} (N^{'}) \otimes N^{'},

(9)

where

σ

is the sigmoid function, and

f^{7 \times 7}

represents a convolution operation with the filter size of 7 × 7.

After DNIM, we up-sample each feature layer from N(3,0), N(2,1), and N(1,2) to the same size as N(0,3) and concatenate them to generate global robust feature maps, which is different from the construction of a traditional feature pyramid:

G_{n} = [u {[X_{n}^{k}]}_{k = 1}^{I}, X_{n}^{0}], n \in [N],

(10)

F_{n} = C o n v_{1 \times 1} (G_{n}), n \in [N],

(11)

where

C o n v_{1 \times 1} (\cdot)

denotes the 1 × 1 convolution operation;

[\cdot, \cdot]

denotes the concatenation operation;

G_{n}

denotes the concatenated multi-layer feature maps, including both node

X_{N}^{0}

and the up-sampled node

X_{n}^{k}

from the decoder; and

F_{n}

represents the feature maps extracted by individual input. An overview of the individual inputs is listed below.

The sDNB and the details are shown in Section 2.2.1. The SMI image can amplify the radiance differences between illuminated vessels and background pixels [13]. We utilize its ability to detect peaks as one of the network’s multiple inputs. The SMI image is created by subtracting the median-filtered image from the original image, where the median-filtered image is obtained by applying a 3 × 3 median filter to the original image.

The SHI image serves two purposes [13]. Firstly, it eliminates erroneous DNB detections caused by high-energy particles’ impacts on the detector. Secondly, it classifies vessel detections identified as peaks into strong and weak categories. The SHI calculates, for each pixel on the radiance-stretched DNB image, the ratio between the average radiance brightness of adjacent pixels along both horizontal rows and vertical columns. The threshold of SHI is defined as lower value of SHI_ver and SHI_hor.

\{\begin{matrix} S H I_{ver} = 1 - (\frac{D N B (i - 1, j) + D N B (i + 1, j)}{2 D N B (i, j)}) \\ S H I_{hor} = 1 - (\frac{D N B (i, j - 1) + D N B (i, j + 1)}{2 D N B (i, j)}) \end{matrix},

(12)

where SHI_ver and SHI_hor denote the vertical and horizontal SHI, respectively.

The attention mechanism is used to enhance the model’s focus on important information and suppress the influence of unimportant information [42]. It dynamically adjusts the weights of different parts, allowing the model to concentrate more on the information that is relevant to the task. In the Enhanced DNA-net, the input attention mechanism is utilized to combine features from multiple inputs. Specifically, after extracting and concatenating features from multiple input images, we perform global average pooling and fully connected operations on these features to obtain weights for the input features. These weights are then reshaped to the same size as the multiple inputs and activated using the sigmoid function. Finally, we perform element-wise multiplication between the channel weights and the feature maps to obtain the global feature maps adjusted by the input attention mechanism.

F_{multi} = C o n c a t (F_{n}), n \in [N],

(13)

F^{'} = F_{multi} \otimes (P_{a v g} (F_{n})), n \in [N],

(14)

where

\otimes

denotes the element-wise multiplication;

C o n c a t (\cdot)

denotes the concatenate operation;

F_{(n)}, n \in [N]

denotes the extracted feature layers of sDNB, SMI, and SHI, respectively; and

P_{a v g} (\cdot)

denotes global average pooling operation.

Hence, a complex many-to-one mapping relationship is constructed to implement multiple inputs. It takes into account information from the original data, features characterized by amplified background radiation differences, and features characterized by amplified radiation differences of adjacent vessel points. Enhanced DNA-net performs feature extraction separately from each input, resulting in corresponding input features. These features are then combined and weighted using an input attention mechanism. Finally, a sigmoid activation function is applied to obtain the predicted probability for each pixel on the image.

3.2. Loss Function

In the Enhanced DNA-net, the loss function consists of two components: binary cross-entropy loss based on focal loss and dice loss based on the dice coefficient. We add them together as the final loss for the binary classification task:

L_{total} = λ L_{dl} + (1 - λ) L_{fl},

(15)

where λ is a weighting factor between 0 and 1, L_fl is the loss for classifying the pixels where the vessels are located, and L_dl is the similarity loss between the predicted segmentation and the true segmentation results.

Due to the sparsity of vessel targets in DNB images, they only occupy a very small portion of the image. Consequently, there is a severe imbalance between background and foreground classes. This imbalance poses training challenges for the network because a large number of negative samples result in a highly imbalanced ratio of positive to negative samples, leading to issues such as model degradation, slow training speed, and bias toward background categories. We employ focal loss to address the issues of class imbalance and difficulty imbalance between the positive and negative samples in night-time vessel detection [43]:

L_{f l} = \{\begin{array}{l} - α {(1 - p)}^{γ}, \\ - (1 - α) p^{γ} \log (1 - p), \end{array} \begin{matrix} y = 1 \\ y = 0 \end{matrix},

(16)

where p is the probability predicted by the Enhanced DNA-net for the sample to be a vessel, α is used to assign different loss weights to positive and negative samples to address the class imbalance issue, and γ is used to assign different loss weights to easy and hard samples to address the difficulty imbalance issue. y represents the true label of the sample, where y = 1 indicates a positive sample and y = 0 indicates a negative sample.

At the same time, we use the dice loss to further balance the issues [44]:

L_{dl} = 1 - \frac{2 P_{intersection}}{P_{union} + L_{smooth}},

(17)

where P_intersection is the number of pixels in both the predicted and true results that are in the positive class, P_union is the number of pixels in the predicted result that are in the positive class plus the number of pixels in the true result that are in the positive class, and L_smooth is a small smoothing term to prevent division by zero.

By adjusting the value of λ, we perform a weighted average of dice loss and focal loss, which allows us to maintain the sensitivity of L_dl to small targets while leveraging the ability of L_fl to handle sample imbalance.

3.3. Training Parameter and Evaluation Metrics

The training of our network was conducted on a Windows 10 operating system, which includes an Intel i7 8700 CPU and an NVIDIA GeForce GTX 2080Ti graphics card. There are 430 raw images with the size of 256 × 256 pixels, where 70% of the sample images were used for training and remaining 30% were used for testing. Data enhancements (rotation and mirroring) were performed to mitigate overfitting. The batch size was set to 16 for 750 epochs, and the decay and learning rate were set to 0.005 and 0.001, respectively. We selected the Enhanced DNA-net paradigm with ResNets18 [45] as the segmentation backbone, selecting the number of down-sampling layer as 3. Additionally, given that vessel targets in NTL images are much sparser compared to vessel targets in natural images, we specified the focusing parameter γ and the balance parameter α as 5 and 0.25, respectively. Furthermore, we used λ = 0.5 in the experiments.

In segmentation tasks, commonly used pixel-level evaluation metrics include the following: Intersection over Union (IOU), accuracy, and recall. These metrics are primarily used for assessing object shapes. However, similar to infrared small targets, vessel targets at night-time generally lack distinct shapes and textures, and this study places more emphasis on the location of the targets. Therefore, we employed IOU, P_d, and F_a to assess the network’s shape description and localization capabilities [18].

I O U = \frac{S_{d} \cap S_{t}}{S_{d} \cup S_{t}},

(18)

where S_d ∩ S_t is the intersection area between the predicted segmentation result and the ground truth label, while S_d ∪ S_t is their union area.

F_{a} = \frac{P_{false}}{P_{All}},

(19)

P_{d} = \frac{T_{correct}}{T_{All}},

(20)

where F_a and P_d represent false-alarm rate and probability of detection, respectively. T_correct is the correctly predicted target number; T_All is all target number. P_false and P_All are falsely predicted pixels and all image pixels, respectively. For F_a and P_d, thresholds have been set for classifying errors and correct predictions. If the deviation of the centroid of the target exceeds the predetermined threshold D_thresh, we consider these pixels as wrongly predicted pixels. In this study, we set the D_thresh as 3.

Furthermore, in order to quantitatively evaluate the model’s vessel localization capability, we introduced the mean positional deviation (MPD) to calculate the distance deviation between the center of detected vessels and their ground truth. The definition of MPD is shown as:

M P D = \frac{\sqrt{{(x_{p} - x_{g})}^{2} + {(y_{p} - y_{g})}^{2}}}{n},

(21)

where (x_p, y_p) and (x_g, y_g) represent the coordinates of the detected targets and their ground truth positions, respectively. n is the number of all detected targets.

4. Results

4.1. Night-Time Vessel-Detection Results

In this study, we consider using various index images and sDNB image as inputs to Enhanced DNA-net to improve feature extraction from the input perspective, addressing the current limitation of fewer extractable features. In this section, we include the overall night-time vessel-recognition results in our study area.

Figure 3 shows the raw images from DNB and the corresponding detection results based on VBD-AIS and the enhanced DNA network in the Sea of Japan (6 April 2022), the South China Sea (3 December 2022), and the Java Sea (9 June 2022). Notably, the positional deviations (Equation (22)) of each detected vessel in the predicted images between the centre of the detected vessels and their ground truth (VBD-AIS) are shown in the predicted images.

D e v i a t i o n = \sqrt{{(x_{p} - x_{g})}^{2} + {(y_{p} - y_{g})}^{2}}

(22)

where (x_p, y_p) and (x_g, y_g) represent the coordinates of the detected vessels and their ground truth positions, respectively.

As shown in Figure 3a, in area I of the Japan Sea, where the lighted vessels have a dense distribution, all lighted vessels on the DNB images can be essentially identified by Enhanced DNA-net under clear-sky conditions, and we found that there are some positional deviations in vessel identification (the mean positional deviation is 0.22 px). At the same time, where the distributions of vessels are relatively sparse (area II), the positional deviations are typically 1 px. Similarly, the mean positional deviations in the area of South China Sea and the Java Sea are 0.33 px and 0.57 px, respectively. However, compared with studies that aim to identify lighted vessel targets in the images, such a level of deviation is considered acceptable for locating lighted vessels.

Moreover, despite the cloud contamination in the observation time of the study area (Figure 3b,c), the detection of lighted vessels still shows a relatively high accuracy. It is important to mention that the illuminated point on the raw DNB imagery of area I in Figure 3b (highlighted by a red circle), which is identified by VBD-AIS as a non-vessel target and possibly is the gas flare on the sea, is also regarded as a non-vessel target by our method. This further confirms that our method can, to some extent, filter out other illuminated targets at sea that are not vessels.

4.2. Comparison of Different Approaches

To better demonstrate the effectiveness of Enhanced DNA-net in accurately extracting illuminated vessels from NTL images, we designed the following three comparative experiments to examine the performance of the proposed Enhanced DNA-net.

4.2.1. Enhanced DNA-Net versus DNA-Net

The most significant improvement in Enhanced DNA-net compared to DNA-net is the design of a multi-input structure, which fully utilizes information from the original image and two index images. Additionally, the input attention mechanism we designed enhances the effective fusion of features from various inputs, leading to improved detection accuracy. To investigate the benefits of this structure and attention mechanism, we compared the results of the multi-input network with those of the single-input network.

As shown in Table 1, the multi-input IOU, F_a, P_d, and MPD values have the best performance on our vessel dataset. Apart from multi-input, single-input values (sDNB) perform the best in F_a. This is because the sDNB comprises positional information with a higher resolution. Compared with the sDNB, the SMI performs better in P_d because of the ability to amplify the radiance difference between the lighted vessels and background pixels, which is confirmed in several related studies [13,46]. However, the SHI aims to differentiate the adjacent pixels, which contributes to the poor performance in metrics. In conclusion, it can be observed that the increase in inputs has contributed to some improvement in detection accuracy. Likewise, these input images complement each other in our Enhanced DNA-net, which can also be observed from the MPD metric. Compared to single-input models, Enhanced DNA-net has the smallest overall position deviation, at 0.36 px. DNA-net (SMI) similarly performs well in MPD due to its localization capability in comparison to other single-input models.

4.2.2. Ablation Study

(1): Effect of CAM and SAM

The CAM (Channel Attention Module) and SAM (Spatial Attention Module) in DNA-net have been proven to enhance the feature-fusion capability of DNA-net to some extent [18]. To investigate their performance in Enhanced DNA-net, we compared our Enhanced DNA-net with three variants: Enhanced DNA-net without CSAM, Enhanced DNA-net without SAM, and Enhanced DNA-net without CAM. The results of the ablation study are shown in Table 2. It can be observed that the performance of the Enhanced DNA-net suffers decreases of 1.72% and 2.23% and an increase of 1.62 × 10⁻⁶ and 0.32 px in terms of IOU, P_d, F_a, and MPD when CSAM is removed, which demonstrates the significance of the Channel and Spatial Attention Modules. The SAM makes more contributions to the performance of our model, with IOU and P_d increasing 0.85% and 0.58%, and F_a and MPD decreasing 0.99 × 10⁻⁶ and 0.25 px, because the Spatial Attention Module enables the network to focus more on local informative regions in comparison to the Channel Attention Module.

(2): DNIM Versus U-net

As most networks used for semantic segmentation (such as U-net) can merge features from different levels, for small targets like illuminated vessels in this study, bottom-level features are easily lost during the up-sampling and feature fusion process [47]. Differently, the DNIM feature-extraction module in Enhanced DNA-net has a densely nested structure, allowing the interaction of distinct features preserving intricate details at the most granular levels, such as point targets [18]. We used a standard U-net network for comparison to evaluate the performance of DNIM. The comparative results based on U-Net and DNIM are shown in Table 3. As shown in Table 3, the IOU and P_d values of w/o DNIM (network structure without DNIM module, i.e., U-net) suffer a decrease of 14.56% and 7.08%. The F_a and MPD suffer an increase of 4.15 × 10⁻⁶ and 1.69 px on our vessel dataset.

To observe how DNIM maintains information about small vessel targets in deep feature layers, the processing is visualized. For each layer’s feature map in DNIM, all the feature layers are added and normalized to visualize the thermal map in our single-input network (sDNB). To achieve better visualization, in the sDNB (DNA-net) with the most abundant semantic information, N(3,0), N(2,1), and N(1,2) are up-sampled to the same size as N(0,3). As shown in Figure 4, in comparison to Enhanced DNA-net, small targets are lost in the feature maps of the deep layer in Enhanced DNA-net with U-Net.

4.2.3. VBD versus Enhanced DNA-Net

Generally, when examining seasonal trends, spatial distribution patterns, and tracking and monitoring of night-time fishing vessels based on only VBD products, a typical choice is to select the bright targets with a QF of 1 from VBD data [48]. However, exclusively extracting detection results with the QF = 1 from VBD for the above study may inevitably result in a loss of result accuracy. Therefore, we selected the DNB images of our test dataset to compare the results obtained from VBD products (QF = 1) with the vessel-recognition results using the Enhanced DNA-net proposed in this study.

Based on the DNB images of our test datasets, a total of 3114 lighted vessels were detected using the algorithm proposed in this paper. In comparison, VBD data with QF = 1 identified a total of 3081 lighted vessels. Through cross-matching AIS with VBD, 3263 lighted vessels were recognized within the selected area. When compared with the number of lighted vessels identified by our algorithm and the number in the VBD products, there were differences between 149 vessels and 182 vessels, resulting in recognition accuracies of 95% and 94%, respectively (Table 4).

To better demonstrate how the proposed Enhanced DNA-net achieves equal to or slightly higher accuracy than VBD (QF = 1) in identifying lighted vessels using NTL imagery, we present the recognition results of Enhanced DNA-net and VBD (QF = 1) on the DNB image of Java Sea on 2 November 2022 (Figure 5).

5. Discussion

The proposed Enhanced DNA-net in this study for detecting illuminated vessels at night-time uses real samples derived from cross-matched VBD-AIS for training. Because of a significant increase in false positives in VBD products during a full moon, the recognition accuracy decreases substantially. Therefore, an Enhanced DNA-net is more suitable for clear or lightly clouded conditions around the new moon. VIIRS has 21 bands in addition to DNB, including the thermal infrared bands that can reflect surface temperature changes. Consequently, combining the DNB band with the thermal infrared bands for correlation analysis may be a method that can be used to address this issue. To be concrete, based on the comparison of DNB and 3.7 μm band (BT3.7) radiance, an empirical model for vessel detection was established, which is under moonlight conditions [49]. However, future validation is needed to confirm that the illuminated targets detected are pixels with stable light sources, such as fishing vessels.

Due to the limitations of the current method and data, it is hard to determine the number of vessels within a single pixel in DNB images and the specific size of detected vessels. Look up 1 and the recently launched SDGSAT satellite both carry a detector for the glimmer imager bands, with its spatial resolution raised to 38 m at the nadir and 10 m in the panchromatic band, providing the possibility for more accurate identification of lighted vessels [50,51]. By improving the corresponding vessel-detection algorithms, theoretically, multiple vessels that were previously contained within a single pixel on the original DNB image can now possibly be identified in the existing SDGSAT data.

However, we find that where the bright spots are relatively concentrated on the DNB image, there are some non-vessel targets identified as vessel targets by our method. There are two reasons this may happen on the dataset’s side: firstly, these bright spots are vessels that the VBD product (QF = 1) cannot detect, and secondly, there are vessels non-matched in the AIS system (the weak detections in VBD cannot be matched between AIS records). However, it is also possible that this is simply a false detection by the model. The solution is to combine more information sources for long-term continuous observation so that we can ensure whether the luminous vessels in the DNB images were overlooked in the cross-matched VBD-AIS and determine whether it is a false detection by the model.

6. Conclusions

Vessel detection from NTL images is mainly based on threshold-based methods to segment the images and identify the bright targets on the images. Currently, deep learning methods with the capability to learn target features automatically hold great potential for night-time vessel detection. Nevertheless, the limited size of the illuminated targets may affect feature retention within deep learning methods due to pooling operations in deep layers. Consequently, in this study, we propose a new night-time vessel detection method, namely, Enhanced DNA-net, which can efficiently protect the features of small targets. Obvious advantages of the proposed Enhanced DNA-net method are that it can avoid (i) the sharp decrease in the accuracy of vessel detection in circumstances of complex scenarios due to the model’s architecture and the additional information of multi-inputs, which allows it to capture more nuanced features and patterns; (ii) the uncertainty of selecting the threshold manually; and (iii) the loss of lighted vessel features as layers increase in a plain deep learning network.

The Enhanced DNA-net method was tested in the Japan Sea, the South China Sea, and the Java Sea. It is noteworthy that due to the lack of ground truth datasets, we cross-matched VBD with AIS and established a vessel ground truth label dataset for training and validation. The result shows that the Enhanced DNA-net improves the accuracy of night-time vessel detection, achieving 87.81%, 96.72%, 5.42%, and 0.36 on IOU, P_d, F_a, and MPD, respectively, and further validates the effectiveness of our approach. With the development of remote sensing technology and low-light sensors, more deep learning methods with higher spatial resolution NTL imagery will be explored to improve small target detection in future studies. This ongoing work seeks to advance the capabilities of our approach and contribute to a more comprehensive understanding of the potential applications of deep learning in night-time remote sensing. Moreover, it shows broad prospects for monitoring IUU activities and managing maritime traffic at night.

Author Contributions

G.Z. designed the research and performed the analysis; Y.M. wrote the draft; J.Z. gave important advice about the experiments; T.Z. and Z.L. reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Outstanding Youth Fund of Sichuan Province, China (grant number: 2023NSFSC1907), and in part by the Fundamental Research Funds for the Central Universities of China, the University of Electronic Science and Technology of China (grant number: ZYGX2019J069).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to thank the providers of the datasets used in this study, including NOAA for the VIIRS/DNB and GSHHG data, the Earth Observation Group for the VBD data, and Global Fishing Watching for AIS data. We also thank the editor and referees for their professional suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cabral, R.B.; Mayorga, J.; Clemence, M.; Lynham, J.; Koeshendrajana, S.; Muawanah, U.; Nugroho, D.; Anna, Z.; Mira; Ghofar, A.; et al. Rapid and Lasting Gains from Solving Illegal Fishing. Nat. Ecol. Evol. 2018, 2, 650–658. [Google Scholar] [CrossRef] [PubMed]
Chuaysi, B.; Kiattisin, S. Fishing Vessels Behavior Identification for Combating IUU Fishing: Enable Traceability at Sea. Wirel. Pers Commun. 2020, 115, 2971–2993. [Google Scholar] [CrossRef]
Deja, A.; Ulewicz, R.; Kyrychenko, Y. Analysis and Assessment of Environmental Threats in Maritime Transport. Transp. Res. Procedia 2021, 55, 1073–1080. [Google Scholar] [CrossRef]
Li, X.; Xiao, Y.; Su, F.; Wu, W.; Zhou, L. AIS and VBD data fusion for marine fishing intensity mapping and analysis in the northern part of the South China Sea. Int. J. Geo Inf. 2021, 10, 277. [Google Scholar] [CrossRef]
Hsu, F.-C.; Elvidge, C.D.; Baugh, K.; Zhizhin, M.; Ghosh, T.; Kroodsma, D.; Susanto, A.; Budy, W.; Riyanto, M.; Nurzeha, R.; et al. Cross-matching VIIRS boat detections with vessel monitoring system Tracks in Indonesia. Remote Sens. 2019, 11, 995. [Google Scholar] [CrossRef]
Ophoff, T.; Puttemans, S.; Kalogirou, V.; Robin, J.-P.; Goedemé, T. Vehicle and vessel detection on satellite imagery: A comparative study on single-shot detectors. Remote Sens. 2020, 12, 1217. [Google Scholar] [CrossRef]
He, C.; Ma, Q.; Liu, Z.; Zhang, Q. Modeling the spatiotemporal dynamics of electric power consumption in mainland China using saturation-corrected DMSP/OLS nighttime stable light data. Int. J. Digit. Earth. 2014, 7, 993–1014. [Google Scholar] [CrossRef]
Huang, Q.; He, C.; Gao, B.; Yang, Y.; Liu, Z.; Zhao, Y.; Dou, Y. Detecting the 20 year city-size dynamics in China with a rank clock approach and DMSP/OLS nighttime data. Landsc. Urban Plan. 2015, 137, 138–148. [Google Scholar] [CrossRef]
Levin, N.; Kyba, C.C.M.; Zhang, Q.; Sánchez de Miguel, A.; Román, M.O.; Li, X.; Portnov, B.A.; Molthan, A.L.; Jechow, A.; Miller, S.D.; et al. Remote Sensing of Night Lights: A review and an outlook for the future. Remote Sens. Environ. 2020, 237, 111443. [Google Scholar] [CrossRef]
Elvidge, C.D.; Baugh, K.E.; Zhizhin, M.; Hsu, F.-C. Why VIIRS data are superior to DMSP for mapping nighttime lights. Proc. Asia-Pac. Adv. Netw. 2013, 35, 62. [Google Scholar] [CrossRef]
Miller, S.D.; Straka III, W.; Mills, S.P.; Elvidge, C.D.; Lee, T.F.; Solbrig, J.; Walther, A.; Heidinger, A.K.; Weiss, S.C. Illuminating the capabilities of the suomi national polar-orbiting partnership (NPP) visible infrared imaging radiometer suite (VIIRS) day/night band. Remote Sens. 2013, 5, 6717–6766. [Google Scholar] [CrossRef]
Tan, X.; Zhu, X.; Chen, J.; Chen, R. Modeling the direction and magnitude of angular effects in nighttime light remote sensing. Remote Sens. Environ. 2022, 269, 112834. [Google Scholar] [CrossRef]
Elvidge, C.; Zhizhin, M.; Baugh, K.; Hsu, F.-C. Automatic boat identification system for VIIRS low light imaging data. Remote Sens. 2015, 7, 3020–3036. [Google Scholar] [CrossRef]
Kim, E.; Kim, S.-W.; Jung, H.C.; Ryu, J.-H. Moon phase based threshold determination for VIIRS boat detection. Korean J. Remote Sens. 2021, 37, 69–84. [Google Scholar] [CrossRef]
Xue, C.; Gao, C.; Hu, J.; Qiu, S.; Wang, Q. Automatic boat detection based on diffusion and radiation characterization of boat lights during night for VIIRS DNB imaging data. Opt. Express. 2022, 30, 13024–13038. [Google Scholar] [CrossRef] [PubMed]
Shao, J.; Yang, Q.; Luo, C.; Li, R.; Zhou, Y.; Zhang, F. Vessel detection from nighttime remote sensing imagery based on deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 12536–12544. [Google Scholar] [CrossRef]
Tsuda, M.E.; Miller, N.A.; Saito, R.; Park, J.; Oozeki, Y. Automated VIIRS boat detection based on machine learning and its application to monitoring fisheries in the East China Sea. Remote Sens. 2023, 15, 2911. [Google Scholar] [CrossRef]
Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense nested attention network for infrared small target detection. IEEE Trans. Image Process. 2022, 32, 1745–1758. [Google Scholar] [CrossRef]
Li, Y.; Li, S.; Du, H.; Chen, L.; Zhang, D.; Li, Y. YOLO-ACN: Focusing on small target and occluded object detection. IEEE Access 2020, 8, 227288–227303. [Google Scholar] [CrossRef]
Song, Z.; Yang, J.; Zhang, D.; Wang, S.; Li, Z. Semi-supervised dim and small infrared ship detection network based on haar wavelet. IEEE Access 2021, 9, 29686–29695. [Google Scholar] [CrossRef]
Nie, Y.; Tao, Y.; Liu, W.; Li, J.; Guo, B. Deep learning method for ship detection in nighttime sensing images. Sens. Mater. 2022, 34, 4521–4538. [Google Scholar] [CrossRef]
Dai, Z.; Cai, B.; Lin, Y.; Chen, J. UP-DETR: Unsupervised Pre-Training for Object Detection with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1601–1610. [Google Scholar]
Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6877–6886. [Google Scholar]
Yoo, S.; Park, J. Why is the southwest the most productive region of the East Sea/Sea of Japan? J. Mar. Syst. 2009, 78, 301–315. [Google Scholar] [CrossRef]
Zhao, X.; Li, D.; Li, X.; Zhao, L.; Wu, C. Spatial and seasonal patterns of night-time lights in global ocean derived from VIIRS DNB images. Int. J. Remote Sens. 2018, 39, 8151–8181. [Google Scholar] [CrossRef]
Apriansyah; Atmadipoera, A.S.; Nugroho, D.; Jaya, I.; Akhir, M.F. Simulated seasonal oceanographic changes and their implication for the small pelagic fisheries in the Java Sea, Indonesia. Mar. Environ. Res. 2023, 188, 106012. [Google Scholar] [CrossRef] [PubMed]
VIIRS/DNB SDR Product. Available online: https://www.avl.class.noaa.gov/saa/products/ (accessed on 12 April 2023).
GSHHG—A Global Self-Consistent, Hierarchical, High-Resolution Geography Database (Online). Available online: http://www.ngdc.noaa.gov/mgg/shorelines/gshhs.html (accessed on 8 April 2023).
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Sarangi, R.K.; Nagendra Jaiganesh, S.N. VIIRS Boat Detection (VBD) Product-Based Night Time Fishing Vessels Observation in the Arabian Sea and Bay of Bengal Sub-Regions. Geocarto Int. 2022, 37, 3504–3519. [Google Scholar] [CrossRef]
Motomura, K.; Nagao, T. Fishing Activity Prediction from Satellite Boat Detection Data. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2870–2875. [Google Scholar]
Meng, Y.; Zhou, J.; Wang, Z.; Tang, W.; Ma, J.; Zhang, T.; Long, Z. Retrieval of nighttime aerosol optical depth by simultaneous consideration of artificial and natural light sources. Sci. Total Environ. 2023, 896, 166354. [Google Scholar] [CrossRef]
VIIRS Boat Detection (VBD) Products (Online). Available online: http://payneinstitute.mines.edu/eog/viirs-vessel-detection-vbd/ (accessed on 18 April 2023).
Balduzzi, M.; Pasta, A.; Wilhoit, K. A security evaluation of AIS automated identification system. In Proceedings of the 30th Annual Computer Security Applications Conference, New Orleans, LA, USA, 8–12 December 2014; ACM: New Orleans, LA, USA, 2014; pp. 436–445. [Google Scholar] [CrossRef]
Lee, E.; Mokashi, A.J.; Moon, S.Y.; Kim, G. The maturity of automatic identification systems (AIS) and its implications for innovation. J. Mar. Sci. Eng. 2019, 7, 287. [Google Scholar] [CrossRef]
Automatic Identification System (AIS) (Online). Available online: https://globalfishingwatch.org/map (accessed on 18 April 2023).
Sánchez, R.J.; Perrotti, D.E. Looking into the future: Big full containerships and their arrival to South American ports. Marit. Policy Manag. 2012, 39, 571–588. [Google Scholar] [CrossRef]
Chua, L.O.; Roska, T. The CNN paradigm. IEEE Trans. Circuits Syst. I 1993, 40, 147–156. [Google Scholar] [CrossRef]
Girshick, R. Fast R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Gallego, A.-J.; Pertusa, A.; Gil, P. Automatic ship classification from optical aerial images with convolutional neural networks. Remote Sens. 2018, 10, 511. [Google Scholar] [CrossRef]
Zou, Z.; Shi, Z. Random Access Memories: A new paradigm for target detection in high resolution aerial remote sensing images. IEEE Trans. Image Process. 2018, 27, 1100–1111. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2016, arXiv:1409.0473. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10553, pp. 240–248. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Guo, G.; Fan, W.; Xue, J.; Zhang, S.; Zhang, H.; Tang, F.; Cheng, T. Identification for Operating Pelagic Light-Fishing Vessels Based on NPP/VIIRS Low Light Imaging Data. Trans. Chin. Soc. Agric. Eng. 2017, 33, 245–251. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Song, L.; Zhao, S.; Zhao, D.; Wu, Y.; You, G.; Kong, Z.; Xi, X.; Yu, Z. Nighttime fishing vessel observation in Bohai Sea based on VIIRS fishing vessel detection product (VBD). Fish Res. 2023, 258, 106539. [Google Scholar] [CrossRef]
Yamaguchi, T.; Asanuma, I.; Park, J.G.; Mackin, K.J.; Mittleman, J. Estimation of vessel traffic density from suomi NPP VIIRS day/night band. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar] [CrossRef]
Zhu, X.; Tan, X.; Liao, M.; Liu, T.; Su, M.; Zhao, S.; Xu, Y.N.; Liu, X. Assessment of a new fine-resolution nighttime light imagery from the Yangwang-1 (“Look up 1”) satellite. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Zhao, Z.; Qiu, S.; Chen, F.; Chen, Y.; Qian, Y.; Cui, H.; Zhang, Y.; Khoramshahi, E.; Qiu, Y. Vessel detection with SDGSAT-1 nighttime light images. Remote Sens. 2023, 15, 4354. [Google Scholar] [CrossRef]

Figure 1. Location of the study area (the yellow, red, and blue boxes represent the Japan Sea, South China Sea, and Java Sea, respectively).

Figure 2. Detailed implementation of the network structure.

Figure 3. Recognition results in the Japan Sea (a), South China Sea (b), and Java Sea (c).

Figure 4. Visualization map of Enhanced DNA-net and Enhanced DNA-net without DNIM (U-Net); the ground truth actual positions of the vessels are marked with red dots.

Figure 5. Recognition results of Enhanced DNA-net and VBD (QF = 1). The missed detection vessels are highlighted by red circles.

Table 1. Overall detection results.

Method	IOU (×10⁻²)	P_d (×10⁻²)	F_a (×10⁻⁶)	MPD (px)
Enhanced DNA-net	87.81	96.72	5.42	0.36
DNA-net (sDNB)	86.37	92.06	5.11	1.56
DNA-net (SMI)	83.64	93.52	7.64	0.97
DNA-net (SHI)	80.84	89.33	9.28	2.38

Table 2. Performance comparison of the CSAM.

Method	IOU (×10⁻²)	P_d (×10⁻²)	F_a (×10⁻⁶)	MPD (px)
Enhanced DNA-net w/o CSAM	86.09	94.49	7.04	0.68
Enhanced DNA-net w/o CAM	86.78	94.75	6.78	0.50
Enhanced DNA-net w/o SAM	86.94	95.07	6.05	0.43

Table 3. Detection results with or without DNIM.

Module	IOU (×10⁻²)	P_d (×10⁻²)	F_a (×10⁻⁶)	MPD (px)
Enhanced DNA-net	87.81	96.72	5.42	0.36
w/o DNIM (U-Net)	73.25	89.64	9.57	2.05

Table 4. Identification accuracy of lighted vessels.

Module	Identification Numbers	Extraction Number of VBD-AIS	Identification Accuracy
VBD (QF = 1)	3081	3263	0.94
Enhanced DNA-net	3114	3263	0.95

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zuo, G.; Zhou, J.; Meng, Y.; Zhang, T.; Long, Z. Night-Time Vessel Detection Based on Enhanced Dense Nested Attention Network. Remote Sens. 2024, 16, 1038. https://doi.org/10.3390/rs16061038

AMA Style

Zuo G, Zhou J, Meng Y, Zhang T, Long Z. Night-Time Vessel Detection Based on Enhanced Dense Nested Attention Network. Remote Sensing. 2024; 16(6):1038. https://doi.org/10.3390/rs16061038

Chicago/Turabian Style

Zuo, Gao, Ji Zhou, Yizhen Meng, Tao Zhang, and Zhiyong Long. 2024. "Night-Time Vessel Detection Based on Enhanced Dense Nested Attention Network" Remote Sensing 16, no. 6: 1038. https://doi.org/10.3390/rs16061038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Night-Time Vessel Detection Based on Enhanced Dense Nested Attention Network

Abstract

1. Introduction

2. Study Area and Materials

2.1. Study Area

2.2. Datasets and Pre-Processing

2.2.1. VIIRS/DNB

2.2.2. Auxiliary Data

3. Methodology

3.1. Enhanced Dense Nested Attention Network

3.2. Loss Function

3.3. Training Parameter and Evaluation Metrics

4. Results

4.1. Night-Time Vessel-Detection Results

4.2. Comparison of Different Approaches

4.2.1. Enhanced DNA-Net versus DNA-Net

4.2.2. Ablation Study

4.2.3. VBD versus Enhanced DNA-Net

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI