Improved YOLOv8n for Lightweight Ship Detection

Gao, Zhiguang; Yu, Xiaoyan; Rong, Xianwei; Wang, Wenqi

doi:10.3390/jmse12101774

Open AccessArticle

Improved YOLOv8n for Lightweight Ship Detection

by

Zhiguang Gao

¹,

Xiaoyan Yu

²,

Xianwei Rong

^2,*

and

Wenqi Wang

²

¹

School of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China

²

School of Physics and Electronic Engineering, Harbin Normal University, Harbin 150025, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(10), 1774; https://doi.org/10.3390/jmse12101774

Submission received: 8 September 2024 / Revised: 30 September 2024 / Accepted: 1 October 2024 / Published: 6 October 2024

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Automatic ship detection is a crucial task within the domain of maritime transportation management. With the progressive success of convolutional neural networks (CNNs), a number of advanced CNN models have been presented in order to detect ships. Although these detection models have achieved marked performance, several undesired results may occur under complex maritime conditions, such as missed detections, false positives, and low detection accuracy. Moreover, the existing detection models endure large number of parameters and heavy computation cost. To deal with these problems, we suggest a lightweight ship model of detection called DSSM–LightNet based upon the improved YOLOv8n. First, we introduce a lightweight Dual Convolutional (DualConv) into the model to lower both the number of parameters and the computational complexity. The principle is that DualConv combines two types of convolution kernels, 3x3 and 1x1, and utilizes group convolution techniques to effectively reduce computational costs while processing the same input feature map channels. Second, we propose a Slim-neck structure in the neck network, which introduces GSConv and VoVGSCSP modules to construct an efficient feature-fusion layer. This fusion strategy helps the model better capture the features of targets of different sizes. Meanwhile, a spatially enhanced attention module (SEAM) is leveraged to integrate with a Feature Pyramid Network (FPN) and the Slim-neck to achieve simple yet effective feature extraction, minimizing information loss during feature fusion. CIoU may not accurately reflect the relative positional relationship between bounding boxes in some complex scenarios. In contrast, MPDIoU can provide more accurate positional information in bounding-box regression by directly minimizing point distance and considering comprehensive loss. Therefore, we utilize the minimum point distance IoU (MPDIoU) rather than the Complete Intersection over Union (CIoU) Loss to further enhance the detection precision of the suggested model. Comprehensive tests carried out on the publicly accessible SeaShips dataset have demonstrated that our model greatly exceeds other algorithms in relation to their detection accuracy and efficiency, while reserving its lightweight nature.

Keywords:

ship detection; DualConv; Slim-neck; SEAM; MPDIoU; YOLOv8n

1. Introduction

Ship inspection is a crucial link in the field of navigation. Given the quick progress of the global shipping sector and the exploitation of marine resources, water transportation carries nearly 90% of the global trade volume [1]. As a result, the number of ships at sea is constantly increasing, and ship collision accidents are frequent. Especially in extreme weather conditions, this phenomenon may become more serious. Thus, to ensure the safety of crew members, relevant departments have proposed stricter guidelines for the monitoring and control of marine traffic. Traditional ship-monitoring systems mainly rely on manual visual inspection and radar systems, but they have limits in addition to consuming labor and material resources. For example, artificial vision inspection is limited by observer experience and fatigue, limited ability to recognize targets in complex sea conditions, and insufficient real-time performance. Although the radar system can cover a wide sea area, for severe weather such as rainstorm and typhoon [2], the detection accuracy often has a large deviation. Accordingly, the effectively detecting ships in complicated circumstances is a huge difficulty in the field of computer vision that requires immediate attention.

With the rapid development of computer vision technology, object detection, as one of its core tasks, has shown broad application prospects in various fields. The DSSM–LightNet model proposed in this article is a lightweight and efficient object detection framework that demonstrates excellent performance in ship detection tasks. It not only has high accuracy but also has good real-time performance. It is worth noting that the design concept of the DSSM–LightNet model and the flexibility of its core components provide a solid foundation for the expansion of applications in other object detection fields. For example, in the field of drone detection, with the popularization and widespread application of drone technology, the demand for efficient and accurate drone detection is becoming increasingly urgent. Similarly, in the field of vehicle detection, facing complex traffic environments and huge data-processing requirements, the DSSM–LightNet model can also leverage its advantages to enhance the intelligence level of traffic monitoring.

In addition, with its powerful feature-learning and -classification capabilities, deep learning has made significant advances in ship-recognizing tasks. The learning-based vessel detection methods can be generally separated into two categories: one-stage and two-stage approaches. The latter locates and classifies the candidate box after first creating one in the candidate area. Typical two-stage models include RCNN [3] (region-based convolutional neural network), Fast-RCNN [4], Faster-RCNN [5], Mask-RCNN [6], etc. Such models usually have high detection accuracy and robustness, whereas they are inappropriate for real-time recognition of objects tasks due to their slow speed. In contrast, the former can accurately forecast the target position and category from the input image without generating candidate boxes. Typical one-stage models embrace the YOLO [7,8,9,10,11,12] (You Only Look Once) series, SSD [13] (Single Shot MultiBox Detector), and RetinaNet [14]. Such models can significantly increase the detection speed while maintaining their accuracy, enhancing their suitability for detection in real time at sea. Because of the complexity of visual image backgrounds as well as the variety of ship classes, existing methods based on convolutional neural networks (CNNs) either have low detection accuracy or require too complex computation, resulting in insufficient lightweighting. In addition, there exist certain cases of over-looked and erroneous identification of ships under complex ocean conditions. Regarding this, Shao et al. [15] suggested a framework for saliency-aware ship detection from images obtained from surveillance camera networks. CNN was used to extract ship features in order to lower the ship’s missed detection rate, and global contrast-based saliency region detection was used for position correction. Wang et al. [16] suggested a better ship detection approach founded on YOLOv3, improving the accuracy of detection and localization by the combination of multi-scale feature maps and their re-generation utilizing a high-resolution network. Guo et al. [17] introduced a portable SAR ship identification model named LMSD–YOLO, which has good multi-scale adaptive ability and significantly improves the ability of small ships to detect. Xing et al. [18] proposed a lightweight algorithm called YOLOv8–FAS, which integrates a productive FasterNet component and an attention mechanism obtaining lightweight and improved feature extraction skills for backbone feature extraction of the model. The following is a summary of the study’s contributions:

We present a lightweight ship detection model by combining a lightweight convolution called DualConv with C2f to obtain a lightweight C2f.
We propose the Slim-neck structure by combining VoVGSCSP and lightweight GSConv to extract feature information at different scales, achieving a better balance between computational cost and detection performance within the network of feature fusion.
We introduce a new IoU loss metric based on minimum point distance (MPDIoU) to reduce functional loss, thereby enabling improved accuracy and quicker convergence in the regression results.
We leverage the SEAM attention module to promote the object-detecting process carried out by addressing the issue of occlusion.
Our proposed model has achieved remarkable performance improvements, specifically: precision of 97.7%, recall of 94.8%, [email protected] at 98.5%, and [email protected] at 78.2%, with respective increases of 1%, 0.4%, 0.7%, and 2.6%. Additionally, the model has reduced its parameter count by 8.6% and computational complexity by 12.3%, achieving a favorable balance between detection accuracy and lightweight design.

This paper is structured as follows for the Sections that follow: Section 2 reviews advanced object detection algorithms (primarily focusing on the YOLO series). Section 3 describes the proposed DSSM–LightNet model in detail. Section 4 mainly illustrates the outcomes of the experiment and discussion. Section 5 summarizes the contribution of this work and future direction.

2. Related Works

2.1. The YOLO Series

Prior to YOLOv1’s release, the R-CNN series of detection algorithms dominated the area of object identification because of its high detection accuracy and the ability of the region proposal mechanism to reduce computational cost. However, because of the failure to achieve real-time detection, the development of a faster detection algorithm has become a trend. In 2016, YOLOv1, a single-stage object-detection technique, was outlined by Joseph Redmon et al. [7]; it boasts a high detection speed and the ability to process video in real-time, with the goal of splitting the image into multiple grids and processing each grid individually to predict the bounding box and category of the target. Still, this algorithm has inaccuracies in localization and is not effective for detecting small targets.

In 2017, Joseph Redmon proposed YOLOv2 [8], which introduced the anchor box mechanism and applied batch normalization, making the network easier to converge during training. YOLOv3 [9] further improved the model’s performance by combining ResNet [19] (Residual Network) and FPN [20], enabling it to better capture features of different sizes. YOLOv4 [10] was released in 2020, introducing a series of improvements based on YOLOv3, such as the introduction of the CIoU loss function, PAN [21] (Path Aggregation Network), SAM (Spatial Attention Module), and Mosaic data augmentation capabilities. In the same year, Glenn Jocher et al. released YOLOv5, which utilizes the C3 (Bottleneck CSP with three convolutions) structure, allowing the model to more effectively capture multi-scale data of objects and improving the robustness of the model. The primary improvement of YOLOv6 [11] is the introduction of the RepVGG (Re-parameterization Visual Geometry Group) architecture, which has a multi-branch topology during training, resulting in improved inference speed and hardware friendliness. Alexey Bochkovskiy and colleagues suggested YOLOv7 [12] in 2022, which leverages infinitely stacked computational blocks and ELAN (Efficient Long-Range Aggregation Network) to enhance detection accuracy. Taking command of the longest and shortest gradient pathways, YOLOv7 enables deeper networks to learn and converge effectively. In 2023, YOLOv8 utilizes the C2f (CSPBottleneck with 2 convolutions) module to deeply extract characteristics and enriches the model’s gradient flow through more cross-layer branch connections, improving detection results. The SPPF (Spatial Pyramid Pooling Fast) module, located at the end of the backbone network, employs three max layers of pooling to handle features of different scales to enhance the abstraction of features in the network capabilities.

In recent years, several scholars have developed various detection algorithms based on YOLOv8, which can detect desired targets with even greater accuracy. Lin et al. [22] proposed a network based on Faster R-CNN with improvements, which additionally improves detection performance by utilizing compression and excitation mechanisms. Lan et al. [23] presented a better ship detection algorithm depended on YOLOv8n, which utilizes C2f_EMAN to integrate contextual information from different scales and incorporates Concatenate_FBiFPN within the framework. This modification can better address the issues related to propagation of characteristics and data flow. Zhao et al. [24] suggested a YOLOv8 ship detection model enhanced with MobileViT and GSConv, which was able to attain greater detection precision and was appropriate for edge-computing devices. Yi et al. [25] raised an improved detection network called LAR–YOLOv8, which utilizes a dual-branch attention mechanism to design a bidirectional feature pyramid network for generating more discriminative features. It has been proven that LAR–YOLOv8 achieves better detection results.

2.2. Attention Mechanism

The attention mechanism, inspired by people perception and attention, can assign various attention weights across various segments associated with the input data, permitting the model to focus primarily on specific portions of the input. The SE [26] (Squeeze-and-Excitation) attention mechanism, proposed by Hu et al. in 2018, improves representational power of the model by adaptively weighting different channel feature maps. In the same year, Zhang et al. introduced the CA [27] (Coordinate Attention) mechanism. The CA attention mechanism mainly focuses on the relationships among several channels of feature maps, aiming to capture the correlations between different channels to enhance the feature capabilities of the model, but it neglects spatial information. To resolve this matter, Woo et al. suggested the CBAM [28] (Convolutional Block Attention Module), which combines CAM (Channel Attention Module) and SAM (Spatial Attention Module). This facilitates the model to better attend to both channel information and spatial information across different channels, resulting in richer high-level features. In 2022, Yu et al. raised an attention module called SEAM [29] and introduced repulsion loss to enhance the model’s functionality in handling occlusion issues. The module is used to tackle object occlusion, particularly in the case of occluded faces, and SEAM achieves this goal by enhancing the features of the un-occluded facial regions.

2.3. Loss Function

The object-detection loss function is typically divided into two sections: a regression loss function and a classification loss function. Yu et al. proposed a loss function founded on IoU (Intersection over Union). The idea behind it is to divide the predicted bounding box and ground truth box’s region of intersection by the area of the two boxes’ union. In the absence of any intersection between the bounding box of truth and the predicted box, the value of IoU becomes 0, and at that point, the IoU loss function loses its significance. To cope with this situation, Rezatofighi et al. introduced a generalized IoU (GIoU) [30] loss in 2019, which introduces the concept of the smallest enclosing rectangle based on IoU. However, when the anticipated box is included within the target box and the difference between the two boxes is the same, the GIoU fails to distinguish their relative positions, and GIoU degenerates into IoU. Consequently, in 2020, Zheng et al. suggested a distance IoU (DIoU) loss [31], which introduced the Euclidean distance between the ground truth box’s and the anticipated box’s center points. This allows DIoU to better reflect the predicted box’s accuracy. However, when the ground truth box encloses the anticipated box and the predicted boxes’ center points are in the same position, the DIoU loss function becomes ineffective. Subsequently, Zheng et al. recognized that DIoU only considered the distance from the midpoint of the predicted box to the midpoint of the ground truth box and overlap area, without considering the aspect ratio. Thus, in 2020, Zheng et al. suggested the CIoU loss function. CIoU [31] introduces an influencing factor based on DIoU, taking into account the aspect ratio of both the predicted box and the target box. CIoU is able to more comprehensively evaluate the degree of matching between the real box and the predicted box. Zhang et al. (2022) considered that the design of CIoU was not quite reasonable, so they proposed Efficient IoU(EIoU) [32] loss by separating the aspect ratio influencing factors of the ground truth box and the anticipated box in the CIoU penalty term. They calculated the breadth and length of the predicted box and the ground real box separately. They also introduced the concept of focal loss, adjusting the weights to balance the losses between samples with good regression quality and those with poor regression quality, enabling the model has improved handling samples of various difficulties. However, the aforementioned loss functions do not distinguish well between irregular shapes. Given the benefits and drawbacks of these loss functions, Ma et al. proposed MPDIoU [33] in 2023. To cater to the actual task of ship detection, we have introduced MPDIoU into our model. MPDIoU is a bounding-box regression loss function that is both accurate and efficient, depending on the minimum point distance. It incorporates all pertinent elements that are taken into account in current loss functions, including regions that overlap or do not overlap, center-to-point separation, width and height deviations. MPDIoU directly focuses on the corner point distances of the bounding boxes, rather than relying solely on the boundary boxes’ overlapped region. This allows MPDIoU to provide more accurate localization when dealing with complex shapes, making it more sensitive to irregular shapes of target objects, and achieving accurate and efficient bounding-box regression.

3. Proposed Model

3.1. Motivation

With the development of deep-learning technology, new algorithms and architectures continue to emerge, providing the possibility to propose a better model. The existing models may not be able to meet the requirements of high-demand scenarios due to high parameter count, low detection accuracy, and false positives. Therefore, it is possible to propose a new model aimed at reducing detection costs, improving detection efficiency, and contributing to the sustainable development of the shipping industry by optimizing ship design, reducing energy consumption, and other means. Figure 1 illustrates the motivation for improving the model.

3.2. Exploration Process

To address the challenges in maritime ship detection, including low accuracy, missed detections, false detections, and complex deployment networks. We have modified the YOLOv8n network and proposed a lightweight model named DSSM–LightNet. Figure 2 shows the exploration process of DSSM–LightNet, and a detailed description of each component is provided in Figure 2.

3.3. Improvement Areas

The ship detection model we propose is an upgraded version of YOLOv8. The traditional YOLOv8 model is separated into five types based on the size and complexity of the network: n, s, m, l, and x. YOLOv8n offers advantages such as fast detection and inference speed, fewer parameters, and balanced performance. YOLOv8n is used as the baseline model, inspired by DualConv [34], Slim-neck [35], SEAM, and MPDIoU loss function. We present a lightweight ship-recognizing network, named DSSM–LightNet. The DSSM–LightNet model’s structure is depicted in Figure 3. DSSM–LightNet improves detection accuracy, reduces computational complexity and the number of parameters, ultimately reaching real-time detection of naval vessels.

3.4. DualConv

To make the model more efficient, we introduce a lightweight and reliable convolutional DualConv in the backbone network for feature extraction. This convolution can effectively reduce parameter and computational while maintaining high-quality feature representation. By combining DualConv with C2f, we generate a new structure called the C2f_Dual module. In DSSM–LightNet, we have replaced all the C2f structures in the YOLOv8n backbone network with C2f_Dual. Compared to the traditional C2f, C2f_Dual possesses the ability to obtain richer and more diverse feature representations, enabling more comprehensive multiple scales information fusion. The structure of C2f_Dual is illustrated in Figure 4.

Replacing each C2f structure in the backbone network with C2f_Dual, Figure 5 shows the various positions where C2f_Dual has been substituted in the backbone network. When C2f_Dual is placed at the location indicated in Figure 5d, the computational cost and number of parameters are minimized, while the model accuracy is highest, resulting in the best overall performance. Referring to Table 1, one can notice that, as the number of C2f_Duals rises, the model’s parameters and computational cost gradually decrease. The experimental results reveal that DualConv is a lightweight convolution, playing a lightweight role in the model.

3.5. Slim-Neck

To further make the suggested model as light as possible while maintaining its detection accuracy, we propose a lightweight Slim-neck structure composed of GSConv and VoVGSCSP. The Slim-neck was originally applied to the visual system of self-driving vehicles. It integrates the lightweight ideas of GhostNet [36] and ShuffleNetv2 [37], which helps simplify the model size while ensuring the backbone network’s capability remains intact. The output of standard convolution can lead to excessive feature maps, resulting in computational redundancy, and the emergence of GhostNet precisely addresses this issue. To make the results of depthwise separable convolution as near as possible to standard convolution, standard convolution and depthwise separable convolution can be combined, leading to a convolutional structure called GSConv. The idea of GSConv is to first use the conventional convolutional process to create a feature map A, and then apply depthwise separable convolution to obtain a feature map B by going to the feature map A. Last, these two parts are concatenated together and then via channel shuffle in order to obtain the transformed feature map as the output. ShuffleNetv2 primarily focuses on the issue where the process of computation necessitates the separation of channel information within the feature maps that were input, thus limiting the interaction of information between channels. Channel shuffle serves to reorganize the feature maps’ channels processed by depthwise convolution, enabling the interaction of information between channels with lower computational costs.

3.6. SEAM

SEAM, originally proposed by YOLO-Face, is an attention mechanism that can effectively improve the detection of occluded objects. SEAM aims to attain this objective by making up for the response loss of the occluded parts and enhancing the response of the un-occluded parts. The SEAM is implemented by combining depthwise separable convolution with residual connections. While depthwise separable convolution is able to learn the relative relevance of several channels, it disregards the link between channels and information. To address this issue, the output of depthwise separable convolution is combined with point convolution, and then two fully connected layers are harnessed for the purpose of combining the data from every channel, thereby enhancing the information exchange between different channels. This model improves the loss under occlusion conditions by learning the relationship between occluded and un-occluded surfaces, thus achieving better detection of occluded objects. SEAM includes three different sizes (patch = 6, patch = 7, patch = 8) of CSMMs (Channel and Spatial Mixing Modules). The outputs of these modules are processed through average pooling to extract global features, followed by a channel expansion operation. In the end, the original features are multiplied by the SEAM’s output features. CSMM utilizes multi-scale features through different patch sizes and employs depthwise separable convolution to learn the correlation between spatial and channel information.

3.7. MPDIoU

In the YOLOv8n network, we calculated the coordinate loss of the prediction box with the CIoU loss function, and it can be computed as follows:

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(1)

V = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(2)

α = \frac{v}{1 - I o U + v}

(3)

In the above equation,

w^{g t}, h^{g t}

reflect the true bounding box’s height and width,

w

,

h

represent the width and height of the predicted bounding box,

ρ^{2} (b, b^{g t})

represents the Euclidean distance between the centers of the target bounding box versus the model’s predicted box,

I o U

represents the Intersection over Union between the ground truth box and the predicted bounding box,

c^{2}

represents the squared diagonal length of the minimum bounding rectangle that encloses both the ground truth box and the predicted bounding box,

α

symbolizes the weight parameter, and

v

represents the metric that assesses the aspect ratio’s consistency.

Boundary-box regression is extensively employed in object detection, as it enables precise framing of the location of the target, making it an indispensable part of object localization. However, most existing bounding-box regression loss functions encounter difficulties when dealing with cases where the predicted and actual target boxes possess the same ratio of aspect but different widths and heights. However, we thoroughly examine the geometric properties of rectangles that are horizontal and propose a novel bounding-box regression function of loss known as MPDIoU, which is according to the smallest point distance. We have presented MPDIoU into our work. MPDIoU streamlines the computation steps by minimizing the deviation of the predicted bounding box from the actual one’s top-left and bottom-right corners, thereby attaining productive and precise bounding box regression. The following is the MPDIoU-calculating method:

d_{1}^{2} = {(x_{1}^{B} - x_{1}^{A})}^{2} + {(y_{1}^{B} - y_{1}^{A})}^{2}

(4)

d_{2}^{2} = {(x_{2}^{B} - x_{2}^{A})}^{2} + {(y_{2}^{B} - y_{2}^{A})}^{2}

(5)

M P D I o U = \frac{A \cap B}{A U B} - \frac{d_{1}^{2}}{w^{2} + h^{2}} - \frac{d_{2}^{2}}{w^{2} + h^{2}}

(6)

In the above equation, for any two random convex shapes A and B, the variables w and h signify the height and width of the input image, respectively. (

x_{1}^{A}, y_{1}^{A}

), (

x_{2}^{A}, y_{2}^{A}

) indicate the coordinates for top-left and bottom-right corners of shape A, while (

x_{1}^{B}, y_{1}^{B}

), (

x_{2}^{B}, y_{2}^{B}

) stand for the coordinates for top-left and bottom-right corners of shape B.

We ran tests on the SeaShips dataset to confirm the efficiency of various loss functions on the model, and Table 2 displays the outcomes. As is evident when comparing the precision of detection of various ship types, the MPDIoU loss function we introduced achieved the best results. In contrast to the original YOLOv8n model’s CIoU loss function, MPDIoU achieved marked improvements in the performance of detection of some ship categories.

4. Experimental Results and Analysis

To validate the effectiveness of our newly designed network, we conduct extensive experiments on the public SeaShips [38] dataset. Through the quantitative and qualitative contrasting of the suggested DSSM–LightNet with other networks, we demonstrate the suggested model’s effectiveness.

4.1. Experimental Environment and Parameter Settings

We conducted our experiments on a system equipped with Ubuntu 22.04.3, using Pycharm as the software environment, Python 3.9.18 as the language used for programming, and PyTorch 1.13.1 as the framework for deep learning. The CPU was an Intel (R) Core (TM) i5-13400F×16 13th generation with a clock speed of 3.0GHz, and the GPU was an NVIDIA GeForce RTX 4060 Ti. We used CUDA version 11.7. The hyperparameters used during training were as follows: The learning rate is initialized at 0.01, with a momentum value of 0.937 and a weight decay of 0.0005. The size of the input images for all experiments is 640 × 640 pixels. Epochs are set to 100, and the batch size is 16. All other parameters were set to default values of the YOLOv8n model.

4.2. Evaluation Metrics

To distinctly and objectively assess the algorithm’s effective improvements in the model, we have chosen nine evaluation metrics: precision, recall, F1Score, [email protected], [email protected]:0.95, inference time, params, and FLOPs to evaluate the model’s performance. Among them, precision indicates the proportion of properly identified samples to the overall number of samples, and it is a metric to evaluate the accuracy of model classification. Recall describes the ratio of samples that accurately forecast the target as a positive category out of the actual count of positive samples, measuring the capacity of the model to recognize all positive categories. The F1Score is the precision and recall harmonic mean, which considers both the model’s recall and accuracy, making it a comprehensive evaluation metric for model performance. mAP (mean Average Precision) is a measure that is frequently employed in object detection to measure the accuracy of algorithms across different categories. Specifically, [email protected] signifies the mean average precision when the IoU threshold is defined to be 0.5, while [email protected]:0.95 signifies the mean average precision when the IoU threshold falls between 0.5 and 0.95. It provides a more comprehensive evaluation of the accuracy of object detection. Inference time refers to the time it takes for a model to produce an output after receiving an input, and it is crucial to use this metric to assess the prediction speed of the model. Params reflects the quantity of parameters that need to be learned in the model, which is often related to the model’s intricacy. FLOPs reflect how many floating-point operations are performed in a second during the inference procedure of the model, indicating the model’s computational complexity. These metrics measure the accuracy, efficiency, and computational complexity of the model, providing a thorough quantitative assessment of the effectiveness of algorithm improvements in the model.

P r e c i s i o n = \frac{T P}{T P + F P}

(7)

R e c a l l = \frac{T P}{T P + F N}

(8)

F 1 S c o r e = \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \times 2

(9)

A P = \int_{0}^{1} P (r) d r

(10)

m A P = \frac{1}{k} \sum_{k}^{i} A P i

(11)

Among them,

T P

denotes true positives (referring to the quantity of positive samples that are correctly identified),

F P

means false positives (referring to the number of negative samples that are incorrectly reported as positive), and

F N

is false negatives (referring to the count of positive samples that are missed or incorrectly reported as negative).

P (r)

stands for the precision–recall curve,

A P

represents the average precision for the

i - t

h category of detection targets, and

k

symbolizes the total number of categories.

4.3. Dataset

The public SeaShips dataset is used in the experiments, which comprises a total of 7000 images, including six categories of ships: ore carriers, fishing boats, passenger ships, general cargo ships, bulk carriers, and container ships. In our experiments, we randomly divided these photos into training, validation, and test sets in the proportion of 3:1:1.

All the photographs in Figure 6 are from the marine-monitoring system, presenting a visualization of the quantity of different types of ships, as well as the size and distribution of the bounding boxes. As shown in Figure 6a, the difficulty of ship detection is increased by the unequal distribution of the six categories of ships. Figure 6b shows the size and quantity information of the target boxes in the dataset. The target box is a rectangular box drawn around the target object in the image, used to label the position and size of the target object. Through this graph, we can understand the size distribution of target objects in the dataset and the number of target objects in each image. In Figure 6c, the center points represent the approximate positions of the vessels in the pictures. Figure 6d demonstrates that the width-to-height ratio of the targets relative to the images can well reflect the ship shapes and design characteristics.

4.4. Visualization and Discussion of DSSM–LightNet

Figure 7 displays the variations in the loss and evaluation metrics of the DSSM–LightNet throughout the training process for the training set and validation set. In Figure 7a, after 100 epochs, while training loss still shows a decline trend, the losses of all validation sets in Figure 7a–c have flattened out. And the decreasing trend in all validation losses is similar to that of training losses, without any situation where validation losses start to rise after a certain point, and training losses continue to decline. This indicates that our trained model is good and has not experienced overfitting. As indicated in Figure 7d, as the quantity of epochs increases, the values of the four metrics also improve. Although the accuracy still exhibits slight fluctuations after 100 epochs, the mAP and recall rates have flattened out.

With the aim of determining the effectiveness of DSSM–LightNet, we compared the precision, recall, and mAP for six different types of ships obtained by YOLOv8n and DSSM–LightNet, and their corresponding bar charts are shown in Figure 8. Subsequently, we also plotted precision–recall (P–R) curves and F1Score curves for each type of ship displayed in Figure 9 and Figure 10 respectively.

As depicted in Figure 8, our model has achieved certain improvements in precision, recall, and mAP for most types of ships. The improvement in precision represents the model’s ability to accurately identify ships. The increase in recall rate indicates that the model covers ships in detection scenarios more comprehensively, thereby reducing the phenomenon of missed detections. The improvement in mAP proves that the model can maintain high accuracy and comprehensiveness across different types of ships and detection conditions. This demonstrates our network’s capability in ship detection. Figure 9 shows the precision–recall (P–R) curves for the initial YOLOv8n and DSSM–LightNet. The P–R curve can evaluate the model’s performance under different thresholds, demonstrating the relationship between the model’s precision and recall under different classification thresholds. The closer the P–R curve is to the upper-right corner, the more accurately the positive samples can be identified, indicating a better model performance. As Figure 9b illustrates, compared to YOLOv8n, the P–R curve generated by the DSSM–LightNet model is nearer to the upper-right corner, indicating that our proposed model has stronger precision and recall. The F1Score is an indicator that comprehensively evaluates the model’s performance, with a range of numbers between 0 and 1, where a value closer to 1 represents better performance of the prediction model. As shown in Figure 10, the F1Score values of DSSM–LightNet are higher than those of YOLOv8n, and the curve is more concentrated, suggesting that our proposed model obtains overall better results.

4.5. Comparison and Analysis of Performance

To further assess the capabilities of DSSM–LightNet, the performance comparison was made between our DSSM–LightNet model and other similar methods. The YOLO series includes YOLOv3-tiny [9], YOLOv5n, YOLOv6n [11], and YOLOv8n. To guarantee the accuracy of the data, the epoch was uniformly set to 100, and all models are tested on the SeaShips dataset. As demonstrated in Table 3, although the quantity of parameters in DSSM–LightNet is slightly higher than YOLOv5n, it has the lowest computational complexity and higher precision, recall, and mAP values than the other models. This proves that our proposed model performs the best among the YOLO series.

To demonstrate that the proposed model is sufficiently lightweight, we conducted comparative experiments between the proposed model and other lightweight models. The introduced lightweight models include mainstream lightweight backbone networks (MobileNetv2 [39], ShuffleNetv1 [40], GhostNetv1 [36]), and the latest proposed lightweight model (YOLOv8n-pest [41], LD-YOLO [42], YOLOv7-Ship [43]). The experimental results are shown in Table 4. Based on the experimental results, the effectiveness of DSSM–LightNet in ship detection can be comprehensively analyzed.

The precision and recall of DSSM–LightNet are the highest among all models, indicating that the model can effectively balance ship false positives and false negatives. In addition, DSSM–LightNet has demonstrated excellent lightweighting capabilities. It has a parameter count of 2.75 and a computational complexity of 7.1, which is even much lower than the latest lightweight model (YOLOv8n-pest, LD-YOLO, YOLOv7-Ship). This indicates that DSSM–LightNet can achieve fast training and inference speeds, making it highly suitable for scenarios that require rapid response.

In essence, the DSSM–LightNet model introduced in this article demonstrates remarkable performance benefits for ship detection tasks, boasting high precision, exceptional efficiency, a lightweight design, and robust adaptability. It is ideally suited for scenarios necessitating rapid, real-time inspections while maintaining stringent accuracy standards.

Figure 11 presents the curves of precision, recall, and the mAP metric for all the aforementioned models. It can be observed that after 100 epochs, our model’s precision, recall, and mAP curves are positioned at the top of the graph, with values slightly higher than the other models. At this point, all the curves tend to flatten out and quickly converge, suggesting that the model training is in good condition and has reached optimal performance. We compared DSSM–LightNet with other models from the past two years, and Table 5 displays the detailed comparison outcomes. It is evident that our model does exceptionally well in terms of precision, recall, and mAP. Moreover, our model has less parameters, less computational cost, and is lighter in weight.

Figure 12 presents a visual comparison of DSSM–LightNet and other models in terms of FLOPs (G) and [email protected]:0.95, where the area reflects the total number of parameters for each bubble. As seen in Figure 12, DSSM–LightNet attains a high level of detecting precision with relatively low computational cost. Compared to other models, DSSM–LightNet strikes a better balance between detection accuracy and model computational complexity.

To more clearly and intuitively observe the performance improvement after model modification, we provide visual images of ship detection using DSSM–LightNet and YOLOv8n, as illustrated in Figure 13 and Figure 14. As seen in Figure 13, the original YOLOv8n model exhibits problems like missing detections and false positives during the process of detecting objects. For example, in the second column of the first row, it identifies a passenger ship as a fishing boat by mistake, and in the second column of the second row, it misclassifies a cargo ship as a fishing boat. In the second column of the fourth row, the model fails to find the fishing vessel. Comparing the detection results in the second and third columns of Figure 13, it is clear that the enhanced algorithm effectively addresses issues like missed detections and false positives.

According to Figure 14, by adopting the MPDIoU loss function, the improved algorithm achieves higher detection precision and improved detection performance compared to the original network. Overall, the improved algorithm not only resolves the issues with erroneous and missed identification but also enhances detection accuracy, demonstrating the effectiveness of this algorithm.

4.6. Ablation Study

To better validate each module’s efficacy in DSSM–LightNet, we executed relevant ablation experiments, with the specific test findings shown in Table 6 and Table 7. All experiments in Table 6 were conducted by adding individual modules to the YOLOv8n model, while Table 7 shows N1, representing the experiment using the original YOLOv8n model, with all other parameters and environmental settings kept consistent. In Table 7, “√” indicates the introduction of improvements to the corresponding module. As observed in Table 7, the N1 model has the lowest mAP value, the highest number of parameters, and relatively average detection performance. However, after the introduction of the Slim-neck structure, the model’s precision, recall, F1Score, and mAP values increased significantly, with a certain degree of reduction in computational complexity and the quantity of parameters. Subsequently, when the Slim-neck and DualConv are combined, the number of parameters and the model’s computational complexity reach the lowest among all experimental groups, demonstrating that the addition of Slim-neck and DualConv serves to lighten the model. As seen in Table 6, the SEAM’s performance when used alone is not ideal, but when combined with Slim-neck, recall improves significantly, reaching the maximum value among all experimental groups. Additionally, the use of MPDIoU maximizes the precision and mAP values, significantly improving the network performance.

5. Conclusions

We present a lightweight ship-detection model in this research, which we called DSSM–LightNet based on the improved YOLOv8n. The proposed DSSM–LightNet introduces DualConv and Slim-neck on the original YOLOv8n, making DSSM–LightNet more lightweight and suitable for application on resource-limited mobile devices. Additionally, the MPDIoU loss function is adopted as the overall loss calculation for the algorithm, effectively improving the detection accuracy. Moreover, numerous tests were conducted on the SeaShips dataset. According to the experimental findings, when compared to representative lightweight learning models such as YOLOv3-tiny, YOLOv5n, and YOLOv6n, DSSM–LightNet achieves notable increases in recall rate and detection accuracy. The ablation study further demonstrates that the proposed improvements enhance the performance of DSSM–LightNet to a certain extent. After adding the DualConv module, Slim-neck module, SEAM, and MPDIoU loss function, the DSSM–LightNet model achieves F1, [email protected], and [email protected]:0.95 of 0.962, 0.985, and 0.782 on the SeaShips dataset, providing good detection results on various ship types while retaining its lightweight nature. Our methodology achieves an exceptional equilibrium between model complexity and detection accuracy in contrast to other ship detection methods.

This study successfully developed and validated an efficient ship-detection model that can monitor ship dynamics in real-time and prevent security incidents such as collisions and pirate attacks. At the same time, it can also optimize route planning, improve port operation efficiency, and reduce shipping costs, thus having a huge market demand. This model has demonstrated excellent performance at both theoretical and experimental levels, bringing new technological breakthroughs to the field of maritime monitoring. Applying it to practical maritime-monitoring systems has significant advantages. Although our method has achieved excellent performance in ship inspection, there are still some limitations. For example:

(1) The ships in the dataset are all in horizontal positions, resulting in significant deficiencies in sample diversity and richness. The model learns features under this specific pose and ignores changes under other poses. Therefore, in future research, efforts should be made to collect ship images in various poses and environments as much as possible to improve the performance and reliability of the model.

(2) The marine environment is complex and changeable, and the model is only trained on a limited dataset, which may not fully cover various situations that may be encountered in practical applications (such as fog, rainstorm, typhoon, lighting, and other factors). We are also aware of this limitation in our study, but due to limitations in resources, time, and technology, we were unable to refine this experiment. Therefore, we have decided to collect as much marine environment and other adverse weather vessel datasets as possible for testing in future research.

Author Contributions

Conceptualization, Z.G. and X.Y.; methodology, Z.G.; software, X.Y.; validation, X.R. and W.W.; formal analysis, Z.G.; investigation, X.Y.; resources, X.R.; data curation, X.Y.; writing—original draft preparation, Z.G.; writing—review and editing, X.Y.; visualization, Z.G.; supervision, X.R.; project administration, Z.G.; funding acquisition, X.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Provincial Natural Science Foundation under Grant LH2022F038 and Cultivation Project of National Natural Science Foundation of Harbin Normal University under Grant XPPY202208.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zheng, Y.; Liu, P.; Qian, L.; Qin, S.; Liu, X.; Ma, Y.; Cheng, G. Recognition and Depth Estimation of Ships Based on Binocular Stereo Vision. J. Mar. Sci. Eng. 2022, 10, 1153. [Google Scholar] [CrossRef]
Rawson, A.; Brito, M.; Sabeur, Z.; Tran-Thanh, L. A Machine Learning Approach for Monitoring Ship Safety in Extreme Weather Events. Saf. Sci. 2021, 141, 105336. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. Available online: https://arxiv.org/abs/2209.02976v1 (accessed on 5 September 2024).
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
Shao, Z.; Wang, L.; Wang, Z.; Du, W.; Wu, W. Saliency-Aware Convolution Neural Network for Ship Detection in Surveillance Video. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 781–794. [Google Scholar] [CrossRef]
Wang, Q.; Shen, F.; Cheng, L.; Jiang, J.; He, G.; Sheng, W.; Jing, N.; Mao, Z. Ship Detection Based on Fused Features and Rebuilt YOLOv3 Networks in Optical Remote-Sensing Images. Int. J. Remote Sens. 2021, 42, 520–536. [Google Scholar] [CrossRef]
Guo, Y.; Chen, S.; Zhan, R.; Wang, W.; Zhang, J. LMSD-YOLO: A Lightweight YOLO Algorithm for Multi-Scale SAR Ship Detection. Remote Sens. 2022, 14, 4801. [Google Scholar] [CrossRef]
Xing, B.; Wang, W.; Qian, J.; Pan, C.; Le, Q. A Lightweight Model for Real-Time Monitoring of Ships. Electronics 2023, 12, 3804. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Lin, Z.; Ji, K.; Leng, X.; Kuang, G. Squeeze and Excitation Rank Faster R-CNN for Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 751–755. [Google Scholar] [CrossRef]
Lan, K.; Jiang, X.; Ding, X.; Lin, H.; Chan, S. High-Efficiency and High-Precision Ship Detection Algorithm Based on Improved YOLOv8n. Mathematics 2024, 12, 1072. [Google Scholar] [CrossRef]
Zhao, X.; Song, Y. Improved Ship Detection with YOLOv8 Enhanced with MobileViT and GSConv. Electronics 2023, 12, 4666. [Google Scholar] [CrossRef]
Yi, H.; Liu, B.; Zhao, B.; Liu, E. Small Object Detection Algorithm Based on Improved YOLOv8 for Remote Sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1734–1747. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. YOLO-FaceV2: A Scale and Occlusion Aware Face Detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Xu, Y.; Ma, S. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. Available online: https://arxiv.org/abs/2307.07662v1 (accessed on 5 September 2024).
Zhong, J.; Chen, J.; Mian, A. DualConv: Dual Convolutional Kernels for Lightweight Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 9528–9535. [Google Scholar] [CrossRef]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-Neck by GSConv: A Lightweight-Design for Real-Time Detector Architectures. Available online: https://arxiv.org/abs/2206.02424v3 (accessed on 5 September 2024).
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. SeaShips: A Large-Scale Precisely Annotated Dataset for Ship Detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Li, H.; Yuan, W.; Xia, Y.; Wang, Z.; He, J.; Wang, Q.; Zhang, S.; Li, L.; Yang, F.; Wang, B. YOLOv8n-WSE-Pest: A Lightweight Deep Learning Model Based on YOLOv8n for Pest Identification in Tea Gardens. Appl. Sci. 2024, 14, 8748. [Google Scholar] [CrossRef]
Lin, Z.; Yun, B.; Zheng, Y. LD-YOLO: A Lightweight Dynamic Forest Fire and Smoke Detection Model with Dysample and Spatial Context Awareness Module. Forests 2024, 15, 1630. [Google Scholar] [CrossRef]
Jiang, Z.; Su, L.; Sun, Y. YOLOv7-Ship: A Lightweight Algorithm for Ship Object Detection in Complex Marine Environments. J. Mar. Sci. Eng. 2024, 12, 190. [Google Scholar] [CrossRef]
Cai, S.; Meng, H.; Wu, J. FE-YOLO: YOLO Ship Detection Algorithm Based on Feature Fusion and Feature Enhancement. J. Real-Time Image Process. 2024, 21, 61. [Google Scholar] [CrossRef]
Qian, L.; Zheng, Y.; Cao, J.; Ma, Y.; Zhang, Y.; Liu, X. Lightweight Ship Target Detection Algorithm Based on Improved YOLOv5s. J. Real-Time Image Process. 2023, 21, 3. [Google Scholar] [CrossRef]
Yu, N.; Fan, X.; Deng, T.; Mao, G. Ship Detection in Inland Rivers Based on Multi-Head Self-Attention. In Proceedings of the 2022 7th International Conference on Signal and Image Processing (ICSIP), Suzhou, China, 20–22 July 2022; pp. 295–299. [Google Scholar]
Zhang, Y.; Chen, W.; Li, S.; Liu, H.; Hu, Q. YOLO-Ships: Lightweight Ship Object Detection Based on Feature Enhancement. J. Vis. Commun. Image Represent. 2024, 101, 104170. [Google Scholar] [CrossRef]
Jiang, X.; Cai, J.; Wang, B. YOLOSeaShip: A Lightweight Model for Real-Time Ship Detection. Eur. J. Remote Sens. 2024, 57, 2307613. [Google Scholar] [CrossRef]

Figure 1. Motivation for improving the model.

Figure 2. The exploration process of DSSM–LightNet. If the stripe color is deep, it indicates that this module has good performance.

Figure 3. The architecture of the proposed DSSM–LightNet model, and the red box represents the main improvements.

Figure 4. Comparison between C2f and C2f_Dual: (a) C2f; (b) C2f_Dual.

Figure 5. Performance comparison of C2f_Dual at different locations, where (a–d) represent different positions where the C2f_Dual structure replaces the C2f in the backbone network of YOLOv8n.

Figure 6. Distribution of ship datasets. (a) Quantity of ships; (b) size and quantity of bounding boxes; (c) position of the center points relative to the entire image; (d) width-to-height ratio of the targets relative to the entire image.

Figure 7. Loss curves of DSSM–LightNet. (a) Box loss; (b) Cls loss; (c) DFL loss; (d) metrics.

Figure 8. Detection results for different types of ships. (a) Precision; (b) recall; (c) [email protected]; (d) [email protected]:0.95.

Figure 9. P–R curves for different types of ships.

Figure 10. F1-Score curves for different types of ships.

Figure 11. Comparison of DSSM–LightNet with other models. (a) Precision; (b) recall; (c) [email protected]; (d) [email protected]:0.95.

Figure 12. Comparison of the proposed DSSM–LightNet with other models. Each bubble’s area indicates the total number of parameters, where MobileNetv2/ShuffleNetv1/GhostNetv1 are used as the backbone of YOLOv8n, respectively.

Figure 13. Visualization results of DSSM–LightNet and YOLOv8n. (a) Original; (b) YOLOv8n; (c) DSSM–LightNet.

Figure 14. Visualization results of DSSM–LightNet and YOLOv8n for different types of ships. (a) Original; (b) YOLOv8n; (c) DSSM–LightNet.

Table 1. Comparing the performance of DualConv at different locations.

Methods	Precision	Recall	[email protected]	[email protected]:0.95	Inference (ms)	Params (M)	FLOPs (G)
C2f_Dual-a	0.973	0.938	0.980	0.756	1.1	3.01	8.0
C2f_Dual-b	0.971	0.934	0.973	0.756	1.1	2.99	7.9
C2f_Dual-c	0.967	0.943	0.975	0.750	1.1	2.95	7.7
C2f_Dual-d	0.962	0.946	0.981	0.758	1.1	2.85	7.7

Table 2. Comparison of different loss functions’ detection accuracy for different ship types.

Methods	Ore Carrier	Fishing Boat	Passenger Ship	General Cargo Ship	Bulk Cargo Carrier	Container Ship
GIoU [30]	0.980	0.966	0.954	0.978	0.982	0.990
DIoU [31]	0.983	0.965	0.975	0.982	0.977	0.991
CIoU [31]	0.982	0.958	0.965	0.989	0.983	0.991
EIoU [32]	0.981	0.959	0.971	0.985	0.977	0.991
MPDIoU [33]	0.988	0.969	0.964	0.988	0.984	0.992

Table 3. Performance comparison between DSSM–LightNet and other YOLO models.

Methods	Precision	Recall	F1Score	[email protected]	[email protected]:0.95	Inference (ms)	Params (M)	FLOPs (G)
YOLOv3-tiny [9]	0.968	0.930	0.949	0.978	0.752	1.4	12.1	18.9
YOLOv5n	0.963	0.931	0.947	0.972	0.739	0.9	2.50	7.1
YOLOv6n [11]	0.971	0.937	0.954	0.979	0.761	1.3	4.23	11.8
YOLOv8n	0.967	0.944	0.955	0.978	0.756	1.1	3.01	8.1
Proposed	0.977	0.948	0.962	0.985	0.782	1.1	2.75	7.1

Table 4. Performance comparison between DSSM–LightNet and other lightweight models.

Methods	Precision	Recall	F1Score	[email protected]	[email protected]:0.95	Inference (ms)	Params (M)	FLOPs (G)
MobileNetv2 [39]	0.963	0.944	0.953	0.977	0.760	2.3	3.76	10.1
ShuffleNetv1 [40]	0.951	0.926	0.938	0.968	0.712	1.5	3.48	6.5
GhostNetv1 [36]	0.958	0.915	0.936	0.967	0.729	1.9	5.36	9.4
YOLOv8n-pest [41]	0.951	0.942	0.946	0.980	—	—	3.13	8.4
LD-YOLO [42]	0.801	0.798	0.799	0.863	0.603	—	6.7	14.7
YOLOv7-Ship [43]	0.814	0.758	0.785	0.805	0.554	—	6.1	12.8
Proposed	0.977	0.948	0.962	0.985	0.782	1.1	2.75	7.1

Table 5. Performance comparison between the proposed model and SOTA on SeaShips.

Methods	Precision	Recall	F1Score	[email protected]	[email protected]:0.95	Params (M)	FLOPs (G)
FE-YOLO [44]	—	—	—	0.976	0.726	—	—
MGS-YOLO [45]	0.967	0.940	0.950	0.977	—	—	6.6
MHSA-YOLO [46]	—	—	0.940	0.976	—	7.83	17.5
YOLO-Ships [47]	0.902	0.892	0.897	0.935	—	4.15	8.72
YOLOSeaShip [48]	0.948	0.959	0.953	0.976	—	4.78	10.9
Ours	0.977	0.948	0.962	0.985	0.782	2.75	7.1

Table 6. Performance comparison of different modules for various ship types.

Methods	[email protected]						[email protected]:0.95	Precision	Recall	F1Score	FLOPs (G)
	Ore Carrier	Fishing Boat	Passenger Ship	General Cargo Ship	Bulk Carrier	Container Ship
YOLOv8n	0.982	0.958	0.965	0.989	0.983	0.991	0.756	0.967	0.944	0.955	8.1
+C2f_Dual	0.984	0.962	0.981	0.989	0.983	0.990	0.758	0.962	0.946	0.954	7.7
+Slim-neck	0.981	0.964	0.983	0.985	0.983	0.991	0.768	0.973	0.950	0.961	7.3
+SEAM	0.977	0.956	0.954	0.983	0.982	0.988	0.758	0.953	0.946	0.949	8.3
+MPDIoU	0.988	0.969	0.964	0.988	0.984	0.992	0.778	0.969	0.942	0.955	8.1

Table 7. Comparison of ship-detection performance among different modules.

Number	Slim-Neck	DualConv	SEAM	MPDIoU	Precision	Recall	F1Score	[email protected]	[email protected]:0.95	Params (M)	FLOPs (G)
N1					0.967	0.944	0.955	0.978	0.756	3.01	8.1
N2	√				0.973	0.950	0.961	0.981	0.768	2.80	7.3
N3	√	√			0.965	0.938	0.951	0.980	0.762	2.65	6.9
N4	√		√		0.972	0.951	0.961	0.981	0.765	2.90	7.5
N5	√	√	√		0.973	0.929	0.950	0.978	0.763	2.75	7.1
N6	√	√	√	√	0.977	0.948	0.962	0.985	0.782	2.75	7.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, Z.; Yu, X.; Rong, X.; Wang, W. Improved YOLOv8n for Lightweight Ship Detection. J. Mar. Sci. Eng. 2024, 12, 1774. https://doi.org/10.3390/jmse12101774

AMA Style

Gao Z, Yu X, Rong X, Wang W. Improved YOLOv8n for Lightweight Ship Detection. Journal of Marine Science and Engineering. 2024; 12(10):1774. https://doi.org/10.3390/jmse12101774

Chicago/Turabian Style

Gao, Zhiguang, Xiaoyan Yu, Xianwei Rong, and Wenqi Wang. 2024. "Improved YOLOv8n for Lightweight Ship Detection" Journal of Marine Science and Engineering 12, no. 10: 1774. https://doi.org/10.3390/jmse12101774

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved YOLOv8n for Lightweight Ship Detection

Abstract

1. Introduction

2. Related Works

2.1. The YOLO Series

2.2. Attention Mechanism

2.3. Loss Function

3. Proposed Model

3.1. Motivation

3.2. Exploration Process

3.3. Improvement Areas

3.4. DualConv

3.5. Slim-Neck

3.6. SEAM

3.7. MPDIoU

4. Experimental Results and Analysis

4.1. Experimental Environment and Parameter Settings

4.2. Evaluation Metrics

4.3. Dataset

4.4. Visualization and Discussion of DSSM–LightNet

4.5. Comparison and Analysis of Performance

4.6. Ablation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI