LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model

Li, Zhengzhong; Ren, Hongxiang; Yang, Xiao; Wang, Delong; Sun, Jian

doi:10.3390/jmse12060861

Open AccessArticle

LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model

by

Zhengzhong Li

,

Hongxiang Ren

^*,

Xiao Yang

,

Delong Wang

and

Jian Sun

Navigation College, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(6), 861; https://doi.org/10.3390/jmse12060861

Submission received: 17 April 2024 / Revised: 15 May 2024 / Accepted: 20 May 2024 / Published: 22 May 2024

(This article belongs to the Special Issue Unmanned Marine Vehicles: Perception, Planning, Control and Swarm)

Download

Browse Figures

Versions Notes

Abstract

:

In inland waterways, there is a high density of various objects, with a predominance of small objects, which can easily affect navigation safety. To improve the navigation safety of inland ships, this paper proposes a new lightweight water-surface object-detection model named LWS-YOLOv7, which is based on the baseline model YOLOv7. Firstly, the localization loss function is improved and the w-CIoU function is introduced to reduce the model’s sensitivity to position deviations of small objects and to improve the allocation accuracy of positive and negative sample labels. Secondly, a new receptive field amplification module named GSPPCSPC is proposed to reduce the model’s parameters and enhance its receptive field. Thirdly, a small-object feature-fusion layer, P2, is added to improve the recall rate of small objects. Finally, based on the LAMP model pruning method, the weights with lower importance are pruned to simplify the parameters and computational complexity of the model, facilitating the deployment of the model on shipborne devices. The experimental results demonstrate that, compared to the original YOLOv7 model, the map of LWS-YOLOv7 increased by 3.1%, the parameters decreased by 38.8%, and the GFLOPS decreased by 28.8%. Moreover, the model not only has better performance and higher speed for input images of different sizes, but it can also be applied to different meteorological conditions.

Keywords:

YOLOV7; small-object detection; deep learning; network pruning

1. Introduction

Compared to the maritime environment, the navigation conditions for inland ships are more complex, with higher ship density and more obstacles. Additionally, there are numerous small objects, which affects navigation safety [1]. At present, ships commonly use radar and AIS devices to perceive their navigation environment. However, radar has blind spots, and small objects have poor radar reflections. In addition, radar echoes are easily affected by weather or obstructed, resulting in a decrease in detection effectiveness. AIS provides higher navigation accuracy compared to radar, but it is a passive sensor, which can also cause safety issues when some ships turn off their AIS devices [2]. Therefore, simply using radar and AIS devices or visual observation by ship operators does not provide a comprehensive understanding of the navigation environment. In recent years, object-detection technology based on visible light images has made significant progress and has gradually been applied to the intelligent perception of ships’ navigation environments [3]. Compared to radar and remote-sensing images, visible light images have advantages such as high resolution, rich color information, and clear textures. They provide a more intuitive observation of the navigation environment [4]. Additionally, visible light image-acquisition devices are relatively affordable, stable, easy to install, and can serve as a complement to radar and AIS devices, enhancing navigation safety [5].

Traditional object-detection algorithms are divided into feature-based methods and segmentation-based methods. By 2010, the performance of traditional object-detection algorithms reached a stable level. In 2012, the CNN constructed by Hinton and his research team, the AlexNet algorithm, significantly outperformed the SVM algorithm and won the ImageNet competition. Object-detection algorithms based on deep learning gradually became the main research direction of the object-detection field [6]. Researchers are starting to use large-scale convolutional neural networks with deeper levels and more parameters to solve complex object-detection tasks. However, as the parameters and models increase, there are higher requirements for the performance of the corresponding devices. Therefore, there is a need to develop a lightweight object-detection model that can be effectively deployed on ships to achieve intelligent perception.

The quality of a dataset has a very important impact on the performance of the model, so some researchers have produced and published some water-surface object datasets. Among them, in 2018, Shao et al. [7] published SeaShips, a public ship dataset, which covers six common ship types and was designed for training and evaluating ship object-detection algorithms. In 2019, Moosbauer et al. [8] published SMD (Singapore Maritime Dataset), and performed experiments based on Faster R-CNN [9] and Mask R-CNN [10], respectively, to verify the applicability of the dataset in the maritime field. In 2020, Shin et al. [11] proposed a data expansion method to automatically expand the dataset; this method uses the strength segmentation technology to extract the foreground image from the SMD, and combines it with the new maritime background to obtain high-quality data. In 2021, Zhou et al. [12] published a water-surface target dataset named WSODD (Water Surface Object Detection Dataset); the scenes in the dataset include oceans, lakes, and rivers, with varying weather conditions, such as sunny, foggy, and cloudy. At the same time, they proposed a new object detector named CRB-Net, which has better detection accuracy of water-surface objects. Cheng et al. [13] published the first floating garbage dataset named FloW, and performed baseline experiments based on vision and radar, but with low accuracy.

These datasets provide the basis for the training of water-surface object-detection algorithm, but the SeaShips only has ship objects, with fewer object categories, while the SMD is a dataset in the form of video, which is not suitable for the model training of this paper, and the FloW is a floating garbage dataset, with fewer object categories too. WSODD contains 14 kinds of objects in addition to ships, almost covering the objects of navigation, which were captured in different weather conditions. Considering that the model developed in this paper is used to detect various objects in navigation to improve navigation safety, the model in this paper will be trained based on WSODD.

This study aims to address the issues of water-surface obstructive objects and ships with high density, as well as the predominance of small objects, in the navigation of inland ships. For this purpose, a lightweight object-detection algorithm named LWS-YOLOv7, using on deep learning, was developed, using YOLOv7 as the baseline model, based on a water-surface object dataset named WSODD. According to the characteristics of the water surface, small objects, and multi-scale objects during navigation, several targeted improvements have been made, which balance detection accuracy, inference speed, and computational cost. The main contributions of this research are as follows:

The model is combined with normalized Wasserstein distance and the w-CIoU loss is proposed to replace the CIoU loss of the baseline model to improve the model’s recall rate and accuracy for small objects.
A new receptive field amplification module named GSPPCSPC is proposed to reduce the model’s parameters and enhance the multi-scale object detection capability of the model.
A P2 feature fusion layer is added to reduce the feature loss of small objects and improve the model’s detection performance for small objects.
The model is pruned, which significantly reduces the parameters and computational costs without sacrificing the overall detection accuracy and speed of the model, facilitating the deployment of the model on shipborne devices.

2. Related Work

Based on the detection process, deep-learning-based object-detection algorithms can be divided into one-stage and two-stage methods. The use of two-stage object-detection algorithms has been previously studied [14]. With this method, the first stage focuses on obtaining proposal boxes for the object locations; in the second stage, these proposal boxes are classified and the object positions are accurately determined. Compared to one-stage algorithms, these algorithms achieve higher detection accuracy at the expense of the detection speed. Representative algorithms in this category include the R-CNN series. R-CNN [15] was the first application of deep learning in the field of object detection. It utilizes a CNN to extract features from generated candidate regions, and then performs classification and bounding box regression, significantly improving the performance of the object-detection algorithms. Fast R-CNN [16] combines the SS (selective search) algorithm with R-CNN, improving both the detection speed and accuracy. Faster R-CNN replaces the SS algorithm with the region proposal network (RPN) to obtain feature maps for regions of interest (ROI), further improving the detection accuracy of the algorithm. One-stage algorithms skip the step of generating proposal boxes and directly provide the class probabilities and coordinates of objects in the image [17,18,19]. The advantage of one-stage algorithms is their faster detection speed, which meets the requirements of real-time detection. However, this sacrifices some detection accuracy. Representative algorithms in this category include the YOLO series and SSD [20]. In order to achieve real-time object detection, Redmon et al. proposed the YOLOv1 [21] in 2016, which abandoned the generation of candidate regions and significantly improved the detection speed. Through several iterations, the detection accuracy of the algorithm has also been greatly improved, and it has been widely applied in various domains. The SSD algorithm draws inspiration from the idea of anchors in Faster R-CNN, sets prior boxes of different scales, and has good multi-scale object-detection capabilities. However, it has relatively poor detection performance of small objects.

One-stage and two-stage object-detection algorithms each have their own advantages and disadvantages; the former is more suitable for real-time detection scenes, and the latter is more suitable for non-real-time detection scenes and providing higher detection accuracy. In the context of ship navigation environment perception, real-time performance is highly demanded. Therefore, scholars mostly focus on the study of one-stage object-detection algorithms. In 2021, Du et al. [22] developed a navigation mark-recognition algorithm based on YOLOv3, which provides accurate navigation mark information for ships. The data augmentation methods proposed in the article significantly enhance the algorithm’s performance, and subsequent versions of the YOLO algorithm also incorporate data-augmentation components. Chen et al. [23] improved the YOLOv3 algorithm by modifying the model’s backbone and pruning the model. They proposed a lightweight ship detector to solve the deployment problem of real-time ship detection using synthetic aperture radar. In addition, knowledge distillation is employed to compensate for the performance degradation caused by network pruning. In 2023, Gao et al. [24] proposed a lightweight infrared image-detection algorithm for small ships. They preprocessed the data using gamma transformation to make ship objects more salient, and then replaced the YOLOv5 model’s backbone with the MobileV3 network to achieve a more lightweight model that can be easily deployed on shipborne devices. By improving the backbone network, this algorithm dramatically reduces the model’s parameters while maintaining excellent detection performance for infrared ship images.

Although deep-learning-based object-detection algorithms can provide good detection performance, there are still various problems in water-surface object detection. In 2020, Zou [25] researched lightweight ship-detection algorithms, which improved the SSD algorithm using the MobilenetV2 network, and combined it with the improved Faster R-CNN to enhance detection efficiency. However, this method performs poorly in detecting small objects, and occasionally results in missed detections of small objects. However, a large number of model parameters and slow detection speed made real-time detection impossible. In 2021, Han et al. [26] optimized YOLOv4 and proposed the ShipYOLO model. The model improved the backbone network by incorporating a parameter reconstruction technique, enhancing both the model’s accuracy and inference speed. Secondly, a new amplified receptive field module named DSPP was designed, which improved the model’s ability to capture the spatial information of small-scale ships and the robustness of ship displacement. Finally, a new feature pyramid structure, AtFPN, was proposed, which improved the detection performance for objects of different scales and reduced the model’s parameters. Although this model improves the performance of the model and reduces its parameters, it still requires high computational power and makes it unsuitable for real-time detection on shipboard devices. Yang et al. [27] designed ship classification experiments and verified that convolutional neural networks significantly outperformed traditional machine learning methods in this task. But the model has a large number of parameters, resulting in slow detection speed, which makes real-time detection unachievable. Yu et al. [28] proposed an enhanced yolov7 for rapid detection of small objects on the water surface, Firstly, a detection branch is designed to address difficulties in recognizing targets of different sizes. Secondly, a LVC module was added to give more attention to small objects. Additionally, the L-ELAN was integrated to enhance computational efficiency. Lastly, the Wise-IOU optimized the loss function. However, this method only makes improvements to the model’s modules, and compared to model pruning techniques, it has a less effective model-simplification effect.

Although researchers have conducted extensive research on ship detection based on deep learning, the navigation environment of inland water is complex, and there are still two main detection problems in this navigation environment. Firstly, there are numerous multi-scale objects, including a large number of small objects, which can affect navigation safety; therefore, it is necessary to design an algorithm to improve ships’ multi-scale object-detection ability and to solve the problem of false and missed detection of small objects. Secondly, shipborne devices have limited computing power, requiring a reduction in the algorithm complexity and demand for computing power. Researchers have not fully considered the above two issues; therefore, the authors of this paper aimed to make improvements to YOLOv7 [29] to improve the detection ability of the model for small objects and use the model pruning method to simplify the network and reduce the computational power requirements, while ensuring the model’s accuracy and real-time detection and reducing the model’s parameters for deployment on shipborne devices.

3. Methodology

Compared to YOLOv5, YOLOv7 has four main modifications: the introduction of model reparameterization; the improvement of the training method of auxiliary heads; the proposal of a more efficient network architecture, ELAN; and the combination of YOLOv5’s cross grid search and YOLOX’s matching strategy for label allocation. With the above improvements, YOLOv7 outperformed all previous versions on official datasets, and its characteristics of high-precision detection and fast inference make it particularly suitable for real-time navigation environment detection tasks.

The model’s structure diagram of YOLOv7 is shown in Figure 1, which is divided into four parts, namely the input, backbone, neck, and output. Firstly, in the input section, the model performs preprocessing operations, such as data augmentation, on the dataset images, and feeds them into the backbone network. Secondly, the backbone network extracts features from the objects and fuses the extracted features with the neck network to obtain features of three sizes: large, medium, and small. Finally, the fused features are passed to the detection head, which generates the results.

Unlike previous versions of YOLO, the backbone network of YOLOv7 is mainly composed of the ELAN module, the MP module, and the SPPCSPC module. The ELAN module has two branches of different lengths. By controlling the shortest and longest gradient paths, it enables the network to learn more features while being more efficient and robust. The SPPCSPC module builds upon the structure of SPP (spatial pyramid pooling) and incorporates the concept of the CSP module. It uses max pooling to obtain features with different receptive fields, allowing for better adaptation to detection tasks with varying resolutions and improving the inference speed. The MP module combines convolutional networks with max pooling to achieve downsampling while minimizing feature loss. The neck of YOLOv7 adopts the same PAFPN structure as YOLOv5, and the output part introduces the RepConv module, which further improves the model’s inference speed.

YOLOv7 has set up three basic models with different depths and widths, namely YOLOv7-tiny, YOLOv7, and YOLOv7-W6, to cope with different usage scenarios [30]. Considering the requirements of accuracy, speed, and the devices’ computing power for ship navigation environment detection, we selected YOLOv7 as the baseline network in this study.

3.1. Introducing w-CIoU Loss Instead of IoU Loss

The loss function of YOLOv7 can be divided into three categories: classification, localization, and confidence loss. The loss function formula is shown in Equation (1).

L_{o b j e c t, L o s s} = L_{c l a s s, L o s s} + L_{l o c, L o s s} + L_{c o n, L o s s}

(1)

Among them,

L_{c l a s s, L o s s}

is a classification loss used to correct the model’s ability to identify objects,

L_{l o c, L o s s}

is a positioning loss used to correct the model’s ability to locate the objects, and

L_{c o n, L o s s}

is a confidence loss used to correct the reliability of the model’s detection results. The classification loss and confidence loss in YOLOv7 are both binary cross entropy losses, while the localization loss is CIoU.

The quality of training samples has a significant impact on the final performance of the model; therefore, the allocation of positive and negative labels during the training process is crucial. However, the sensitivity of IoU to objects of different scales varies greatly. As shown in Figure 2, for the same positional deviation, large objects and small objects exhibit significantly different declines in IoU. Small objects show larger fluctuations in IoU, leading to inaccurate label assignments for positive and negative samples. Therefore, using CIoU as the localization loss in scenarios with numerous small objects, such as inland water navigation, will reduce the model’s recall rate and accuracy and result in missed detections of small objects.

Based on the issues posed by small objects in label assignment and localization accuracy, we drew inspiration from the normalized Wasserstein distance (NWD) [31] and combined it with CIoU to propose the w-CIoU loss. The goal was to reduce the sensitivity of the original loss function to positional deviations of small objects and improve the accuracy of positive and negative sample assignment for small objects. The composition and formulas for the w-CIoU loss function are as follows:

L_{C I o U} = 1 - I_{I o U} + \frac{ρ^{2} (b, b_{g t})}{c^{2}} + α v

(2)

v = \frac{4}{π^{2}} {(a r c t a n \frac{w_{g t}}{h_{g t}} - a r c t a n \frac{w}{h})}^{2}

(3)

α = \frac{v}{(1 - I_{I o U}) + v}

(4)

W_{2}^{2} (N_{a}, N_{b}) = {[{c x}_{a}, {c y}_{a}, \frac{w_{a}}{2}, \frac{h_{a}}{2}]}^{T}, {[{c x}_{b}, {c y}_{b}, \frac{w_{b}}{2}, \frac{h_{b}}{2}]}_{2}^{T 2}

(5)

N W D (N_{a}, N_{b}) = e x p (- \frac{\sqrt{W_{2}^{2} {(N}_{a}, N_{b})}}{C})

(6)

L_{N W D} = 1 - N W D (N_{a}, N_{b})

(7)

L_{w - C I O U} = (1 - W) \times L_{N W D} + W \times L_{C I o U}

(8)

Among these,

b

represents the prediction box,

b_{g t}

represents the real box,

ρ

is the Euclidean distance between the center points of the two boxes, and

c

represents the diagonal distance of the minimum closure region that can simultaneously contain both the predicted box and the true box.

α

is the equilibrium parameter and

v

is the consistency coefficient.

W_{2}^{2} (N_{a}, N_{b})

is the Wasserstein distance,

N_{a}

and

N_{b}

represent the Gaussian distribution for the modeling of bounding boxes

[{c x}_{a}, {c y}_{a}, \frac{w_{a}}{2}, \frac{h_{a}}{2}]

and

[{c x}_{b}, {c y}_{b}, \frac{w_{b}}{2}, \frac{h_{b}}{2}]

, and

W

is the weight factor, which is used to control the optimization proportion of the two types of loss functions to cope with the presence of objects with different proportion sizes. In this paper, a value of 0.44 was taken.

In order to verify the improvement of the w-CIoU function on the YOLOv7 model in small-object detection, there are two experiments on the WOSDD dataset: the data were split into a training set and a test set at an 8:2 ratio before the training. The model of experiment 1 was the original YOLOv7, and the model of experiment 2 was an improved YOLOv7 with w-CIoU as the localization loss. In addition, the hardware device used in the experiments was a Tesla V100 graphics card (Nvidia, Santa Clara, CA, USA), the system version was Ubantu22.04, the deep learning framework was PyTorch1.10.1, and the CUDA version was 10.2. After 300 iterations of training, the model was verified on the test set, and the experimental results are shown in Table 1. The three indicators accuracy, recall rate, and mAP are used here, where a higher value signifies superior performance of the model. It was found that, compared to the CIoU function, the accuracy of the model with w-CIoU increased by 4.5%, while the recall rate increased by 1.1%, and the mAP value increased by 1.4%, which indicates that by using the w-CIoU loss function with the appropriate parameter settings, the model can effectively handle the localization and label assignment issues posed by small objects, improving the model’s performance for inland water navigation scenes with numerous small objects.

3.2. Spatial Pyramid Pooling Module Based on GhostConv (GSPPCSPC)

The SPPNet (spatial pyramid pooling network) [32] was proposed by the research team led by Kaiming He in 2015. This module effectively avoids information loss or distortion caused by drop or warp processing and obtains fixed-length output vectors through pooling operations, allowing the network to retain rich spatial information and to adapt to different scales of object-detection and image-classification tasks. This improves the flexibility and generalization ability of the model, improves the speed of generating candidate boxes, and reduces the computational costs.

The SPPCSPC module in YOLOv7 builds upon the SPPNet module and incorporates ideas from the CSP module. It consists of two branches. In the first branch, the SPP structure is applied to obtain four different scales of receptive fields; this helps to distinguish objects of different scales within the module. In the second branch, the module performs routine convolution processing and merges with the first branch to form a structure similar to the CSP module, improving the module performance. However, compared to the SPP module of YOLOv5, the convolution operation of the SPPCSPC module is greatly increased, which not only increases the model’s parameters but also improves the computational costs of the terminal device. Therefore, we propose a new receptive field amplification module named GSPPCSPC, which combines the GhostConv convolutional network and Depthwise (DW) convolutional network with the SPPCSPC module.

GhostNet [33] is a lightweight neural network architecture designed specifically for mobile devices and embedded systems. It was proposed by a research team at Huawei’s Noah’s Ark Lab in 2020. The key idea of GhostNet is to utilize an efficient feature reuse strategy, significantly reducing computational resource consumption, which allows the model to maintain high accuracy while significantly reducing model’s complexity. GhostNet is particularly suitable for scenes such as mobile devices with limited computing resources or that require real-time processing. The DW (Depthwise) convolutional network consists of depthwise convolution and pointwise convolution. This network effectively separates the spatial filtering and cross-channel information mixing steps in standard convolution, thereby significantly reducing the computational complexity and model size while maintaining the model’s recognition performance, which is conducive to the model’s miniaturization and accelerated operation.

A diagram of the SPPCSPC module is shown in Figure 3, while the GSPPCSPC module is shown in Figure 4. Firstly, this module replaces the original 3 × 3 convolution with GhostConv convolution to reduce computational resource consumption. Secondly, a DW convolution module is introduced before the branches of the 5 × 5 pooling layer for spatial feature extraction and cross-channel information fusion, enhancing the feature-representation capability of the module. The GSPPCSPC module reduces the parameters and computational power requirements of the original module, and reduces the spatial feature loss, further improving the model’s performance.

3.3. P2 Small-Object Feature-Fusion Network

The NECK network is used for feature fusion, while the YOLOv7 network has three detection layers. For input images with a size of 640 × 640, the minimum detectable object size is 8 × 8 pixels. During ship navigation, distant ships and other small obstacles tend to have smaller sizes, resulting in a higher proportion of small objects. In addition, some objects are in partially camouflaged backgrounds with a low proportion in the image, which causes a low detection accuracy of the YOLOv7 and high false alarm rate for such object-detection tasks. Therefore, we introduced a P2 layer with a scale of 160 × 160 to the feature-fusion network section to detect small objects at a scale of 4 × 4 pixels. In order to not significantly increase the parameters, the RepConV convolutional layer was not added. The overall network structure of the improved model is shown in Figure 5.

By incorporating the P2 layer, the model can effectively detect small objects without significantly increasing the parameters. This enhancement improved the overall performance of the model in detecting small objects and objects against partially camouflaged backgrounds.

3.4. Model Pruning

The YOLOv7 is composed of numerous neural network connections, but not all connections contribute equally to the model’s performance. By pruning the connections with smaller contributions to the results, the model can be compressed. Model pruning is an effective technique for model compression, allowing the network to be streamlined by identifying and removing redundant or less influential parameters or connection structures without significantly impacting or having a minimal impact on the model’s performance. In this study, the LAMP (layer-wise adaptive magnitude pruning) method [34] was selected to prune the YOLOv7 network model.

Research on network pruning has shown that amplitude-based network pruning, after carefully selecting layer-wise sparsity, can strike a balance between network performance and lightweight design, but there is no consensus on the selection method. The LAMP method proposes a new global pruning importance score that calculates the importance scores for weights to guide the network pruning. This method does not require any hyperparameter adjustment or heavy calculation, and it is superior to manual heuristics or extensive hyperparameter search methods in hierarchical sparsity selection. By applying the LAMP method to prune the YOLOv7 network model, we aimed to achieve a more compact and efficient model while maintaining the performance or minimizing the impact on the model’s performance.

As shown in Figure 6, the vectors

W_{u}

and

W_{v}

represent the weight vectors between different layers of a model, respectively, which are flattened to one dimension. The LAMP score is calculated by normalizing the sum of the squared weights. Specifically, assume there is a weight vector in the model, denoted as

W_{1} \dots W_{u}

. This vector is associated with every fully connected layer and convolutional layer in the model, which can be two-dimensional or four-dimensional. In order to calculate the LAMP score uniformly between the fully connected layer and the convolutional layers, all weight vectors are flattened into one-dimensional vectors before calculation. During the flattening process, the vectors are sorted in ascending order according to their weights, that is, when

u a < u b

,

|W_{u a}| < |W_{u b}|

is established, where

|W_{u a}|

represents the mapping value of index

u a

.

At this point, the formula for calculating the LAMP score of the ua-th index of tensor

W_{u}

is:

s c o r e (u a; W_{u}) ≔ \frac{W_{u a}^{2}}{\sum_{u \geq u a} W_{u}^{2}}

(9)

Once the LAMP scores of the weight vectors are calculated, it is possible to globally prune the connections with the smallest LAMP scores until the specified threshold condition is met. From the LAMP score calculation formula, it can be observed that there is at least one connection with a LAMP score of 1 in each layer. Therefore, at least one connection will be retained in each layer, which distinguishes it from the approach of solely pruning connections based on the weight magnitudes.

Compared to other pruning methods, the method based on the LAMP score has the following advantages. Firstly, it does not require any hyperparameters to be tuned, as only the pruning rate needs to be set. Secondly, the LAMP score of each connection can be obtained through elementary tensor operations, with minimal computational costs.

4. Experiments

4.1. Experimental Indicators

A total of four experimental indicators were used in this study, namely map@0.5, FPS, parameters, and GFLOPS. The indicator map@0.5 represents the mean average precision of all class labels at an intersection-over-union (IoU) threshold of 0.5, measuring the accuracy of the model’s object-detection performance. FPS denotes the number of frames per second processed by the model during inference, the parameters refer to the total number of parameters in the model, and GFLOPS measures the model’s floating-point operations per second, providing an estimation of the computational complexity or workload of the model, and indicating the number of floating-point calculations required during inference. These indicators collectively assess various aspects of the model’s performance, including its accuracy, speed, size, and computational costs.

4.2. Dataset

The dataset used in this study was the WSODD (Water Surface Object-Detection Dataset). The dataset consists of 7467 images with 14 object categories and 21,911 object instances. Prior to model training, the data were split into a training set and a test set at an 8:2 ratio. Some sample images from the dataset are shown in Figure 7.

4.3. Results and Discussion

The hardware device used in the experiments was a Tesla V100 graphics card, the system version was Ubantu22.04, the deep learning framework was PyTorch1.10.1, and the CUDA version was 10.2. In order to verify the improvement of various optimization strategies on the model, we conducted experiments on the dataset with input images of a 640 × 640 size, and the results are shown in Table 2.

The experimental results demonstrate that, compared to the original YOLOv7 model, using w-CIoU as the position loss function improved the accuracy of the model and greatly improved its detection speed. The GSPPCSPC module not only reduced the parameters and GFLOPS, but also improved the accuracy and detection speed of the model. Although the P2 feature-fusion layer increased the parameters and GFLOPS, it significantly improved the accuracy and detection speed of the model. Finally, after applying the three optimization strategies and the LAMP-based network pruning method, compared to the original YOLOv7 model, the LWS-YOLOv7 achieved a 3.1% improvement in mAP, with a 68% increase in FPS. Additionally, it reduced the parameters to 61.2% and GFLOPS to 71.2% of the original values.

To evaluate the comprehensive performance of the proposed model on images of different sizes, comparisons were made among YOLOv5, YOLOv7, and the model proposed in this paper on the same dataset. The results are presented in Table 3. The experiments demonstrated that LWS-YOLOv7 outperformed YOLOv5 and YOLOv7 in terms of the mAP and detection speed on images of different sizes. Additionally, it significantly reduced the parameters and GFLOPS, to about 43.8% and 38.2% of those of YOLOv5, respectively, making it more suitable for deployment on shipborne devices.

Furthermore, Figure 8 shows the detection results of the three models under different weather conditions. The first image is cloudy; in this scenario, YOLOv5 missed one boat, and YOLOv7 missed one ball, while LWS-YOLOv7 successfully detected all objects. The second image is sunny; in this scenario, YOLOv5 detected one platform and one boat, and YOLOv7 only detected one boat, but in contrast, the LWS-YOLOv7 successfully detected one platform, one boat, and one buoy.

Finally, Figure 9 shows the detection results of the three models on objects in videos captured under foggy conditions. For the first scenario, YOLOv5 detected two ships and four boats, and YOLOv7 detected three ships and three boats, while LWS-YOLOv7 successfully detected all six objects mentioned above and even detected the boat at the farthest distance. For the second scenario, YOLOv5 detected four ships and YOLOv7 detected one boat and three ships, while LWS-YOLOv7 detected five ships and one boat.

The experimental results demonstrate that the LWS-YOLOv7 proposed in this paper has better water-surface object-detection performance under various meteorological conditions, particularly in inland ship navigation scenarios with numerous small objects, and can effectively solve the problem of the missed detection of small objects. Moreover, its model parameters and GFLOPS are lower, making it more suitable for ship-based detection scenarios.

5. Conclusions

In inland waterways, there is a high density of ships and various objects, with a predominance of small objects, which can easily affect navigation safety. The use of computer vision method to detect visible light images can help ship drivers better grasp the navigation environment and improve navigation safety.

This paper proposes a lightweight water-surface object-detection model named LWS-YOLOv7 to improve the obstacle-detection effect of inland ships and enhance navigation safety. Firstly, the localization loss function of the model was optimized by combining normalized Wasserstein distance with CIoU, proposing a w-CIoU function to reduce the sensitivity to position deviations of small objects, and improving the recall rate of the model. Secondly, the GSPPCSPC module was introduced to reduce the parameters and increase the receptive field to improve the multi-scale object-detection capability of the model. Thirdly, a feature fusion layer P2 with a scale of 160 × 160 was added to detect objects larger than 4 × 4 pixels, thereby improving the accuracy and recall rate of small objects. Finally, the LMAP-based method was used to prune the model, significantly reducing its parameters and GFLOPS. The experimental results demonstrate that, compared to the original YOLOv7 model, the LWS-YOLOv7 achieved a 3.1% improvement in mAP, with a 68% increase in FPS. Additionally, it reduced the parameters to 61.2% and GFLOPS to 71.2% of the original values. Therefore, the LWS-YOLOv7 proposed in this paper has excellent detection performance for inland ship navigation scenarios with a high proportion of small objects. At the same time, it is suitable for various meteorological conditions and can be deployed on ships to perform real-time object-detection tasks.

Future research can be focused on the following aspects: (1) continuing to collect water-surface object datasets under various scenarios to produce higher quality datasets and (2) combining with knowledge-distillation technology to further simplify the network parameters and to improve the model performance.

Author Contributions

Conceptualization, Z.L.; methodology, Z.L., H.R., X.Y., D.W. and J.S.; software, Z.L.; validation, Z.L., H.R. and D.W.; data curation, Z.L.; writing—original draft preparation, Z.L. and D.W.; writing—review and editing, H.R., X.Y. and J.S.; visualization, Z.L.; supervision, H.R., X.Y., D.W. and J.S.; funding acquisition, H.R. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Science Foundation of China (Grant No. 52071312), the Key Science and Technology Projects in Transportation Industry (Grant No. 2022-ZD3-035), the Applied Basic Research Program Project of Liaoning Province (Grant No. 2023JH2/101300144), the Guangxi Key Research and Development Plan (Grant No. GUIKE AB22080106), and the Dalian Science and Technology Innovation Fund Project (Grant No. 2022JJ12GX035).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed or generated in this study are available from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gan, L.; Yan, Z.; Zhang, L.; Liu, K.; Zheng, Y.; Zhou, C.; Shu, Y. Ship path planning based on safety potential field in inland rivers. Ocean Eng. 2022, 260, 111928. [Google Scholar] [CrossRef]
Yang, S.; Wei, S.; Wei, L.; Shuai, W.; Yang, Z. Review of research on information fusion of shipborne radar and AIS. Ship Sci. Technol. 2021, 43, 167–171. [Google Scholar]
Im, I.; Shin, D.; Jeong, J. Components for smart autonomous ship architecture based on intelligent information technology. Procedia Comput. Sci. 2018, 134, 91–98. [Google Scholar] [CrossRef]
Ning, Y.; Zhao, L.; Zhang, C.; Yuan, Z. STD-Yolov5: A ship-type detection model based on improved Yolov5. Ships Offshore Struct. 2022, 19, 66–75. [Google Scholar] [CrossRef]
Wang, T.; Zhao, Q.; Yang, C. Visual navigation and docking for a planar type AUV docking and charging system. Ocean Eng. 2021, 224, 108744. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImagNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [Google Scholar]
Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar] [CrossRef]
Moosbauer, S.; König, D.; Jäkel, J.; Teutsch, M. A benchmark for deep learning based object detection in maritime environments. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 916–925. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Shin, H.-C.; Lee, K.-I.; Lee, C.-E. Data augmentation method of object detection for deep learning in maritime image. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (Big Comp), Busan, Republic of Korea, 19–22 February 2020; pp. 463–466. [Google Scholar]
Zhou, Z.; Sun, J.; Yu, J.; Liu, K.; Duan, J.; Chen, L.; Chen, C.L.P. An image-based benchmark dataset and a novel object detector for water surface object detection. Front. Neurorobot. 2021, 15, 723336. [Google Scholar] [CrossRef]
Cheng, Y.; Zhu, J.; Jiang, M.; Fu, J.; Pang, C.; Wang, P.; Kris, S.; Olawale, O.; Yimin, L.; Dianbo, L.; et al. FloW: A dataset and benchmark for floating waste detection in inland waters. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10953–10962. [Google Scholar]
Zhang, H.; Wang, Y.; Dayoub, F.; Sanderhauf, N. VarifocalNet: An IoU-aware dense object detector. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Snowmass, CO, USA, 20–25 June 2021; pp. 8510–8519. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic seg-mentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Scaled-YOLOv4: Scaling cross stage partial network. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13024–13033. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference, (Computer Vision-ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. arXiv 2016, arXiv:1506.02640v5. [Google Scholar]
Du, Y.; Sun, S.; Qiu, S.; Li, S.; Pan, M.; Chen, C.-H. Intelligent recognition system based on contour accentuation for navigation marks. Wirel. Commun. Mob. Comput. 2021, 2021, 6631074. [Google Scholar] [CrossRef]
Chen, S.; Zhan, R.; Wang, W.; Zhang, J. Learning slimming SAR ship object detector through network pruning and knowledge distillation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1267–1282. [Google Scholar] [CrossRef]
Gao, Z.; Zhang, Y.; Wang, S. Lightweight Small Ship Detection Algorithm Combined with Infrared Characteristic Analysis for Autonomous Navigation. J. Mar. Sci. Eng. 2023, 11, 1114. [Google Scholar] [CrossRef]
Zou, Y. Research on Ship Recognition and Tracking Technology Based on Convolutional Neural Network. Master’s Thesis, Dalian Maritime University, Dalian, China, 2020. [Google Scholar]
Han, X.; Zhao, L.; Ning, Y.; Hu, J. ShipYolo: An enhanced model for ship detection. J. Adv. Transp. 2021, 2021, 1060182. [Google Scholar] [CrossRef]
Yang, Y.; Ding, K.; Chen, Z. Ship classification based on convolutional neural networks. Ships Offshore Struct. 2021, 17, 2715–2721. [Google Scholar] [CrossRef]
Yu, J.; Zheng, H.; Xie, L.; Zhang, L.; Yu, M.; Han, J. Enhanced YOLOv7 integrated with small target enhancement for rapid detection of objects on water surfaces. Front. Neurorobot. 2023, 17, 1315251. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 20–22 June 2023; pp. 7464–7475. [Google Scholar]
Qi, L.; Gao, J. Small Object Detection Based on Improved YOLOv7. Comput. Eng. 2023, 49, 41–48. [Google Scholar]
Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.-S. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-adaptive sparsity for the magnitude-based pruning. arXiv 2020, arXiv:2010.07611. [Google Scholar]

Figure 1. YOLOv7 model’s structure diagram.

Figure 2. IOU sensitivity of objects at different scales. (Note that each grid denotes a pixel, box A denotes the ground truth bounding box, box B, C denote the predicted bounding box with 1 pixel and 4 pixels diagonal deviation respectively).

Figure 3. Structure diagram of SPPCSPC module.

Figure 4. GSPPCSPC module structure diagram.

Figure 5. Improved network structure diagram.

Figure 6. LAMP score calculation process diagram.

Figure 7. Dataset display.

Figure 8. Detection results of objects in images using different models. (The labels with different colors in the figure represent different objects, and the color each assigned randomly during the model’s detection process.)

Figure 9. Detection results of objects in videos using different models. (The labels with different colors in the figure represent different objects, and the color each assigned randomly during the model’s detection process.)

Table 1. Comparison of effects of different loss functions.

Experiment	Loss Function	Accuracy	Recall Rate	mAP
1	CIoU	81.3	83.3	86.1
2	w-CIoU	85.8	84.4	87.5

Table 2. The impact of different optimization strategies on the model.

Model	mAP	FPS (f/s)	Params (MB)	GFLOPS
YOLOv7	86.1	50	34.85	103.4
YOLOv7 + w-CIoU	87.5	100	34.85	103.4
YOLOv7 + GSPPCSPC	87.0	99	31.88	100.9
YOLOv7 + P2	88.4	94	43.73	105.6
YOLOv7 + w-CIoU + GSPPSCPC + P2	88.8	54	40.75	103.1
YOLOV7 + w-CIoU + GSPPSCPC + P2 (LWS-YOLOv7, pruned)	89.2	84	21.34	73.6

Table 3. Detection performance of different models on images of different sizes.

Model	Image Size	mAP	FPS(f/s)	Params	GFLOPs
YOLOv5	640 × 640	84.2	43.48	48.76	192.4
YOLOv7		86.1	50.01	34.85	103.4
LWS-YOLOv7 (ours)		89.2	84.75	21.34	73.6
YOLOv5	512 × 512	84.1	42.86	48.76	192.4
YOLOv7		87.2	50.71	34.85	103.4
LWS-YOLOv7 (ours)		88.2	85.69	21.34	73.6
YOLOv5	416 × 416	82.6	43.48	48.76	192.4
YOLOv7		86.1	50.68	34.85	103.4
LWS-YOLOv7 (ours)		87.4	88.5	21.34	73.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Ren, H.; Yang, X.; Wang, D.; Sun, J. LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model. J. Mar. Sci. Eng. 2024, 12, 861. https://doi.org/10.3390/jmse12060861

AMA Style

Li Z, Ren H, Yang X, Wang D, Sun J. LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model. Journal of Marine Science and Engineering. 2024; 12(6):861. https://doi.org/10.3390/jmse12060861

Chicago/Turabian Style

Li, Zhengzhong, Hongxiang Ren, Xiao Yang, Delong Wang, and Jian Sun. 2024. "LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model" Journal of Marine Science and Engineering 12, no. 6: 861. https://doi.org/10.3390/jmse12060861

APA Style

Li, Z., Ren, H., Yang, X., Wang, D., & Sun, J. (2024). LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model. Journal of Marine Science and Engineering, 12(6), 861. https://doi.org/10.3390/jmse12060861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Introducing w-CIoU Loss Instead of IoU Loss

3.2. Spatial Pyramid Pooling Module Based on GhostConv (GSPPCSPC)

3.3. P2 Small-Object Feature-Fusion Network

3.4. Model Pruning

4. Experiments

4.1. Experimental Indicators

4.2. Dataset

4.3. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI