Developing a Dead Fish Recognition Model Based on an Improved YOLOv5s Model

Tong, Chengbiao; Li, Biyu; Wu, Jiting; Xu, Xinming

doi:10.3390/app15073463

Open AccessArticle

Developing a Dead Fish Recognition Model Based on an Improved YOLOv5s Model

¹

College of Electrical and Mechanical Engineering, Hunan Agricultural University, Changsha 410125, China

²

Intelligent Agricultural Machinery Equipment Hunan Key Laboratory, Changsha 410125, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3463; https://doi.org/10.3390/app15073463

Submission received: 8 February 2025 / Revised: 13 March 2025 / Accepted: 19 March 2025 / Published: 21 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

To address the challenges of low detection accuracy and high difficulty in dead fish identification due to water surface reflections, low contrast between targets and the environment, uncertain positions between detection sources and targets, and the impact of water mist blurriness, this paper proposes a dead fish recognition model named YOLO-DWM based on an improved YOLOv5s model. To address the weak feature extraction capabilities of existing convolutional modules for targets, a multi-scale convolutional module (DWMConv) based on depthwise separable convolution is designed, enhancing the detection performance for targets. To further improve accuracy, the EMA mechanism is embedded in the C3 module, enhancing its feature processing capabilities. Additionally, to reduce the model’s parameters and FLOPs, a lightweight approach called C3-Light is introduced, which replaces the Conv convolution in the C3 module with a DWConv convolution. A total of 670 images of dead fish in a fish farm were collected prior to the experiment and used to train the model. The experimental results demonstrate that, compared with Faster RCNN, SSD, YOLOV3-tiny, YOLOv5s, YOLOv6n, and YOLOv8n, the YOLO-DWM model exhibits superior performance, with mAP increases of 8.13%, 12.3%, 5.9%, 4.5%, 4.5%, and 2.8% and increases in the F1 score of 17.5%, 7.5%, 2.7%, 2.6%, 3.0%, and 0.4%.

Keywords:

dead fish recognition; YOLO; deep learning; multi-scale convolutional; attention mechanism

1. Introduction

Intensive aquaculture is an important branch of fish farming. Influenced by factors such as temperature, water quality, and stocking density, the intensive farming of grass carp (‘dead fish’ in this paper refers specifically to grass carp) is associated with a high mortality rate due to diseases. To better and more quickly identify dead grass carp and reduce labor costs, the use of computer vision technology to achieve high-quality aqua-culture has become a new research focus.

Recently, object detection has continuously developed in the field of computer vision and has been widely applied in areas such as autonomous navigation [1], facial recognition [2,3], pedestrian detection [4,5], aquaculture [6,7], and fruit detection [8,9]. In the field of dead fish identification, Yu et al. [10] proposed an improved algorithm based on YOLOv4-v1 for identifying diseased and dead golden pomfret, achieving an average precision value of 98.31% in the identification of diseased and dead golden pomfret. This algorithm performs well in flat and complex environments, but its detection speed and leakage rate could be improved compared to the more advanced YOLOv5 algorithm. Yang et al. [11] proposed a method for detecting dead fish called FFA-SSD (SSD with feature fusion and attention), which is based on single-shot multi-box detection, achieving a detection accuracy of 93.5%. FFA-SSD has better feature extraction capabilities and fewer parameters than VGG16 and ResNet50, but its superiority and usability are questionable when compared to the mainstream YOLO algorithms. Zhao et al. [12] combined deformable convolution and the SE attention mechanism to improve the YOLOv4 model, which improved the detection accuracy of dead fish. This algorithm is lightweight, fast, and implantable and can be good for the accurate real-time detection of dead fish but is limited by the dataset, which can lead to the false detection of normal swimming and turning of the fish. Zhang P et al. [13] proposed a dead fry detection model based on E-YOLOv4 and D-YOLOv4 and established a dead fish detection platform with practical application value. This method improves the performance of the YOLOv4 model, reduces the size of the model, can be subjected to high levels of water surface interference, and takes into account additional artificial light to reduce the effects of water surface reflections. Heng Zhang et al. [14] improved the YOLOv8 model to achieve the real-time detection of dead fish in aquaculture using drones combined with artificial intelligence, yielding excellent results. Improvements to the model have improved the identification of small targets, and the use of drones for surveillance is an efficient and accurate method, but drones are unable to fly missions in harsh environments, and rain can result in blurred images that cannot be accurately identified. In a study by Fu et al. [15] aiming at the problems of feature blurring, small targets, and fish movement in dead fish detection, a single-center detection model, DF-DETR, is proposed to achieve high-precision real-time dead fish detection; however, different aquaculture environments and lightweighting are the issues to be further optimized in this model. Zheng et al. [16] proposed a dead fish detection model named DD-IYOLOv8. By adding a probe and mixing attention mechanism, the model’s perception of dead fish and the accuracy of the model were improved. Detecting dense objects is a follow-up study that needs to be conducted for this model. Yin et al. [17] et al. proposed a probe head capable of integrating the multi-scale features DCNv4-Dyhead and EMA-SlideLoss loss function combined with YOLOv9 to achieve the detection of abnormal fish behavior and counting in complex aquatic environments. The experimental datasets were derived from laboratory simulations, not real aquaculture complex environments. Wang et al. [18] proposed a method capable of detecting and tracking fish exhibiting rollover behavior. This was achieved by using YOLOv5s in combination with SiamRPN++ to detect small targets in complex scenes. However, the method has room for improvement, and subsequent consideration should be given to the study of multi-target tracking. Meanwhile, the dataset contains only one kind of abnormal behavior, and it is necessary to expand the dataset. Zhao et al. [19] used the CLAHE image enhancement algorithm and introduced PConv in the YOLOv7 backbone network to improve the accuracy of the network model for dead fish detection. Wang et al. [20] proposed an enhanced YOLOv5s model to address the challenges posed by occlusion, motion blur, and the detection of small targets. The model incorporates an attention mechanism, a BiFPN feature fusion approach, and a lightweight upsampling operator called Carafe. However, this experiment was also hindered by an inadequate dataset. Wang et al. [21] proposed an improved YOLOX-S model for the real-time detection of abnormal behavior in the yellow croaker, which improved the performance of the model by increasing coordinate attention. However, this method increases the parameters and computation of the model, which needs to be further developed to meet real-time requirements. In the area of dead fish identification on the water surface, due to the requirements of dataset collection and detection tasks, research often involves small object detection. Feature fusion is one of the strategies for enhancing small object detection through improved feature representation. Cheng et al. [22] used deeply and multi-scale fused millimeter-wave radar point cloud data and visual information for unmanned surface vehicles and used the characteristics of radar data to improve the accuracy and robustness of surface target detection. Sun et al. [23] proposed a multi-feature fusion strategy using multi-scale and dilated convolution for fabric surface defects, incorporating a residual structure to capture multi-scale features and improve the model’s detection rate. Peng [24] proposed a local–global feature fusion method for the study of small target detection using drones, aimed at improving the efficiency of cross-layer feature fusion and effectively extracting the shallow features necessary for small object detection. Xie et al. [25] proposed that the TMFD model can be used for fish detection in a variety of scenarios. The feature extraction capability was further strengthened by adding the MCB module, which improves the features of fish with different scales and postures, and the DBMFormer binary fusion attention mechanism. In subsequent work, the model needs to be studied for lightweighting.

Currently, most detection algorithms for dead fish or floating debris on the water surface focus on small object detection and enhancing attention to target features. Feature fusion is mostly approached from the network structure or algorithmic level. However, using a single size convolution kernel has poor adaptability to dead fish at varying distances, often overlooking feature details of different scales, which significantly limits model performance. There are very few reliable and targeted feature fusion modules. This paper proposes a multi-scale convolution module, DWMConv, based on depthwise separable convolution. Additionally, the EMA mechanism is embedded in C3. Finally, while ensuring accuracy, the C3-Light module is used to reduce the parameter count and computation. Various comparative experiments have verified that this method can effectively improve the detection accuracy of dead fish.

2. Methods and Models

YOLO is a commendable object detection algorithm, and the YOLOv5 model was released by Ultralytics in 2020. The YOLOv5 model comprises several components: the in-put layer, backbone, neck, and output. Due to its smaller size, YOLOv5s is widely used in scenarios that require lightweight models. Based on YOLOv5s, this paper proposes the following improvements: (1) a new multi-scale convolution module to enhance the model’s ability to detect target features; (2) the embedding of the EMA mechanism in C3, which can effectively enhance the perception of dead fish features; (3) on this basis, to reduce the increase in parameters and computational effort brought by the improved model, the C3-Light module is proposed to lower the model’s parameters and computation while ensuring accuracy. The improved YOLOv5s model, named YOLO-DWM, is shown in Figure 1. The DWMConv replaces the original Conv modules, the C3-EMA module is added to the neck layer, and the C3-Light module is introduced before the SPPF module in the backbone layer.

This model is evaluated using common metrics, including precision, recall, mAP_0.5 (mean average precision), Params (parameters of model), and FLOPs (floating-point operations per second). The formulas are as follows:

P = \frac{T P}{T P + F P} \times 100 %

(1)

R = \frac{T P}{T P + F N} \times 100 %

(2)

m A P = \frac{1}{N} \sum_{i = 1}^{N} \int_{0}^{1} P (R) d (R)

(3)

where

T P

is the number of samples correctly identified by the model;

F P

is the number of samples incorrectly identified by the model;

F N

is the number of target samples the model failed to identify; and

N

is the total number of samples.

3. YOLO-DWM Model Algorithm

3.1. DWMConv Convolution Module

The YOLOv5s network has significant downsampling, which causes many feature details of the targets to be gradually lost during the downsampling process. Dead fish on the water surface are greatly affected by lighting, and the targets often exist at a certain distance from the detection source. However, the original Conv module has a weak ability to extract features for targets of different scales. Therefore, the goal is to improve the detection performance of the targets while minimizing the increase in the model’s parameter count. A multi-scale convolution module based on depthwise separable convolution (DWMConv) is proposed, with the DWMConv structure shown in Figure 2. First, the input features undergo initial feature extraction through a 3 × 3 convolutional kernel (Conv) to capture the information of the input features. Then, they are further processed through three parallel convolutional operations: a 3 × 3 depthwise convolution (DWConv), a 3 × 3 convolution (Conv), and a 5 × 5 depthwise convolution (DWConv) for additional feature extraction. The use of DWConv allows for retaining the spatial feature extraction capability of conventional convolutions while reducing redundant computations. At the same time, DWMConv utilizes a residual structure. Finally, these features are fused. After passing through the BN layer and SiLU activation function, a standard convolution with a kernel size of 1 is applied for pointwise convolution to adjust the number of output channels.

The specific calculation process for DWMConv is as follows:

Y^{'} = Y_{i n} f_{3 \times 3}

(4)

Y^{″} = Y^{'} + Y^{'} f_{3 \times 3} + Y^{'} f_{3 \times 3}^{'} + Y^{'} f_{5 \times 5}^{'}

(5)

Y_{o u t} = S i L U (γ Y^{″} + β) f_{1 \times 1}

(6)

where

Y_{i n}

is the input feature map,

Y^{'}

is the original feature map,

Y^{″}

is the feature map after the second processing,

Y_{o u t}

is the final output feature map,

f

is the kernel of the standard convolution,

f^{'}

is the kernel of the depthwise convolution, and

γ

and

β

are the learnable parameters in the BN layer.

DWMConv is a multi-scale depthwise separable convolution, where the 3 × 3 convolutional kernel is suitable for capturing fine details, while the 5 × 5 convolutional kernel is better at capturing broader contextual information. This multi-scale convolution can capture various patterns of the input features, enhancing the model’s feature representation capability. Additionally, the residual structure helps to address the vanishing gradient problem in deep networks and allows information to flow more easily through the network.

Depthwise separable convolution is a commonly used convolution operation in neural networks that primarily consists of two parts: depthwise convolution (DWConv) and pointwise convolution. The structure is shown in Figure 3. Depthwise convolution consists of processes 1 and 2, where each channel is convolved with a separate convolutional kernel. This means that each input channel generates a corresponding output channel to capture feature information of the target. Processes 3 and 4 represent pointwise convolution, which convolves all channels at each position, allowing for a linear combination of the feature maps generated by the depthwise convolution.

3.2. C3-EMA Module

The EMA mechanism (efficient multi-scale attention mechanism) enhances the model’s ability to process features by reorganizing the channel and batch dimensions, utilizing cross-dimensional interactions to capture pixel-level relationships. The module encodes global information in parallel branches for recalibrating channel weights, enhancing the ability of feature expression. Additionally, the module itself has a relatively small number of parameters. The structure of the EMA module is shown in Figure 4 [26].

In the feature grouping, EMA groups the given input feature map

X \in R^{C \times H \times W}

to form G sub-feature groups. Each sub-feature focuses on specific semantics, allowing the model to better learn information from different channels, where the group style can be denoted by

X = [X_{0}, X_{i}, \dots, X_{G - 1}]

,

X_{i} \in R^{C / / G \times H \times W}

. Let

G ≪ C

, and assume that the learned attention weight descriptors will be utilized to strength the feature representation of the region of interest in each sub-feature.

In the cross-spatial learning, EMA proposes a method to perform cross-space information fusion in the direction of different spatial dimensions for feature fusion and handles short-range and long-range dependencies with a first-mover advantage for the task of detecting dense and small targets. Two tensors are introduced in EMA, one a 1 × 1 branch output and the other a 3 × 3 output, and then the global spatial information in the 1 × 1 branch output is encoded using a 2D global average pooling operation expressed by the following equation:

Z_{c} = \frac{1}{H \times W} \sum_{j}^{H} \sum_{i}^{W} x_{c} (i, j)

(7)

where

x_{c}

is the feature map on channel

c

, i is the coordinates of the image in the vertical direction, and j is the coordinates of the image in the horizontal direction.

The C3 module is a lightweight convolutional neural network module consisting of three standard convolution blocks and a Bottleneck module, featuring two branch structures. One branch processes the feature maps through standard convolution blocks, while the other branch processes the feature maps through standard convolution and the Bottleneck module. The outputs of the two branches are then concatenated and passed through a standard convolution for output. The C3 module plays a key role in feature information extraction and processing. The EMA mechanism can automatically adjust the weights of information at different scales, increasing the model’s focus on the features of dead fish. Therefore, embedding the EMA module into the Bottleneck of the C3 module can further enhance the model’s ability to process feature information at different scales. The C3-EMA module is shown in Figure 5.

3.3. C3-Light Module

The aforementioned improvements have led to an increase in the computational effort. Replacing the Conv convolution in the C3-EMA module with DWConv convolution makes it possible to achieve a lightweight design while maintaining performance. The C3-Light module is shown in Figure 6.

The Conv parameters and computational effort are the following:

D_{K} \times D_{K} \times M \times N, D_{K} \times D_{K} \times M \times N \times D_{F} \times D_{F}

The DWConv parameters and computational effort are the following:

D_{K} \times D_{K} \times M, D_{K} \times D_{K} \times M \times D_{F} \times D_{F}

where

D_{K} \times D_{K}

is the convolution kernel size,

M

is the input channel number,

N

the is output channel number, and

D_{F} \times D_{F}

is the size of output feature map.

Perceptibly, the ratio between the parameters and the computational effort between DWConv and Conv is both

1 / N

. This improvement can effectively reduce the parameters and the computation effort of the model.

4. Results and Analysis

4.1. Dataset and Experimental Platform

To better match the actual grass carp farm environment, we produced our own dataset for model training. The experimental data were sourced from the Kaitian Fish Farm in Wangcheng District, Changsha City, Hunan Province. Image collection was conducted using an OPPO A96, Huawei Nova 7 Plus, and a Yingshi CS-H8 camera. Images of dead fish were collected under varying lighting conditions and different weather conditions, perspectives, and scales; there were a total of 670 images. The images were uniformly resized to 640 × 640 pixels, and the LabelImg tool was used for annotation. The dataset was split into training and testing sets in an 8:2 ratio. In this experiment, image enhancement was not used to expand the dataset, and all dead fish images refer to grass carp.

The experimental operating system is Ubuntu 20.04, with a GeForce RTX 3090 (24GB) GPU, an Intel(R) Xeon(R) Platinum 8362 CPU, CUDA 11.8, and the programming languages Python 2.0.0 and PyTorch 3.8.

4.2. Model Training Results

The network input size for model training is 640 × 640 pixels, with a batch size of 32, using 12 parallel processes. The momentum is set to 0.937, the learning rate is 0.01, weight decay is 0.0005, and the training runs for 400 epochs.

Figure 7 shows a comparison of the training loss curves before and after the improvements to the YOLOv5s model. The YOLO-DWM model exhibits a faster decrease in the loss function compared to YOLOv5s, indicating quicker fitting of the model. When the loss function converges, it demonstrates a smaller loss value, suggesting that the predicted boxes generated by the improved model have less error compared to the ground truth boxes, leading to more accurate predictions. This indicates that the YOLO-DWM model converges faster, has a smaller loss value, and demonstrates better robustness compared to YOLOv5s.

Figure 8 shows a comparison of the precision, recall, and mAP_0.5 curves of the model before and after the improvements to the YOLOv5s model. It can be seen that after 50 epochs, all three metrics of the improved model YOLO-DWM tend to be stable, and the P, R, and mAP curves of the YOLO-DWM model are higher than those of the YOLOv5s model during the whole training process. This indicates that the model is able to learn more and better feature information in the learning stage, while the YOLOv5s model has been in an oscillating state, which indicates that the model’s feature extraction can be weak and unstable, and the features of the dead fish are not sufficiently learned. The experimental results show that the YOLO-DWM model can be better adapted to the detection of dead fish in complex environments and better extract dead fish features from it, which effectively improves the performance of the model in processing this dataset and verifies its feasibility and effectiveness in practical applications.

4.3. Ablation Experiment

The design starts with the YOLOv5s model, first incorporating the DWMConv module, then embedding the EMA within the C3 module, and finally adding the C3-Light module. This setup aims to verify the impact of each improvement on performance. The experimental results are shown in Table 1, where √ indicates that the module is used and × indicates that the module is not used.

It can be observed that the addition of the DWMConv module results in a decrease in precision by 0.5%, while recall increases by 2%. The average precision mAP_0.5 and mAP_0.5∼0.95 improve by 2.2% and 2.9%, respectively. However, the model’s floating-point computation increases by 1.8G, and the number of parameters increases by 10.1%. Next, embedding the EMA mechanism leads to an increase in precision of 4.9% and a decrease in recall of 0.7%. The average precision mAP_0.5 and mAP_0.5~0.9 show improvements of 2.5% and 2.1%, respectively, while the floating-point computation decreases by 1.5G and the number of parameters decreases by 11.1%. Finally, the addition of the C3-Light module results in precision and recall rates of 93.6% and 77.5%, respectively, with average precision mAP_0.5 and mAP_0.5~0.9 reaching 87.5% and 49.8%. The model’s computation decreases by 0.2 G compared to YOLOv5s, and the number of parameters decreases by 9.69%. In summary, the sequential addition of the DWMConv module, C3_EMA module, and C3-Light module effectively improves the model’s performance.

Figure 9 demonstrates the change curve of mAP_0.5 during the training process, and it can be seen that the mAP_0.5 of the model is greatly improved and the volatility disappears after the addition of DWMConv convolution, which indicates that the method of multi-scale feature fusion is effective. The mAP_0.5 is further improved after adding C3-EMA. After adding the C3-Light lightweighting module, the curves are highly overlapped with those when adding the C3-EMA module, which shows that C3-Light does not degrade the accuracy of the model while performing the lightweighting operation.

4.4. Visualization Result Analysis

A visual analysis of the training result distribution for the YOLOv5s model before and after improvements was conducted under conditions of normal dead fish, folding, shelter, large-area dead fish, and distant small targets, as shown in Figure 10.

In the cases of normal dead fish and folded dead fish, although both models successfully identified the objects, the improved model still demonstrated higher accuracy than YOLOv5s. Under conditions of large-area shelter, YOLOv5s not only had lower accuracy than the improved model but also produced false positives. In the scenario with large-area dead fish, the yellow boxes indicate missed detections, while the yellow arrows indicate false detections. YOLOv5s exhibited multiple false positives and missed detections. In the case of distant small targets, YOLOv5s also encountered missed detections, while the YOLO-DWM model successfully recognized all targets. The main reason is that the YOLOv5s model does not pay enough attention to the features of dead fish on the water’s surface, making it difficult to distinguish the target from the complex aquatic environment. In the detection of a large area of dead fish and small targets, YOLOv5s inadequately utilizes features of different scales, resulting in the loss of some effective features. The YOLO-DWM model utilizes multi-scale depthwise separable convolutions and an EMA mechanism, effectively addressing the varying scales of dead fish features in environments with small targets and large-area dead fish, significantly reducing the occurrences of missed detections and false positives.

4.5. Comparison of Different Models

In order to further verify the robustness and superiority of the improved model, it was compared with some mainstream algorithmic models. During the experiment, the same parameters and conditions were kept, and the comparison results are shown in Table 2.

Comparative experiments were conducted between the YOLO-DWM model and Faster R-CNN, SSD, YOLOv3-tiny, YOLOv5s, YOLOv6n, and YOLOv8n. The comparison revealed that the improved YOLOv5s model demonstrated higher precision, recall, mAP_0.5, and F1 score. The mAP_0.5 improved by 8.13%, 12.3%, 5.9%, 4.5%, 4.5%, and 2.8%, while the F1 scores increased by 17.5%, 7.5%, 2.7%, 2.6%, 3.0%, and 0.4%. Therefore, it is evident that the YOLO-DWM model performs better in identifying dead fish. However, its Params and FLOPs are still higher than those of YOLOv6n and YOLOv8n. Thus, further research on lightweight optimization is necessary for embedded development and deployment.

5. Discussion

The method proposed in this paper is mainly for dead fish identification on the water surface of the outdoor farm and has a good application prospect for the detection of floating objects on the water surface in complex environments. The mAP_0.5 of the improved model reaches 87.5%, which is 4.5% higher than the original YOLOv5s model. However, there are still false and missed detections, and a richer dataset is needed to feed the model to improve its training. Although high-pixel images can be obtained through mobile phones and cameras, some images have problems such as reflections, blurred backgrounds, and overexposure. Therefore, designing a targeted image enhancement algorithm is also important to study. In addition, nighttime datasets are difficult to collect, the quality of images is poor, and the recognition of dead fish at night or under low illumination has not been studied much. Nighttime detection is also a difficult problem in target detection, and research can be carried out inspired by the proposal of Pe-YOLO in 2021. In terms of the complexity of the model, there is not a lot of lightweight operation in this paper but just a reduction in the parameters added in the model improvement with the amount of computation. A good model needs to be both accurate and fast at the same time. The method proposed by Zhuang [27] et al. provides ideas for subsequent pruning algorithms. The lightweight dead fish detection model DM-YOLO proposed by Zhao [12] et al. replaces the YOLOv4 backbone network with MobileNetV3, but MobileNet can lead to feature extraction in terms of information loss. In future research, we can try to prune or use a lightweight network instead of a backbone network to achieve lightweight. Yang [11] et al. proposed the multi-scale feature fusion FFA-SSD algorithm, which is based on SSD, different from the YOLO series of algorithms, which is also an important idea in the research of dead fish identification.

6. Conclusions

In this paper, a dead fish recognition model based on multi-scale feature fusion is proposed. The model uses the DWMConv module to replace the original Conv module, the C3-EMA module is proposed by embedding the EMA mechanism in C3, and the C3-Light module is proposed by using the DWConv convolution to replace the standard convolutional Conv in the C3 module. On the homemade dataset, the mAP_0.5, precision, recall, and F1-score of YOLO-DWM are 87.5%, 93.6%, 77.5%, and 84.8%, respectively. Params and FLOPs are 6.34 M, and 15.7 G, respectively. The following conclusions are obtained from the experiments described in this paper:

(1) The model performance is improved by introducing DWMConv, a multi-scale feature fusion module based on depth-separable convolution, and C3-EMA, an efficient multi-scale attention mechanism. As can be seen from Table 1, all the indicators of the model are well improved. That is to say, the research from the aspect of multi-scale feature fusion can effectively improve the detection rate of dead fish on the water surface.

(2) The C3-Light module effectively reduces the model parameters and computation under the condition that the accuracy is basically unchanged, and the Params and FLOPs are reduced by 0.53M and 0.5G after adding the YOLOv5s+DWMConv+C3-EMA model.

(3) From Table 2 of the comparative experimental results, compared with Faster R-CNN, SSD, YOLOv3-tiny, YOLOv5s, YOLOv6n, and YOLOv8n, the mAP_0.5 improves by 8.13%, 12.3%, 5.9%, 4.5%, 4.5%, and 2.8%, respectively, while the F1 scores increased by 17.5%, 7.5%, 2.7%, 2.6%, 3.0%, and 0.4%. The improved algorithm outperformed these comparative algorithms.

From the current study, it is shown that this improved method is feasible for enhancing the accuracy of dead fish detection at the surface. In future research, while improving the detection accuracy, the focus will be on model lightweighting and nighttime dead fish identification for application in the real-time detection of dead fish in outdoor aquaculture farms.

Author Contributions

Conceptualization, B.L.; methodology, C.T. and X.X.; software, B.L. and J.W.; validation, B.L.; formal analysis, B.L.; investigation, B.L., X.X. and J.W.; resources, C.T.; data curation, B.L.; writing—original draft preparation, B.L.; writing—review and editing, B.L. and C.T.; visualization, B.L.; supervision, C.T.; project administration, C.T.; funding acquisition, C.T. and X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hunan Provincial Key R&D Program, Study on technology and equipment of engineering circulating water culture in self-floating assembled pond, grant number 2022NK2028; Hunan Provincial Natural Science Foundation Upper-level Programs, grant number 2024JJ5209; and the Scientific Research Project of the Hunan Provincial Education Department, grant number 22B0186. This work was supported by the China Scholarship Council (CSC) under grant no. 202308430207.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

For data supporting the results of this study, please contact the author for communication. Due to privacy, these data have not been made public.

Acknowledgments

Expressions of gratitude are extended to Kaitian New Agricultural Science and Technology Co. in Hunan Province, China, for their provision of the designated test site, and to Hu Haoyu and Li Mi for their invaluable contributions to the present paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Trehman, T.U.; Mahmud, M.S.; Chang, Y.K.; Jin, J.; Shin, J. Current and future applications of statistical machine learning algorithms for agricultural machine vision systems. Comput. Electron. Agric. 2019, 156, 585–605. [Google Scholar]
Liu, X.; Liu, M.; Li, N. Dual vision visual fusion improved YOLO-V7 intelligent elevator face recognition model. J. Opt. 2024, 1–13. [Google Scholar] [CrossRef]
Anjeana, N.; Anusudha, K. Real time face recognition system based on YOLO and InsightFace. Multimed. Tools Appl. 2023, 83, 31893–31910. [Google Scholar]
Li, N.; Bai, X.; Shen, X.; Xin, P.; Tian, J.; Chai, T.; Wang, Z. Dense Pedestrian Detection Based on GR-YOLO. Sensors 2024, 24, 4747. [Google Scholar] [CrossRef] [PubMed]
Gong, L.; Wang, Y.; Huang, X.; Liang, J.; Fan, Y. An improved YOLO algorithm with multisensing for pedestrian detection. Signal Image Video Process. 2024, 18, 5893–5906. [Google Scholar]
Issac, A.; Dutta, M.K.; Sarkar, B. Computer vision based method for quality and freshness check for fish from segmented gills. Comput. Electron. Agric. 2017, 139, 10–21. [Google Scholar] [CrossRef]
Chen, S.; Wang, Q.B.; He, X.L.; Zhang, X.; Li, D. An automatic method of fish length estimation using underwater stereo system based on LabVIEW. Comput. Electronics. Agric. 2020, 173, 105419. [Google Scholar]
Tang, Z.; Wu, Y.; Xu, X. The study of recognizing ripe strawberries based on the improved YOLOv7-Tiny model. Vis. Comput. 2024, 41, 3155–3171. [Google Scholar] [CrossRef]
Wang, Y.; Yan, G.; Meng, Q.; Yao, T.; Zhang, B. DSE-YOLO: Detail semantics enhancement YOLO for multi-stage strawberry detection. Comput. Electron. Agric. 2022, 198, 107057. [Google Scholar] [CrossRef]
Yu, G.; Luo, Y.; Wang, L. Recognition method of dead golden pomfrets based on improved YOLOv4. Fish. Mod. 2021, 48, 80–89. [Google Scholar]
Yang, S.P.; Li, H.; Liu, J.J.; Fu, Z.M.; Zhang, R.; Jia, H.M. A Method for Detecting Dead Fish on Water Surfaces Based on Multi-scale Feature Fusion and Attention Mechanism. J. Zhengzhou Univ. Nat. Sci. Ed. 2024, 56, 32–38. [Google Scholar]
Zhao, S.L.; Zhang, S.; Lu, J.; Wang, H.; Feng, Y.; Shi, C.; Li, D.; Zhao, R. A lightweight dead fish detection method based on deformable convolution and YOLOV4. Comput. Electron. Agric. 2022, 198, 107098. [Google Scholar] [CrossRef]
Zhang, P.; Zheng, J.; Gao, L.; Li, P.; Long, H.; Liu, H.; Li, D. A novel detection model and platform for dead juvenile fish from the perspective of multi-task. Multimed. Tools Appl. 2024, 83, 24961–24981. [Google Scholar] [CrossRef]
Zhang, H.; Tian, Z.; Liu, L.; Liang, H.; Feng, J.; Zeng, L. Real-time detection of dead fish for unmanned aquaculture by yolov8-based UAV. Aquaculture 2024, 595, 741551. [Google Scholar] [CrossRef]
Fu, T.; Feng, D.; Ma, P.; Hu, W.; Yang, X.; Li, S.; Zhou, C. DF-DETR: Dead fish-detection transformer in recirculating aquaculture system. Aquac. Int. 2025, 33, 43. [Google Scholar] [CrossRef]
Zheng, J.; Fu, Y.; Zhao, R.; Lu, J.; Liu, S. Dead Fish Detection Model Based on DD-IYOLOv8. Fishes 2024, 9, 356. [Google Scholar] [CrossRef]
Li, Y.; Hu, Z.; Zhang, Y.; Liu, J.; Tu, W.; Yu, H. DDEYOLOv9: Network for Detecting and Counting Abnormal Fish Behaviors in Complex Water Environments. Fishes 2024, 9, 242. [Google Scholar] [CrossRef]
Wang, H.; Zhang, S.; Zhao, S.; Wang, Q.; Li, D.; Zhao, R. Real-time detection and tracking of fish abnormal behavior based on improved YOLOV5 and SiamRPN++. Comput. Electron. Agric. 2022, 192, 106512. [Google Scholar] [CrossRef]
Rang, Z.; Hao, Y.W.; Li, S.L.; Song, Z.; Qing, Y.D. Detection and positioning system of dead fish in factory farming. China Agric. Inform. 2024, 36, 31–46. [Google Scholar]
Wang, H.; Zhang, S.; Zhao, S.; Lu, J.; Wang, Y.; Li, D.; Zhao, R. Fast detection of cannibalism behavior of juvenile fish based on deep learning. Comput. Electron. Agric. 2022, 198, 107033. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, X.; Su, Y.; Li, W.; Yin, X.; Li, Z.; Ying, Y.; Wang, J.; Wu, J.; Miao, F.; et al. Abnormal Behavior Monitoring Method of Larimichthys crocea in Recirculating Aquaculture System Based on Computer Vision. Sensors 2023, 23, 2835. [Google Scholar] [CrossRef] [PubMed]
Cheng, Y.W.; Xu, H.; Liu, Y.M. Robust small object detection on the water surface through fusion of camera and millimeter wave radar. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Online, 10–17 October 2021; IEEE Press: Piscataway, NJ, USA, 2021; pp. 15243–15252. [Google Scholar]
Sun, D.; Wang, X. Fabric surface defect detection method based on multi-scale feature fusion neural network. J. Liaoning Norm. Univ. Nat. Sci. Ed. 2024, 47, 331–341. [Google Scholar]
Peng, H.; Xie, H.; Liu, H.; Guan, X. LGFF-YOLO: Small object detection method of UAV images based on efficient local–global feature fusion. J. Real-Time Image Proc. 2024, 21, 167. [Google Scholar]
Xie, Y.; Xiang, J.; Li, X.; Yang, C. An Intelligent Fishery Detection Method Based on Cross-Domain Image Feature Fusion. Fishes 2024, 9, 338. [Google Scholar] [CrossRef]
Ouyang, D.L.; He, S.; Zhang, G.Z.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-scale Attention Module with Cross-spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
Zhuang, L.; Jiang, G.L.; Zhi, G.S.; Gao, H.; Shou, M.Y.; Chang, S.Z. Learning Efficient Convolutional Networks through Network Slimming. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2755–2763. [Google Scholar]

Figure 1. The network structure of YOLO-DWM.

Figure 2. The structure of the DWMConv module.

Figure 3. The structure of the depthwise separable convolution.

Figure 4. The structure of the EMA module.

Figure 5. The structure of the C3-EMA module.

Figure 6. The structure of the C3-Light module.

Figure 7. The training loss curves of the model before and after improvement.

Figure 8. The precision, recall, and mAP_0.5 curves of the model before and after improvement.

Figure 9. mAP_0.5 curve of ablation experiment.

Figure 10. Visual comparison under different conditions.

Table 1. Results of ablation experiment.

DWMConv	C3-EMA	C3-Light	P/%	R/%	mAP_0.5	mAP_0.5~0.9	FLOPs/G	Params/10⁶
×	×	×	89.7	75.9	83.0	44.5	15.9	7.02
√	×	×	89.2	77.9	85.2	47.4	17.7	7.73
√	√	×	94.1	77.2	87.7	49.5	16.2	6.87
√	√	√	93.6	77.5	87.5	49.8	15.7	6.34

Table 2. Experimental results of different models.

Module	P/%	R/%	mAP_0.5/%	Params/10⁶	FLOPs/G	F1 Score/%
Faster RCNN	57.1	81.9	79.37	136.75	369.7	67.3
SSD	93.1	66.1	75.2	24.01	61.1	77.3
YOLOv3-tiny	90.9	74.9	81.6	9.52	44.9	82.1
YOLOv5s	89.7	75.9	83.0	7.02	15.9	82.2
YOLOv6n	92.0	73.6	83.0	4.16	11.5	81.8
YOLOv8n	93.2	77.2	84.7	2.68	6.8	84.4
YOLO-DWM	93.6	77.5	87.5	6.34	15.7	84.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tong, C.; Li, B.; Wu, J.; Xu, X. Developing a Dead Fish Recognition Model Based on an Improved YOLOv5s Model. Appl. Sci. 2025, 15, 3463. https://doi.org/10.3390/app15073463

AMA Style

Tong C, Li B, Wu J, Xu X. Developing a Dead Fish Recognition Model Based on an Improved YOLOv5s Model. Applied Sciences. 2025; 15(7):3463. https://doi.org/10.3390/app15073463

Chicago/Turabian Style

Tong, Chengbiao, Biyu Li, Jiting Wu, and Xinming Xu. 2025. "Developing a Dead Fish Recognition Model Based on an Improved YOLOv5s Model" Applied Sciences 15, no. 7: 3463. https://doi.org/10.3390/app15073463

APA Style

Tong, C., Li, B., Wu, J., & Xu, X. (2025). Developing a Dead Fish Recognition Model Based on an Improved YOLOv5s Model. Applied Sciences, 15(7), 3463. https://doi.org/10.3390/app15073463

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Developing a Dead Fish Recognition Model Based on an Improved YOLOv5s Model

Abstract

1. Introduction

2. Methods and Models

3. YOLO-DWM Model Algorithm

3.1. DWMConv Convolution Module

3.2. C3-EMA Module

3.3. C3-Light Module

4. Results and Analysis

4.1. Dataset and Experimental Platform

4.2. Model Training Results

4.3. Ablation Experiment

4.4. Visualization Result Analysis

4.5. Comparison of Different Models

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI