Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model

Yu, Gangyi; Zhang, Junbo; Chen, Ao; Wan, Rong

doi:10.3390/fishes8040186

Open AccessArticle

Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model

by

Gangyi Yu

¹,

Junbo Zhang

^1,2,*,

Ao Chen

¹ and

Rong Wan

^1,2,*

¹

College of Marine Science, Shanghai Ocean University, Shanghai 201306, China

²

National Offshore Fisheries Engineering Technology Research Center, Shanghai 201306, China

^*

Authors to whom correspondence should be addressed.

Fishes 2023, 8(4), 186; https://doi.org/10.3390/fishes8040186

Submission received: 20 February 2023 / Revised: 26 March 2023 / Accepted: 27 March 2023 / Published: 30 March 2023

(This article belongs to the Special Issue Fisheries and Aquaculture Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

A primary problem affecting the sustainable development of aquaculture is fish skin diseases. In order to prevent the outbreak of fish diseases and to provide prompt treatment to avoid mass mortality of fish, it is essential to detect and identify skin diseases immediately. Based on the YOLOv4 model, coupled with lightweight depthwise separable convolution and optimized feature extraction network and activation function, the detection and identification model of fish skin disease is constructed in this study. The developed model is tested for the diseases hemorrhagic septicemia, saprolegniasis, benedeniasis, and scuticociliatosis, and applied to monitor the health condition of fish skin in deep-sea cage culture. Results show that the MobileNet3-GELU-YOLOv4 model proposed in this study has an improved learning ability, and the number of model parameters is reduced. Compared to the original YOLOv4 model, its mAP and detection speed increased by 12.39% and 19.31 FPS, respectively. The advantages of the model are its intra-species classification capability, lightweight deployment, detection accuracy, and speed, making the model more applicable to the real-time monitoring of fish skin health in a deep-sea aquaculture environment.

Keywords:

aquaculture; fish disease; detection and identification; improved YOLOv4 network; deep learning

1. Introduction

Aquaculture is the fastest growing sector of high-protein resources in global food production, which is regarded as the most efficient and sustainable approach to meet the ever-increasing demand for food, contributing to economic development and social stability worldwide [1,2,3]. The global production of aquaculture reached 114.5 million tons, of which finfish, shellfish, and crustacean culture is 82.1 million tons [4].

The supply of high-quality and green-labeled aquatic products and the welfare of farmed aquatic organisms are of great concern to politicians and policymakers, non-governmental organizations, the aquaculture industry, and consumers [2,5]. However, diseases, which are major contributors to the degradation of seafood quality and the reduction in fish welfare in intensive aquaculture, are probably responsible for massive fish mortality and, to some extent, are known to affect economic and social development worldwide [6]. For many aquaculture species, such as Atlantic salmon Salmo salar and redbanded seabream Pagrus auriga, skin diseases are regarded as a primary problem that affects their sustainable growth [6,7,8,9].

Immediately detecting and identifying skin disease is therefore critical to the prevention of fish disease outbreaks as prompt treatment would avoid mass mortality of fish. Information on fish skin diseases in aquaculture can be obtained by human observation and artificial intelligence (AI) technology [5,10,11]. Currently, many large-scale aquaculture cages are equipped with underwater camera equipment, but manual real-time detection and identification of fish skin health status through video viewing is extremely costly and difficult to implement in cage farming. The utilization of AI technology, specifically image recognition technology, based on target geometry and statistical features, is of great importance to achieve the rapid detection and identification of fish skin diseases even when the fish are swimming fast.

The technique of image recognition has been widely applied in fisheries for species identification, biomass estimation, and behavioral analysis [12,13,14,15,16,17,18,19,20,21,22,23]. However, research into the detection and identification of fish skin disease is still in its infancy [5]. In the early days, machine learning techniques such as color segmentation and K-means clustering were applied to identify fish diseases [10,11,24]. Waleed et al. [25] then adopted AlexNet, a deep convolutional neural network (CNN), to improve the accuracy of fish disease (epizootic ulcerative syndrome, ichthyophthirius, and columnaris) detection. However, there is still significant potential to improve detection accuracy, and real-time skin disease detection as fish swim. An algorithm that combines high detection speed with high precision is required [19].

The YOLO (you only look once) model provides a more efficient and accurate algorithm for rapid detection and recognition of targets [26,27,28], as demonstrated by Hu et al. [28] who used the YOLOv4 model to detect uneaten feed particles in aquaculture. In this study, an improved YOLOv4 model was used through a coupled algorithm approach to construct a system for the detection and identification of four common fish diseases, including hemorrhagic septicemia, saprolegniasis, benedeniasis, and scuticociliatosis. The system has been applied to real-time monitoring of fish skin health status in deep-sea cage culture, to provide useful decision-making support for early warning of aquaculture diseases.

2. Materials and Methods

2.1. Dataset

The image data used in this study are divided into three types: training dataset, test dataset, and application dataset. As shown in Figure 1, the datasets are mainly sourced from the website and the Penghu semi-submersible deep-sea cage (located on Chimelong Island, Zhuhai, China) and divided into the training dataset and the test dataset at a ratio of 9:1. The application dataset consists of images and videos of fish populations from the Yellow Sea Long Whale No.1 submersible deep-sea cage (located on Dashin Island, Yantai, China).

2.2. Detection and Identification Model Based on YOLOv4

Although different approaches based on computer vision techniques are used for detection tasks [10,11,24], the YOLO model is one of the main approaches and well applied in practice [28]. YOLO is a one-stage network model that combines the two stages of R-CNN into one stage, and can recognize the location and category of the target with only one operation. Compared with the two-stage network model’s way of detecting targets by first determining the target location and then determining the target category, the YOLO model pre-divides an image into several preset frames, splits the image into a grid, and obtains simultaneously the location frame and the category information of the target by a merging grid and judging way. It greatly reduces the computation time of the target detection network and has the good real-time performance, which is suitable for the rapid detection of fish skin health status in aquaculture.

YOLOv4 model [29] mainly consists of four components: the feature extraction network CSPDarknet, the enhanced feature extraction network SPP, the upsampling and feature fusion network PANet, and the feature prediction YOLO head. The SPP module and PANet work together to form a feature pyramid. The SPP module greatly increases the field of view and separates the most important contextual features. PANet not only performs feature extraction from bottom to top in the feature pyramid, but also implements feature extraction from top to bottom, preserving the features of different layers through repeated feature extraction. The YOLO Head generates results for different sizes of feature maps.

In order to address the problem of difficulties in recognition due to the unclear characteristics of light and the fish skin diseases in actual production processes, and to improve network computation speed and implement the lightweight model deployment, this study developed a detection and identification model of fish skin health status based on the YOLOv4 network. Through optimizing the feature extraction network and activation function, and combining the lightweight depthwise separable convolution method, the proposed image-analysis method comprises a preprocessing stage, an improving stage, and a training stage (Figure 2).

2.2.1. Data Preprocessing

Training a robust model typically requires a large amount of labeled data, while high quality data is hard to collect, and labeling of complex actions is often time consuming and expensive [30,31]. The dataset used in this study contains 500 images and 5066 annotated samples. The position and posture of fish in water is highly random, and stretching images can affect fish body proportions. To increase the variability of the input images and improve the robustness of the training model, we used methods such as adjusting brightness, contrast, hue, saturation, and noise to change the luminance distortion of the dataset. Techniques such as random scaling, cropping, horizontal flipping, and rotation (−10° to 10°) were used to augment the data, the processed image is shown in the Figure 3. The original dataset of 500 images was expanded to 4550 images by this data augmentation method. To prevent image distortion caused by resizing during the pre-training image input, the initial images were processed by cropping and filling to 416 × 416 pixels.

2.2.2. Model Improving

Feature Extraction Network

Feature extraction refers to the process of extracting meaningful information from input data and transforming it into features that are suitable for machine learning model learning and processing. The extracted features are descriptive and non-redundant. The effect of feature extraction is crucial for the subsequent network recognition and model generalization. The classic lightweight feature extraction network is mainly MobileNet. MobileNet v1 proposed a method of depthwise separable convolution by splitting a 3 × 3 convolution block into a 3 × 3 convolution block and a 1 × 1 convolution block to reduce the number of parameters and improve the feature extraction effect. MobileNet v2 uses Inverted Resblock overall, which adopts 1 × 1 convolution and 3 × 3 depthwise separable convolution for dimensionality increasement and feature extraction, and directly connects the input to the output through residual edge structure. MobileNet v3 establishes a lightweight channel-based attention model to enhance the feature extraction effect of the network, using a special bneck structure that combines the depthwise separable convolution of MobileNet v1 and the inverted residual structure of MobileNet v2.

Activation Function

Activation functions, also known as neurons, are the basic units that make up a neural network and mimic the structure and properties of biological neurons. A good activation function enhances the representation and learning ability of the network, allowing gradients to propagate more efficiently and avoiding the problem of exploding and disappearing gradients.

The rectified linear unit (ReLU) activation function is a segmented linear function that changes all negative values to zero while leaving positive values unchanged, and unilateral inhibition causes the neurons in the neural network to have sparse activation. It can better extract the relevant features and fit the training data.

R e L U (x) = m a x (0, x)

(1)

The Mish activation function is a smooth non-monotonic function that helps keep small negative values, thus stabilizing the network gradient flow and avoiding a sharp drop in training speed to produce gradient saturation. Mish is smoother than the ReLU activation function, allowing information to penetrate deeper into the neural network for better accuracy and generalization, but is more computationally intensive.

M i s h (x) = x t a n h (l n (1 + e^{x}))

(2)

The GELU activation function introduces the idea of stochastic regularity in activation, which makes the output result more in line with the normal distribution, and is the best activation function to deal with the NLP domain, especially in the Transformer model.

G E L U (x) = x P (X \leq x) = x Φ (x) \approx 0.5 x (1 + t a n h [\sqrt{\frac{2}{π}} (x + 0.044715 x^{3})])

(3)

Depthwise Separable Convolution

The depthwise separable convolution block converts the standard convolution into a depth convolution and a point-by-point convolution for the operation. The feature map of each input channel is convolved channel-by-channel by 3 × 3 convolution, and the channels and convolution kernels correspond one-to-one, and then the result of the previous convolution kernel is convolved point-by-point using the convolution block of 1 × 1 input channel to superimpose the feature map into an output feature map and increase exponentially with the number of output channels (Figure 4). If the input channel is X and the output channel is Y, a normal 3 × 3 convolution will produce the number of parameters shown in Equation (4), while the 3 × 3 depthwise separable convolution will yield the number of parameters shown in Equation (5).

X \times 3 \times 3 \times Y = 9 X Y

(4)

X \times 3 \times 3 + 1 \times 1 \times 3 \times Y = 9 X + 3 Y

(5)

2.2.3. Model Training

Model training is a core part of deep learning and migration learning training. In this study, we built YOLOv4 and its improved model based on PyTorch platform (hardware and software: Windows 10 OS, 16G running memory, Nvidia GeForce GTX1060 (6G) graphics card), pre-trained the model on the publicly available VOC 2007 large dataset, then froze most of the model parameters and used the extended annotated images. The YOLOv4 model, Mobile v1-YOLOv4 model, Mobile v2-YOLOv4 model, Mobile v3-YOLOv4 model, and Mobile v3-GELU-YOLOv4 model were trained and tested on four fish skin disease datasets, which were divided into 8:1:1 for training dataset, validation dataset and test dataset, respectively.

In order to determine the error between the prediction results of the model and the annotation dataset, this study uses the sum value of three loss functions, confidence-IoU loss (CIoU loss), binary cross-entropy loss (BCELoss), and mean squared error loss (MSELoss) to evaluate the loss of the trained model [32,33].

CIoU loss is calculated by computing the difference between the confidence level of the target bounding box and the IoU of the actual bounding box, where the IoU is a measure of the proportion of intersecting and concatenated parts of two regions.

I o U = \frac{a r e a (B_{p} \cap B_{g t})}{a r e a (B_{p} \cup B_{g t})}

(6)

where B_p is the bounding box of the predicted target and B_gt is the bounding box of the actual target. CIoU loss introduces the parameters v and α of the predicted and actual border aspect ratio to calculate the loss on the basis of CIoU.

v = {\frac{4}{π^{2}} (a r c t a n \frac{w_{g t}}{h_{g t}} - a r c t a n \frac{w}{h})}^{2}

(7)

α = \frac{v}{1 - I o U + v}

(8)

C I o U = I o U - \frac{ρ^{2} (B_{p}, B_{g t})}{C^{2}} - α v

(9)

L_{C I o U} = 1 - C I o U = 1 - I o U + \frac{ρ^{2} (B_{p}, B_{g t})}{C^{2}} + α v

(10)

where w_gt and h_gt are the width and height of the predicted bounding box, w and h are the width and height of the actual bounding box, and C is the diagonal distance between the predicted box and the minimum outer rectangle of the target box.

MSELoss calculates the loss by computing the mean squared difference between the predicted and true values, and reflects the accuracy of the prediction.

L_{M S E} = \frac{1}{N} \sum_{c \in C l a s s} (p (c) - \hat{p} (c))

(11)

where N is the number of categories of the data,

\hat{p} (c)

is the target predicted output of the model, and p(c) is the target label value of the model.

BCELoss is a commonly used dichotomous loss function that calculates the difference between the predicted and the actual values.

L_{B C E} = - (y \log x + (1 - y) \log (1 - x))

(12)

where y and x are the target predicted output and the label values of model, respectively.

Finally, the loss function for model training is as follows.

\begin{array}{l} L o s s = L_{C I o U} + L_{M S E} + L_{B C E} \\ = λ \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} (1 - C I o U) \\ - \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} (c_{i} \log {\hat{c}}_{i}) + (1 - c_{i}) \log (1 - {\hat{c}}_{i})) \\ + λ \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} (c_{i} - {\hat{c}}_{i}) + \sum_{i = 0}^{S^{2}} \sum_{c \in C l a s s} (p_{i} (c) - {\hat{p}}_{i} (c)) \end{array}

(13)

where

λ

is the penalty coefficients and both equal five, i is the grid being divided, s² is the area of the whole input image, j is the predicted bounding box, and c is the category information.

The iterative training is 100 epochs. Each training round is validated on the validation dataset, and the model parameters and loss values are saved to the system file. The trained weight groups are imported separately into different network models, and the performance metrics of the network models are analyzed by testing on the test dataset. The models are trained to perform the monitoring and recognition tasks on hardware devices that supporting neural network inference function or on the Windows PC side configured with NVIDIA graphics cards (supporting compute unified device architecture, CUDA).

2.2.4. Performance of Improved YOLOv4 Model

The performance, such as rapidity, accuracy and robustness, of the improved YOLOv4 model is evaluated by four indices: precision, recall, mAP, and FPS [34] which are calculated as follows:

Precision refers to the percentage of samples in all prediction categories that are predicted to be sample correct in that category.

P = \frac{T_{P}}{T_{P} + F_{P}}

(14)

where true positive (T_P) is the number of correctly identified samples and false positive (F_P) is the number of negative samples.

Recall refers to the proportion of correctly predicted samples out of all samples labeled with that category.

r = \frac{T_{P}}{T_{P} + F_{N}}

(15)

where false negative (F_N) is the number of positive samples that are incorrectly identified as negative.

AP refers to the accuracy assessment of each category target frame in target detection, which reflects the detection performance of the target. mAP is calculated based on AP, which refers to the average accuracy of each category target frame.

A P = \int_{0}^{1} P (r) d r

(16)

m A P = \frac{\sum_{n \in N} A P (n)}{N}

(17)

where n is the current category and N is the total category.

FPS refers to the number of frames processed per second, which is the main factor used to determine the detection speed of target.

F P S = \frac{1}{T}

(18)

where T is the time used to process one image.

3. Results and Discussion

3.1. Comparison of Different Loss Functions

The change in loss values of different models with the training epochs is shown in Figure 5. The models gradually converge after 50 epochs, and the YOLOv4 models with the MobileNet feature extraction module have an obviously lower loss value after the 10th epoch than the original YOLOv4 model, indicating a much improved learning efficiency. When the GELU activation function is adopted, the learning efficiency of the model shows a faster decrease in loss value during training epochs compared to other models, which means the MobileNet3-GELU-YOLOv4 model has a further enhancement in learning ability.

3.2. Comparison of Changes in the Number of Parameters

As shown in Table 1, the number of model parameters is reduced by the MobileNet feature extraction networks. One parameter storage process in the operation is removed by the GELU activation function, and the amount of parameters is further reduced by 64%–67% after applying the adoption of depthwise separable convolution. Reducing the number of parameters makes the MobileNet v3-GELU-YOLOv4 model available on hardware devices such as Raspberry Pi and FPGA devices for detection and identification of fish skin diseases, providing a good basis for the deploying lightweight networks.

3.3. Model Detection Performance and Precision

Figure 6 shows the performance of different models for detecting the disease Hemorrhagic Septicemia in snubnose pompano (Trachinotus blochii), where the red boxes indicate that the detected fish are healthy, and the yellow boxes indicate the presence of skin disease in the detected fish. The MobileNet v3-GELU-YOLOv4 model can improve the skin disease detection accuracy to 0.99 based on the correct recognition of healthy fish in the case of overlapping occlusion of multiple detected targets. In terms of the model precision of the four fish skin diseases (Figure 7), the MobileNet v3-YOLOv4 and the MobileNet v3-GELU-YOLOv4 models have higher precision than the other three models. In addition, the MobileNet v3-GELU-YOLOv4 model improved the recognition precision of healthy fish and fish with Hemorrhagic Septicemia by 0.26% and 0.15%, respectively. A possible reason is that the MobileNet v3-GELU-YOLOv4 model includes more BatchNorm structures and GELU activation functions in the lightweight feature extraction network, allowing the output of the convolutional layer to be readjusted to the data distribution based on the mean and variance of the output [35,36,37], increasing the distance of intraspecific details in the feature extraction stage, enriching regional target features, and improving the accuracy and generalizability of species classification, especially in the case of healthy and diseased fish with small morphological differences. It is noteworthy that the addition of the channel attention mechanism module to obtain the relationship between individual pixel points of the feature map improves ability of model to capture image contextual information, thus improving the accuracy of reconstruction of regions with week structure [38].

The models were validated by running them separately in a training server with an NVIDIA graphics card for the same frame comparison test when detecting real-time monitoring videos of fish in a deep-sea net cage, as shown in Figure 8. The skin health status of Korean rockfish (Sebastes schlege) is monitored, and no diseased fish are found, which is consistent with the actual situation. YOLOv4 and MobileNet v1-YOLOv4 models both had one false positive, and 28 healthy fish are accurately recognized by the MobileNet v3-GELU-YOLOv4 model, which is the highest number of healthy fish detected compared to other models. It indicates that the model is capable of being applied to the real-time monitoring of the fish skin health status in deep-sea cage culture. Underwater images frequently have low quality due to underwater illumination and detecting equipment, decreasing model detection performance. Applying sonar devices to assist in the extraction of underwater information can significantly enhance detection accuracy, boost relevant details for model input, and broaden the application scenarios of our proposed method to combine ROI analysis and image classification models to detect underwater targets in low-and no-light conditions [39,40].

In terms of recall and average precision of the model, the MobileNet v3-GELU-YOLOv4 model was the highest with 98.65% (Recall) and 99.64% (mAP), respectively (Table 2). A reasonable explanation is that the mAP of the model is usually effectively improved by using the MoblieNet feature extraction network [28,41,42], and the GELU activation function further improved the average precision of the network model [43]. Yu et al. [44] improved the mAP by 2.72% after employing the GELU activation function in the vehicle and pedestrian target detection task. For detection speed, although the MobileNet v1-YOLOv4 model has the highest detection speed of 54.14 FPS, its average accuracy for four fish skin diseases is relatively low due to its relatively simple structure. Usually, the FPS value of the model is greater than 30 to meet the requirement of real-time underwater camera detection, and from the perspective of actual fishery production, the precise detection and the early warning of fish diseases are both the top priorities in practice [45]. Therefore, the highest mAP value of 99.64 and FPS value of 39.62 make the MobileNet v3-GELU-YOLOv4 model suitable for use in different types of aquaculture facilities, such as offshore aquaculture platform, deep-water net cages, and factory farming ponds. Real-time detection of farmed unhealthy fish is possible without the use of additional hardware by simply installing a clear underwater camera and linking it to a device with deep learning processing capacity. The model can be widely used to assist farmers in identifying and treating diseased fish in a timely manner, effectively improving production and quality, and minimizing economic losses because it also has a low number of model parameters (11,428,545) that makes it less constrained by the hardware.

In deep learning, the detection and recognition models for fish disease detection are based on color calssification, including the recognition of diseases epizootic ulcerative syndrome (EUS), ichthyophthirius (Ich), and columnaris [25], and the detection of wound and lice in Atlantic salmon fish [46]. Compared to the color calssification-based model, the object detection-based model, e.g., the proposed model in this study, is able to deal with the case of more than one detection target with different diseases in the same image, which considerably expands the potential application scenarios of fish disease detection; nevertheless, little research has been conducted on this [47] and there is significant space available for improvement [46].

4. Conclusions

Immediate detection and identification of fish skin diseases is important for aquaculture to prevent the outbreak of fish diseases that can cause the mass fish mortality. In this study, a fish disease detection and recognition system was developed using a coupling algorithm approach based on the YOLOv4 model. The system targets common issues such as overlapping of multiple detection targets and small, indistinct features of fish diseases. Specifically, the system was designed to detect and recognize four common fish diseases: hemorrhagic septicemia, saprolegniasis, benedeniasis, and scuticociliatosis, and has been applied to real-time monitoring of the surface health status of fish in deep-sea cage culture.

Compared with the original YOLOv4 model, the improved MobileNet v3-GELU-YOLOv4 model reduces the parameter volume by 52,934,556 and increases the mAP and the detection speed by 12.39% and 19.31 FPS, respectively, providing a significant advantage in lightweight network deployment. Using Nvidia GTX 1060 (6G) as the training test hardware, the detection speed is more than 30 FPS, which meets the criteria of real-time tracking detection. Due to the limited availability of underwater images and videos of fish with skin diseases in the dataset, the collected images may not be clear enough, which may result in a decrease in the generalization performance of the model. However, the proposed system still exhibits good performance in terms of mAP, detection precision, number of parameters, and detection time.

In the future, it is necessary to construct large-scale, high-quality datasets of fish skin diseases to improve the generalization of the network and to incorporate underwater image processing algorithms for real-time recognition and warning of fish disease. In the long term, the development of AI-based fish disease detection and recognition models may represent a paradigm shift in the way real-time monitoring of fish diseases is conducted in aquaculture.

Author Contributions

Conceptualization and supervision, R.W. and J.Z.; Methodology, data curation, formal analysis, and visualization, G.Y. and A.C.; Writing—original draft preparation, G.Y. and J.Z.; Writing—review and editing, G.Y., R.W. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the National Key Research and Development Program of China (Grant No. 2019YFC0312104).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank Junyu Dong and his research group form Ocean University of China for their help in methodology.

Conflicts of Interest

The authors declare no conflict of interest.

References

FAO. Planning for Aquaculture Diversification: The Importance of Climate Change and Other Drivers; FAO Fisheries and Aquaculture Department: Rome, Italy, 2016. [Google Scholar]
FAO. The State of World Fisheries and Aquaculture 2018; FAO Fisheries and Aquaculture Department: Rome, Italy, 2018. [Google Scholar]
Yu, J.-K.; Li, Y.-H. Evolution of Marine Spatial Planning Policies for Mariculture in China: Overview, Experience and Prospects. Ocean. Coast. Manag. 2020, 196, 105293. [Google Scholar] [CrossRef]
FAO. The State of World Fisheries and Aquaculture 2020; FAO Fisheries and Aquaculture Department: Rome, Italy, 2022. [Google Scholar]
Sveen, L.; Timmerhaus, G.; Johansen, L.-H.; Ytteborg, E. Deep Neural Network Analysis—A Paradigm Shift for Histological Examination of Health and Welfare of Farmed Fish. Aquaculture 2021, 532, 736024. [Google Scholar] [CrossRef]
Ina-Salwany, M.Y.; Al-saari, N.; Mohamad, A.; Mursidi, F.-A.; Mohd-Aris, A.; Amal, M.N.A.; Kasai, H.; Mino, S.; Sawabe, T.; Zamri-Saad, M. Vibriosis in Fish: A Review on Disease Development and Prevention. J. Aquat. Anim. Health 2019, 31, 3–22. [Google Scholar] [CrossRef] [PubMed]
Defoirdt, T.; Sorgeloos, P.; Bossier, P. Alternatives to Antibiotics for the Control of Bacterial Disease in Aquaculture. Curr. Opin. Microbiol. 2011, 14, 251–258. [Google Scholar] [CrossRef]
Labella, A.M.; Arahal, D.R.; Lucena, T.; Manchado, M.; Castro, D.; Borrego, J.J. Photobacterium Toruni Sp. Nov., a Bacterium Isolated from Diseased Farmed Fish. Int. J. Syst. Evol. Microbiol. 2017, 67, 4518–4525. [Google Scholar] [CrossRef] [PubMed]
Kristoffersen, A.B.; Qviller, L.; Helgesen, K.O.; Vollset, K.W.; Viljugrein, H.; Jansen, P.A. Quantitative Risk Assessment of Salmon Louse-Induced Mortality of Seaward-Migrating Post-Smolt Atlantic Salmon. Epidemics 2018, 23, 19–33. [Google Scholar] [CrossRef] [PubMed]
Park, J.-S.; Oh, M.-J.; Han, S. Fish Disease Diagnosis System Based on Image Processing of Pathogens’ Microscopic Images. In Proceedings of the 2007 Frontiers in the Convergence of Bioscience and Information Technologies, Jeju, Republic of Korea, 11–13 October 2007; pp. 878–883. [Google Scholar] [CrossRef]
Malik, S.; Kumar, T.; Sahoo, A.K. Image Processing Techniques for Identification of Fish Disease. In Proceedings of the 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), Singapore, 4–6 August 2017; pp. 55–59. [Google Scholar] [CrossRef]
Li, X.; Shang, M.; Qin, H.; Chen, L. Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN. In Proceedings of the OCEANS 2015—MTS/IEEE Washington, Washington, DC, USA, 19–22 October 2015; pp. 1–5. [Google Scholar] [CrossRef]
Qin, H.; Li, X.; Liang, J.; Peng, Y.; Zhang, C. DeepFish: Accurate Underwater Live Fish Recognition with a Deep Architecture. Neurocomputing 2016, 187, 49–58. [Google Scholar] [CrossRef]
Lu, H.; Li, Y.; Uemura, T.; Ge, Z.; Xu, X.; He, L.; Serikawa, S.; Kim, H. FDCNet: Filtering Deep Convolutional Network for Marine Organism Classification. Multimed. Tools Appl. 2018, 77, 21847–21860. [Google Scholar] [CrossRef]
Christensen, J.H.; Mogensen, L.V.; Galeazzi, R.; Andersen, J.C. Detection, Localization and Classification of Fish and Fish Species in Poor Conditions Using Convolutional Neural Networks. In Proceedings of the 2018 IEEE/OES Autonomous Underwater Vehicle Workshop (AUV), Porto, Portugal, 6–9 November 2018; pp. 1–6. [Google Scholar] [CrossRef]
Khalifa, N.E.M.; Taha, M.H.N.; Hassanien, A.E. Aquarium Family Fish Species Identification System Using Deep Neural Networks. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2018, Cairo, Egypt, 3–5 September 2018; Hassanien, A.E., Tolba, M.F., Shaalan, K., Azar, A.T., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 347–356. [Google Scholar] [CrossRef]
Sun, X.; Shi, J.; Liu, L.; Dong, J.; Plant, C.; Wang, X.; Zhou, H. Transferring Deep Knowledge for Object Recognition in Low-Quality Underwater Videos. Neurocomputing 2018, 275, 897–908. [Google Scholar] [CrossRef] [Green Version]
Måløy, H.; Aamodt, A.; Misimi, E. A Spatio-Temporal Recurrent Network for Salmon Feeding Action Recognition from Underwater Videos in Aquaculture. Comput. Electron. Agric. 2019, 167, 105087. [Google Scholar] [CrossRef]
Labao, A.B.; Naval, P.C. Cascaded Deep Network Systems with Linked Ensemble Components for Underwater Fish Detection in the Wild. Ecol. Inform. 2019, 52, 103–121. [Google Scholar] [CrossRef]
Rauf, H.T.; Lali, M.I.U.; Zahoor, S.; Shah, S.Z.H.; Rehman, A.U.; Bukhari, S.A.C. Visual Features Based Automated Identification of Fish Species Using Deep Convolutional Neural Networks. Comput. Electron. Agric. 2019, 167, 105075. [Google Scholar] [CrossRef]
Liawatimena, S.; Atmadja, W.; Abbas, B.S.; Trisetyarso, A.; Wibowo, A.; Barlian, E.; Hardanto, L.T.; Triany, N.A.; Faisal; Sulistiawan, J.; et al. Drones Computer Vision Using Deep Learning to Support Fishing Management in Indonesia. IOP Conf. Ser. Earth Environ. Sci. 2020, 426, 012155. [Google Scholar] [CrossRef]
Zhang, S.; Yang, X.; Wang, Y.; Zhao, Z.; Liu, J.; Liu, Y.; Sun, C.; Zhou, C. Automatic Fish Population Counting by Machine Vision and a Hybrid Deep Neural Network Model. Animals 2020, 10, 364. [Google Scholar] [CrossRef] [Green Version]
Yang, X.; Zhang, S.; Liu, J.; Gao, Q.; Dong, S.; Zhou, C. Deep Learning for Smart Fish Farming: Applications, Opportunities and Challenges. Rev. Aquac. 2021, 13, 66–90. [Google Scholar] [CrossRef]
Ahmed, M.S.; Aurpa, T.T.; Azad, M.A.K. Fish Disease Detection Using Image Based Machine Learning Technique in Aquaculture. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 5170–5182. [Google Scholar] [CrossRef]
Waleed, A.; Medhat, H.; Esmail, M.; Osama, K.; Samy, R.; Ghanim, T.M. Automatic Recognition of Fish Diseases in Fish Farms. In Proceedings of the 2019 14th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 17–18 December 2019; pp. 201–206. [Google Scholar] [CrossRef]
Sung, M.; Yu, S.-C.; Girdhar, Y. Vision Based Real-Time Fish Detection Using Convolutional Neural Network. In Proceedings of the OCEANS 2017—Aberdeen, Aberdeen, UK, 19–22 June 2017; pp. 1–6. [Google Scholar] [CrossRef]
Cai, K.; Miao, X.; Wang, W.; Pang, H.; Liu, Y.; Song, J. A Modified YOLOv3 Model for Fish Detection Based on MobileNetv1 as Backbone. Aquac. Eng. 2020, 91, 102117. [Google Scholar] [CrossRef]
Hu, X.; Liu, Y.; Zhao, Z.; Liu, J.; Yang, X.; Sun, C.; Chen, S.; Li, B.; Zhou, C. Real-Time Detection of Uneaten Feed Pellets in Underwater Images for Aquaculture Using an Improved YOLO-V4 Network. Comput. Electron. Agric. 2021, 185, 106135. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Liu, F.; Xu, X.; Qing, C.; Jin, J. Probability Matrix SVM+ Learning for Complex Action Recognition. In Proceedings of the Internet Multimedia Computing and Service, Qingdao, China, 23–25 August 2017; Springer: Singapore; pp. 403–410. [Google Scholar] [CrossRef]
Meng, L.; Hirayama, T.; Oyanagi, S. Underwater-Drone with Panoramic Camera for Automatic Fish Recognition Based on Deep Learning. IEEE Access 2018, 6, 17880–17886. [Google Scholar] [CrossRef]
Hu, J.; Zhao, D.; Zhang, Y.; Zhou, C.; Chen, W. Real-Time Nondestructive Fish Behavior Detecting in Mixed Polyculture System Using Deep-Learning and Low-Cost Devices. Expert Syst. Appl. 2021, 178, 115051. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Huang, Z.; Sui, B.; Wen, J.; Jiang, G. An Intelligent Ship Image/Video Detection and Classification Method with Improved Regressive Deep Convolutional Neural Network. Complexity 2020, 2020, 1520872. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, Proceedings of Machine Learning Research, Lille, France, 1 June 2015; pp. 448–456. [Google Scholar]
Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]
Nguyen, A.; Pham, K.; Ngo, D.; Ngo, T.; Pham, L. An Analysis of State-of-the-Art Activation Functions for Supervised Deep Neural Network. In Proceedings of the 2021 International Conference on System Science and Engineering (ICSSE), Ho Chi Minh City, Vietnam, 26–28 August 2021; pp. 215–220. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Połap, D.; Wawrzyniak, N.; Włodarczyk-Sielicka, M. Side-Scan Sonar Analysis Using ROI Analysis and Deep Neural Networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4206108. [Google Scholar] [CrossRef]
Galusha, A.; Dale, J.; Keller, J.M.; Zare, A. Deep Convolutional Neural Network Target Classification for Underwater Synthetic Aperture Sonar Imagery. In Proceedings of the Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XXIV., SPIE, Baltimore, MD, USA, 10 May 2019; Volume 11012, pp. 18–28. [Google Scholar] [CrossRef]
Akgül, T.; Çalik, N.; Töreyın, B.U. Deep Learning-Based Fish Detection in Turbid Underwater Images. In Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 5–7 October 2020; pp. 1–4. [Google Scholar] [CrossRef]
Ayob, A.F.; Khairuddin, K.; Mustafah, Y.M.; Salisa, A.R.; Kadir, K. Analysis of Pruned Neural Networks (MobileNetV2-YOLO v2) for Underwater Object Detection. In Proceedings of the 11th National Technical Seminar on Unmanned System Technology 2019, Kuantan, Malaysia, 2–3 December 2019; Md Zain, Z., Ahmad, H., Pebrianti, D., Mustafa, M., Abdullah, N.R.H., Samad, R., Mat Noh, M., Eds.; Springer Nature: Singapore, 2021; pp. 87–98. [Google Scholar] [CrossRef]
Zheng, Z.; Liang, G.; Luo, H.; Yin, H. Attention Assessment Based on Multi-View Classroom Behaviour Recognition. IET Comput. Vis. 2022; early view. [Google Scholar] [CrossRef]
Yu, W.; Ren, P. Vehicle and Pedestrian Target Detection in Auto Driving Scene. J. Phys. Conf. Ser. 2021, 2132, 012013. [Google Scholar] [CrossRef]
Li, D.; Li, X.; Wang, Q.; Hao, Y. Advanced Techniques for the Intelligent Diagnosis of Fish Diseases: A Review. Animals 2022, 12, 2938. [Google Scholar] [CrossRef]
Gupta, A.; Bringsdal, E.; Knausgård, K.M.; Goodwin, M. Accurate Wound and Lice Detection in Atlantic Salmon Fish Using a Convolutional Neural Network. Fishes 2022, 7, 345. [Google Scholar] [CrossRef]
Yasruddin, M.L.; Hakim Ismail, M.A.; Husin, Z.; Tan, W.K. Feasibility Study of Fish Disease Detection Using Computer Vision and Deep Convolutional Neural Network (DCNN) Algorithm. In Proceedings of the 2022 IEEE 18th International Colloquium on Signal Processing and Applications (CSPA), 12 May 2022; IEEE: Selangor, Malaysia, 2022; pp. 272–276. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Schematic diagram of image-acquisition system, image preprocessing, model improving, model training, and fish skin disease detection.

Figure 3. Data Augmentation (a) the original image; (b,c) the data augmented image.

Figure 4. Schematic diagram of depthwise separable convolution.

Figure 5. Training loss function plots (a) first three epochs; (b) after three epochs.

Figure 6. Fish skin disease detection effect on snubnose pompano with Hemorrhagic Septicemia (a) input; (b) YOLOv4; (c) MobileNet v1-YOLOv4; (d) MobileNet v2-YOLOv4; (e) MobileNet v3-YOLOv4; (f) MobileNet v3-GELU-YOLOv4.

Figure 7. Detection precision of four fish skin diseases.

Figure 8. Real-time detection of fish skin disease on Korean rockfish in deep-sea cages. (a) input (b) YOLOv4 (c) MobileNet v1-YOLOv4; (d) MobileNet v2-YOLOv4 (e) MobileNet v3-YOLOv4 (f) MobileNet v3-GELU-YOLOv4.

Table 1. Table of different network model parameter sizes.

Models	Feature Extraction Network	Activation Functions	Use of Depthwise Separable Convolution	Number of Parameters
YOLOv4	CSPDarknet	ReLU	False	64,363,101
	MobileNetv1			40,952,893
	MobileNetv2			39,062,013
	MobileNetv3			39,989,933
	MobileNetv1		True	12,692,029
	MobileNetv2			10,801,149
	MobileNetv3			11,729,069
	MobileNetv3	GELU		11,428,545

Table 2. Table of network model performance.

Models	Feature Extraction Network	Activation Functions	Precision/%	Recall/%	mAP/%	FPS
YOLOv4	CSPDarknet	ReLU	95.83	85.39	87.25	20.31
	MobileNetv1		98.42	86.45	94.28	54.14
	MobileNetv2		98.79	94.92	95.47	47.36
	MobileNetv3		98.90	94.83	95.7	38.31
	MobileNetv3	GELU	98.98	98.65	99.64	39.62

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, G.; Zhang, J.; Chen, A.; Wan, R. Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model. Fishes 2023, 8, 186. https://doi.org/10.3390/fishes8040186

AMA Style

Yu G, Zhang J, Chen A, Wan R. Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model. Fishes. 2023; 8(4):186. https://doi.org/10.3390/fishes8040186

Chicago/Turabian Style

Yu, Gangyi, Junbo Zhang, Ao Chen, and Rong Wan. 2023. "Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model" Fishes 8, no. 4: 186. https://doi.org/10.3390/fishes8040186

APA Style

Yu, G., Zhang, J., Chen, A., & Wan, R. (2023). Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model. Fishes, 8(4), 186. https://doi.org/10.3390/fishes8040186

Article Menu

Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Detection and Identification Model Based on YOLOv4

2.2.1. Data Preprocessing

2.2.2. Model Improving

Feature Extraction Network

Activation Function

Depthwise Separable Convolution

2.2.3. Model Training

2.2.4. Performance of Improved YOLOv4 Model

3. Results and Discussion

3.1. Comparison of Different Loss Functions

3.2. Comparison of Changes in the Number of Parameters

3.3. Model Detection Performance and Precision

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI