1. Introduction
Due to accelerating changes in the global environment, protections for various kinds of fish have received more and more attention, especially protections for deep sea fish. In order to detect and recognize fish, underwater robots, which are equipped with a single camera or several cameras, are commonly deployed to obtain images of the underwater environment. As one key research content of underwater vision, underwater image processing has lots of differences compared with air image processing because of its color cast effect, low contrast, noise and so on [
1], and all these factors make most of the proven image processing methods for air images unsuitable for underwater image processing, so studies on underwater image processing are necessary and crucial.
Aqueous media’s strong absorption and scattering of light make underwater images deteriorate badly in imaging distance and quality, contrast also drops dramatically, and blurriness becomes higher, and as a result, it is hard to extract useful image information [
2]. Image enhancement methods are adopted to clarify underwater images. After years of research and development, plenty of enhancement methods are more targeted and have better effects. As it is known that underwater images always have low contrast and uneven colors, to solve these problems, Iqbal et al. [
3] designed an Unsupervised Color Correction Method(UCM). Firstly, color equalization was adopted to improve color projection. Secondly, RGB format images’ R channels were strengthened while B channels were weakened. Lastly, strengthened every channel’s contrast of HSI format images. The advantages and processing effects of UCM were proven through comparison with extant methods like Gray World(GW), Histogram Equalization(HE) and so on. Ancuti et al. [
4] described one underwater image-enhancing strategy based on the fusion principles which derived the inputs and the weight measures only from the degraded version of the image. This strategy effectively improved underwater images’ and videos’ quality and enhanced the dark areas’ brightness, as a result, the whole contrast of images and videos could be upgraded and edges could be seen clearly. A hybrid CLAHE algorithm was proposed by Hitam et al. [
5], this algorithm took the contrast-limited adaptive histogram equalization(CLAHE) as the enhancement method and the Euclidean norm as the fusion method to process HSV and RGB images. Experimental results showed that this algorithm could enhance image quality significantly, improve contrast, and reduce noise and artifacts. To solve the spectral degradation effects, Ghani and Isa [
6] designed one method that stretched and integrated histograms of different components in RGB and HSV spaces. This method had achieved satisfactory results in both blue-green effects reduction and image enhancement, its advantage was proved through the results of processing 300 images. Based on principles of underwater imaging, Fengyun Cao et al. [
7] proposed an adaptive weighted enhancement algorithm, this algorithm took advantage of white balance and contrast enhancement to acquire double inputs images firstly, then combined illuminance with contrast to obtain three-weight images, an adaptive weighted fusion of all weighted images was conducted to obtain output images finally. This algorithm’s application value was shown through its improvement of underwater images’ quality while ensuring real-time property. Huang et al. [
8] adopted an adaptive relative global histogram stretching method, which consisted of contrast correction and color correction, to deal with low-quality shallow water images. High quality, abundant information, normal color and low noise shallow images were obtained by using this method. To solve problems that existed in typical low-quality underwater images, Jia et al. [
9] proposed an enhancement method that was based on the HSI model and adopted improved Retinex. It has been proved that this method could improve underwater images’ contrast and highlight images’ details both in qualitative and quantitative analysis. A weighted fusion-based multi-space transformation method for underwater image enhancement was proposed by Du et al. [
10], Gamma correction, CLAHE, SSR(Single-Scale Retinex), and some other traditional algorithms were adopted to implement enhancement on corresponding channels of different color models, and then weighted the sum of all results. Through comparison with the three existing methods both in subjective visual and objective evaluation indicators, this method’s advantage in improving color deviation and enhancing image quality was proved. Dong et al. [
11] designed a processing method for every channel in underwater images’ RGB and LAB color spaces, the specific coefficient based on the contrast between normal and abnormal channels in RGB color space was calculated to improve abnormal channels; histograms adopted fusion local enhancement and the exposure cut-off strategy was employed for L heft in LAB color space to enhance contrast. The beneficial equilibrium method was adopted to weigh the difference between A heft and B heft. This method acquired higher-quality underwater images.
Underwater target image enhancement is the basis for underwater target detection and recognition. The application of air target detection and recognition technology has been very mature, while the research on underwater target detection and recognition technology started relatively late. However, as countries gradually pay more attention to the development of the ocean, underwater target recognition technology has also made great progress. Wang Shilong et al. [
12] designed a new underwater target recognition system, which perfectly integrated the boundary moment theory and the improved FCM clustering method. The recognition experiments of four types of underwater objects proved the system’s effectiveness, stability, reliability and real-time properties. Shi Min et al. [
13] used the wavelet-transformed underwater target noise signal as the input of a probabilistic neural network to classify targets. The probabilistic neural network did not require training and was very suitable for underwater target recognition. Shi Yang et al. [
14] proposed a recognition method suitable for forward-looking sonar images. The method included a series of steps such as image enhancement, image segmentation, and feature classification, and used the particle swarm algorithm to optimize the parameters of the least squares vector machine. After 110 rounds of training, the recognition accuracy of the test set stabilized. Zhu et al. [
15] proposed a real-time underwater target recognition method based on a prior transformable template matching method. This method extracted features from the sonar video sequence to analyze and construct a template, then used fast saliency detection technology to find the target area in the image, and finally obtained the normalized gradient feature by performing an affine transformation on the template and calculating the template and target area’s similarity between them. An underwater acoustic target recognition method based on multi-dimensional fusion features was put forward by Wang et al. [
16]. This method used a Gaussian mixture model to optimize the structure of a deep neural network. This method could reduce redundant features and improve recognition accuracy. Experiments were conducted on a dataset with underwater background noise and weak targets, and the method achieved an accuracy of 94.3% after 800 iterations. Li Yu et al. [
17] designed an autonomous underwater target recognition system. This system was based on the LeNet-5 network model, used a 13-layer network structure, and was trained in 100 iterations. The final recognition rate reached 99.18%. Chen et al. [
18] proposed an underwater target recognition model based on the improved YOLOv4 network. The model used deconvolution to replace upsampling and fused depth-separable convolution, and the training set was preprocessed through the improved mosaic enhancement method. According to subjective and objective evaluations, the improved network model could effectively improve the accuracy of underwater target recognition while ensuring real-time performance and was no longer overly dependent on hardware performance. Wang et al. [
19] proposed an underwater target recognition method based on a convolutional neural network that optimized the two processes of feature extraction and target classification, and completed related underwater simulation experiments. Experimental results showed that the underwater target recognition accuracy was improved after optimization, which fully proved the effectiveness and innovation of the optimization method.
The method proposed in this paper for real-time fish detection and recognition contains two main parts, underwater image enhancement and fish detection and recognition. The following contents are organized as follows. In
Section 2 the image enhancement method is stated, and a weighted fusion algorithm is obtained by analyzing and improving several classic algorithms. The fish detection and recognition method is illustrated in
Section 3, the model is established by adding attention modules and lightweight improvements to the YOLOv5m model. Finally,
Section 4 introduces and analyzes the results of underwater simulation experiments.
2. Underwater Image Enhancement
The main problems encountered in underwater image processing include insufficient detail, low contrast, and color distortion, etc. Lots of image enhancement algorithms have achieved great effects in dealing with the problems described above. The Gray World algorithm [
20] aims to weaken or completely remove the color deviation caused by water absorption in underwater images. It is based on the assumption that the average pixel values of the underwater image’s R, G, and B color channels are all approximately the same value. Underwater images processed by the traditional Gray World algorithm are greatly improved in terms of the blue-green cast but will produce a certain degree of red artifacts. The CLAHE algorithm [
21] can not only make up for the block effect of the AHE algorithm but also has a small amount of calculation. Details of underwater images processed in HSV space using the CLAHE algorithm are more prominent, and at the same time, noise is less. Gamma Correction [
22] could improve images’ contrast by performing nonlinear operations on their gamma curves. In order to obtain information from images more easily, variable parameters are applied to dynamically adjust images’ brightness. However, when processing images in batches, different images need different parameters, and it is obviously unrealistic to change parameters every time an image is processed. Retinex [
23] theory aims to minimize the impact of light on the object’s irradiation component on the appearance of the image and retain the essential attributes of the object to the greatest extent.
The underwater image enhancement algorithm proposed in this paper is based on traditional classic algorithms and improvements are implemented to meet fish detection and recognition needs.
An improved Gray World Algorithm is adopted to acquire better color correction results by compensating the red channel [
24], that is, first compensate the R channel of the image and then calculate the gain of each channel according to the traditional algorithm.
represents the pixel corresponding to the compensated R channel,
and
represent the pixel values corresponding to the R component and G component of the original image, respectively,
is a variadic parameter, which should be dynamically adjusted based on the different degree of color deviation of underwater images.
Adaptive Gamma transformation is adopted to solve parameter adjustment issues while processing images in batches. This method is implemented through adaptive parameters, that is, each image determines its own Gamma transformation parameters through its own brightness. First of all, the underwater image is converted to LAB space. After channel separation, the L channel is separately Gamma transformed, and then the channels are merged and converted back to RGB space. This method significantly improves the contrast and clarity of the image while ensuring that the image color is not distorted.
In order to enhance the details of the underwater image and ensure the information content, this article first decomposes the underwater image into a lighting layer and a detail layer based on the Retinex theory and then enhances the detail layer based on the Sigmoid function. In order to ensure that the color of the color image is not affected by this algorithm, the original image needs to be converted to HSV space, and only the V component is decomposed and enhanced.
Image fusion can perfectly integrate the characteristics of multiple images into the output image, and make the output image more informative and detailed. Proper image fusion can increase the stability and reliability of the image enhancement system. In view of the characteristics of underwater images and based on the above analysis and improvement of these classic enhancement methods, this paper proposes an underwater image enhancement algorithm based on weighted fusion. The workflow is shown in
Figure 1, and the contents in the dotted boxes are the algorithm proposed or improved by this paper.
The steps of the underwater image enhancement algorithm in this paper are as follows:
- (1)
The Gray World algorithm that compensates for the red channel is used to process the original underwater image, which solves the problem that underwater images usually appear blue-green due to water absorption, making it close to the real color.
- (2)
Convert the color-corrected image from RGB space to HSV space and LAB space, respectively, and separate the three channels of the HSV image and LAB image.
- (3)
The CLAHE algorithm is used to process the V component to enhance the contrast of the underwater image. The color enhancement algorithm is used to process the S component to enhance the color saturation of the underwater image. The original H component is combined with the processed S and V components and converted to RGB space to obtain the first output image
. The expression of the color enhancement algorithm is as follows:
is the output image of the color enhancement algorithm, is the input image of the color enhancement algorithm, and are two variable parameters to change the degree of color enhancement and their best value could be determined experimentally.
- (4)
The brightness of the image after color correction is often too high and the image brightness needs to be reduced. Therefore, the L component is processed through Adaptive Gamma Correction, then merged with the original A and B components and the LAB image to RGB space to obtain the second output image .
- (5)
The V component is processed using the detail enhancement algorithm based on the Sigmoid function to enrich the details of the image. The original H and S components are merged with the enhanced V component and then convert the HSV image to RGB space to obtain the third output image .
- (6)
The three images
,
, and
are weighted and fused to achieve complementary advantages and obtain underwater images of higher quality and more in line with research needs. The expression of the weighted fusion output image
is as follows:
Weighting coefficients p, q, and r need to be determined experimentally.
- (7)
The multi-scale detail enhancement method based on Gaussian difference [
25] is used to improve the clarity and solve the blur problem caused by weighted fusion. The expression of the multi-scale detail enhancement method based on Gaussian difference is as follows:
, , and represent Gaussian kernels with standard deviations of 1.0, 2.0, and 4.0, respectively. , and value 0.5, 0.5, 0.25, respectively. This method first performs three-scale Gaussian filtering on the original image and then obtains different levels of detail information by subtracting from the original image. Finally, the obtained detailed information is integrated into the image through a specific algorithm to improve the clarity of the image.
Comparative experiments were conducted to prove the effectiveness of the image enhancement algorithm proposed in this paper, and experimental results are shown in
Figure 2 and
Table 1.
In
Figure 2, images of the first column are original underwater images, images of the second column are underwater images processed by the GW algorithm, images of the third column are underwater images processed by the CLAHE algorithm, images of the fourth column are underwater images processed by MSRCR algorithm, and images of the fifth column are underwater images processed by the image enhancement method proposed in this paper. It can be seen that the clarity and contrast of the underwater images processed by the GW algorithm are slightly improved; however, while correcting the blue-green bias of underwater images, it causes severe red artifacts or even significant color distortion in some images. The CLAHE algorithm does not correct the color of underwater images but significantly improves the contrast. The MSRCR algorithm can noticeably enhance the details, clarity, and contrast of underwater images and correct colors to some extent, but it may lead to color distortion. The underwater image enhancement method proposed in this paper can not only effectively improve the quality of underwater images in terms of details, clarity, and contrast but also decrease color distortion and make underwater images closer to real scenarios.
OI is short for Original Images in
Table 1. As can be seen from
Table 1, information entropy values of underwater images processed by the GW algorithm are all smaller than other algorithms but part of UIQM and UCIQE values are larger due to color distortion. Overall, the GW algorithm has less improvement in the quality of underwater images. Information entropy values of underwater images processed by the CLAHE algorithm have increased significantly, UIQM and UCIQE values have also increased to a certain extent, and PSNR values are larger since they have no color correction effect, so the CLAHE algorithm can effectively improve most underwater images’ contrast and clarity. Information entropy, UIQM and UCIQE values of underwater images processed by the MSRCR algorithm are significantly increased, but PSNR values are generally small, which means this algorithm can effectively improve the overall quality of underwater images but may cause distortion problems. Information entropy, UIQM and UCIQE values of underwater images processed by the method of this paper are almost the largest or close to the maximum compared with the other three algorithms. PSNR values are also within an acceptable range. Hence, the method of this article is relatively comprehensive and can effectively enrich images’ information, and increase contrast and clarity while ensuring that processed underwater images’ colors are close to real scenes.
3. Fish Detection and Recognition
With enhanced underwater images obtained from the above section, fish detection and recognition methods can be presented in the following.
3.1. Underwater Target Detection and Recognition Models
Due to the complex underwater environment, and various types and shapes of fishes, traditional underwater target detection and recognition methods have disadvantages of low accuracy and poor versatility. With the rapid rise of deep learning in various fields, it has also been widely used in underwater target detection and recognition. Compared with traditional methods, deep learning methods have higher accuracy and robustness. Among all kinds of deep-learning-based models, a highly representative series of YOLO (You Only Look Once) algorithms [
26,
27,
28,
29,
30] have shown advantages in target detection and classification performed together. The overall performance of YOLOv5m has been proved in many fields in both detection speed and accuracy, making it more suitable for application scenarios with high real-time performance and low computation cost.
In addition, since it is difficult to photograph underwater images, especially different fish pictures, only a small sample data set of underwater images is available. Due to the wide variety of fish, if a shallow convolutional neural network is only trained on a small sample data set, there will be a problem of incomplete learning target features, that is, under-fitting; while training a deep convolutional neural network on a small sample data set, an over-fitting problem will occur. In order to avoid these problems and improve the detection and recognition effect of underwater targets, this article will train an underwater target detection and recognition network model based on the pre-trained backbone network of YOLOv5m and integrate transfer learning, image enhancement, and data enhancement methods to improve the effectiveness, as shown in
Figure 3.
Firstly, select an appropriate deep convolutional neural network and conduct training on a large sample data set once to obtain a pre-trained network model. Secondly, use data enhancement technology to expand underwater target images to generate the underwater image data set, and then the underwater image enhancement algorithm proposed in this article will be implemented to pre-process the data set. Finally, the parameters of the feature extraction network in the pre-trained model are frozen, and then the pre-trained network is trained the second time on the enhanced underwater image data set, and the parameters of the feature detection network and classify network should be renewed, and the final underwater target detection and recognition network will be generated.
3.2. Model Evaluation
For the task of detecting and identifying fishes, the trained model must be able to accurately identify fishes and determine the location and range of fishes. Then, on this basis, it should ensure the recognition rate of fish and at the same time reduce the probability of misidentification or missed identification as much as possible. In order to prove the performance of the basic YOLOv5m model, a simple comparative experiment is carried out between the SSD (Single-Shot Detector) model, YOLOv4 model, YOLOv5s model, and YOLOv5m model on the self-built underwater image data set based on transfer learning and image enhancement. The training and testing of all models are under the same experimental environment.
According to results shown in
Table 2 the underwater target detection and recognition model trained through YOLOv5m based on transfer learning and image enhancement has the best overall performance. Therefore, this article will adopt YOLOv5m as the basic model to conduct subsequent research on underwater target detection and recognition.
3.3. Model Improvements
Although the original YOLOv5m model has achieved proper results in underwater target detection and recognition, there is still room for improvement. On the one hand, the attention mechanism can be introduced to further raise the ability to extract target features and reduce the negative-true and false-negative rates. On the other hand, the convolutional layer structure can be improved to make the network more lightweight, and reduce model complexity and computing costs.
3.3.1. Improvement of Feature Extraction Layer Based on Attention Mechanism
Since some fish can be pretty small and easily blend into the environment background, this article introduces an attention mechanism into the YOLOv5m model to improve the feature extraction layer’s learning ability for target features and improve target feature extraction ability to achieve better detection results.
Introducing the attention mechanism is to tell the model where to put the “attention” without paying attention to the image background or other parts. Commonly used attention mechanisms include SENet [
31], CBAM [
32], ECA [
33] and CA [
34] with features as follows.
- (1)
SENet puts “attention” on channel information and models the inherent relationships of each channel to make each channel more sensitive to target features.
- (2)
CBAM not only takes into account channel information but also takes spatial information into account; therefore, the introduction of the CBAM attention mechanism can obtain richer feature information.
- (3)
The ECA module does not involve dimensionality reduction operations, ensuring that the underlying features will not be lost, and the information interaction operation effectively reduces the calculation.
- (4)
CA module encodes channel information along the horizontal and vertical directions of the coordinate axis. The CA module can be added to any feature map in the network model to process it.
Usually, the attention mechanism will be added to the backbone part of the YOLOv5m network, because this part plays the role of feature extraction in the entire YOLOv5m network, including all feature information of the detected fishes. Adding the attention mechanism will correct the importance of this information and remove some useless information, thereby improving the network’s feature extraction capability and detection accuracy. As a result, there are two ways to add the attention module to the backbone part: one is to put the attention module at the end of the part; the other is to integrate the attention mechanism into each C3 module to obtain the C3-attention module.
To improve the original network model, an attention module is integrated at different positions and with different structures to explore its effectiveness in detection and recognition. A total of five cases are discussed in this paper, that is, YOLOv5m+SE(pos1), YOLOv5m+SE(pos2), YOLOv5m+ECA, YOLOv5m+CBAM, YOLOv5m+CA. The positions of the backbone to add ECA, CBAM, and CA are the same as SE(pos2). The training data set uses the image-enhanced underwater image data set. The settings of hyperparameters are shown in
Table 3. The improved models are trained for 300 rounds, respectively.
The comparison results are shown below in
Section 3.3.2. Although the size of the underwater target detection and recognition model trained by YOLOv5m with the SENet module added to each C3 module has been reduced, the detection speed has also been significantly reduced, and the detection accuracy has not been significantly improved, the mean average precision is only 0.8% higher than the original YOLOv5m model. The underwater target detection and recognition model developed by YOLOv5m by adding the SENet module at the end of the backbone is not different in size, computing cost, and detection speed compared with the original YOLOv5m model, but the precision rate, recall rate and mean average precision have increased by 4%, 4.5% and 5.4%, respectively. The detection accuracy of the model has been significantly improved, and the number of false detection and missed detection is effectively reduced. Then, based on the analysis of the above experimental results, adding the attention module at the end of the backbone is more suitable for underwater fish detection and recognition tasks.
After determining the location to add the attention module, the very attention module should be selected to achieve better performance. This article adds SENet, CBAM, ECA and CA at the end of the YOLOv5m model’s backbone, respectively, and then trains the network model.
Table 4, compares the training results and the performance of the models on the test set to analyze which attention mechanism is more suitable for underwater fish detection and recognition tasks. Although the YOLOv5m model with the SENet module added has a great improvement in detection accuracy compared with the original YOLOv5m model, its overall performance is worse than the YOLOv5m model with other modules added; the size and parameter quantity of the YOLOv5m model with the ECA module added are basically unchanged compared with the original YOLOv5m model, but its detection accuracy is not as good as the YOLOv5m models with the CBAM module and CA module added; compared with adding the CA module, although the precision of the YOLOv5m model with the CBAM module added is slightly smaller, its recall and mean average precision both are larger, and it is also more advantageous in terms of size, parameter quantity and detection speed. In summary, the overall performance of the underwater target detection and recognition model trained by the YOLOv5m model with the CBAM module added is better, because it effectively improves the precision and reduces the probability of false detection and missed detection of targets while ensuring that the model is not complicated and the detection speed remains basically unchanged compared with the original YOLOv5m model.
In order to show the effects of adding a CBAM module in identifying small targets or targets that are easily confused with the background, simulation experiments are conducted here with the original YOLOv5m model and the CBAM-YOLOv5m model to execute target detection and recognition on the same test set. As shown in
Figure 4 with the typical part of the results, after adding the CBAM attention mechanism, small targets that cannot detected by the original YOLOv5m could be detected and accurately identified by the CBAM-YOLOv5m model in different scenarios.
3.3.2. Lightweight Improvements of CBAM-YOLOv5m Model
To further reduce the model size, parameter quantity, and computation burden of CBAM-YOLO while ensuring precision, a lightweight improvement strategy is implemented in this section. Lightweight convolution modules are adopted to replace standard convolution modules to achieve a lightweight network. Commonly used lightweight convolution modules mainly include DSC (Depth-wise Separable Convolution) [
35], GhostConv [
36] and GSConv [
37], etc. Since convolution modules and C3 modules are almost distributed throughout the entire network model, and each module has a large number of parameters, a lightweight design may improve the whole network efficiently.
- (1)
Replacement location selections
The backbone part plays the role of feature extraction, and dense convolution operations will retain the connections between channels, while sparse convolution operations will ignore all these connections; therefore, although replacing the backbone part will reduce the model size and computing cost, loss of semantic features caused by changes in the number of feature map channels will lead to a reduction in precision. The feature map of the neck part is small in size and has a large number of channels, it is generally no longer compressed or expanded. Therefore, lightweight processing of convolution modules and C3 modules of the neck part can not only reduce the model size and computational cost but also ensure that precision is not reduced.
- (2)
Choice of lightweight modules
The DSC module may reduce both the detection and recognition accuracy and detection speed of the model, so it is not considered a lightweight module for replacement. The GhostConv module and the C3Ghost module can effectively reduce the number of parameters and size of the model, but they require more calculation costs. The GSConv module and VoVGSCSP module are not as effective as the GhostConv module and C3Ghost module in reducing the number of parameters and size of the model, but they occupy less computing resources. Therefore, this article weighs the advantages and disadvantages of each lightweight module, in the neck part, the C3Ghost module is selected to replace C3 modules, whose number of parameters is large, to effectively reduce the model size and parameter quantity, at the same time, the GSConv module is selected to replace convolution modules to ensure that the computing cost is not too high. The lightweight CBAM-YOLOv5m network structure is shown in
Figure 5.
Then experiments are conducted to prove the correctness of the analysis of replacement location selections and choice of lightweight modules.
- (1)
Changes in parameter quantity
Each module’s parameter quantity of the neck structure before and after lightweight improvement is shown in
Table 5. Obviously, after the lightweight improvement, the parameter quantity of each module in the neck structure is significantly reduced, effectively reducing the size of the entire model.
- (2)
Changes in performance
In this paper, under the same experimental conditions, the lightweight model was trained for 300 rounds on a self-built underwater image data set, and the training results were compared with the CBAM-YOLOv5m model obtained in the above section, as shown in
Figure 6. It can be seen that the lightweight improvement has almost no impact on the convergence speed and precision of the CABM-YOLOv5m model. Change trends in the total loss, recall and mean average precision are almost the same. This illustrates that the lightweight CABM-YOLOv5m model designed in this article has almost no impact on the model training speed and detection accuracy.
- (3)
Performance comparison of different lightweight methods
In addition, to further demonstrate the advantage of the proposed method, comparisons between the CBAM-YOLOv5m mode, the lightweight model designed, and models improved by selecting different replacement positions or other modules are carried out on the same test set. Experiment results are shown in
Table 6, serial number 1 is the proposed CBAM-YOLOv5m here, and serial number 7 is the lightweight improved model adopted in this article.
Comparing model No.7 with model No.1, it can be seen that the lightweight method proposed in this article reduces the size and computing cost of the CBAM-YOLOv5m model by 25% and 20%, respectively, and the detection speed is also improved. Not only does the precision not decline, but the recall rate and mean average precision increase by 1.1% and 0.7%, respectively. Comparing model No.7 with model No.2, although the size and computing cost of the model have been significantly reduced, the precision becomes very low, and the detection speed is greatly reduced. Comparing model No.7 with models No.3, No.4, No.5, and No.6, the improved model in this article has more advantages in model size, detection speed, and precision. Although the computing cost is slightly higher than that of model No.5 and model No.6, the difference is small.
- (4)
Fish detection and recognition simulation experiment
Using this article’s model and the original YOLOv5m model to conduct underwater fish detection and recognition simulation experiments on the same test set, part of the detection and recognition results are shown in
Figure 7. From the first and fifth sets of images in
Figure 7, the lighted-weighted model in this article has better detection and recognition capabilities of small fish, and the number of missed fish detections has been reduced. It can also be seen from the second set of images that the confidence of the fish detected by this improved model is higher than that of the original model. Additionally, the number of false detections of the improved model has been reduced according to the third and fourth sets of images.
Detection and recognition results of the two models for each type of fish in the test set are also statistically calculated, and statistical results are shown in
Figure 8. It can be seen from
Figure 8 that the improved model has better detection and recognition indicators of various fish with only the precision of Zebrasoma scopas slightly decreasing. Recalls and average precision of detection for Zebrasoma scopas, Naso annulatus, Zanclus cornutus, and Chaetodon kleinii in the original model are all not high mainly due to the existence of small targets or targets that are easily confused with the background or multiple objects in one image in the data sets. The improved model’s detection and recognition capabilities for these targets have been significantly improved.