1. Introduction
Object counting refers to the estimation of the number of objects in a region of interest to accurately obtain information on the number of objects in the area and provide guidance for subsequent related decisions [
1,
2]. It has been applied in the fields of crowd counting [
3], plant counting [
4], and vehicle counting [
5]. Shrimp fry counting is a basic operation for biomass estimation in aquaculture. The accurate counting of shrimp fry not only serves as a means of assessing the production and reproductive capacity of mature shrimp but also evaluates the survival rate of the shrimp fry in each tank and the control of breeding density and provides instructions for the management of transportation and sales [
6]. At present, most shrimp fry counting is performed manually, which is time-consuming and laborious, and the calculation accuracy is low. Meanwhile, it is easy to hurt them and affect the normal growth of the shrimp fry. Therefore, a shrimp fry counting method that can be automated and has high accuracy and efficiency is needed.
With the rapid development of artificial intelligence technology, the emerging field of smart aquaculture has emerged, which aims to improve the yield and efficiency of aquaculture through computer vision and deep learning [
7,
8]. Shrimp fry counting, as a research direction of smart aquaculture [
9], is widely favored by researchers and producers for its high efficiency, low cost, and easy operation. With the aid of a terminal device (e.g., a mobile phone) embedded with this method, fishermen do not need to know the specific details of the method; they only need to take an image of the shrimp fry to automatically obtain the number of shrimp fry. At the same time, our model can also provide more accurate counting results for factory farming.
The existing methods for shrimp fry counting can be divided into two main types: detection-based methods and regression-based methods. Detection-based shrimp fry counting has benefited from strong development in the field of object detection. Zhang [
10] used a lightweighted model (LIGHT-YOLOv4) to reduce the complexity of the model. In their experiment, the backbone of YOLOv4 was replaced with the backbone of MobileNetV3 [
11]. Although the accuracy was reduced by 2%, the size of the model was reduced to one-sixth of that of the original YOLOv4 model, which can be effectively applied to terminal devices. Feng [
12] attempted to solve the problems of overlapping, as well as sticking fish fry in water, and proposed a lightweight object detection counting method (YOLOv4-Tiny) based on deep learning and added an attention mechanism (CBAM), which could satisfy edge computing devices to perform automatic counting while obtaining high counting accuracy. Zhang [
13] proposed a dynamic fish fry counting method to compensate for the shortcomings of the current methods, which are all implemented in static scenarios. They regarded fish fry counting as a multi-object tracking problem based on tracking by detection, combined YOLOv5 with SORT, and improved the SORT algorithm based on multi-matching and trajectory recovery, for which the final tracking accuracy reached 82.6%. The recently proposed YOLOv7 [
14] and YOLOv8 [
15] have a high accuracy and running speed in object detection, which also provides a reliable solution for object counting. However, for small objects such as shrimp fry, due to the small pixels they occupy in the image, they will inevitably lead to missed detection, resulting in counting errors. While regression-based shrimp fry counting methods use a density map as a training label for counting, this method integrates the final predicted density map matrix to obtain the final number of objects, which can better predict the number of objects in the image. Hu [
16] proposed a counting model for shrimp larvae that draws on the method of density map estimation used in crowd counting and added a multi-scale module. The results showed that the accuracy of counting more than 1000 shrimp fawns reached 98.72%. Zhang [
17] used a generative adversarial network (CycleGAN) to synthesize the dataset, set in a way that avoids heavy manual labeling, and proposed a shrimp egg counting network (SECNet) for implementing the counting process, with a final accuracy of 99.2%. Li [
18] proposed a counting method (MSENet) for portable counting devices for fish fry counting. Based on this method, the counting datasets NCAUF and NCAUF-ex were constructed to verify the generalization performance of the network, and the final MAE of the model reached 3.33. Hou [
19] improved the multicolumn convolutional neural network (MCNN) for residual bait counting, and experiments showed that the improved MCNN was able to calculate the amount of residual bait efficiently. Liu [
20] proposed ShrimpSeed_Net for shrimp seed counting, which was based on the improved CSRNet and was successfully implemented in smartphones with an accuracy of 95.53%.
With the research deepening, many emerging structures can bring significant improvements in model performance. Multi-scale structures can integrate feature maps at different scales so that the network can learn global features and improve the ability to learn local information. In smart aquaculture, past studies have also incorporated multi-scale structures into their models to improve their performance. Zhang [
21] analyzed fish feeding behavior. He used MobileNetV3 as the backbone and improved the channel attention module based on multi-scale information fusion. They fused the multi-scale feature map with the original image through the operation of down-sampling, which effectively enhanced the attention to small targets and obtained high-feeding intensity classification accuracy. Yu [
22] designed a multi-scale attention mechanism to improve the accuracy of fish counting by designing convolutional layers with different convolutional kernel sizes and obtaining receptive fields at different scales in parallel. Wang [
23] used U-Net [
24] as the backbone to construct the Multi-scale with Dilated convolution and Offset Attention U-Net (MDOAU-Net), which used multi-scale feature fusion blocks to extract the features of the original input; their method effectively promoted the fusion of different feature maps. The experimental results demonstrated their superior performance compared to seven existing methods. In addition, the attention mechanism allowed the model to focus on the important parts of the image. Li [
25] designed a Synergistical Attention Module (SAM), which allowed channel affinity extraction while preserving spatial details, and embedded the module into a Synergistical Attention Perception Network (SAPNet) for the semantic segmentation of remote sensing images, so that the network enriched the inference clues through the required spatial and channel details. The experiment verified the efficiency of the SAM. In order to solve the problem of fish counting in high-density scenarios, Chen [
26] added an attention network to the model, which included a nonlinear batch-normalized residual block, a convolutional layer, and two parallel independent convolutional layers. Yu [
27] proposed a deep learning network model based on a multi-module and attention mechanism (MAN) to determine farmed fish counts. It included a feature extraction module, an attention module, and a density estimation module. The experiments showed that the method based on an MAN could promote the exploration of correlations in dense fish counting.
In this paper, a shrimp fry counting model based on a fully convolutional neural network (SFCNet) is proposed. This model adopts a regression-based method to achieve shrimp fry counting, which can accurately count shrimp fry in breeding tanks; our counting performance is the best compared with the four other traditional CNN counting networks. The main contributions of this paper are as follows:
The shrimp fry dataset was collected and labeled. It contained 556 images, of which 390 were used as the training set, 63 as the validation set, and 103 as the test set. The resolution size of the images was 768 × 576;
A shrimp fry counting network based on multi-scale attention fusion (SFCNet) is proposed, which uses VGG-16 as the frontend to accept images and uses a multi-scale structure and attention mechanism in the backend to improve the global modeling and local information extraction ability of the model. Finally, it outputs a density map with the same size as the original image;
Our SFCNet achieved an optimal performance (MAE: 3.96, RMSE: 4.68) compared with other baseline models.
The remainder of this paper is organized as follows:
Section 2 focuses on our main steps from image acquisition to model construction, and some details are used in the model training process;
Section 3 lists the main results of our experiments;
Section 4 discusses the potential limitations of our current work and future study;
Section 5 summarizes our work.
4. Discussion
Compared with the time-consuming and labor-intensive manual counting, the shrimp fry counting network constructed by a deep learning method provides a more effective method for evaluating shrimp fry growth status, adult shrimp yield estimation, and transportation management in aquaculture. Most of the previous studies on counting tasks are detection-based methods. The limitations of these are that the size of the counting objects is too small, or the shrimp fry are blocked from each other, which results in missing detection, or other objects are similar to the counting objects, which causes false detection. The regression-based counting method can effectively deal with this problem by modeling the images globally and integrating the final output density map to obtain the counting quantity with shrimp fry that are occluded by each other or shrimp fry that are too small. In order to objectively analyze our model, we also discuss the limitations of the current work and future studies.
4.1. Potential Limitations of Current Work
Although our SFCNet has a lower counting error than other traditional CNN models, the method proposed in this paper has the following three points that need to be improved: (1) our method is aimed at the stage of shrimp fry and cannot solve the problem of counting shrimp bodies in different environments and in different breeding periods; (2) compared with the traditional CNN model, the computational resources and inference time of the SFCNet are slightly increased, but the amount of increase is controllable and acceptable in practical applications. For example, the model computing resource of CSRNet on the shrimp fry counting is 12.7 MB, and the average inference time for the test set is 46 ms, while the model of the SFCNet is 44.1 MB and 90 ms; (3) since the number of shrimp fry in the dataset we constructed was mainly distributed in the hundreds, the SFCNet model could show good counting accuracy in the scenario with low density. However, we are aware that in the actual production environment, the density of shrimp fry can vary greatly, especially in high-density farming environments.
4.2. Future Study
In view of the limitations of the analysis in
Section 4.1, our future studies will focus on the following aspects: (1) considering the quantity monitoring of different species of shrimp fry in different water breeding environments, the model will be extended to different species and environments to enhance the robustness and applicability; (2) we will continue exploring methods to reduce the computing resources and the time of the model while improving the counting performance of the model, such as network structure and hyperparameter settings, to make the model more lightweight; (3) in order to evaluate the counting performance of the SFCNet in high-density shrimp fry scenarios comprehensively, we plan to add more high-density shrimp fry image data in future studies and optimize and adjust the model accordingly. By expanding the scope and diversity of the dataset, we can accurately simulate the complexity of the actual farming environment, allowing for a more comprehensive assessment of the model’s generalization ability and counting accuracy.