A Multi-Step Image Pre-Enhancement Strategy for a Fish Feeding Behavior Analysis Using Efficientnet

Feng, Guofu; Kan, Xiaojuan; Chen, Ming

doi:10.3390/app14125099

Open AccessArticle

A Multi-Step Image Pre-Enhancement Strategy for a Fish Feeding Behavior Analysis Using Efficientnet

by

Guofu Feng

^*,

Xiaojuan Kan

and

Ming Chen

Key Laboratory of Fisheries Information, Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Hucheng Ring Road 999, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5099; https://doi.org/10.3390/app14125099

Submission received: 13 April 2024 / Revised: 5 June 2024 / Accepted: 9 June 2024 / Published: 12 June 2024

(This article belongs to the Section Marine Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

To enhance the accuracy of lightweight CNN classification models in analyzing fish feeding behavior, this paper addresses the image quality issues caused by external environmental factors and lighting conditions, such as low contrast and uneven illumination, by proposing a Multi-step Image Pre-enhancement Strategy (MIPS). This strategy includes three critical steps: initially, images undergo a preliminary processing using the Multi-Scale Retinex with Color Restoration (MSRCR) algorithm, effectively reducing the impact of water surface reflections and enhancing the visual effect of the images; secondly, the Multi-Metric-Driven Contrast Limited Adaptive Histogram Equalization (mdc) technique is applied to further improve image contrast, especially in areas of low contrast, by adjusting the local contrast levels to enhance the clarity of the image details; finally, Unsharp Masking (UM) technology is employed to sharpen the images, emphasizing their edges to increase the clarity of the image details, thereby significantly improving the overall image quality. Experimental results on a custom dataset have confirmed that this pre-enhancement strategy significantly boosts the accuracy of various CNN-based classification models, particularly for lightweight CNN models, and drastically reduces the time required for model training compared to the use of advanced ResNet models. This research provides an effective technical route for improving the accuracy and efficiency of an image-based analysis of fish feeding behavior in complex environments.

Keywords:

multi-step image pre-enhancement strategy; fish feeding behavior analysis; multi-scale retinex with color restoration; unsharp masking

1. Introduction

In the aquaculture industry, ensuring the precise feeding of bait is crucial [1,2,3,4] as it directly impacts the healthy growth of fish stocks. Traditional feeding methods mainly include manual feeding and timed, quantitative mechanical feeding. Manual feeding relies on the experience of aquaculture personnel and observations of fish stocks to determine the quantity of feed needed. However, this method is dependent on individual subjective judgment, requires a high level of experience, and is labor-intensive, making it difficult to promote for use in large-scale, industrialized farming. Another method involves the use of timed feeders to dispense feed, which overlooks the variations in the feeding patterns of fish stocks, potentially leading to inappropriate feeding quantities and timings that contradict the growth patterns of the fish. This may result in the bait not being fully utilized, leading to water pollution [5]. Poor water quality treatment can have a detrimental effect on the growth of fish and may even lead to mass mortality. Therefore, improving the precision and adaptability of feeding becomes an important challenge in the management of aquaculture.

In recent years, the development of automated feeding systems for fish stocks based on intelligent technologies and sensors [6] has emerged as a new focus. These systems are capable of dynamically adjusting feeding quantities and timings [7] based on the needs of the fish stocks, which aids in enhancing aquaculture efficiency, reducing resource wastage, and ensuring the healthy growth of fish stocks. It is anticipated that these systems will gradually replace experience-based manual feeding decisions in the future.

The spatial characteristics [8,9] of fish stocks undergo significant changes during the feeding process, making fish feeding behavior an important indicator of fish appetite. Among various behavior analysis methods, computer vision [10] is an efficient, non-contact detection [11,12] technology which has been widely applied in the analysis of fish behavior.

Traditional methods primarily quantify the intensity of fish feeding activities by extracting color and texture features from feeding images. Numerous studies utilize image segmentation [13] or object detection techniques to extract fish coordinates and calculate their swimming speeds [14] and aggregation levels for the analysis of fish feeding behavior per unit time. These methods can rapidly capture the behavioral characteristics of fish. However, considering the complexity of lighting conditions in industrialized farming and the randomness of fish behavior changes [15], conventional methods based on manually designed features struggle to accurately extract the characteristics of fish feeding behavior.

Since the victory of AlexNet [16] in the ImageNet competition in 2012, convolutional neural networks (CNNs) have garnered widespread attention. Unlike traditional feature extraction methods, CNNs do not require features to be manually designed. In complex tasks, CNNs demonstrate higher robustness and accuracy compared to conventional methods. In recent years, an increasing number of deep learning models have been applied in aquaculture. Zhou et al. [17] proposed a method for analyzing fish feeding activities using LeNet-5, which combines a CNN and computer vision, categorizing fish feeding intensity into four states with a classification accuracy of 90%, effectively assessing fish feeding behavior. Hu et al. [18] employed deep learning technologies, including an improved R(2 + 1)D convolutional neural network (CNN), to identify the size of water ripples produced during fish feeding to decide whether to continue feeding. They developed a computer vision-based intelligent fish feeding system that not only recognizes the dynamics of water ripples but also integrates data from water quality sensors, achieving an accuracy of 93.2%. Zheng [19] and colleagues utilized a Spatio-Temporal Attention Network (STAN) to analyze the feeding behavior of Pompano fish stocks. By innovatively combining spatial images [20] with optical flow images [21,22,23] for video analysis, the study applied the STAN model to extract intuitive and perceptual features to determine the feeding or non-feeding states of Pompano fish stocks. This method incorporated a hierarchical convolutional network (HCN) to extract multi-scale spatial features. Experimental validation showed that the STAN model achieved a test accuracy of 97.97%. ResNet (Residual Network [24]), another widely used deep learning model, has also shown its effectiveness in fish feeding behavior classification tasks. By utilizing residual connections to prevent the vanishing gradient problem, ResNet models can achieve commendable accuracy. However, they typically possess a large number of parameters, necessitating more computational resources and training time. This makes them less suitable for resource-constrained small devices, such as smartphones. However, the image quality of fish feeding behavior is affected by factors such as the fish pond environment and lighting conditions, leading to issues like low contrast and uneven illumination, which result in less prominent foreground features in the images. Additionally, due to the relatively small area of individual fish in images, CNN models can easily be disrupted by background features during the feature extraction process. This makes it challenging for traditional CNN models to focus on fish behavior [25,26,27,28,29], reducing the accuracy of model classification.

To tackle this issue, our research introduces a multi-step image pre-enhancement strategy that sequentially employs the MSRCR (Multi-Scale Retinex with Color Restoration), Multi-Metric-Driven Contrast Limited Adaptive Histogram Equalization (mdc), and Unsharp Masking (UM) techniques for image pre-enhancement. This approach is designed to eliminate problems related to water reflections, low contrast, and unclear image detail features. Following this, the lightweight [30,31,32] EfficientNet model is utilized for classifying fish feeding behaviors. The test results demonstrate that the proposed MIPS (Multi-Step Image Preprocessing Strategy) module significantly enhances the accuracy of various CNN models in classification tasks. Furthermore, compared to advanced ResNet [33] models, it markedly reduces the training time of the model.

The key contributions can be summarized as the following three points:

(1) This study proposes a multi-step image pre-enhancement strategy by integrating existing techniques such as color space conversion, Multi-Scale Retinex with Color Restoration (MSRCR), Contrast Limited Adaptive Histogram Equalization (CLAHE), and Unsharp Masking (USM). This integration significantly improves the quality of underwater fish feeding behavior images.

(2) The main improvements include two parts: First, an innovative multi-parameter-driven CLAHE method is devised, which involves the detailed processing of only the luminance channel in the LAB color space. By carefully optimizing the clipLimit and tileGridSize parameters, this method balances local detail enhancement and overall visual coherence. Second, an adaptive image enhancement adjustment mechanism is proposed, based on comprehensive evaluation metrics, including Learned Perceptual Image Patch Similarity (LPIPS), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM). This mechanism dynamically adjusts the degree of image enhancement, ensuring that the results meet both subjective visual perception and objective quality standards.

(3) The present study presents a comprehensive system architecture for classifying the feeding intensity of fish. By applying a multi-step image pre-enhancement strategy to optimized models from the EfficientNet and ResNet series, a significant improvement in the accuracy of classifying fish feeding behavior intensity was achieved, validating the effectiveness and practical value of this strategy.

2. Methodology

2.1. Overview

The methodological framework proposed in this paper, as illustrated in Figure 1, primarily consists of three modules: data collection and processing, image pre-enhancement, and classification based on lightweight CNN models.

The data collection and processing module involves first acquiring videos of fish feeding behavior in a production environment, followed by frame selection according to specific rules to form an image dataset. Within this module, data augmentation techniques such as random cropping and color enhancement are employed to increase the diversity of image samples and the model’s adaptability to geometric transformations and illumination changes. This, in turn, enhances the model’s accuracy and generalization ability in recognizing fish feeding behavior in complex natural environments. The image pre-enhancement module includes three main steps: (1) MSRCR, which is primarily used to address issues of water surface reflections in images [34,35]; (2) the Multi-Metric-Driven Contrast Limited Adaptive Histogram Equalization (mdc) technique, which aims to improve the contrast between fish, bait, and the background in the images; and (3) Unsharp Masking (USM) sharpening [36], which is employed to enhance the details within the images.

2.2. Data Collection

The experiment was conducted within a real aquaculture system at the Hai’an aquaculture base, which houses approximately 500 Koi fish, each weighing between 1200 and 1500 g. Throughout this study, the aquaculture pond environment was consistently maintained with dissolved oxygen levels at 7.1 ± 1.0 mg/L and a water temperature within the range of 12–15 °C. Prior to the commencement of the experiment, the fish were trained to feed at a fixed location to facilitate subsequent data collection. Feeding was carried out twice daily, at 9:30 a.m. and 5:30 p.m., with each feeding amount set to 1.5–2.0% of the total weight of the fish stock.

The image collection system ensures optimal shooting effects through the rational layout of light sources and cameras, as well as the use of wide-angle lenses, as illustrated in Figure 2. Firstly, the light source is located directly above the breeding area, approximately 1.2 m high, which can evenly illuminate the entire water surface and fish bodies, thereby ensuring the clarity and detail presentation of the images. Secondly, the camera is vertically placed above the breeding pool at a height of approximately 1.2 m, approximately 1 m above the surface of the aquaculture pool. This height and angle can cover the entire breeding pool, ensuring that all fish activities are captured, avoiding dead corners and blind spots, which is conducive to a comprehensive observation and analysis of fish behavior and activities.

The video data collected were encoded in MP4 format, featuring a frame rate of 30 frames per second and a resolution of 1280 × 720 pixels. Image processing was performed using Python, with the PyTorch library utilized for constructing neural networks. The captured videos were segmented into 452 sub-videos, each containing 300 frames. Video frames were extracted from the collected video data at intervals of 100 frames to construct a dataset of fish stock images. Subsequently, the collected dataset of fish stock images was categorized and labeled into four classes based on the fish’s feeding behaviors and the amount of feed dispensed.

As shown in Table 1, the behaviors of the fish vary significantly under different feeding states. At the initial stage of feeding, the fish exhibit a strong desire to feed, competing for bait while causing disturbances such as splashes and noise. Approximately 1–2 min later, as the bait begins to diminish, a portion of the fish ceases feeding, and the overall feeding intensity of the stock [37,38] decreases to a moderate level. After another minute, with a further reduction in bait, only a few fish continue to show feeding interest, and the school begins to disperse further. Around 4 min into the process, almost all fish lose their desire to feed and start to disperse, swimming slowly along the bottom. Images representing different categories are illustrated in Figure 3.

To enhance the diversity of the data samples and thereby improve model performance, several preprocessing operations are performed on the images of the fish stock. Initially, the images are subjected to random aspect ratio cropping, where they are randomly cropped into various sizes and aspect ratios. Under this premise, these images are then randomly rotated by changing their rotation angles and centers. Subsequently, the rotated images are trimmed to a uniform size. Finally, the images of the fish stock undergo random alterations in brightness, contrast, saturation, and hue.

2.3. Multi-Step Image Pre-Enhancement Strategy

As illustrated in Figure 4, a systematic demonstration is provided of the enhancements achieved in underwater photography image quality through the application of a multi-step image pre-enhancement strategy. The original image displays a school of fish captured under natural lighting conditions in an underwater environment, affected by surface reflection and an uneven distribution of light. The image processed by the MSRCR algorithm has a reduced impact of surface reflection and, through dynamic range compression and color restoration techniques, an enhanced visual effect and color fidelity. The subsequent mdc processing step optimized the local contrast, particularly enhancing the discernibility of details in regions of low contrast, rendering them more conspicuous. Finally, the application of Unsharp Masking technology improved the local contrast and edge clarity of the image, especially in delineating the boundary between the fish and their background, thereby enhancing the visual resolution. The entire pre-enhancement process not only significantly improved the image quality but also provided clear, enhanced visual information for deeper image analysis and fish behavior recognition without sacrificing color authenticity.

2.3.1. Multi-Scale Retinex with Color Restoration (MSRCR)

In response to the challenges posed by lighting conditions on the fidelity of images capturing piscine feeding behaviors, particularly the pronounced problem of specular reflections on water surfaces, this study advocates the employment of the Multi-Scale Retinex with Color Restoration (MSRCR) algorithm as a remedial measure. The MSRCR algorithm facilitates the amelioration of water surface reflection issues by executing dynamic range compression, augmenting edge definition, and reinstating authentic coloration. Through these mechanisms, MSRCR significantly diminishes the discrepancies induced by uneven luminance, thereby ensuring the preservation of image consistency and verisimilitude across a spectrum of lighting conditions.

The MSRCR algorithm processes images by combining filters of multiple scales (or sizes) to better address variations in illumination across different scales. The algorithm includes the following steps:

Multi-Scale Decomposition: The original image undergoes decomposition into multiple scales, typically facilitated by Gaussian filters. Each scale is associated with a Gaussian filter of distinct size, designed to capture illumination details pertinent to that specific scale.
Single-Scale Retinex (SSR) Processing: The Retinex algorithm is individually applied to each scale. This phase involves the computation of the logarithmic domain discrepancy between the original image and its filtered counterpart, effectively highlighting the luminance contrast while mitigating illumination inconsistencies.
Combination of Processed Results: The SSR results from all scales are combined through a simple arithmetic mean to form the final MSRCR output.
Color Restoration: Given that MSRCR processing may inadvertently alter the color balance of an image, a color restoration function is implemented to preserve the authenticity of the original hues. This ensures that the resultant image maintains a natural appearance, notwithstanding the algorithmic modifications to its luminance and contrast.

The formula for MSRCR can be expressed as follows:

R (x, y) = \sum_{i = 1}^{N} w_{i} \cdot \log (\frac{I (x, y)}{F_{i} (x, y)})

(1)

In Equation (1),

R (x, y)

represents the luminance at pixel point

(x, y)

in the Retinex output image,

I (x, y)

denotes the luminance of the corresponding pixel in the original image,

F_{i} (x, y)

refers to the filtered image at the

i

-th scale,

w_{i}

is the weight corresponding to that scale, and

N

signifies the total number of scales.

The MSRCR algorithm is particularly suited for images captured under uneven lighting conditions, such as scenarios where shadowed areas are overly dark or illuminated regions are excessively bright. By adjusting the local contrast of the image, MSRCR is capable of rendering the details in these areas more visible and discernible. Furthermore, this algorithm excels in maintaining or enhancing the natural colors of the image, resulting in colors that appear more vivid and authentic.

2.3.2. Multi-Metric Driven CLAHE and Unsharp Masking

In aquatic environments, illumination is often uneven, leading to areas within images that are either too dark or too bright, with indistinct details. Contrast Limited Adaptive Histogram Equalization (CLAHE) [39,40,41] enhances the clarity of these regions by improving the contrast within local areas. Unlike conventional histogram equalization, CLAHE limits the extent of contrast enhancement to prevent issues of excessive enhancement, resulting in more natural contrast improvements and clearer details, particularly in areas of low contrast. Underwater environments typically exhibit low contrast, especially in turbid waters or situations marred by suspended particulates causing poor visibility. CLAHE effectively enhances local contrast, rendering the boundaries of fish, food particles, and the surrounding environment more discernible.

The application of CLAHE in this study fundamentally achieves the fine-tuning of contrast through a series of sequential operational steps, reflecting its practical operability as an image enhancement tool. Its key steps can be summarized as follows:

Histogram Clipping: If the value of a histogram bin exceeds a predefined contrast limit, the excess is uniformly distributed across other bins. This step prevents any single pixel value from becoming overly prominent, thereby avoiding unnatural image effects.
Histogram Equalization: The clipped histogram is equalized using the Cumulative Distribution Function (CDF). The equalization formula is typically as follows:

S_{k} = \sum_{i = 0}^{k} P_{i}

(2)

In the context of Equation (2),

S_{k}

represents the pixel value after equalization, and

P_{i}

denotes the probability the pixel level

i

in the original histogram. This step redistributes pixel values, making the brightness distribution of the image more uniform and enhancing the overall contrast of the image. It prevents any particular pixel value from becoming too prominent, which could result in unnatural image effects. Specifically, this formula calculates the value of each pixel in the new image to ensure that the overall contrast enhancement is more natural and uniform.

As illustrated in Figure 5, the specific algorithmic design process highlights two primary adjustable parameters within CLAHE: clipLimit and tileGridSize.

clipLimit: This parameter controls the threshold for histogram clipping. It determines the extent of contrast enhancement within each small block. If clipLimit is set lower, the contrast enhancement will be milder, which will help reduce noise but may not sufficiently enhance the image contrast. If clipLimit is set higher, the contrast enhancement will be stronger, which could introduce more noise or create unnatural image effects.

tileGridSize: This parameter determines how many small blocks (tiles) the image is divided into for individual histogram equalization. TileGridSize is typically represented by two values (width and height). Smaller tile sizes can provide more detailed local enhancement effects but may result in visible artifacts at the tile borders. Larger tile sizes might lead to less noticeable local enhancement but can offer a more uniform overall effect.

To ensure that image enhancement enhances detail while maintaining color authenticity and overall visual harmony, this study improves upon the traditional CLAHE algorithm by proposing the Multi-Metric-Driven Contrast Limited Adaptive Histogram Equalization (mdc) algorithm. This includes an integrated image quality assessment mechanism to optimize the enhancement process, ensuring that clarity is improved without compromising color fidelity. As demonstrated in Figure 5, the enhancement workflow involves the conversion to the LAB color space, independent processing of the luminance channel, and quality control based on PSNR and SSIM indices, thus achieving a balance between visual effect and technical standards.

The main process is as follows:

Conversion from BGR to LAB Color Space: The LAB color space separates luminance and color information, allowing for operations to be carried out on luminance alone, thus avoiding impacts on color to maintain color fidelity.
Independent Processing of the L Channel: This involves separating the L channel from the LAB image, performing subsequent processing only on the L channel, and applying the CLAHE algorithm to the L channel. This approach focuses on enhancing the luminance details of the image, which is crucial for improving image quality.
Initialization of clipLimit and tileGridSize: Setting these parameters serves as the basis for beginning CLAHE processing, which can be adjusted based on outcomes to achieve desired effects.
CLAHE Application: Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to the L channel using the initial parameters set in the previous step. This stage is the core of the entire process; by limiting contrast enhancement, noise can be reduced, and over-enhancement can be prevented.
Image Quality Assessment: The quality of the enhanced image is evaluated using PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index) as metrics for image quality. PSNR measures the similarity between the reconstructed image and the original image, while SSIM assesses the structural information, luminance, and contrast of the image. Through the evaluation with PSNR and SSIM, the quality of the image can be quantified to ensure that the visual improvements of the enhanced image are effective. Setting these conditions as quality thresholds ensures that further processing is only undertaken when the image reaches a certain quality standard, thereby avoiding unnecessary computations and potential quality degradation.

By setting the initial clipLimit and tileGridSize parameters, the algorithm controls the intensity of local contrast enhancement while enhancing the image contrast, preventing over-processing and noise generation. The independent processing of the luminance channel maintains color fidelity, avoiding color distortion. The quality of the enhanced image is evaluated using metrics such as PSNR and SSIM, ensuring that improvements in image quality are made without sacrificing detail. Overall, this processing workflow aims to enhance the visibility and detail of images, especially under uneven lighting conditions, while maintaining the naturalness of color and reducing noise. This approach provides more accurate and authentic visual information for various applications.

2.3.3. Image Sharpening

Image sharpening is applied to meticulously accentuate the edges and textural details of underwater images, particularly in deep water or low-visibility conditions. This technique critically enhances the discernibility of the contrast between the fish contours and the background. The application of image sharpening techniques can enhance the contrast in local regions, making the boundaries between fish and their background more distinct. Image sharpening can highlight these details, making them easier to analyze and understand. Such processing aids in improving the accuracy of model classification.

To address the issue of low-contrast images lacking clarity and detail, especially in dark or overexposed areas, the Unsharp Masking (USM) sharpening algorithm is introduced on top of the CLAHE algorithm based on the LAB color space. Unsharp Masking (USM) is a sharpening technique, the core idea of which involves subtracting a blurred (low-pass filtered) version of the original image and then adding the result back to the original to enhance the edges and details. The sharpening process can be represented as

g (x, y) = f (x, y) + k \cdot (f (x, y)) - f_{b l u r} (x, y)

(3)

wherein

g (x, y)

corresponds to the sharpened image,

f (x, y)

represents the original image,

f_{b l u r} (x, y)

is the image after blurring, and

k

signifies the enhancement coefficient. The specific algorithm design process is illustrated in Figure 5.

The specific operations are as follows:

Unsharp Masking: This is a common image sharpening technique used to enhance the visual clarity of an image. In this step, if the quality of the image, after CLAHE processing (based on PSNR and SSIM indices), is considered to be sufficiently good, Unsharp Masking is employed to further enhance the image’s details and edges. This sharpening is achieved by amplifying the high-frequency details within the image, which can make the image appear crisper; however, overuse may lead to noise or an over-sharpened effect in the image.
LPIPS < 0.4: Learned Perceptual Image Patch Similarity (LPIPS) is an advanced metric used to evaluate the perceptual similarity of images. This metric takes into account the characteristics of the human visual system to assess the perceptual similarity between the enhanced and original images. An LPIPS value of less than 0.4 indicates that the enhanced image is visually close enough to the original image, and the perceptual quality is deemed acceptable. This step ensures that the image enhancement does not deviate excessively, thus losing the perceptual characteristics and texture of the original image.
LAB to BGR conversion: After completing image sharpening and quality assessment, the image is converted from the LAB color space back to the BGR color space. This step is intended to translate the results of image processing into a more universally applicable color format for display and use across various devices and applications.

Through this series of processing steps, the adaptive enhancement of the image brightness and contrast is achieved while maintaining the authenticity of colors and minimizing the impact of noise. While enhancing the details and sharpness, the LPIPS ensures the perceptual quality of the post-processed image, preventing over-processing.

2.4. EfficientNet

The EfficientNet model [42], developed by the Google AI team, represents an innovative convolutional neural network architecture that has led a new paradigm in scalable network design. The innovation of this model resides in its application of a systematic method of network scaling known as compound scaling. This method diverges from traditional network scaling practices that typically only enhance the depth or width of the network. EfficientNet employs a fixed scaling coefficient to extend the network’s depth, width, and the resolution of the input image in a balanced manner.

In this research, the base network model EfficientNet-B0 from the EfficientNets series was selected for analysis. EfficientNet-B0 is an efficient convolutional neural network (CNN) designed by utilizing the compound scaling method. This compound scaling method of EfficientNet enables the generation of a range of models, from EfficientNet-B0 to B7, each providing a set of capabilities that achieve a harmonious balance between computational efficiency and predictive accuracy.

EfficientNet-B0 has been optimized under predefined resource constraints through an automated neural architecture search (NAS) process, aimed at finding the optimal balance between efficiency and accuracy.

The core structure of EfficientNet-B0 is the Mobile Inverted Bottleneck Convolution (MBConv) module, which incorporates the attention mechanism from the Squeeze-and-Excitation Network (SENet [43]). Upon its introduction, SENet achieved the highest accuracy on the ImageNet dataset, highlighting its effectiveness. The MBConv [44] module, also a product of Neural Architecture Search (NAS), bears similarity to the Depthwise Separable Convolution in structure.

These innovations underscore the effectiveness of the EfficientNet architecture in creating models that not only push the boundaries of accuracy in image classification tasks but also do so with considerations for efficiency that make them practical for a wide range of computing environments.

Within the MBConv module, as shown in Figure 6, a

1

×

1

pointwise convolution is first executed to adjust the output channel dimensions according to the expansion ratio. This is followed by a

k

×

k

depthwise convolution. If the squeeze-and-excitation operation (SE module) is required, it is implemented subsequent to the depthwise convolution. The final part of the module is another

1

×

1

pointwise convolution that restores the original channel dimensions. Additionally, the MBConv module integrates dropout connectivity and input skip connections, which have effectively shortened training times and enhanced the overall performance of the model.

The architecture of EfficientNet-B0 comprises 16 Mobile Inverted Bottleneck Convolution (MBConv) modules, 2 convolutional layers, 1 global average pooling layer, and 1 classification layer. In its structural diagram, Figure 7 displays these components, where different colors represent different stages within the network to facilitate the distinction and understanding of the functions and organization of each part. Specifically, black layers represent the initial convolutional layer and the first MBConv layer, responsible for the initial feature extraction and down-sampling of the input image. Blue layers denote the MBConv layers with a kernel size of 3 × 3, focusing on efficient feature extraction and processing through depth-wise separable convolutions. Red layers indicate the MBConv layers with a kernel size of 5 × 5, providing a larger receptive field to capture more complex patterns and features in the image. The use of different colors helps to visually distinguish between the various types of layers and their specific roles within the network architecture. Across multiple standard image recognition benchmarks, EfficientNet has demonstrated significant performance, particularly achieving state-of-the-art accuracy on the ImageNet dataset while maintaining a lower computational complexity and a lower number of parameters. This efficiency makes EfficientNet particularly advantageous for application in resource-constrained small devices, enabling high-performance operation within limited computational resources.

3. Experiments and Discussion

3.1. Training Details

To evaluate the classification results of MIPS-EfficientNet, extensive experiments were conducted on a fish feeding behavior dataset. The MIPS-EfficientNet algorithm was compared with other advanced algorithms, such as ResNet-18, ResNet-50, and ResNeXt50. In this study, the aforementioned models were trained separately, and the accuracy (acc) and prediction time of the models post-training were compared.

To evaluate the classification results of MIPS-EfficientNet, extensive experiments were conducted on the fish feeding behavior dataset. The MIPS-EfficientNet algorithm was compared with other advanced algorithms such as ResNet-18, ResNet-50, and ResNeXt50. The dataset was divided into training, validation, and testing sets in a 7:2:1 ratio. Each model was trained on the training set and adjusted using the validation set. Finally, the accuracy (acc) and prediction time were compared on the testing set.

For the training of the target detection networks, the input image size was set to 224 × 224 × 3 (RGB). This paper adopted the basic and widely used Stochastic Gradient Descent (SGD) optimizer, which has shown commendable performance across many deep learning tasks, despite potentially requiring meticulous hyperparameter tuning. The hyperparameter settings were as follows: maximum iterations, max_epoch = 100; initial learning rate, lr0 = 0.004; decay factor, decay_factor = 0.1; momentum = 0.65; weight decay rate, weight_decay = 0.001; and log interval, log_interval = 10. The learning rate adjustment milestones were set at milestones = [25,35], which are related to the learning rate adjustment strategy. When the training reaches the specified epoch (in this case, the 25th and 35th epochs), the learning rate is adjusted according to pre-established rules. To accelerate the training of models, pre-trained weights of models such as ResNet-18, ResNet-50, ResNeXt50, and EfficientNet are used, which are trained on the ImageNet dataset. The pre-trained weights of these models can be obtained from “https://github.com/pytorch/vision/tree/main/torchvision/models (accessed on 15 August 2023)”.

It is well known that different hyperparameter settings can significantly impact the performance of deep learning models. After a series of experiments in this study, the aforementioned hyperparameters yielded models that were satisfactory for the classification of fish feeding behavior.

The experiments were conducted on a Windows system by utilizing the PyTorch framework, Python version 3.8, and CUDA version 11.7. An NVIDIA 2080 Ti GPU was used in the experiments.

3.2. Evaluation Metrics

In classification tasks, commonly used evaluation metrics include accuracy, precision, recall, and F1 score. Among these, accuracy is a key indicator of the overall performance of an algorithm, reflecting the proportion of samples correctly classified to the total number of samples. When dealing with datasets with imbalanced classes, precision and recall become more critical tools for performance assessment. Precision refers to the proportion of samples correctly predicted as positive out of the total samples predicted as positive, while recall is the proportion of samples correctly predicted as positive out of the total actual positive samples. The F1 score is the harmonic mean of precision and recall, providing a comprehensive evaluation of these two metrics. These indicators collectively assist us in thoroughly understanding and evaluating the performance of classification models. The definitions of these metrics are as follows:

A c c u r a c y = \frac{T P + T N}{T P + F N + F P + T N}

(4)

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

R e c a l l = \frac{T P}{T P + F N}

(6)

F_{1} = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

where TP represents the number of true positive samples, TN denotes the number of true negative samples, FP stands for the number of false positive samples, and FN refers to the number of false negative samples.

3.3. Performance of Different Training Strategies

When analyzing and comparing the impacts of different optimizers (AdamW and SGD) combined with different loss functions (Focal Loss and Cross Entropy Loss) on neural network performance, consideration is given from two main perspectives: firstly, the combination of an optimizer and loss function on the model’s fitting ability on the training set, and secondly, the effect of these combinations on the model’s generalization ability on the validation set. The detailed experimental outcomes are presented in Table 2.

Initially, when considering the use of the AdamW optimizer, its combination with Focal Loss shows only moderate performance on the training set (79.85% training accuracy) and limited generalization capability on the validation set (79.10% validation accuracy). This may indicate that the model, under this combination, struggles to effectively learn key features of the data, or that Focal Loss is not suited for this dataset. Particularly in datasets where categories are relatively balanced, using Focal Loss might not be the best choice. However, when AdamW is combined with Cross Entropy Loss, there is a significant improvement in training accuracy to 99.21%, with validation accuracy reaching 91.04%. This suggests that Cross Entropy Loss is more appropriate for model learning in this scenario. Nonetheless, higher training accuracy with a comparatively lower validation accuracy could imply an overfitting issue, where the model may have learned specific features of the training data too well, lacking generalization capability on unseen data.

On the other hand, when using the SGD optimizer, its combination with Focal Loss performs exceptionally well on the training set (99.74% training accuracy) and also maintains high accuracy on the validation set (90.30%). This indicates that the combination of SGD with FL can effectively handle issues of class imbalance while maintaining good generalization ability. Even more notably, the combination of SGD with Cross Entropy Loss performs the best, achieving 100% training accuracy and validation accuracy of up to 97%. This combination not only achieves perfect fit on the training set but also exhibits excellent generalization ability on the validation set, suggesting that for this specific neural network and dataset, SGD combined with Cross Entropy Loss is the most effective combination.

These results underscore the importance of choosing the right optimizer and loss function suitable for the specific dataset and problem. While AdamW is a newer optimizer that may offer advantages in handling sparse data, in some cases, the traditional SGD optimizer, when combined with the appropriate loss function, may yield better results. This indicates that even advanced optimizers and loss functions need to be adjusted and optimized for specific applications. Moreover, it also shows that the choice of optimizer and loss function can significantly affect the model’s learning capability and generalization performance across different training and validation settings.

3.4. A Comparison with Other CNNs on the Testing Set

To evaluate the performance of the MIPS-EfficientNet algorithm, we conducted a series of experiments on the testing set, comparing it with other well-known convolutional neural networks (CNNs) including EfficientNet, ResNet-18, ResNet-50, and ResNeXt50. The comparison focused on several key metrics: test accuracy (Test-Acc), recall, precision, and F1 score.

As shown in Table 3, Multi-Step Image Pre-processing Strategy (MIPS) optimization demonstrably enhanced the performance metrics of both EfficientNet and the ResNet series of models. Specifically, the MIPS-enhanced EfficientNet model showed significant improvements in precision, recall, and F1 score, with the F1 score markedly increasing from 80.68% to 93.78%. The F1 score, being the harmonic mean of precision and recall, more comprehensively reflects the model’s performance in handling positive classes, such as target categories of interest. This substantial increase suggests that the original design of EfficientNet, while prioritizing efficiency, may have compromised accuracy to some extent. Although EfficientNet boasts rapid response and shorter training periods, its performance was initially inferior to that of the ResNet models before MIPS optimization.

From Table 3, it can be seen that there is no significant difference in the training time between MIPS-EfficientNet and EfficientNet, with both being 1.595 h. This is because MIPS is an image pre-enhancement strategy, where the preprocessing steps are completed before model training, thus having no impact on the training time. However, compared to other models, MIPS-EfficientNet has a significant advantage in training time. By comparing the training times of different models, it is evident that the training time of MIPS-EfficientNet is the same as that of the original EfficientNet, indicating that the MIPS optimization strategy does not increase the training time while significantly enhancing the model’s performance. Specifically, MIPS-EfficientNet shows a marked improvement in test accuracy, recall, precision, and F1 score compared to other models, while maintaining a relatively low training time.

After the enhancement with MIPS technology, the EfficientNet model not only improved significantly in performance, but in certain cases, its F1 score even surpassed that of some of the ResNet models. This not only validates the effectiveness of the MIPS optimization technique but also highlights the potential of the EfficientNet model when augmented with appropriate pre-processing strategies. Additionally, the ResNet models also exhibited performance improvements following MIPS pre-processing, further verifying the effectiveness of the MIPS pre-enhancement strategy and underscoring the general applicability and practicality of MIPS technology across multiple deep learning frameworks. These findings provide valuable insights for model selection in practical applications, especially when there is a need to balance limited computational resources with high performance, positioning MIPS pre-processing as a key strategy for balancing these demands.

3.5. Accuracy across Different Categories

To delve deeper into the limitations of the original EfficientNet model in the classification task of fish feeding behavior images, this study compared the confusion matrices of the unoptimized EfficientNet model and the MIPS-enhanced MIPS-EfficientNet model. Figure 8a displays the classification efficacy of the original EfficientNet model, which accurately recognizes most categories. However, it exhibits a higher rate of misclassification, particularly in the category corresponding to subtle feeding behavior, likely due to the model’s over-sensitivity to background features in the images.

Moreover, as revealed in Figure 8b, the MIPS-EfficientNet model, after the introduction of MIPS, shows improvements primarily in the recognition of the fourth category. The performance for the other categories remains stable or shows only minor changes. This indicates that the MIPS enhancement strategy effectively optimizes the model’s accuracy in classifying fish feeding behavior, especially for subtle behavioral distinctions. The MIPS approach, by optimizing the model’s feature extraction and representation capabilities, diminishes the interference of background noise on classification decisions. Consequently, it enhances the model’s ability to recognize fish behavior against complex backgrounds while improving classification accuracy. These findings affirm the importance and efficacy of the MIPS module in enhancing the performance of image classification tasks in complex underwater environments.

The MIPS-EfficientNet model demonstrated a significant improvement in accuracy, reaching 97.00%, which is substantially higher than the EfficientNet model’s 88.81%. Accuracy represents the overall proportion of correct predictions, indicating that MIPS-EfficientNet is more precise in its overall predictions. MIPS-EfficientNet also showed improvement in recall, achieving 93.56%, compared to EfficientNet’s recall of 82.35%. Recall measures the model’s ability to correctly identify positive instances; thus, MIPS-EfficientNet performs better in correctly identifying instances of each category. In terms of precision, MIPS-EfficientNet also surpassed EfficientNet, with scores of 94.10% and 87.27%, respectively. Precision refers to the proportion of instances predicted as positive that are actually positive, with higher precision indicating fewer false positives (misreports). Lastly, MIPS-EfficientNet significantly improved its F1 score, reaching 93.78%, compared to EfficientNet’s F1 score of 80.68%. The F1 score is the harmonic mean of precision and recall, indicating that MIPS-EfficientNet is more balanced and superior in overall performance. In summary, the MIPS-EfficientNet model shows significant improvements across all primary performance metrics. These metrics suggest that compared to the original EfficientNet model, MIPS-EfficientNet has made progress in reducing misjudgments (improving precision), enhancing the identification rate of positive instances (improving recall), and achieving a better balance between the two (improving the F1 score). These improvements may be attributed to optimizations in the model structure, training process, or data handling by MIPS-EfficientNet. Therefore, when selecting a model for practical application, based on these indicators, MIPS-EfficientNet surpasses the original EfficientNet model in reducing misjudgments, enhancing correct identification rates, and maintaining a good balance between these aspects.

In the process of delving deeper into the aforementioned phenomena, this study further employed t-distributed Stochastic Neighbor Embedding (t-SNE [45]) to visualize the output of the model’s final convolutional layer. As depicted in Figure 9, the impact of different loss functions on the distribution of the feature space is clearly observable.

Figure 9 provides a t-SNE comparative visualization of feature vectors extracted by the EfficientNet and MIPS-EfficientNet models. In these representations, samples from various categories are encoded in different colors to denote their class: none (yellow), weak (blue), medium (green), and strong (red). This color-coding facilitates the observation of the models’ capabilities to cluster similar features within a three-dimensional feature space. Notably, MIPS-EfficientNet demonstrates a more distinct clustering of points compared to EfficientNet, indicating enhanced capability in distinguishing between categories.

Within the feature space, the distances between categories, represented by unique colors, are pronounced, indicating the model’s effective discriminative ability to differentiate between sample categories. The spatial separation between categories, particularly the discernible division between the strong and medium classes, signifies the superior discriminative power of the MIPS-EfficientNet model. Such a clear demarcation of classes is crucial for precise classification tasks, especially in datasets characterized by complexity or a high degree of overlap.

Furthermore, it can be inferred that the MIPS-EfficientNet model places greater emphasis on the internal consistency of categories and the distinctions between them during the learning process. This is evidenced by the cohesiveness of the category clusters and their separation. Similarly, it can be anticipated that this attribute may lead to a lower classification error rate, as the model has understood the differences between categories on a more granular level.

Taking these observations into account, it can be posited that the MIPS-EfficientNet model may surpass traditional EfficientNet models in specific applications such as the recognition of fish behaviors. The t-SNE visualization in Figure 9 illustrates the potential of MIPS-EfficientNet in handling complex datasets for refined classification tasks, and its efficiency in feature extraction warrants further exploration and application.

4. Conclusions

This paper presents an innovative image pre-enhancement strategy designed to optimize the precision and efficiency of fish feeding behavior classification. Initially, a multi-step image pre-enhancement protocol is adopted, encompassing three main processing phases: Multi-Scale Retinex with Color Restoration (MSRCR), Multi-Metric Driven Contrast Limited Adaptive Histogram Equalization (mdc), and Unsharp Masking. This suite of pre-enhancement actions effectively rectifies common image quality issues, such as surface reflections, low contrast, and indistinct detail features. Subsequent to image pre-enhancement, a lightweight EfficientNet neural network model is further employed to accomplish the classification of fish feeding behaviors.

This study involves integrating a Multi-Scale Image Preprocessing Strategy (MIPS) into the EfficientNet model and comparing its performance against the baseline, unprocessed EfficientNet model. The analysis focuses on the accuracy (acc) post model training and the time required for prediction. Moreover, to assess the universality of the MIPS preprocessing strategy and its impact on model performance, this strategy is also applied to other leading deep learning frameworks, such as ResNet-18, ResNet-50, and ResNeXt50. By comparing the accuracy and prediction time of these frameworks after applying MIPS preprocessing, this research aimed to comprehensively evaluate the potential of the preprocessing strategy in enhancing the efficiency and effectiveness of various deep learning models.

The test results indicate that the proposed Multi-Step Image Pre-enhancement (MIPS) module significantly enhances the performance of various convolutional neural network (CNN) models in terms of classification accuracy. Particularly in comparison with the current advanced ResNet series models, this approach not only improves the classification accuracy but also substantially reduces the training times of the models. These achievements demonstrate that the method introduced in this paper is both effective and efficient in processing fish feeding behavior classification tasks.

To validate the efficiency of the algorithm, we evaluated its practical application in the fish feeding process during experiments. By comparing it with other well-known convolutional neural networks (CNNs), we found that the MIPS-enhanced models performed exceptionally well on several key metrics. Below are some practical application examples.

In terms of cost reduction, by integrating the algorithm to optimize the feeding process, feed waste was reduced from 25% to 10%, saving the fish farm approximately 500 kg of feed per month, which is equivalent to monthly savings of USD 1500 (assuming a feed cost of USD 3 per kilogram). Regarding feeding efficiency, the experiment revealed that manual feeding took about 2 h per session, while the automated feeding system, after implementing the algorithm, completed the process in 1.5 h, increasing feeding efficiency by 25%, allowing staff to focus on other critical tasks. In terms of productivity improvement, an experiment monitoring two ponds showed that the pond using the algorithm had a 10% higher fish growth rate and a 15% improvement in the feed conversion ratio (FCR), resulting in an average weight increase of 200 g per fish, enhancing market value and profitability. Regarding the environmental impact, the optimized feeding process reduced the occurrence of feed waste sinking to the bottom of the pond, thus reducing water pollution and the need for frequent pond cleaning, benefiting the environment and lowering maintenance costs for the fish farm. These practical benefits demonstrate the real-world feasibility and effectiveness of the algorithm in improving the fish feeding process, proving its potential as a valuable tool in modern fish farming operations.

In this study, while the proposed Multi-Step Image Pre-Enhancement (MIPS) module combined with the lightweight EfficientNet network achieved significant outcomes in classifying fish feeding behaviors, particularly in enhancing classification accuracy and reducing the training time compared to ResNet series models, a critical limitation was observed. Specifically, the MIPS, as a multi-step image pre-enhancement strategy, incurs longer processing times during the prediction phase. This implies that despite a significant reduction in the overall training time and an improvement in classification accuracy, the prediction speed did not exhibit a competitive advantage.

To address this issue, future research directions may focus on implementing the MIPS steps themselves through convolutional neural network (CNN) models. Such an approach is anticipated to further optimize the overall performance of the model, especially in reducing the time required for the prediction stage. Through such optimization, it is expected to achieve faster prediction speeds while maintaining or even improving the classification accuracy, thereby providing a more efficient solution for practical applications.

Although the proposed method (MIPS-EfficientNet) shows significant improvements over traditional convolutional neural networks (CNNs), it has only been compared with a single class of CNNs. However, it is noteworthy that Vision Transformer-based models currently achieve the highest performance on the ImageNet benchmark, as indicated by the rankings on Papers With Code. It should be noted that the dataset used in this study is relatively small. Vision Transformer models typically perform better on large-scale datasets but may not perform as well on small datasets compared to traditional CNNs. Therefore, despite the superior performance of Vision Transformers on some large-scale datasets, future work should include a comparison of MIPS-EfficientNet with Vision Transformer models on small datasets to comprehensively evaluate their performance.

Author Contributions

Conceptualization, G.F. and M.C.; methodology, X.K.; software, X.K.; validation, G.F. and X.K.; formal analysis, G.F. and X.K.; resources, G.F.; data curation, G.F. and X.K.; writing—original draft preparation, G.F. and X.K.; writing—review and editing, G.F. and X.K.; supervision, G.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shanghai Science and Technology Innovation Action Planning, No. 20dz1203800, and Jiang Su Modern Agricultural Industry Key Technology Innovation Planning, No. CX (20) 2028.

Data Availability Statement

The experimental data related to this paper can be requested from the authors by email if any researcher is in need of the dataset (email: [email protected]).

Acknowledgments

The authors are very grateful to Wang Yaohui of Nantong Longyang Aquatic Co. in Jiangsu Province, China for providing us with the experimental data and the data collection site.

Conflicts of Interest

The authors declare no conflict of interest.

References

An, D.; Hao, J.; Wei, Y.; Wang, Y.; Yu, X. Application of Computer Vision in Fish Intelligent Feeding System—A Review. Aquac. Res. 2021, 52, 423–437. [Google Scholar] [CrossRef]
Peixoto, S.; Soares, R.; Allen Davis, D. An Acoustic Based Approach to Evaluate the Effect of Different Diet Lengths on Feeding Behavior of Litopenaeus vannamei. Aquac. Eng. 2020, 91, 102114. [Google Scholar] [CrossRef]
Atoum, Y.; Srivastava, S.; Liu, X. Automatic Feeding Control for Dense Aquaculture Fish Tanks. IEEE Signal Process. Lett. 2015, 22, 1089–1093. [Google Scholar] [CrossRef]
Chang, C.M.; Fang, W.; Jao, R.C.; Shyu, C.Z.; Liao, I.C. Development of an Intelligent Feeding Controller for Indoor Intensive Culturing of Eel. Aquac. Eng. 2005, 32, 343–353. [Google Scholar] [CrossRef]
Wisnu, R.P.; Karuniasa, M.; Moersidik, S.S. The Impact of Fish Feed on Water Quality in Lake Cilala, Bogor Regency, West Java. IOP Conf. Ser. Earth Environ. Sci. 2021, 716, 012023. [Google Scholar] [CrossRef]
Zhao, S.; Zhu, M.; Ding, W.; Zhao, S.; Gu, J. Feed Requirement Determination of Grass Carp (Ctenopharyngodon idella) Using a Hybrid Method of Bioenergetics Factorial Model and Fuzzy Logic Control Technology under Outdoor Pond Culturing Systems. Aquaculture 2020, 521, 734970. [Google Scholar] [CrossRef]
Wang, Y.; Yu, X.; Liu, J.; An, D.; Wei, Y. Dynamic Feeding Method for Aquaculture Fish Using Multi-Task Neural Network. Aquaculture 2022, 551, 737913. [Google Scholar] [CrossRef]
Wei, D.; Bao, E.; Wen, Y.; Zhu, S.; Ye, Z.; Zhao, J. Behavioral Spatial-Temporal Characteristics-Based Appetite Assessment for Fish School in Recirculating Aquaculture Systems. Aquaculture 2021, 545, 737215. [Google Scholar] [CrossRef]
Wei, D.; Ji, B.; Li, H.; Zhu, S.; Ye, Z.; Zhao, J. Modified Kinetic Energy Feature-Based Graph Convolutional Network for Fish Appetite Grading Using Time-Limited Data in Aquaculture. Front. Mar. Sci. 2022, 9, 1021688. [Google Scholar] [CrossRef]
Zhou, C.; Zhang, B.; Lin, K.; Xu, D.; Chen, C.; Yang, X.; Sun, C. Near-Infrared Imaging to Quantify the Feeding Behavior of Fish in Aquaculture. Comput. Electron. Agric. 2017, 135, 233–241. [Google Scholar] [CrossRef]
Miyazono, T.; Saitoh, T. Fish species recognition based on CNN using annotated image. In IT Convergence and Security 2017; Kim, K.J., Kim, H., Baek, N., Eds.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2018; Volume 449, pp. 156–163. ISBN 978-981-10-6450-0. [Google Scholar]
Tamou, A.B.; Benzinou, A.; Nasreddine, K.; Ballihi, L. Underwater live fish recognition by deep learning. In Image and Signal Processing; Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 10884, pp. 275–283. ISBN 978-3-319-94210-0. [Google Scholar]
Yang, L.; Chen, Y.; Shen, T.; Li, D. An FSFS-Net Method for Occluded and Aggregated Fish Segmentation from Fish School Feeding Images. Appl. Sci. 2023, 13, 6235. [Google Scholar] [CrossRef]
Castro-Santos, T. Optimal Swim Speeds for Traversing Velocity Barriers: An Analysis of Volitional High-Speed Swimming Behavior of Migratory Fishes. J. Exp. Biol. 2005, 208, 421–432. [Google Scholar] [CrossRef] [PubMed]
Zhou, C.; Lin, K.; Xu, D.; Chen, L.; Guo, Q.; Sun, C.; Yang, X. Near Infrared Computer Vision and Neuro-Fuzzy Model-Based Feeding Decision System for Fish in Aquaculture. Comput. Electron. Agric. 2018, 146, 114–124. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Zhou, C.; Xu, D.; Chen, L.; Zhang, S.; Sun, C.; Yang, X.; Wang, Y. Evaluation of Fish Feeding Intensity in Aquaculture Using a Convolutional Neural Network and Machine Vision. Aquaculture 2019, 507, 457–465. [Google Scholar] [CrossRef]
Hu, W.-C.; Chen, L.-B.; Huang, B.-K.; Lin, H.-M. A Computer Vision-Based Intelligent Fish Feeding System Using Deep Learning Techniques for Aquaculture. IEEE Sens. J. 2022, 22, 7185–7194. [Google Scholar] [CrossRef]
Zheng, K.; Yang, R.; Li, R.; Guo, P.; Yang, L.; Qin, H. A Spatiotemporal Attention Network-Based Analysis of Golden Pompano School Feeding Behavior in an Aquaculture Vessel. Comput. Electron. Agric. 2023, 205, 107610. [Google Scholar] [CrossRef]
Zheng, K.; Yang, R.; Li, R.; Yang, L.; Qin, H.; Li, Z. A dual stream hierarchical transformer for starvation grading of golden pomfret in marine aquaculture. Front. Mar. Sci. 2022, 9, 1039898. [Google Scholar] [CrossRef]
Zheng, K.; Yang, R.; Li, R.; Yang, L.; Qin, H.; Sun, M. A Deep Transformer Model-Based Analysis of Fish School Starvation Degree in Marine Farming Vessels. In Proceedings of the 2022 4th International Conference on Control and Robotics (ICCR), Guangzhou, China, 2–4 December 2022; IEEE: New York, NY, USA, 2022; pp. 40–46. [Google Scholar]
Ye, Z.Y.; Zhao, J.; Han, Z.Y.; Zhu, S.M.; Li, J.P.; Lu, H.D.; Ruan, Y.J. Behavioral Characteristics and Statistics-Based Imaging Techniques in the Assessment and Optimization of Tilapia Feeding in a Recirculating Aquaculture System. Trans. ASABE 2016, 59, 345–355. [Google Scholar] [CrossRef]
Wang, G.; Muhammad, A.; Liu, C.; Du, L.; Li, D. Automatic Recognition of Fish Behavior with a Fusion of RGB and Optical Flow Data Based on Deep Learning. Animals 2021, 11, 2774. [Google Scholar] [CrossRef]
Shang, H.; Yu, Y.; Song, W. Underwater Fish Image Classification Algorithm Based on Improved ResNet-RS Model. In Proceedings of the Jiangsu Annual Conference on Automation (JACA 2023), Changzhou, China, 10–12 November 2023; Institution of Engineering and Technology: Changzhou, China, 2023; pp. 106–110. [Google Scholar] [CrossRef]
Zhao, J.; Bao, W.; Zhang, F.; Zhu, S.; Liu, Y.; Lu, H.; Shen, M.; Ye, Z. Modified Motion Influence Map and Recurrent Neural Network-Based Monitoring of the Local Unusual Behaviors for Fish School in Intensive Aquaculture. Aquaculture 2018, 493, 165–175. [Google Scholar] [CrossRef]
Zhang, B.; Xie, F.; Han, F. Fish Population Status Detection Based on Deep Learning System. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 4–7 August 2019; IEEE: New York, NY, USA, 2019; pp. 81–85. [Google Scholar]
Qi, R.; Liu, H.; Liu, S. Effects of Different Culture Densities on the Acoustic Characteristics of Micropterus Salmoide Feeding. Fishes 2023, 8, 126. [Google Scholar] [CrossRef]
Wang, Y.; Yu, X.; An, D.; Wei, Y. Underwater Image Enhancement and Marine Snow Removal for Fishery Based on Integrated Dual-Channel Neural Network. Comput. Electron. Agric. 2021, 186, 106182. [Google Scholar] [CrossRef]
Yang, P.; Liu, Q.Y.; Li, Z. A High-Precision Classification Method for Fish Feeding Behavior Analysis Based on Improved RepVGG. Preprints 2023, 2023091041. [Google Scholar] [CrossRef]
Zhang, L.; Wang, J.; Li, B.; Liu, Y.; Zhang, H.; Duan, Q. A MobileNetV2-SENet-Based Method for Identifying Fish School Feeding Behavior. Aquac. Eng. 2022, 99, 102288. [Google Scholar] [CrossRef]
Yang, L. A Dual Attention Network Based on efficientNet-B2 for Short-Term Fish School Feeding Behavior Analysis in Aquaculture. Comput. Electron. Agric. 2021, 187, 106316. [Google Scholar] [CrossRef]
Zhang, Y.; Xu, C.; Du, R.; Kong, Q.; Li, D.; Liu, C. MSIF-MobileNetV3: An Improved MobileNetV3 Based on Multi-Scale Information Fusion for Fish Feeding Behavior Analysis. Aquac. Eng. 2023, 102, 102338. [Google Scholar] [CrossRef]
Zhao, H.; Wu, J.; Liu, L.; Qu, B.; Yin, J.; Yu, H.; Jiang, Z.; Zhou, C. A Real-Time Feeding Decision Method Based on Density Estimation of Farmed Fish. Front. Mar. Sci. 2024, 11, 1358209. [Google Scholar] [CrossRef]
Zhang, S.; Zhu, M.; Meng, K. An Automated Multi-Scale Retinex for Dim Image Enhancement. In Proceedings of the 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 21–23 January 2022; IEEE: New York, NY, USA, 2022; pp. 647–651. [Google Scholar]
Jobson, D.J.; Rahman, Z.; Woodell, G.A. A Multiscale Retinex for Bridging the Gap between Color Images and the Human Observation of Scenes. IEEE Trans. Image Process. 1997, 6, 965–976. [Google Scholar] [CrossRef] [PubMed]
Kansal, S.; Purwar, S.; Tripathi, R.K. Image Contrast Enhancement Using Unsharp Masking and Histogram Equalization. Multimed. Tools Appl. 2018, 77, 26919–26938. [Google Scholar] [CrossRef]
Banerjee, S.; Alvey, L.; Brown, P.; Yue, S.; Li, L.; Scheirer, W.J. An Assistive Computer Vision Tool to Automatically Detect Changes in Fish Behavior in Response to Ambient Odor. Sci. Rep. 2021, 11, 1002. [Google Scholar] [CrossRef]
Ubina, N.; Cheng, S.-C.; Chang, C.-C.; Chen, H.-Y. Evaluating Fish Feeding Intensity in Aquaculture with Convolutional Neural Networks. Aquac. Eng. 2021, 94, 102178. [Google Scholar] [CrossRef]
Reza, A.M. Realization of the Contrast Limited Adaptive Histogram Equalization (CLAHE) for Real-Time Image Enhancement. J. VLSI Signal Process. Syst. Signal Image Video Technol. 2004, 38, 35–44. [Google Scholar] [CrossRef]
Koonsanit, K.; Thongvigitmanee, S.; Pongnapang, N.; Thajchayapong, P. Image Enhancement on Digital X-Ray Images Using N-CLAHE. In Proceedings of the 2017 10th Biomedical Engineering International Conference (BMEiCON), Hokkaido, Japan, 31 August–2 September 2017; IEEE: New York, NY, USA, 2017; pp. 1–4. [Google Scholar]
Sonali; Sahu, S.; Singh, A.K.; Ghrera, S.P.; Elhoseny, M. An Approach for De-Noising and Contrast Enhancement of Retinal Fundus Image Using CLAHE. Opt. Laser Technol. 2019, 110, 87–98. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Int. Conf. Mach. Learn. PMLR 2019, 97, 6105–6114. [Google Scholar]
Zhang, L.; Liu, Z.; Zheng, Y.; Li, B. Feeding Intensity Identification Method for Pond Fish School Using Dual-Label and MobileViT-SENet. Biosyst. Eng. 2024, 241, 113–128. [Google Scholar] [CrossRef]
Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
Leticio, G.R.; Kawai, V.S.; Valem, L.P.; Pedronette, D.C.G.; Torres, R.D.S. Manifold Information through Neighbor Embedding Projection for Image Retrieval. Pattern Recognit. Lett. 2024, 183, 17–25. [Google Scholar] [CrossRef]

Figure 1. The overall process of the fish school feeding behavior identification method.

Figure 2. The structure of the image acquisition system.

Figure 3. Classification chart of different feeding behavior intensities. (a) Strong Feeding: At the initial stage, the fish actively compete for bait, accompanied by splashes and noise, exhibiting a strong desire for feeding. (b) Medium Feeding: As the bait begins to diminish, a portion of the fish reduces their feeding activities, resulting in a decrease in overall feeding intensity to a moderate level. (c) Weak Feeding: With further reduction in bait, only a few fish continue to show interest in feeding, and the school begins to disperse further, indicating sparse feeding behavior. (d) None Feeding: After approximately 4 min, almost all fish lose their desire to feed and start to disperse, swimming slowly along the bottom, signaling the end of the feeding process.

Figure 4. Results of all steps of multi-step image pre-enhancement strategy.

Figure 5. Flowchart of Multi-Metric-Driven CLAHE and Unsharp Masking.

Figure 6. SE module.

Figure 7. EfficientNet module.

Figure 8. A comparison of confusion matrices. The horizontal axis represents the predicted classification labels, while the vertical axis corresponds to the actual classification labels. (a) presents the classification results of fish feeding behavior images obtained using the EfficientNet network model, visualized in the form of a confusion matrix. (b) displays the classification results for fish feeding behavior images obtained using the MIPS-EfficientNet network model, also visualized through a confusion matrix.

Figure 9. Feature visualization using the t-SNE method. (a) shows the classification results of the EfficientNet network model on images of fish feeding behavior, visualized in the form of t-SNE. (b) shows the classification results of the MIPS-EfficientNet network model on images of fish feeding behavior, also visualized through t-SNE.

Table 1. Formal definition and evaluation criteria for feeding intensity.

Stage	Time	Feeding Intensity	Definition Behavioral Characteristics	Evaluation Criteria
High Intensity	1–2 min after feeding	Most active chasing and ingesting behavior	Significant splashing and noise on the surface, dense chasing of food by the fish	Rapid decrease in feed, frequent surface activity
Medium Intensity	2–3 min after feeding	Feeding behavior begins to slow but remains active	Fish start to disperse, there is less noise and splashing on the surface, but still noticeable chasing of food	Slower consumption of feed, some fish still actively feeding
Low Intensity	3–4 min after feeding	Only few fish show feeding behavior as feed decreases further	Most fish start to disperse, and a few active fish occasionally chase remaining feed	Continued observation of few fish feeding behaviors
Inactive	More than 4 min after feeding	Most fish no longer actively chase feed, showing low feeding activity	Fish swim slowly in the water, occasionally staying at the bottom, with almost no response to feed	High feed residue, slow and scattered fish behavior

Table 2. MIPS-EfficentNet training strategies.

Optimizer	Loss Function	Train Acc (%)	Val Acc (%)
AdamW	FL (Focal Loss)	79.85%	79.10%
AdamW	CE (Cross Entropy)	99.21%	91.04%
SGD	FL	99.74%	90.30%
SGD	CE	100.00%	97.00%

Table 3. Comparison with other CNNs.

Model	Test-Acc (%)	Recall (%)	Precision (%)	F1 Score (%)	Training Time (Hours)
Efficientnet	88.81%	82.35%	87.27%	80.68%	1.595 h
MIPS-Efficientnet	97.00%	93.56%	94.10%	93.78%	1.595 h
Resnet-18	96.27%	94.35%	94.92%	94.44%	2.150 h
MIPS-Resnet-18	98.50%	97.54%	97.00%	97.13%	2.150 h
Resnet-50	95.50%	85.29%	88.79%	86.27%	2.365 h
MIPS-Resnet-50	97.80%	93.37%	94.96%	94.03%	2.365 h
ResNeXt50	93.28%	90.70%	91.38%	90.46%	2.291 h
MIPS-ResNeXt50	94.78%	92.28%	92.14%	92.19%	2.291 h

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, G.; Kan, X.; Chen, M. A Multi-Step Image Pre-Enhancement Strategy for a Fish Feeding Behavior Analysis Using Efficientnet. Appl. Sci. 2024, 14, 5099. https://doi.org/10.3390/app14125099

AMA Style

Feng G, Kan X, Chen M. A Multi-Step Image Pre-Enhancement Strategy for a Fish Feeding Behavior Analysis Using Efficientnet. Applied Sciences. 2024; 14(12):5099. https://doi.org/10.3390/app14125099

Chicago/Turabian Style

Feng, Guofu, Xiaojuan Kan, and Ming Chen. 2024. "A Multi-Step Image Pre-Enhancement Strategy for a Fish Feeding Behavior Analysis Using Efficientnet" Applied Sciences 14, no. 12: 5099. https://doi.org/10.3390/app14125099

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Step Image Pre-Enhancement Strategy for a Fish Feeding Behavior Analysis Using Efficientnet

Abstract

1. Introduction

2. Methodology

2.1. Overview

2.2. Data Collection

2.3. Multi-Step Image Pre-Enhancement Strategy

2.3.1. Multi-Scale Retinex with Color Restoration (MSRCR)

2.3.2. Multi-Metric Driven CLAHE and Unsharp Masking

2.3.3. Image Sharpening

2.4. EfficientNet

3. Experiments and Discussion

3.1. Training Details

3.2. Evaluation Metrics

3.3. Performance of Different Training Strategies

3.4. A Comparison with Other CNNs on the Testing Set

3.5. Accuracy across Different Categories

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI