Research on In Situ Observation Method of Plankton Based on Convolutional Neural Network

Yuan, Chengzhi; He, Zhongjie; Ning, Chunlin; Wang, Weimin; Zhao, Jinkai; Yuan, Guozheng; Li, Chao

doi:10.3390/jmse12101702

Open AccessArticle

Research on In Situ Observation Method of Plankton Based on Convolutional Neural Network

by

Chengzhi Yuan

^1,2,

Zhongjie He

¹

,

Chunlin Ning

^2,3,4,5,*,

Weimin Wang

^2,5,

Jinkai Zhao

^2,6,

Guozheng Yuan

^2,6 and

Chao Li

^2,3,4,5

¹

Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China

²

First Institute of Oceanography, Ministry of Natural Resources, Qingdao 266061, China

³

Key Laboratory of Marine Science and Numerical Modeling, Ministry of Natural Resources, Qingdao 266061, China

⁴

Shandong Key Laboratory of Marine Science and Numerical Modeling, Qingdao 266061, China

⁵

Laboratory for Regional Oceanography and Numerical Modeling, Qingdao Marine Science and Technology Center, Qingdao 266237, China

⁶

College of Oceanography and Space Informatics, China University of Petroleum, Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(10), 1702; https://doi.org/10.3390/jmse12101702

Submission received: 23 August 2024 / Revised: 14 September 2024 / Accepted: 16 September 2024 / Published: 26 September 2024

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The marine ecosystem is one of the most extensive and abundant ecosystems on Earth. Marine plankton is an important component, and its abundance, number of species, and dominant species are regarded as important monitoring indicators. Aiming at the problems of low accuracy and high complexity in identifying plankton based on convolutional neural networks, this study proposes a lightweight identification algorithm for plankton images based on the improved MobileNetV2. Firstly, the network layer structure is extracted by redesigning features to balance the depth and width of the network to reduce the model parameters; secondly, the lightweight coordinate attention (CA) mechanism is introduced to strengthen the attention and extraction ability of key areas; in addition, the structure of the network classifier is optimized to improve the utilization efficiency of the model parameters. The results show that the model achieves a 95.46% accuracy and 94.48% recall in 12 kinds of images. Compared with the initial MobileNetV2, the parameters and calculation amount are reduced by 72.47% and 52.09%, respectively, and the reasoning time for a single image is 6.15 ms. The model realizes the accurate identification of plankton in situ under the premise of ensuring it is lightweight. Combining time information and depth data, it is of great significance for marine ecological environment monitoring and prediction to obtain the abundance of various plankton.

Keywords:

plankton recognition; convolutional neural network; improved MobileNetV2; abundance

1. Introduction

The marine ecosystem is one of the largest and richest ecosystems on Earth, covering a vast area from the coastline to the deep sea, occupying about 70% of the Earth’s surface, and covering a rich biodiversity and complex ecosystems. Marine plankton is the primary and secondary producer in the ocean, with a complex composition and vast numbers. It is crucial that we maintain the balance of marine material circulation, energy flow, and the marine ecological environment [1,2,3]. In marine ecological environment monitoring, the abundance, species, and distribution of plankton are important monitoring indicators. It is particularly important to collect and analyze plankton data in real time to gain a deep understanding of the relationship between its abundance and distribution and changes in the surrounding environment [4,5]. The comprehensive analysis of such data can provide key evidence to help predict potential marine and climate change effects, formulate sustainable resource management strategies, and promote the adoption of necessary environmental protection and restoration measures [6,7,8].

Therefore, the study of plankton has received widespread attention in society, and the identification of plankton is a prerequisite for studying and monitoring changes in plankton ecology, and has certain scientific research value and economic value [9]. In the early days, in order to understand the distribution changes of plankton in time and space, plankton samples were usually collected by trawling and water sampling, and then classified and counted manually [10]. This process is time-consuming, labor-intensive, and costly, which hindered the monitoring and research of plankton at a larger temporal and spatial scale. In recent years, with the development of imaging technology, underwater image sensors can continuously and automatically capture individual images of plankton, such as the Video Plankton Recorder (VPR), Shadow Image Particle Profile Evaluation Recorder, Zooplankton Visualization System, Scripps Plankton Camera, In Situ Fish Plankton Imaging System, etc. [11,12,13,14,15]. Since these imaging systems usually generate a large amount of image data, taxonomists need to manually mark them, resulting in the slow identification of plankton species and a huge workload. At the same time, due to the wide variety of plankton species, researchers may have inadequate knowledge and experience, which may lead to miscalculations and human error by chance when performing species identification [16,17].

Over the last decade, numerous research studies on automatic plankton image recognition have emerged globally [18,19,20,21]. Before the emergence of deep learning, plankton image recognition mainly used traditional machine-learning methods. Tang et al. [22] proposed a novel pattern recognition system to image a large number of plankton images using drag-and-drop VPR, and classify them using an improved learning vector quantized network classifier, which realizes fully automatic plankton recognition for the first time. Hu and Davis [23] used the co-occurrence matrix (COM) as a feature and used a support vector machine (SVM) as a classifier to reduce the classification error rate by using a combination of the COM and SVM to automatically recognize plankton images. Zheng et al. [24] proposed an automatic plankton image classification system based on multi-kernel learning (MKL) combined with multi-view features. By combining different types of features together and providing them to multiple classifiers to better utilize the feature information, a higher classification accuracy is achieved. These machine-learning-based recognition methods extract target features and process them using classifiers. The setting of these manual features depends on past experience and does not involve big data. In addition, the number of parameters in manual features is small, which limits the recognition accuracy of plankton. At present, due to the rapid advancement of computer technology and the improvement of hardware integration quality, deep-learning technology has rapidly emerged and has made significant breakthroughs in various fields [25,26]. Li and Cui [27] proposed a method for classifying plankton images using a deep residual network, which improved the accuracy and real-time performance of plankton recognition. In order to overcome the problem of class imbalance, Lee et al. [28] proposed a fine-grained classification method for a large plankton database using a convolutional neural network (CNN) based on transfer learning. Nandini et al. [29] used CNN models with VGG16 and Resnet as the backbone networks and deployed them on the Heroku platform to classify different plankton species and record newly added classes, so as to further input new data into the model to help identify new plankton species, achieving an accuracy of 83.7%. Most existing zooplankton image recognition methods usually only use imagers to image underwater, and then export the images or transmit them to onshore computers or servers through cables or communication systems for identification and analysis, which cannot meet the needs of real-time observation. In order to realize the real-time in situ monitoring of plankton distribution, lightweight algorithms have gradually gained attention and applications, such as Guo et al. [30], who combined deep-learning methods with digital embedded holography and applied the lightweight network ShuffleNet to classify marine plankton holograms. However, the acquisition and reconstruction process of holographic images is complicated and has high requirements for light sources, recording media and reconstruction algorithms. Microscopic imaging technology can directly observe the microstructure of plankton without complex recording and reconstruction processes, and can quickly acquire a large amount of high-quality plankton image data, and this immediacy is crucial for the study of the ecological behaviors, distribution patterns, and population dynamics of plankton [31,32,33].

In view of this, this study has made improvements in two aspects: reducing the number of model parameters and the amount of calculation, and improving the accuracy of plankton recognition in real scenarios. A lightweight recognition method for plankton microscopic images based on MobileNetV2 is proposed. First, the depth and width of the network are balanced by redesigning the layer structure of the feature extraction network to reduce the number of parameters and the amount of computation of the model, which not only reduces the computational burden of the model, but also accelerates the training and inference process. Second, the CA mechanism is introduced to dynamically adjust the feature weights between different channels to promote the deep interaction of global information within the graph, which enables the model to focus more on the key feature regions of plankton, and significantly improves the differentiation and extraction accuracy of the features; finally, in conjunction with the target recognition task, the tail classifier of the network is improved to enhance the utilization efficiency of the network parameters, and, ultimately, the accurate identification of the in situ-end plankton is achieved while guaranteeing the lightweighting premise. Combined with the hydrological information and spatial and temporal position information observed by other sensors, the abundance of each category of plankton is thus obtained. This study not only demonstrates, for the first time, the potential of the high-precision classification of plankton micrographs in real-time environments, but also provides a new technological path for the long-term in situ monitoring of plankton. By integrating the in situ observation algorithm in this study, the imaging instrument will be able to monitor plankton automatically and continuously, which is of great significance for marine ecological environment monitoring and ocean and climate prediction.

2. Materials and Methods

2.1. Materials

2.1.1. Dataset Description

The dataset used is derived from a scientific expedition conducted by the research vessel “Xiangyanghong 01” in 2018 in the typical waters of the southern Pacific Ocean. During this expedition, data were collected by horizontally towing a VPR at two stations (129.05° W, 9.01° S and 134.17° W, 11.57° S). The VPR device model 6V-ESPB-SO03VPR was equipped with a color camera with pixels of 1392 × 1040 CCDs and operated at a fixed operating frequency of 20 Hz. The video was preprocessed using the JpegLS_Adeck.exe software, version 2017, by adjusting key parameters such as Threshold, Sobel, and Std Dev to optimize the image quality and extract the Region of Interest (ROI). In order to ensure the accuracy and reliability of the data, a total of 12 types of image production cost studies were screened by manual classification and screening, including a variety of zooplankton such as copepods, radiolarians, medusa, foraminifera, chaetognaths, tunicates, and krill, as well as phytoplankton such as diatoms, chaetoceros, and ceratium, and also sea snow and zooplankton faeces, which are formed by combining tiny dead organisms with living organisms. Selected images of the 12 categories in the dataset are shown in Figure 1.

2.1.2. Data Preprocessing

Dataset augmentation was performed on the plankton images due to the uneven distribution of the sample data and the relatively small number of images in some categories, which may trigger overfitting problems during model training. A variety of dataset augmentations were performed on the plankton images using Python scripts (version 3.8.10), including random rotation, translation along different axes, or cropping. These operations not only effectively increase the number of samples in the dataset, but also simulate the different angles and positions that may occur during the actual shooting process, improve the generalization ability and robustness of the model, and, finally, obtain a plankton dataset with a rich number of samples and a balanced distribution. The original images of the plankton dataset were enhanced from 2728 to 6674, and each type of plankton image was randomly divided into training, validation, and test sets according to the ratio of 8:1:1. The training and validation sets were used to train the neural network and select the optimal parameters of the model, and the test set was used to evaluate the performance of the final network model. The detailed data of the dataset are shown in Table 1.

2.2. Methods

2.2.1. MobileNetV2 Model Structure

MobileNetV2 model is a lightweight convolutional neural network focused on mobile or embedded devices proposed by Google team in 2018 [34]. Its structural parameters are shown in Table 2. The MobileNetV2 network consists of 17 bottleneck layers, and its core is deeply separable convolution, including deep convolution and point-by-point convolution. In deep-learning algorithms, gradient problems such as gradient dispersion, gradient explosion, and degradation become more prominent as the depth of the network increases. Compared with MobileNetV1 [35], the inverted residual structural block and linear bottleneck layer structure in MobileNetV2 can effectively mitigate these gradient-related problems and help ensure stability and convergence during network training. The inverted residual structure block adopts the depthwise convolutive separation operation; for the input feature matrix, a 1 × 1 convolution kernel is first used for dimension upgrading of the input feature matrix, and then it is convolved by the depthwise convolution operation with a convolution kernel size of 3 × 3, and, then, finally it is downgraded by the 1 × 1 convolution kernel, which can help to enhance the propagation efficiency of the gradient and significantly reduce the memory resource consumption during the reasoning period. The occupation of memory resources happens during inference. In addition, the Relu6 activation function is used instead of the original Relu function, and the linear activation function is used after the last 1 × 1 convolution operation. Moreover, when the input and output feature matrices are of the same shape and the step size is 1, a shortcut connection is established, which, ultimately, forms a linear bottleneck structure with two small ends and a large middle. This helps to retain the diversity of features, enhance the expressive ability of the network, and make the model more robust under low-precision computation. In addition, the model introduces two hyperparameters, the width coefficient α and the resolution coefficient β, to proportionally reduce the number of channels in each layer and the resolution of the input image, thus reducing the amount of computation and the number of parameters that need to be processed by the model, which helps to realize faster reasoning speed with limited hardware resources.

2.2.2. Attention Module

The attention mechanism is a strategy to optimize the processing capability of the model under limited computational resources, which makes the model able to focus on the key regions in the input image, reduce the attention to other information, and even filter out irrelevant information by weighting different parts of the input data. This mechanism is able to capture and express the effective features in the data more effectively, which significantly improves the learning efficiency and accuracy during task processing. Among the current mainstream attention mechanisms, the squeeze and excitation (SE) attention module [36] realizes the fusion of global contextual information by dynamically adjusting the weights of each channel, and this strategy greatly enhances the model’s ability to recognize the importance of features among different channels. However, the SE module mainly focuses on attention in the channel dimension; i.e., it focuses on how much each channel contributes to the overall output, and fails to capture attention information in the spatial dimension. The convolutional block attention module (CBAM) attention module [37] effectively combines both channel attention and spatial attention, and the combination of the convolutional operations and attention mechanisms can enhance the attention of convolutional neural networks to image features from both spatial and channel dimensions. However, it exhibits some limitations in capturing the correlation of remote spatial locations and does not explicitly establish the dependency between coordinates. To overcome these limitations, more advanced attention mechanisms such as coordinate attention (CA) attention module [38] have been proposed, which not only considers the information interactions between channels, but also emphasizes the dependencies between coordinates to capture finer spatial features. It enables the model to better understand the spatial structure in the image and enhances the attention and extraction of key regions, thus bringing a more significant performance improvement to the model. The implementation process of the CA attention module is shown in Figure 2.

In order to obtain the attention on the width and height of the image and encode the precise location information, the global average pooling is divided into two operations; i.e., the pooling kernel is globally averaged along two different directions, namely, the height direction (H, 1) and the width direction (1, W), to obtain two feature vectors

Z_{c}^{h} (h)

and

Z_{c}^{w} (w)

encoding the attention information of the different directions with the specific formulae as shown below.

Z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq i \leq W} x_{c} (h, i)

(1)

Z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j \leq H} x_{c} (j, w)

(2)

where

x_{c} (h, i)

and

x_{c} (j, w)

represent the input of the cth channel in the horizontal and vertical directions, respectively, and H and W represent the height and width of the input feature map, respectively.

Then, the feature maps in the two directions containing the global receptive fields are spliced, and then subjected to dimensional compression by a shared 1 × 1 convolution module to reduce their dimensions to the original C/r. Then, sigmoid activation is performed after batch normalization and non-linear processing to obtain a feature map f with the shape

C / r * 1 * (H + W)

feature map f. The specific formula is shown below:

f = δ (F_{1} ([Z^{h}, Z^{w}]))

(3)

where

Z^{h}

and

Z^{w}

are the feature vectors in the horizontal and vertical directions,

F_{1}

is the 1 × 1 convolution kernel dimensionality reduction operation, and

δ

is the sigmoid activation function.

Subsequently, the feature map is split along the spatial dimension, and 1 × 1 convolution is applied to each of these sub-feature maps to increase the number of channels from C/r to C to match the number of channels of the original feature map. Then, sigmoid activation is performed to generate attention weight vectors

g^{h}

and

g^{w}

. Finally, the original feature map is multiplied by weighted calculation to obtain the feature map

y_{c} (i, j)

with attention weights in the width and height directions. The specific formula is as follows:

g^{h} = σ (F_{h} (f^{h}))

(4)

g^{w} = σ (F_{w} (f^{w}))

(5)

y_{c} (i, j) = x_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(6)

where

F_{h}

and

F_{w}

are 1 × 1 convolution kernel dimension increase operations,

σ

is the sigmoid activation operation,

\times

is the matrix dot product operation,

y_{c} (i, j)

is the original input feature map, and

g_{c}^{h} (i)

and

g_{c}^{w} (j)

are attention weight vectors.

2.2.3. Feature Classification Module

The MobileNetV2 network can quickly capture the features of the target in the image with its efficient backbone network structure. As a key component of the network, the classifier is mainly responsible for effectively converting the feature map extracted by the feature extraction module into a specific classification result. At the end of the network, it processes the feature map through operations such as global average pooling and 1 × 1 linear convolution to obtain the recognition result. In practical applications, it is simple and direct to adjust the number of neurons in the last layer of the classifier to adapt to the classification tasks of different numbers of targets, but it is likely to cause the network to fail to learn the feature representation that best matches the target task and reduce the utilization efficiency of the network parameters. For the recognition task of this article, in order to improve the network’s recognition accuracy for plankton, the classifier of the network was redesigned. The new classifier structure consists of two convolutional layers, a global pooling layer and an output layer. The specific parameter settings of the improved classifier are shown in Table 3. In order to avoid losing too many useful features during the compression process and reduce the complexity of subsequent calculations, a 1 × 1 convolution kernel is used to compress the number of feature channels, retaining 3/5 of the original number of channels. Then, in order to increase the correlation between the feature map and the plankton category and avoid information loss caused by the large feature map in the subsequent global pooling operation, this paper uses a 3 × 3 convolution kernel to perform a convolution operation on the compressed feature map. This step helps to further enhance the expressiveness of the feature map and make it closer to the classification target. After the above two convolutional layers, the implementation uses a global pooling layer to perform a pooling operation on the feature map to further reduce the size of the feature map and extract global information. Finally, an output layer (linear layer) is used to convert the globally pooled feature map into a plankton recognition result.

2.2.4. Overall Network Introduction

The model proposed for classification is based on MobileNetV2. Through the reconstruction of the feature extraction backbone network, the introduction of the coordinate attention module, and the improvement of the network classifier and other optimization operations, a plankton recognition algorithm suitable for plankton images is designed. The overall architecture of the model is shown in Figure 3. First, in the deep convolutional neural network, the shallow network receptive field is small, and the general features such as the texture and edge of the image are extracted. Since the feature extraction backbone network is completely based on the inverted residual structure, the overly deep network architecture may lead to the gradual attenuation of the basic features, which will have a negative impact on the recognition of small targets or plankton with relatively single features, and the overly deep network structure will reduce the computational efficiency and real-time performance of the model. Therefore, in order to improve the efficiency of the model and reduce the computational cost, this paper redesigns the feature extraction network layer structure to balance the depth and width of the network, ensuring that the network can better strengthen and utilize the underlying basic features while maintaining a certain depth to capture high-level semantic features. Secondly, in order to effectively capture and express the effective features in the data and enhance the attention and extraction ability of key areas, this paper introduces the CA attention mechanism into the inverted residual module. Finally, in order to learn the feature representation that best matches the target task and further improve the utilization efficiency of network parameters, the network classifier was redesigned for the plankton identification task to ensure that the network can make full use of the rich information obtained from the feature extraction layer to accurately identify different types of plankton.

3. Results

3.1. Training Methods

3.1.1. Network Training

The experimental environment in this study is the Ubuntu 18.04 operating system, and the graphics processing unit (GPU) uses Nvidia GeForce RTX 2080 Ti (11 GB) (NVIDIA, Santa Clara, CA, USA), with a system memory of 40GB. The experiment uses the Pytorch deep-learning framework, and the software package installation versions are torch1.8.1, python3.8.10, and cuda11.1, respectively.

Before model training, in order to increase the diversity of the data and improve the robustness and generalization ability of the model, the training set images are randomly cropped to 224 × 224 pixels, randomly scaled, randomly horizontally flipped, and standardized. A series of preprocessing operations are performed. The rotation of the training set patterns uses the five-fold cross-validation method, where the dataset is randomly divided and the experiment is repeated five times, with the average taken as the experimental result. The number of training rounds is 200 epochs, the batch size is set to 16, and the Adam optimizer [39] is used, with a cross-entropy loss function. In order to make the model better converge to the optimal solution and reduce the risk of overfitting, a fixed step decay learning rate dynamic adjustment strategy [40] is adopted, the initial learning rate is set to 0.001, the weight decay factor is set to 0.5, and the learning rate is adjusted every five rounds.

3.1.2. Evaluation Indicators

In the problem of plankton identification, for multi-label classification tasks, this study uses multiple performance metrics to evaluate the proposed model [41,42]. These include accuracy, F1-score, number of parameters, amount of computation, and inference time. In order to handle the classification of multiple categories of plankton, the metric F1 score is calculated as the average of all categories. The calculation formula is as follows:

A c c u r a c y = \sum_{n = 1}^{N} \frac{T P_{n}}{M}

(7)

P r e c i s i o n = \frac{1}{N} \sum_{n = 1}^{N} \frac{T P_{n}}{T P_{n} + F P_{n}}

(8)

R e c a l l = \frac{1}{N} \sum_{n = 1}^{N} \frac{T P_{n}}{T P_{n} + F N_{n}}

(9)

F 1 - s c o r e = \frac{1}{N} \sum_{n = 1}^{N} \frac{2 \times P r e c i s i o n_{n} \times R e c a l l_{n}}{P r e c i s i o n_{n} + R e c a l l_{n}}

(10)

Among them, the number of samples is set to M, and the data are divided into two classes (this class and other). Given N 2 × 2 confusion matrices

C_{n}

, each confusion matrix is

T P_{n} — True Positive, F P_{n} — False Positive, F N_{n} — False Negative, T N_{n} — True Negative

for each class, where

n \in [1, 2, \dots, N]

:

T P_{n}

represents the number of samples that are actually positive but predicted as positive,

F P_{n}

represents the number of samples that are actually negative but predicted as positive,

F N_{n}

represents the number of samples that are actually positive but predicted as negative, and

T N_{n}

represents the number of samples that are actually negative but predicted as negative.

3.2. Model Performance Evaluation

3.2.1. Ablation Experiment

The network model proposed is based on the MobileNetV2 basic model. It reconstructs the feature extraction backbone network, introduces the coordinate attention module, and improves the network classifier. The three improvement measures are also carried out in the ablation experiment. The specific methods are as follows:

A: Use the original MobileNetV2;

B: Reconstruct the feature extraction backbone network of MobileNetV2;

C: Reconstruct the feature extraction backbone network of MobileNetV2 and introduce the coordinate attention module;

D: Reconstruct the feature extraction backbone network of MobileNetV2, introduce the coordinate attention module and improve the network classifier.

The results of the ablation experiment are shown in Table 4. It can be seen that, by adjusting the feature extraction backbone network of the MobileNetV2 model, the size and computational complexity of the model are significantly reduced, and the accuracy and F1 score of the model have a small decrease, from 94.40% to 92.74%, and the F1 score is reduced from 92.05% to 90.89%. Combining the recognition accuracy and lightweight properties of the comprehensive model, the CA attention module is introduced into the basic unit of the inverted residual module of the MobileNetV2 model, and the model accuracy is increased from 92.74% to 93.80%, and the F1 score is increased from 90.89% to 92.14%. The reason is that adding the CA attention mechanism is conducive to the network capturing long-range dependencies and retaining precise location information, which can effectively improve the model’s attention to the location information of plankton and the perception of key features. Since there is a large gap between the number of classified targets and the dimension of the feature map output by the backbone network, the network classifier was improved based on the reconstruction of the feature extraction backbone network and the introduction of the CA module. The accuracy of the improved model increased from 93.80% to 95.46%, and the F1 score increased from 92.14% to 94.48%. The number of parameters and the amount of calculation of the model were also slightly reduced. The improved model has achieved significant improvements in recognition performance. Finally, after using the improved measures strategy, compared with the original network, the number of parameters and calculation amount of the model are significantly reduced, which are reduced by 72.47% and 52.09% respectively, the accuracy rate is increased by 1.06%, the F1 score is increased by 2.43%, and the reasoning time for a single plankton image is only 6.15 ms.

3.2.2. Identification Result Analysis

In order to evaluate the recognition effectiveness of this paper’s model on the plankton dataset, the recognition results of plankton images are analyzed using several statistical metrics, including the precision rate, average precision, recall rate, F1 score, and PR curve, and the confusion matrix.

First, the recognition effectiveness of this paper’s model and the MobileNetV2 model on various types of plankton is compared and analyzed, and the results are shown in Table 5. According to the experimental results in Table 5, the overall recognition accuracy, recall, and F1 score of the improved MobileNetV2 model on various types of plankton are 94.07%, 94.99%, and 94.48%, respectively, which are higher than those of MobileNetV2 on various types of plankton with an overall recognition accuracy of 93.96%, a recall rate of 91.95%, and an F1 score of 92.05%. In addition, the F1 scores of the model proposed in this paper are more than 90% for most of the categories, and, if the plankton categories with fewer training images (such as chaetognatha and ceratium) are excluded, the recognition precision, recall, and F1 scores of each category are all above 90%.

The average precision (AP) refers to the average area under the precision–recall curve, which is usually used to measure the precision of the model at different recall rates. Generally speaking, the better the model performance, the larger the area (that is, the AP value) under the PR curve. It can be seen from Figure 4 that the area under the improved model PR curve is significantly larger, and the overall recognition AP value reaches 97.86%, while the overall recognition AP value of the original MobileNetV2 model is 97.07%. This shows that this article has a better balance between the recognition accuracy and recall rate, and proves that the model proposed in this article has a significant improvement in performance.

The confusion matrix is an important tool for evaluating the performance of classification models. Figure 5 shows the performance of the proposed model in the plankton image recognition task. The values on the main diagonal represent the number of samples that are correctly identified, and the rest reflect the cases of incorrect identification. It can be seen from the confusion matrix that most of the plankton images can be accurately identified by the model. The poor recognition effect is mainly tunicata, which is mainly misidentified as chaetognatha. The analysis of misclassified images reveals that significant defects in image quality are a core reason for misclassification. Additionally, the diversity of shooting angles results in incomplete images of the captured plankton, making it difficult for the model to accurately capture their features. Moreover, the two types of plankton have certain similarities, as shown in Figure 5b. In addition, the F1 scores of individual plankton categories (such as chaetognatha, krill, and ceratium) are relatively poor. The main reason is that the amount of data for these three types of plankton images is relatively scarce, which makes it difficult for the model to fully learn the features of these categories during training, thus affecting their recognition performance.

3.2.3. Contrast Experiment

In order to further verify the superior performance of the model designed in this paper in identifying plankton, this study used the mainstream models ResNet34, EfficientNetb0, ShuffNet0.5, ShuffNet1.0, MobileViT_xxs, etc. as comparison models to conduct comparative experiments in the plankton test set. In all experiments, other conditions and hyperparameter settings were kept consistent. The specific results are shown in Table 6.

As can be seen from Table 6, the improved MobileNetV2 model proposed in this study showed excellent performance in the plankton recognition task. The accuracy of the model reached 95.46%, which was 4.99% and 2.27% higher than the accuracy of the ShuffNet0.5 and ShuffNet1.0 models, and the F1 scores were increased by 6.48% and 2.22%, respectively. From the perspective of the number and size of the model parameters, the method also has obvious advantages. Compared with ShuffNet0.5 and MobileViT_xxs, the number of parameters of this model is 45.10% and 48.46% of their respective parameters, which greatly reduces the consumption of computing resources. At the same time, the advantage of the model size also makes it more suitable for deployment on edge computing devices. The computational time is generally related to the number of model parameters (Pares) and the number of floating-point operations (MFLOPs). For example, EfficientNetb0 has fewer floating-point operations than ResNet34; yet, its inference time is longer. This may be due to trade-offs made in EfficientNetb0’s architecture to optimize computational efficiency, which, in some cases, can lead to increased computation time. In addition, the recognition time of a single plankton is only 6.15 ms, which fully meets the needs of the fast and real-time recognition of plankton.

In summary, although this model is slightly inferior to the ResNet34 model in terms of the average single plankton image recognition time, it has significant advantages in terms of parameter quantity, model size, and recognition accuracy. Therefore, compared with other mainstream models, the model proposed in this paper has achieved a good balance between recognition accuracy, model parameter quantity, and operating efficiency, and is suitable for deployment in edge computing devices for plankton recognition.

3.2.4. Application

In the spring of 2018, we took the Xiangyanghong 01 research vessel to typical waters in the southern Pacific Ocean and used VPR to tow to obtain data from a profile. The above model was used to analyze the data obtained. This set of data generated a total of 42,532 individual images of plankton. After using the model to identify each plankton image, the data with a probability greater than 0.8 were screened out based on the predicted top-1 category probability to ensure the accuracy of the results. A total of 22,882 individual images of plankton were retained for subsequent analysis. Currently, the developed lightweight algorithm has been deployed on the Huawei Ascend Atlas 200I DK A2 development board (Huawei, Shenzhen, China). First, it is necessary to prepare the hardware environment, install drivers and firmware, install dependencies, install the CANN software (version 6.2.RC2), and configure the environment variables. Then, the algorithm is converted into an ONNX intermediate model, which is further converted into an om inference model suitable for the development board using the ATC conversion tool. This allows for image classification on the development board. The product specifications of the Atlas 200I DK A2 development board include a CPU capability of four cores at 1.0 GHz, AI capability of 8 TOPS, 4 GB of memory, and a power consumption of 24 W. Combining the time information corresponding to the image and the observation data of CTD (Sea Bird SBE 49), the distribution of two types of plankton, copepods and foraminifera, on the towed profile was plotted. The time was divided into equally spaced time periods (with a step size of 0.0001 days), and then the number of plankton and the average depth in each time period were counted by time period, and then a data point was generated to better realize the data visualization. Figure 6 shows the spatial distribution of two types of plankton, copepods and foraminifera, drawn based on the prediction results of the model. As can be seen from the figure, copepods and foraminifera are mainly distributed in the depth range of 20 to 80 m underwater. The abundance is lower in water bodies deeper than 80 m, but copepods are more concentrated in the vertical direction. The model proposed greatly simplifies the statistics and research work of plankton distribution. Through efficient data-processing methods, the abundance and distribution of each category can be more intuitively and accurately displayed. This will provide a faster, more effective, and more convenient research method for the scientific research of oceanographers and biologists, which is of great significance for the monitoring and protection of the marine ecological environment.

4. Discussion

In view of the problems of large model parameters, insufficient real-time performance, and poor accuracy in the current convolutional neural network in the identification of plankton, this study proposes a lightweight plankton image recognition algorithm that improves MobileNetV2. The algorithm takes 12 types of plankton images as the research object, and constructs a plankton dataset containing 6674 images through manual screening, combined with data enhancement technology. Based on the MobileNetV2 model, this study took three key improvement measures: feature extraction backbone network reconstruction, the introduction of the coordinate attention module and network classifier improvement, and the verification of the effectiveness of these improvement measures through ablation experiments. The experimental results show that, compared with mainstream models such as MobileNetV2, EfficientNetb0, and ShuffNet0.5, the model proposed in this study shows better recognition performance on the plankton test set, with an accuracy of 95.46% and an F1 score of 94.48%. At the same time, the corresponding advantages are also reflected in the model size. The inference time for a single plankton image is only 6.15 ms, which is suitable for deployment on edge computing devices. Taking into account the number of model parameters, recognition accuracy, and speed, the proposed method has the best performance and realizes the accurate, fast, and efficient recognition of plankton. In addition, by further combining the hydrological information observed by the depth sensor, the abundance of each type of plankton can be obtained.

Although the model proposed achieves a significant performance improvement in the plankton identification task, there is still some room for improvement. First, the dataset used is only 12 types of images, which will be further expanded in the future to include a wider and more diverse array of plankton species. Secondly, in the future, modern enhancement techniques such as synthetic data generation and generative adversarial networks will be further integrated. Additionally, more images from different angles will be added as training data to improve the model’s robustness and generalization ability. Finally, since ocean observation equipment often faces strict power and computing resource constraints, the model structure and parameters can be further optimized to improve the operational efficiency and stability of the algorithm. In general, this study not only optimizes the model structure and computational complexity, realizes the lightweight properties of the network, and improves the recognition efficiency, but also realizes the real-time and high-precision application of plankton identification by integrating hydrological information. Deploying the proposed model on the embedded platform of the in situ imager will help to realize the long-term, fixed-point, real-time stereoscopic observation of marine plankton, providing strong support for ecological environment monitoring and marine scientific research.

Author Contributions

Conceptualization, C.Y., Z.H. and C.N.; methodology, C.Y., J.Z. and C.N.; software, C.Y.; validation, C.Y.; formal analysis, C.N. and W.W.; investigation, C.Y. and J.Z.; resources, C.N., W.W. and J.Z.; data curation, G.Y., C.N. and C.L.; writing—original draft, C.Y., C.N. and G.Y.; writing—review and editing, Z.H., C.N. and W.W.; visualization, C.Y. and W.W.; supervision, C.N., C.L. and Z.H.; project administration, C.N.; funding acquisition, C.N. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the National Key Research and Development Program of China, the funding number is 2022YFC3104301; and Laoshan Laboratory, with funding number LSKJ202201601.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aoki, I.; Komatsu, T.; Hwang, K. Prediction of response of zooplankton biomass to climatic and oceanic changes. Ecol. Model. 1999, 120, 261–270. [Google Scholar] [CrossRef]
Hays, G.C.; Richardson, A.J.; Robinson, C. Climate change and marine plankton. Trends Ecol. Evol. 2005, 20, 337–344. [Google Scholar] [CrossRef] [PubMed]
Moreno, A.R.; Martiny, A.C. Ecological stoichiometry of ocean plankton. Annu. Rev. Mar. Sci. 2018, 10, 43–69. [Google Scholar] [CrossRef]
Racault, M.F.; Platt, T.; Sathyendranath, S.; Ağirbaş, E.; Martinez Vicente, V.; Brewin, R. Plankton indicators and ocean observing systems: Support to the marine ecosystem state assessment. J. Plankton Res. 2014, 36, 621–629. [Google Scholar] [CrossRef]
Pastore, V.P.; Zimmerman, T.G.; Biswas, S.K.; Bianco, S. Annotation-free learning of plankton for classification and anomaly detection. Sci. Rep. 2020, 10, 12142. [Google Scholar] [CrossRef]
Palumbi, S.R.; Sandifer, P.A.; Allan, J.D.; Beck, M.W.; Fautin, D.G.; Fogarty, M.J.; Halpern, B.S.; Incze, L.S.; Leong, J.-A.; Norse, E.; et al. Managing for ocean biodiversity to sustain marine ecosystem services. Front. Ecol. Environ. 2009, 7, 204–211. [Google Scholar] [CrossRef]
Prairie, J.C.; Sutherland, K.R.; Nickols, K.J.; Kaltenberg, A.M. Biophysical interactions in the plankton: A cross-scale review. Limnol. Oceanogr. Fluids Environ. 2012, 2, 121–145. [Google Scholar] [CrossRef]
Chivers, W.J.; Walne, A.W.; Hays, G.C. Mismatch between marine plankton range movements and the velocity of climate change. Nat. Commun. 2017, 8, 14434. [Google Scholar] [CrossRef]
Benfield, M.; Grosjean, P.; Culverhouse, P.; Irigolen, X.; Sieracki, M.; Lopez-Urrutia, A.; Dam, H.; Hu, Q.; Davis, C.; Hanson, A.; et al. RAPID: Research on automated plankton identification. Oceanography 2007, 20, 172–187. [Google Scholar] [CrossRef]
Davis, C.S.; Thwaites, F.T.; Gallager, S.M.; Hu, Q. A three-axis fast-tow digital Video Plankton Recorder for rapid surveys of plankton taxa and hydrography. Limnol. Oceanogr. Methods 2005, 3, 59–74. [Google Scholar] [CrossRef]
Sun, H.; Hendry, D.C.; Player, M.A.; Watson, J. In situ underwater electronic holographic camera for studies of plankton. IEEE J. Ocean. Eng. 2007, 32, 373–382. [Google Scholar]
Dahms, H.U.; Hwang, J.S. Perspectives of underwater optics in biological oceanography and plankton ecology studies. J. Mar. Sci. Technol. 2010, 18, 14. [Google Scholar] [CrossRef]
Corgnati, L.; Marini, S.; Mazzei, L.; Ottaviani, E.; Aliani, S.; Conversi, A.; Griffa, A. Looking inside the ocean: Toward an autonomous imaging system for monitoring gelatinous zooplankton. Sensors 2016, 16, 2124. [Google Scholar] [CrossRef] [PubMed]
Colas, F.; Tardivel, M.; Perchoc, J.; Lunven, M.; Forest, B.; Guyader, G.; Danielou, M.; Le Mestre, S.; Bourriau, P.; Antajan, E.; et al. The ZooCAM, a new in-flow imaging system for fast onboard counting, sizing and classification of fish eggs and metazooplankton. Prog. Oceanogr. 2018, 166, 54–65. [Google Scholar] [CrossRef]
Jiang, Z.; Liu, J.; Zhu, X.; Chen, Y.; Chen, Q.; Chen, J. Quantitative comparison of phytoplankton community sampled using net and water collection methods in the southern Yellow Sea. Reg. Stud. Mar. Sci. 2020, 35, 101250. [Google Scholar] [CrossRef]
Li, X.; Liao, R.; Zhou, J.; Leung, P.T.; Yan, M.; Ma, H. Classification of morphologically similar algae and cyanobacteria using Mueller matrix imaging and convolutional neural networks. Appl. Opt. 2017, 56, 6520–6530. [Google Scholar] [CrossRef]
Walcutt, N.L.; Knörlein, B.; Cetinić, I.; Ljubesic, Z.; Bosak, S.; Sgouros, T.; Montalbano, A.L.; Neeley, A.; Menden-Deuer, S.; Omand, M.M. Assessment of holographic microscopy for quantifying marine particle size and concentration. Limnol. Oceanogr. Methods 2020, 18, 516–530. [Google Scholar] [CrossRef]
Orenstein, E.C.; Beijbom, O.; Peacock, E.E.; Sosik, H.M. WHOI-Plankton—A Large Scale Fine Grained Visual Recognition Benchmark Dataset for Plankton Classification. arXiv 2015, arXiv:1510.00745. [Google Scholar]
Dai, J.; Wang, R.; Zheng, H.; Ji, G.; Qiao, X. ZooplanktoNet: Deep convolutional network for zooplankton classification. In Proceedings of the OCEANS 2016—Shanghai, Shanghai, China, 10–13 April 2016; pp. 1–6. [Google Scholar]
Py, O.; Hong, H.; Zhongzhi, S. Plankton classification with deep convolutional neural networks. In Proceedings of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, Chongqing, China, 20–22 May 2016; pp. 132–136. [Google Scholar]
Ellen, J.S.; Graff, C.A.; Ohman, M.D. Improving plankton image classification using context metadata. Limnol. Oceanogr. Methods 2019, 17, 439–461. [Google Scholar] [CrossRef]
Tang, X.; Stewart, W.K.; Huang, H.; Gallager, S.M.; Davis, C.S.; Vincent, L.; Marra, M. Automatic plankton image recognition. Artif. Intell. Rev. 1998, 12, 177–199. [Google Scholar] [CrossRef]
Hu, Q.; Davis, C. Automatic plankton image recognition with co-occurrence matrices and support vector machine. Mar. Ecol. Prog. Ser. 2005, 295, 21–31. [Google Scholar] [CrossRef]
Zheng, H.; Wang, R.; Yu, Z.; Wang, N.; Gu, Z.; Zheng, B. Automatic plankton image classification combining multiple view features via multiple kernel learning. BMC Bioinform. 2017, 18, 570. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Li, X.; Cui, Z. Deep residual networks for plankton classification. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–4. [Google Scholar]
Lee, H.; Park, M.; Kim, J. Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3713–3717. [Google Scholar]
Nandini, T.S.; Swethaa, S.; Bolem, S.; Dharani, G.; Thangarasu, S. Real-time classification of plankton species using convolutional neural networks. In Proceedings of the OCEANS 2022—Chennai, Chennai, India, 21–24 February 2022; pp. 1–5. [Google Scholar]
Guo, B.; Nyman, L.; Nayak, A.R.; Milmore, D.; McFarland, M.; Twardowski, M.S.; Sullivan, J.M.; Hong, J. Automated plankton classification from holographic imagery with deep convolutional neural networks. Limnol. Oceanogr. Methods 2021, 19, 21–36. [Google Scholar] [CrossRef]
Pfitsch, D.W.; Malkiel, E.; Takagi, M.; Ronzhes, Y.; King, S.; Sheng, J.; Katz, J. Analysis of in-situ microscopic organism behavior in data acquired using a free-drifting submersible holographic imaging system. In Proceedings of the OCEANS 2007, Vancouver, BC, Canada, 29 September–4 October 2007; pp. 1–8. [Google Scholar]
Bochdansky, A.B.; Jericho, M.H.; Herndl, G.J. Development and deployment of a point-source digital inline holographic microscope for the study of plankton and particles to a depth of 6000 m. Limnol. Oceanogr. Methods 2013, 11, 28–40. [Google Scholar] [CrossRef]
Talapatra, S.; Hong, J.; McFarland, M.; Nayak, A.R.; Zhang, C.; Katz, J.; Sullivan, J.; Twardowski, M.; Rines, J.; Donaghay, P. Characterization of biophysical interactions in the water column using in situ digital holography. Mar. Ecol. Prog. Ser. 2013, 473, 29–51. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.G. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Proceedings of the European Conference on Information Retrieval, Santiago de Compostela, Spain, 21–23 March 2005; pp. 345–359. [Google Scholar]
Lumini, A.; Nanni, L. Deep learning and transfer learning features for plankton classification. Ecol. Inform. 2019, 51, 33–43. [Google Scholar] [CrossRef]

Figure 1. Sample images in 12 categories. 1: copepod; 2: radiolaria; 3: medusa; 4: foraminifera; 5: chaetognatha; 6: tunicata; 7: snow; 8: faeces; 9: diatom; 10: chaetoceros; 11: krill; 12: ceratium.

Figure 2. CA attention module structure. C, number of channels; H, height; W, width; Avg Pool, average pooling; Conv2d, 2-dimensional convolution function.

Figure 3. Overall architecture of the improved MobileNetV2 model. Conv2d, 2-dimensional convolutional layer; Avg pool, average pooling; Linear, linear transformation layer; Dwise, depthwise convolution.

Figure 4. PR curves for various categories: (a) the PR curve of the improved model; (b) the PR curve of the original MobileNetV2 model.

Figure 5. (a) Confusing matrix result. (b) Analysis of misclassified images (using cyst-like plankton as an example).

Figure 6. Spatial distribution map of plankton. The horizontal coordinates in the figure represent time, and the number of days that year has been accumulated from the beginning of the year by Yearday. The vertical coordinate represents depth, and is based on rice. The depth range in the figure is from 0 m to −140 m. The scattering point in the figure represents the number of floats at a specific time and depth, and the color and size of the point represent the amount and relative size of the number.

Table 1. Dataset details.

No.	Category	Original Quantity	Enhanced Quantity
1	copepod	608	1216
2	radiolaria	504	1044
3	medusa	108	276
4	foraminifera	320	816
5	chaetognatha	53	244
6	tunicata	201	458
7	snow	228	648
8	faeces	243	660
9	diatom	238	516
10	chaetoceros	137	338
11	krill	42	222
12	ceratium	46	236

Table 2. Structural parameters of the MobileNetV2 model. Conv2d is the convolution operation; Bottleneck is the inverted residual bottleneck layer; t is the expansion factor; c is the number of output channels; n is the number of repetitions of Bottleneck; s is the step size; and k is the number of species.

Input	Operator	t	c	n	s
224 × 224 × 3	Conv2d	-	32	1	2
112 × 112 × 32	Bottleneck	1	16	1	1
112 × 112 × 16	Bottleneck	6	24	2	2
56 × 56 × 24	Bottleneck	6	32	3	2
28 × 28 × 32	Bottleneck	6	64	4	2
28 × 28 × 64	Bottleneck	6	96	3	1
14 × 14 × 96	Bottleneck	6	160	3	2
7 × 7 × 160	Bottleneck	6	320	1	1
7 × 7 × 320	Conv2d 1 × 1	-	1280	1	1
7 × 7 × 1280	Avgpool 7 × 7	-	-	1	-
1 × 1 × 1280	Conv2d 1 × 1	-	k	-	-

Table 3. Structure of the improved classifier. Conv2d, 2-dimensional convolution function; Relu6, activation function; Avg pool, average pooling; Linear, linear transformation layer.

Input	Operator	Output
7 × 7 × 320	Conv2d 1 × 1, Relu6	7 × 7 × 192
7 × 7 × 192	Conv2d 3 × 3, Relu6	4 × 4 × 64
4 × 4 × 64	Avg pool	1 × 1 × 64
1 × 1 × 64	Linear	1 × 1 × 12

Table 4. Ablation experiment results. Pares (parameter quantity) stands for the quantity of parameters contained in the model structure, corresponding to the spatial complexity of the model; MFLOP (computation quantity) stands for the number of floating-point operations, used to measure the computational complexity of the model; time refers to the time required for the model to process a sample in the inference stage. Acc stands for accuracy, Pre stands for precision, Rec stands for recall, and F1 stands for F1 score; and time represents the inference time.

Model	Pares	Size/M	MFLOPs	Time/ms	Acc/%	Pre/%	Rec/%	F1/%
A	2,239,244	8.54	326.29	5.78	94.40	93.96	91.95	92.05
B	589,836	2.25	158.22	2.73	92.74	91.37	91.12	90.89
C	656,862	2.51	161.74	6.06	93.80	92.49	92.28	92.14
D	616,414	2.35	156.36	6.15	95.46	94.07	94.99	94.48

Table 5. Comparison of evaluation index results for each category.

	Improved Model			MobileNetV2
Class	Pre/%	Rec/%	F1/%	Pre/%	Rec/%	F1/%
01copepod	99.14	95.04	97.05	97.52	97.52	97.52
02radiolaria	97.06	95.19	96.12	93.58	98.08	95.77
03medusa	92.86	96.30	94.55	100.00	88.89	94.12
04foraminifera	96.30	96.30	96.30	98.68	92.59	95.54
05chaetognatha	85.19	95.83	90.20	94.12	66.67	78.05
06tunicata	91.30	93.33	92.31	97.67	93.33	95.45
07snow	94.03	98.44	96.18	94.12	100.00	96.97
08faeces	96.92	95.45	96.18	96.88	93.94	95.38
09diatom	94.23	96.08	95.15	100.00	94.12	96.97
10chaetoceros	100.00	100.00	100.00	100.00	100.00	100.00
11krill	90.91	90.91	90.91	55.00	100.00	70.97
12ceratium	90.91	86.96	88.89	100.00	78.26	87.80
macro avg	94.07	94.99	94.48	93.96	91.95	92.05

Table 6. Comparison of recognition results of different models.

Model	Pares	MFLOPs	Time/ms	Acc/%	Pre/%	Rec/%	F1/%
ResNet34	21,797,672	3678.74	4.40	94.70	93.98	93.26	93.58
EfficientNetb0	5,288,548	28.29	11.06	94.25	94.99	92.75	93.67
ShuffleNet0.5	1,366,792	44.57	6.79	90.47	88.21	88.00	88.00
ShuffleNet1.0	2,278,604	152.71	6.94	93.19	92.01	92.72	92.26
MobileViT_xxs	1,272,024	273.67	9.26	94.40	93.26	94.90	93.81
Improved model	616,414	156.36	6.15	95.46	94.07	94.99	94.48

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, C.; He, Z.; Ning, C.; Wang, W.; Zhao, J.; Yuan, G.; Li, C. Research on In Situ Observation Method of Plankton Based on Convolutional Neural Network. J. Mar. Sci. Eng. 2024, 12, 1702. https://doi.org/10.3390/jmse12101702

AMA Style

Yuan C, He Z, Ning C, Wang W, Zhao J, Yuan G, Li C. Research on In Situ Observation Method of Plankton Based on Convolutional Neural Network. Journal of Marine Science and Engineering. 2024; 12(10):1702. https://doi.org/10.3390/jmse12101702

Chicago/Turabian Style

Yuan, Chengzhi, Zhongjie He, Chunlin Ning, Weimin Wang, Jinkai Zhao, Guozheng Yuan, and Chao Li. 2024. "Research on In Situ Observation Method of Plankton Based on Convolutional Neural Network" Journal of Marine Science and Engineering 12, no. 10: 1702. https://doi.org/10.3390/jmse12101702

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on In Situ Observation Method of Plankton Based on Convolutional Neural Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Dataset Description

2.1.2. Data Preprocessing

2.2. Methods

2.2.1. MobileNetV2 Model Structure

2.2.2. Attention Module

2.2.3. Feature Classification Module

2.2.4. Overall Network Introduction

3. Results

3.1. Training Methods

3.1.1. Network Training

3.1.2. Evaluation Indicators

3.2. Model Performance Evaluation

3.2.1. Ablation Experiment

3.2.2. Identification Result Analysis

3.2.3. Contrast Experiment

3.2.4. Application

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI