Feature Extraction and Recognition of Chinese Mitten Crab Carapace Based on Improved MobileNetV2

Peng, Nengtian; Chen, Ming; Feng, Guofu

doi:10.3390/app14124982

Open AccessArticle

Feature Extraction and Recognition of Chinese Mitten Crab Carapace Based on Improved MobileNetV2

by

Nengtian Peng

,

Ming Chen

^*

and

Guofu Feng

Key Laboratory of Fisheries Information, Ministry of Agriculture and Rural Affairs, Shanghai Ocean Univesity, Hucheng Ring Road 999, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 4982; https://doi.org/10.3390/app14124982

Submission received: 18 April 2024 / Revised: 24 May 2024 / Accepted: 4 June 2024 / Published: 7 June 2024

(This article belongs to the Special Issue Engineering of Smart Agriculture—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

The Chinese mitten crab (Eriocheir sinensis), a species unique to Chinese aquaculture, holds significant economic value in the seafood market. In response to increasing concerns about the quality and safety of Chinese mitten crab products, the high traceability costs, and challenges for consumers in verifying the authenticity of individual crabs, this study proposes a lightweight individual recognition model for Chinese mitten crab carapace images based on an improved MobileNetV2. The method first utilizes a lightweight backbone network, MobileNetV2, combined with a coordinate attention mechanism to extract features of the Chinese mitten crab carapace, thereby enhancing the ability to recognize critical morphological features of the crab shell while maintaining the model’s light weight. Then, the model is trained using the ArcFace loss function, which effectively extracts the generalized features of the Chinese mitten crab carapace images. Finally, authenticity is verified by calculating the similarity between two input images of Chinese mitten crab carapaces. Experimental results show that the model, combined with the coordinate attention mechanism and ArcFace, achieves a high accuracy rate of 98.56% on the Chinese mitten crab image dataset, surpassing ShuffleFaceNet, MobileFaceNet, and VarGFaceNet by 13.63, 11.1, and 6.55 percentage points, respectively. Moreover, it only requires an average of 1.7 milliseconds per image for verification. While maintaining lightness, this model offers high efficiency and accuracy, offering an effective technical solution for enhancing the traceability of Chinese mitten crab products and combating counterfeit goods.

Keywords:

Chinese mitten crab; feature extraction; image recognition; MobileNetV2; ArcFace loss

1. Introduction

The Chinese mitten crab (Eriocheir sinensis), commonly known as the hairy crab or river crab, is renowned for its unique flavor and rich nutritional value, making it one of China’s principal aquaculture products and a favorite among global culinary enthusiasts [1]. According to the “China Fisheries Statistical Yearbook” (2023) [2], China’s 2022 cultivation output of Chinese mitten crabs reached 815,318 tons, with a total value of 50 billion RMB, underscoring its significant role in freshwater fisheries. The market for Chinese mitten crabs is extremely broad, ranging from upscale restaurants to family dining tables, and from online marketplaces to offline markets, with its presence felt everywhere. The numerous brands and trademarks registered, over a hundred in total, demonstrate its significant status in the aquaculture market. As market demand increases, issues related to the quality and safety of Chinese mitten crab products are also rising, such as the misuse of drugs during breeding processes and the practice of temporarily altering breeding environments to counterfeit branded products [3,4]. This not only poses a threat to consumer food safety but also impedes efforts towards supply chain transparency and brand value establishment.

To enhance the traceability of Chinese mitten crab products and combat counterfeit goods, various solutions have been proposed. For example, anti-counterfeit labels printed with barcodes or QR codes are attached to the crab’s claws, allowing consumers to scan these identifiers with mobile devices to access product information [5]. This method incurs high traceability costs and has poor environmental sustainability. Moreover, the anti-counterfeit tags are logically separate from the Chinese mitten crabs themselves, making it easy for unscrupulous merchants to recycle or counterfeit these identifiers. Studies show that the morphological growth of Chinese mitten crabs is influenced by genetic factors and the surrounding environment, with their carapaces exhibiting unique features such as grooves, protrusions, and textures [6,7]. Similar to human fingerprints, the characteristics of the Chinese mitten crab’s shell are unique and non-replicable [8], and short-term changes in the environment do not alter these morphological features [9]. Therefore, the recognition of the carapace has become an important means of distinguishing different Chinese mitten crabs [10]. Weipeng T. and others were the first to use SURF and FLANN algorithms to extract and match feature points on crab carapaces, performing individual matching verification based on these features, which showed variability among individual crab carapace features, although this method is susceptible to disturbances from uneven lighting and noise [11]. Therefore, the individual recognition of Chinese mitten crabs requires a more accurate, reliable, efficient, and convenient method of recognition.

Given the powerful feature extraction capabilities of deep learning, image recognition based on deep learning has been widely applied in studies on humans and certain animals and plants, such as facial recognition [12,13], animal facial recognition (e.g., pigs and sheep) [14,15,16], and plant disease recognition [17,18]. Some researchers have also used deep learning for the individual recognition of Chinese mitten crab carapace features. Yuying F. and other researchers and other researchers utilized a pyramid convolutional network to extract carapace image features of Chinese mitten crabs, combining it with a Multilayer Perceptron (MLP) to classify and recognize 100 individual crabs and achieving an accuracy rate of 98.88% [19]. Although this method achieved a high classification accuracy by enhancing the model’s ability to extract image features, it cannot recognize Chinese mitten crabs not included in the training samples. The addition of new individuals requires retraining the entire network. Guozhong S. and others improved the ResNet101 residual network to extract the features of Chinese mitten crabs, using these features to calculate their similarity with authentic images for verification against a registered database, achieving individual crab traceability with an accuracy rate of 92.1% [20]. This method introduces a new technical approach to recognition technology. However, due to its large model size and computational demands, it requires substantial computational resources and training time, making it less suitable for resource-constrained environments.

Considering the issues mentioned above, as well as the potential applications and scalability of Chinese mitten crab traceability, this paper proposes a lightweight individual recognition method based on the image features of Chinese mitten crabs. This method aims to enhance the traceability of Chinese mitten crab products and combat counterfeit products.

The main contributions of this study are summarized as follows:

A Chinese mitten crab image recognition dataset is constructed, containing data for 122 individual Chinese mitten crabs and 64,050 annotated images.
A method for recognizing Chinese mitten crab carapace features based on MobileNetV2 is proposed, utilizing a lightweight model structure to effectively improve the accuracy and efficiency of crab recognition.
A coordinate attention mechanism is introduced to the module, enhancing the model’s ability to extract detailed features of the crab carapace.
The Additive Angular Margin Loss (ArcFace) is introduced to train the model, enhancing the intra-class compactness and inter-class variability of the extracted features.

The rest of the paper is organized as follows. Section 2 describes the data collection process, data augmentation, carapace recognition process, and the methods. Section 3 presents the experimental setup and analyses used to evaluate the proposed model, including the model training results, ablation experiments, quantitative comparisons with other advanced algorithms, and generalization testing of model feature extraction. Finally, the discussion and conclusions are presented in Section 4 and Section 5, respectively.

2. Materials and Methods

This section describes the materials and methods used in the study, including data collection, data augmentation, the methodological process, the configuration of the feature extraction network, and the specific application of attention mechanisms and loss functions.

2.1. Materials

This subsection provides a detailed description of the data collection methods and the dataset used for training the model. It explains how images of Chinese mitten crabs were collected, as well as the enhancement and partitioning of the dataset.

2.1.1. Image Acquisition

To perform image recognition using deep learning models, training the model is essential, and the collection of datasets serves as a critical foundation. Due to the scarcity of publicly available datasets on the research topic, we constructed our dataset. This study collected images from 122 artificially bred Chinese mitten crabs at the Aquatic Animal Germplasm Testing Station in Pudong New District, Shanghai (121°39′56.503″ E, 31°0′48.366″ N), numbering each crab starting from zero. An example of the image collection environment and equipment is shown in Figure 1. An MV-CA050-12UC industrial camera (Hikvision, Hangzhou, China) was mounted on a stand and positioned at a vertical height of 0.4 m, centered on the carapace area of the Chinese mitten crab, with the collected images having a resolution of 2048 × 2048 pixels.

2.1.2. Data Augmentation and Partitioning

Deep convolutional neural networks excel in various computer vision tasks, including image recognition. Training these network models typically requires a large number of training images to enhance the model’s generalization ability. However, due to the high economic value of Chinese mitten crabs, obtaining a large amount of data is often challenging. To overcome this issue and enhance the recognition capability, generalizability, and robustness of the model, we augmented the existing dataset. Initially, all original sample image data were augmented by scaling to different sizes, resulting in a dataset of 610 images. Then, the data from the first augmentation were further expanded by changing their brightness, conversion to grayscale, random translation, rotation, affine transformations, the addition of salt-and-pepper noise, and random cropping and padding, yielding a total of 64,050 images. The sample division is shown in Table 1, where 58,800 images from 112 samples are divided into training and validation image pairs in an 8:2 ratio. To verify the model’s recognition ability, the image pairs were randomly matched following the Labeled Faces in the Wild (LFW) format [21], with 3000 matching and non-matching items each. The remaining 5250 images from 10 samples served as a generalization ability test set and were used to test the model’s ability to distinguish unknown samples.

2.2. Overall Process Flow of the Proposed Method

This study proposes a method for extracting features from the carapace of river crabs and verifying and recognizing them, as illustrated in the flowchart in Figure 2.

2.3. Methods

This subsection introduces the technological advancements integrated into the MobileNetV2 architecture, focusing on modifications such as the coordinate attention mechanism and the ArcFace loss function. These enhancements are crucial for improving the accuracy and efficiency of the model in extracting and recognizing distinct features from crab carapace images.

2.3.1. Feature Extraction Network

MobileNetV2, proposed by Sandler et al., is a neural network model renowned for its lightweight design [22]. The model emphasizes efficiency and incorporates innovative features such as depthwise separable convolutions, inverted residual blocks, and linear bottleneck structures. These designs enable MobileNetV2 to maintain a high level of accuracy while achieving exceptional efficiency, making it widely applicable to various image recognition tasks [23,24,25]. Therefore, this study employed MobileNetV2 as the foundational architecture for the feature extraction network.

Additionally, in the Chinese mitten crab image traceability task, as the number of crabs is not fixed, to enhance the model’s generalization ability in recognizing crab features, this study modified the output layer of the MobileNetV2 network. Specifically, we replaced the traditional classification output layer with a new fully connected layer designed to output 128-dimensional feature vectors.

This fixed-dimension feature output method provides a standardized and information-rich input for subsequent feature similarity calculations, not only enhancing the model’s ability to handle unseen samples but also facilitating the model’s deployment in real-world application scenarios. The model structure is illustrated in Figure 3.

2.3.2. Coordinate Attention Mechanism

In computer vision, attention mechanisms are inspired by the human brain’s focus on the detailed characteristics of areas of interest, learning target details layer by layer while suppressing irrelevant information, thus significantly highlighting the features of the target area. Attention mechanism modules have been proven to enhance performance in computer vision tasks and are widely used in fields such as image classification, object detection, and image segmentation [26].

To improve the feature extraction capability and focus on the important morphological features of the Chinese mitten crab traceability model, this study introduces the Coordinate Attention mechanism (CA) [27]. The CA refines the attention allocation across the spatial dimensions of the input feature map, enhancing the recognition of key morphological features on the crab carapace, such as grooves, protrusions, and textures. The structure of the CA is shown in Figure 4.

The CA module adopts an effective method to capture channel relationships and positional information, achieving the further suppression of background noise and focusing on the key information of the crab carapace, thereby outputting refined features that more accurately represent the essence of the crab carapace. It decomposes any feature map in the convolutional layer into two different directions for feature encoding, thereby acquiring long-range dependencies in one spatial direction and precise positional information in the other. This method of directly embedding positional information into channel attention can be complementarily applied to the input feature map, thus enhancing the target representation in areas of interest. The specific steps are as follows:

First, the crab carapace feature map is pooled along the horizontal direction (kernel (

H

, 1)) and vertical direction (kernel (1,

W

)) to obtain the positional information of the input feature map along the x axis and y axis. The average pooling in the horizontal direction can be expressed as follows:

z_{c}^{h} (h) = \frac{1}{W} \sum_{i = 0}^{W} x_{c} (h, i)

(1)

where

z_{c}^{h} (h)

represents the average features along the width

W

at a specific height

h

. Similarly, the average pooling in the vertical direction can be expressed as follows:

z_{c}^{w} (w) = \frac{1}{H} \sum_{j = 0}^{H} x_{c} (j, w)

(2)

where

z_{c}^{w} (w)

represents the average of features along the height

H

at a specific width

w

. These two operations yield feature descriptors that capture information along the horizontal and vertical directions, respectively, with each independently summarizing the statistical information of the entire feature map.

Secondly, the feature maps from the horizontal and vertical pooling results are concatenated and fed into a 1 × 1 convolution to obtain the attention feature map.

f = δ (F_{1} ([z^{h}, z^{w}])), f \in R^{C / r x (H + W)}

(3)

where

[,]

represents the concatenation operation along the spatial dimensions and

δ

is a nonlinear activation function. The feature maps with non-linear data obtained after the activation function are then processed for feature extraction in both the horizontal and vertical directions.

g^{h} = δ (F_{h} (f^{h})), g^{w} = δ (F_{w} (f^{w}))

(4)

In the equation,

g^{h}

and

g^{w}

represent the attention weights along the horizontal and vertical directions, respectively.

Finally, after processing through a 1 × 1 convolution and a sigmoid operation, the feature map data are multiplied by the weights, which are processed horizontally and vertically to output the final features.

y_{c} (i, j) = x_{c} (i, j) * g_{c}^{h} (i) * g_{c}^{w} (j)

(5)

2.3.3. Loss Function

The loss function plays a crucial role in model training, serving as a key factor in ensuring effective learning and accurate predictions by the model. Traditional classification loss functions utilize the Softmax loss function [28], which primarily maps output features to the probability range (0, 1) and classifies these features based on the probabilities. Although it ensures the separability of classes in recognition tasks, it lacks constraints on intra-class and inter-class distances. This limitation makes it less suitable for tasks requiring fine-grained individual identification, such as those distinguishing individual Chinese mitten crabs.

To address this issue, this paper employs the ArcFace loss function [29]. This has proven effective in enhancing feature separability in facial recognition tasks, which is analogous to the challenge we face in identifying individual Chinese mitten crabs based on subtle differences in carapace features. The purpose of ArcFace is to encourage the learned feature vectors so that they cluster more closely together in angular space while larger angular distances are maintained between inter-class feature vectors, thereby improving the discriminability of the features. The specific expression is shown as follows:

L_{ArcFace} = - \frac{1}{N} \sum_{i = 1}^{N} \ln \frac{e^{s (\cos (θ_{y_{i}} + m))}}{e^{s (\cos (θ_{y_{i}} + m))} + \sum_{j = 1, j \neq y_{i}}^{C} e^{s \cos θ_{j}}}

(6)

In the equation,

N

is the number of sample batches in the training batch,

y_{i}

is the true label of the sample,

θ_{y_{i}}

is the angle between the feature vector of the sample

i

and the weight vector of its corresponding category

y_{i}

,

s

is the scaling factor used to control the radius of the feature space, and

m

is the margin added to increase the angular separation between feature vectors, thus enhancing the model’s discriminative ability.

2.3.4. Similarity Calculation

After obtaining the image features using the model, the similarity between two crab carapace images can be determined using cosine similarity. Suppose the features of two crab carapace images extracted by the feature extraction network are represented as a tuple (

X_{1}

,

X_{2}

). The similarity calculation is as shown in the following equation:

S (X_{1}, X_{2}) = \frac{\sum_{i}^{d} (X_{1 i} \times X_{2 i})}{\sqrt{\sum_{i}^{d} {(X_{1 i})}^{2}} \times \sqrt{\sum_{i}^{d} {(X_{2 i})}^{2}}}

(7)

In the equation,

d

represents the dimension of the feature vector.

2.3.5. Evaluation Metrics

To quantitatively analyze the effectiveness of the model, we used Accuracy to evaluate the model’s performance [30]. The equation is as follows:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(8)

In the equation, True Positive (TP) refers to the number of image pairs correctly identified as the same crab, True Negative (TN) refers to the number of image pairs correctly identified as different crabs, False Positive (FP) refers to the number of image pairs incorrectly identified as the same crab, and False Negative (FN) refers to the number of image pairs incorrectly identified as different crabs.

3. Experiment and Analysis

This section reports on the setup, execution, and results of the experiments conducted to test the developed model. It includes detailed analyses of network training outcomes, ablation studies to assess component impacts, comparisons with other algorithms, and tests of the model’s generalization capabilities.

3.1. Experimental Configuration

The experiments in this paper were conducted on a Windows 10 system using the PyTorch 1.9.1 deep learning framework, Python 3.8 version, NVIDIA RTX 3080 Ti graphics card (NVIDIA, Santa Clara, CA, USA) with 12 GB of VRAM, and an AMD Ryzen 7 5800X3D 8-Core processor (AMD, Santa Clara, CA, USA) equipped with CUDA 11.1 for high-performance GPU computing.

During the training phase, each RGB crab carapace image in the dataset was resized to 112 × 112 pixels, and the pixel values were normalized to the range of [−1, 1]. This normalization helps to maintain a consistent distribution in the training data, accelerates model convergence, and prevents the problem of gradient vanishing to some extent. All feature embedding dimensions were set to 128. The overall experimental process used SGD as the training optimizer, with the batch size set to 128, and the model was trained for 5000 iterations.

In the testing phase, ten-fold cross-validation (10-fold cross-validation) was used to evaluate the performance of the algorithm. With a similarity threshold set at 0.90, ten different cross-validation tests were repeated, and the results of these ten tests were averaged. This method allows us to more thoroughly explore the model’s generalization ability, thereby ensuring the robustness and reliability of our findings.

3.2. Network Training Results

The training set was input into the improved MobileNetV2 network for training, and the results are shown in Figure 5 and Figure 6.

Figure 5 exhibits a typical pattern of a rapid initial decline followed by gradual stabilization. The sharp reduction in loss during the initial phase (0–1000 iterations) reflects the early learning stage of the model, where the optimizer effectively reduces the high error rate by significantly adjusting the model parameters. This is followed by a period of fluctuation (1000–2000 iterations), indicating that the model is refining its parameters and learning is still ongoing, although the changes are no longer as drastic. As the training progresses beyond 2000 iterations, we observed a gradual decrease in the rate of loss reduction, which approached zero, and a plateau formed and persisted for the remainder of the training process (3000–5000 iterations). This plateau indicates that the model reached a convergence point where additional training provided only a minimal improvement in loss metrics, suggesting that the learning capacity of the current model was maximized.

Figure 6 shows the trend of accuracy changes during the model training process. In contrast to the loss values, the accuracy rapidly increases in the early stages of training, reflecting a significant enhancement in the model’s ability to differentiate between classes. After entering 2000 iterations, the growth in model accuracy slows and stabilizes, reaching a high level and fluctuating within a narrow range, indicating that the model’s ability to recognize the training data has become saturated.

Combining the analyses of Figure 5 and Figure 6, we can conclude that the proposed model exhibits good learning capabilities and stability during the training period, laying the foundation for its potential application in traceability tasks of Chinese mitten crab images.

3.3. Ablation Experiment

To assess the specific impact of different components on model performance, this study conducted a series of ablation experiments. These experiments aimed to verify the contributions of the coordinate attention mechanism and the choice of loss function to the recognition accuracy of Chinese mitten crabs; the results are shown in Table 2.

In the baseline model MobileNetV2, which only outputs feature vectors, we observed a recognition accuracy of 69.96%, setting the benchmark for subsequent experiments. After introducing the coordinate attention mechanism, the model’s accuracy improved to 75.72%. This significant increase demonstrates the effectiveness of the CA mechanism in enhancing the model’s ability to recognize the morphological features of river crabs. This suggests that the CA mechanism, by finely allocating attention across spatial dimensions, enables the model to focus more on the key morphological features of the crab carapace, thus improving recognition precision.

Furthermore, simply integrating the ArcFace loss function into MobileNetV2 resulted in a further substantial increase in accuracy, reaching 97.71%. The ArcFace loss function optimizes the feature space by introducing angular margins, promoting intra-class compactness and inter-class distinguishability and significantly enhancing the model’s discriminative ability. This result underscores the importance of angular margins in class discrimination and their contribution to enhancing feature differentiation.

Ultimately, combining the strengths of the coordinate attention mechanism and the ArcFace loss function in one model achieved the highest accuracy, at 98.56%. This further proves the complementary nature of CA and ArcFace; their combination not only enhances feature extraction capabilities but also optimizes the separation between categories. This integrated approach provides an effective strategy for achieving the high-accuracy recognition of individual Chinese mitten crabs.

3.4. Comparison of Different Algorithms

To comprehensively assess the efficacy and practicality of the proposed model, this paper conducted comparative experiments with several other advanced lightweight facial recognition models on 6000 validation image pairs. The selected comparison models include widely recognized industry benchmarks such as ShuffleFaceNet [31], MobileFaceNet [32], and VarGFaceNet [33], which demonstrated a good baseline performance in the field of facial recognition. The key metrics that were examined were model size, test time, and accuracy, aiming to provide a comprehensive performance evaluation. The results are shown in Table 3.

An analysis of Table 3 shows that each model exhibited varying degrees of efficiency and accuracy while maintaining a lightweight structure. ShuffleFaceNet, although smaller in size (10.3 M), has a longer processing time (10.57 s) and an accuracy rate of 84.93%, indicating its limitations in accurate processing. MobileFaceNet has the smallest model size (4.0 M) and the shortest processing time (8.98 s), with an accuracy rate of 87.46%, demonstrating its high efficiency and reasonable accuracy. The VarGFaceNet model is the largest (24.3 M) and has the longest processing time (33.81 s), but its accuracy rate of 92.01% indicates that it sacrifices efficiency to achieve a higher accuracy.

Compared to these models, the model proposed in this paper achieves the best balance between model size (11.0 M) and accuracy (98.56%), reaching the highest accuracy while maintaining a lower processing time (10.21 s), with an average verification time of only 1.7 milliseconds per image pair. Although slightly larger than MobileFaceNet, the accuracy is significantly improved. Compared to VarGFaceNet, our model significantly reduces the model size while still achieving the highest recognition accuracy. These experimental results emphasize the effectiveness of the strategies proposed in this study, especially in the verification of Chinese mitten crab carapace recognition while maintaining a lightweight structure.

3.5. Model Feature Extraction Generalization Test

For the generalization test set of 5250 crab carapace images, the 128-dimensional feature vectors of the crab carapaces were extracted using the improved MobileNetV2 after training. Then, using T-SNE manifold learning [34], a dimensionality reduction was performed to visualize the distribution of feature vectors, mapping out the distribution of extracted crab carapace features. The legend in the figure indicates the Chinese mitten crab numbers, with different colors representing different individual crabs. As shown in Figure 7, the crab carapace features extracted from 5250 images of 10 different crabs are clustered into 10 categories, with compact intra-class and clear inter-class differences, demonstrating that the crab carapace features extracted by the improved MobileNetV2 after training possess excellent recognition and generalization capabilities.

4. Discussion

In this study, we propose and validate a lightweight Chinese mitten crab image verification model based on an improved MobileNetV2 and ArcFace loss function. By incorporating the Coordinate Attention (CA) mechanism and ArcFace loss function, the model not only maintains its light weight and processing speed but also improves in terms of accuracy and generalizability, which was not achieved in previous studies.

Our results support the initial hypothesis that attention mechanisms and angular margin losses can significantly enhance recognition accuracy. This is similar to the findings of Hu et al. [35], who discovered that SE blocks could significantly improve network performance by recalibrating the feature responses of channels, thereby enhancing the model’s representational power. Meanwhile, the ArcFace loss was proven by Deng et al. to effectively improve inter-class separability, offering better differentiation among highly similar individuals [29].

The design of the model in this study considers the needs of practical applications; it has an optimized computational efficiency and model size, making it suitable for resource-limited devices, which is particularly important in actual aquaculture scenarios. Moreover, the traceability method provided by this study supports the sustainable development of the aquaculture industry by enhancing the accuracy and efficiency of traceability, helping to establish a more transparent supply chain.

However, there are still some challenges in the field of Chinese mitten crab carapace recognition. Due to the lack of publicly available Chinese mitten crab carapace datasets, the dataset used in this experiment needs further expansion. Future research may involve applying the model to larger and more diverse datasets to further enhance the model’s generalization ability. Additionally, we aim to explore more efficient and accurate algorithms, making further improvements in the accuracy of Chinese mitten crab recognition, an important direction for future research.

5. Conclusions

The primary objective of this study is to address the difficulty in tracing the origin of Chinese mitten crabs and to assist in detecting counterfeit or inferior Chinese mitten crab products. In this study, a lightweight Chinese mitten crab image recognition model was successfully developed, based on an improved MobileNetV2 architecture and the ArcFace loss function. The model extracts 128-dimensional features from each pair of Chinese mitten crab carapace images and determines whether they belong to the same Chinese mitten crab based on the cosine similarity between the features. The introduction of a coordinate attention mechanism significantly enhanced the feature extraction capability of the model, thereby improving its recognition accuracy. Furthermore, incorporating the ArcFace loss function greatly sensitized the model to the differences between individual crabs. Ultimately, the model achieved a recognition accuracy of 98.56% on the Chinese mitten crab image verification dataset, with an average verification time of only 1.7 milliseconds per image pair, demonstrating its potential for real-time traceability in Chinese mitten crab image applications. The proposed method can serve as an empirical basis for subsequent tasks related to the individual identification of Chinese mitten crabs and contribute to improving the traceability of Chinese mitten crab products.

Author Contributions

Conceptualization, M.C. and G.F.; methodology, N.P.; software, N.P.; validation, M.C. and G.F.; formal analysis, M.C. and N.P.; resources, M.C.; data curation, N.P.; writing—original draft preparation, N.P.; writing—review and editing, M.C., N.P. and G.F.; supervision, M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Research and Development Planning in Key Areas of Guangdong Province, NO. 2021B0202070001.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to Significant investment of time and money in data acquisition.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, X.; Zheng, R.-X.; Zhang, D.-M.; Lei, X.-Y.; Wang, S.; Wan, J.-W.; Liu, H.-J.; Chen, Y.-K.; Zhao, Y.-L.; Wang, G.-Q. Effects of different stocking densities on growth performance, nutritional quality and economic benefit of juvenile female Chinese mitten crab (Eriocheir sinensis) in rice-crab culture systems. Aquaculture 2022, 553, 738111. [Google Scholar] [CrossRef]
China Fisheries Statistical Yearbook 2023; China Agricultural Press: Beijing, China, 2023.
Song, X.; Huang, F.; Liang, F.; Hu, J.; Lu, B.; Cai, C. Identification of Geographical Origin and Culture Water Body of Chinese Mitten Handed Crab Eriocheir sinensis Based on Mineral Element Fingerprints. Chin. J. Fish. 2024, 37, 9. [Google Scholar]
Liu, J.-J.; Liu, D.-Y.; Li, D.-R.; Wang, H.; Zhao, Y.; Sun, X.-H. The prevalence, virulence, and antibiotic resistance of Vibrio parahemolyticus in aquatic products and their breeding environment in Shanghai. Food Ferment. Ind. 2023, 49, 250–257. [Google Scholar]
Ma, M.T.; Sun, S.C.; Li, L.W.; Chen, C.; Yang, X. Design and implementation of trusted traceability system for agricultural products origin based on NB-IoT. J. Agric. Sci. Technol. 2019, 21, 58–67. [Google Scholar]
Zhou, L.; Gao, J.; Yang, Y.; Nie, Z.; Liu, K.; Xu, G. Genetic Diversity and Population Structure Analysis of Chinese Mitten Crab (Eriocheir sinensis) in the Yangtze and Liaohe Rivers. Fishes 2023, 8, 253. [Google Scholar] [CrossRef]
Zheng, C.; Jiang, T.; Luo, R.; Chen, X.; Liu, H.; Yang, J. Geometric morphometric analysis of the Chinese mitten crab Eriocheir sinensis: A potential approach for geographical origin authentication. N. Am. J. Fish. Manag. 2021, 41, 891–903. [Google Scholar] [CrossRef]
Zhang, B.L.; Lian, D.; Ren, H.L. Research on anti-counterfeiting system of Chinese mitten crab based on crab shell image. Sci. Fish. Farming 2014, 2, 77–78. [Google Scholar]
Xue, J.; Liu, H.; Jiang, T.; Chen, X.; Yang, J. Shape variation in the carapace of Chinese mitten crabs (Eriocheir sinensis H. Milne Edwards, 1853) in Yangcheng Lake during the year-long culture period. Eur. Zool. J. 2022, 89, 217–228. [Google Scholar] [CrossRef]
Xu, Y.; Xue, J.; Liu, H.; Jiang, T.; Chen, X.; Yang, J. Identification of “Bathed” Chinese Mitten Crabs (Eriocheir sinensis) Using Geometric Morphological Analysis of the Carapace. Fishes 2023, 9, 6. [Google Scholar] [CrossRef]
Tai, W.-P.; Li, H.; Zhang, B.-L.; Wang, C. Research on the Feature Recognition and Algorithm of the Carapace of Eriocheir sinensis. Period. Ocean. Univ. China 2021, 51, 138–146. [Google Scholar]
Minaee, S.; Abdolrashidi, A.; Su, H.; Bennamoun, M.; Zhang, D. Biometrics recognition using deep learning: A survey. Artif. Intell. Rev. 2023, 56, 8647–8695. [Google Scholar] [CrossRef]
Shukla, R.K.; Tiwari, A.K. Masked face recognition using mobilenet v2 with transfer learning. Comput. Syst. Sci. Eng. 2023, 45, 293–309. [Google Scholar] [CrossRef]
Ahmad, M.; Abbas, S.; Fatima, A.; Issa, G.F.; Ghazal, T.M.; Khan, M.A. Deep transfer learning-based animal face identification model empowered with vision-based hybrid approach. Appl. Sci. 2023, 13, 1178. [Google Scholar] [CrossRef]
Kim, J.H.; Poulose, A.; Colaco, S.J.; Neethirajan, S.; Han, D.S. Enhancing animal welfare with interaction recognition: A deep dive into pig interaction using xception architecture and SSPD-PIR method. Agriculture 2023, 13, 1522. [Google Scholar] [CrossRef]
Wan, Z.; Tian, F.; Zhang, C. Sheep face recognition model based on deep learning and bilinear feature fusion. Animals 2023, 13, 1957. [Google Scholar] [CrossRef] [PubMed]
Shoaib, M.; Shah, B.; Ei-Sappagh, S.; Ali, A.; Ullah, A.; Alenezi, F.; Gechev, T.; Hussain, T.; Ali, F. An advanced deep learning models-based plant disease detection: A review of recent research. Front. Plant Sci. 2023, 14, 1158933. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Wang, X. Early recognition of tomato gray leaf spot disease based on MobileNetv2-YOLOv3 model. Plant Methods 2020, 16, 1–16. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Yang, X.; Xu, D.; Luo, N.; Chen, F.; Sun, C. Recognition method of individual images of river crabs based on transfer learning and pyramid convolutional network. Fish. Mod. 2022, 49, 52–60. [Google Scholar]
Shi, G.-Z.; Chen, M.; Zhang, C.-Y. Accurate traceability system of crab based on improved deep residual network. Chin. J. Liq. Cryst. Disp. 2019, 34, 1202–1209. [Google Scholar]
Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 16 September 2008. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Hu, L.; Ge, Q. Automatic facial expression recognition based on MobileNetV2 in Real-time. J. Phys. Conf. Ser. 2020, 1549, 022136. [Google Scholar] [CrossRef]
Gulzar, Y. Fruit image classification model based on MobileNetV2 with deep transfer learning technique. Sustainability 2023, 15, 1906. [Google Scholar] [CrossRef]
Qiao, Y.; Liu, H.; Meng, Z.; Chen, J.; Ma, L. Method for the automatic recognition of cropland headland images based on deep learning. Int. J. Agric. Biol. Eng. 2023, 16, 216–224. [Google Scholar] [CrossRef]
Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar] [CrossRef]
Liang, X.; Wang, X.; Lei, Z.; Liao, S.; Li, S.Z. Soft-margin softmax for deep classification. In Proceedings of the International Conference on Neural Information Processing, Guangzhou, China, 14–18 November 2017; pp. 413–421. [Google Scholar]
Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4690–4699. [Google Scholar]
Valero-Carreras, D.; Alcaraz, J.; Landete, M. Comparing two SVM models through different metrics based on the confusion matrix. Comput. Oper. Res. 2023, 152, 106131. [Google Scholar] [CrossRef]
Martindez-Diaz, Y.; Luevano, L.S.; Mendez-Vazquez, H.; Nicolas-Diaz, M.; Chang, L.; Gonzalez-Mendoza, M. Shufflefacenet: A lightweight face architecture for efficient and highly-accurate face recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 2721–2728. [Google Scholar]
Chen, S.; Liu, Y.; Gao, X.; Han, Z. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In Proceedings of the Biometric Recognition: 13th Chinese Conference, CCBR 2018, Urumqi, China, 11–12 August 2018; pp. 428–438. [Google Scholar]
Yan, M.; Zhao, M.; Xu, Z.; Zhang, Q.; Wang, G.; Su, Z. Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
Roman-Rangel, E.; Marchand-Maillet, S. Inductive t-SNE via deep learning to visualize multi-label images. Eng. Appl. Artif. Intell. 2019, 81, 336–345. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]

Figure 1. Data collection site and image acquisition equipment. Different colors represent different districts in Shanghai.

Figure 2. Overall research process flowchart.

Figure 3. Improved MobileNetV2 network structure diagram.

Figure 4. Schematic diagram of the Coordinate Attention mechanism.

Figure 5. Loss variation chart.

Figure 6. Training accuracy variation chart.

Figure 7. Distribution of crab carapace image features.

Table 1. Dataset division.

	Original Sample Quantity	Quantity after Data Augmentation	Dataset Size
Train Dataset	112	47,040	47,040
Validation Image Pairs	112	11,760	6000
Generalization Test Dataset	10	5250	5250

Table 2. Ablation experiment results.

Model	Accuracy (%)
BaseLine	69.96
+CA	75.72
+ArcFace	97.71
+CA + ArcFace	98.56

Table 3. Comparison results of different models.

Model	Model Size (M)	Times (s)	Accuracy (%)
ShuffleFaceNet	10.3	10.57	84.93
MobileFaceNet	4.0	8.98	87.46
VarGFaceNet	24.3	33.81	92.01
Ours	11.0	10.21	98.56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peng, N.; Chen, M.; Feng, G. Feature Extraction and Recognition of Chinese Mitten Crab Carapace Based on Improved MobileNetV2. Appl. Sci. 2024, 14, 4982. https://doi.org/10.3390/app14124982

AMA Style

Peng N, Chen M, Feng G. Feature Extraction and Recognition of Chinese Mitten Crab Carapace Based on Improved MobileNetV2. Applied Sciences. 2024; 14(12):4982. https://doi.org/10.3390/app14124982

Chicago/Turabian Style

Peng, Nengtian, Ming Chen, and Guofu Feng. 2024. "Feature Extraction and Recognition of Chinese Mitten Crab Carapace Based on Improved MobileNetV2" Applied Sciences 14, no. 12: 4982. https://doi.org/10.3390/app14124982

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Extraction and Recognition of Chinese Mitten Crab Carapace Based on Improved MobileNetV2

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Image Acquisition

2.1.2. Data Augmentation and Partitioning

2.2. Overall Process Flow of the Proposed Method

2.3. Methods

2.3.1. Feature Extraction Network

2.3.2. Coordinate Attention Mechanism

2.3.3. Loss Function

2.3.4. Similarity Calculation

2.3.5. Evaluation Metrics

3. Experiment and Analysis

3.1. Experimental Configuration

3.2. Network Training Results

3.3. Ablation Experiment

3.4. Comparison of Different Algorithms

3.5. Model Feature Extraction Generalization Test

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI