1. Introduction
The important role cephalopods (Mollusca: Cephalopoda) play in many marine ecosystems has been widely acknowledged [
1]. Cephalopods are predators for numerous prey and are preyed upon by predators [
2,
3,
4,
5]. In particular, cephalopods are one of the main food sources for large marine predators such as whales [
6], dolphins [
7], and sharks [
8]. Consequently, cephalopods are located in the middle of the marine trophic level pyramid, playing a significant role in the marine food chain and nutrition structure [
9]. Furthermore, cephalopods are significant marine animals in economic terms, due to their short life cycles (typically 1 year), rapid growth, and abundant resources [
10]. In recent decades, the development of the global cephalopod fishery industry and the production of edible cephalopods have accelerated. Research on cephalopods is beneficial for the sustainable utilization of this resource and will also increase the number of cephalopod species available for future commercial development.
The majority of our understanding of cephalopods comes from analyzing the stomach contents of their predators. The identification of cephalopods in stomach contents is typically dependent on beaks since the majority of soft tissue has been digested, but the beaks can resist digestion for as long as several months [
11,
12]. As the main feeding organ of cephalopods, beaks are located in the buccal mass and are divided into the upper beak and the lower beak [
13,
14,
15,
16]. The beak is one of the hard tissues in cephalopods, which has a stable structure and is resistant to corrosion [
14]. In recent years, the beak has been extensively utilized for the identification of cephalopod populations [
13,
17] and the classification of species [
18]. Therefore, a lot of research work has been devoted to improving the feature extraction and recognition methods for beaks.
In the field of computer vision, shallow features refer to extracting basic image attributes or features from image data. The common shallow features in computer vision include edge features, texture features, morphological features, color features, and so on. The morphological features of the beak are a useful tool for searching for inter- and intra-species differences in cephalopods, as well as for species identification [
10]. Hence, the majority of research on the classification of beaks has centered on refining methods for extracting morphological features. The research on deriving the morphological features of the beak focuses primarily on the calibration of feature points and the extraction of feature parameters [
19]. With the development of artificial intelligence, edge detection has been applied as a basic method for image processing using computer vision in the study of beak recognition. He Q H et al. [
20] extracted the contours of the beak by using the canny algorithm to assist in the calibration of feature points and extraction of feature parameters, which resulted in addressing issues such as time-consuming and labor-intensive manual measurements. Wang B Y et al. [
21] proposed an improved edge detection method to extract the morphological outer contour of the beak, which can effectively distinguish signal noise and improve the accuracy of target selection, while ensuring the integrity of the contour within the error tolerance. The feature algorithm used in the above study to extract a single shallow feature of the beak is effective in beak image classification, which has the advantages of high interpretability, good performance with a small number of samples, and low computational resource requirements. This traditional method typically requires the manual design of region of interest features and feature extraction operators in the image, which fails to fully define the subtle differences in the beak and is, therefore, sensitive to changes in scale and morphology.
CNN is the most prominent deep learning method in which the multiple layers are trained and tested robustly. In recent years, deep learning has been broadly applied in various domains [
22], since it autonomously extracts image features for image recognition [
23,
24]. Deep features are high-level feature representations that are learned from original image data by deeply learned models. These features can help computers better understand and utilize complex real-world data. Tan H Y et al. [
25] extracted shallow and deep features from beaks and classified them using eight machine learning classification methods. They concluded that deep features were preferable to shallow ones for beak classification. This model has several limitations, including imbalance and a small sample size, as well as a single beak view and a limited number of morphological features in morphological shape descriptors (MSDs). Deep learning methods based on the convolutional neural network (CNN) model have led to significant breakthroughs in various fields, as they can achieve the extraction of complex target features to some extent and also reduce the errors arising from human-defined features. However, the CNN model requires a large amount of labeled data and a long training time to fully learn and represent complex features within the image data. Also, its performance may be limited in the face of an insufficient amount of beak data. Therefore, shallow feature algorithms and deep feature methods have advantages and disadvantages and differ in their representation of features.
Based on the preceding analysis, we have reason to believe that traditional algorithms are beneficial for extracting shallow features for classifying beaks. However, the feature information extracted by a single feature descriptor is relatively limited, and the required features may not be extracted sufficiently. At the same time, the deep features contain semantic information, but due to the limited number of beak samples, the descriptor may not be able to extract all the necessary details. Therefore, we tried to improve accuracy by describing multiple features of the image and achieving a complementary means of feature information. This study proposes, for the first time, a recognition method based on fusing global deep features with local shallow features in the field of beak research. The study included four cephalopod species, namely
Dosidicus gigas (
D. gigas),
Illex argentinus (
I. argentinus),
Eucleoteuthis luminosa (
E. luminosa), and
Ommastrephes bartramii (
O. bartramii), which provided images of upper and lower beaks. Initially, the histogram of the orientation gradient (HOG) and the local binary pattern (LBP) feature descriptors were employed to derive the morphological and texture features from the beak image. Meanwhile, we selected the optimal CNN model for deep feature extraction, including the VGG16 [
26], InceptionV3 [
27], and Resnet series [
28]. Next, two types of local shallow features and global deep features were fused separately to highlight the details of the features, and the support vector machine (SVM) classifier was utilized for classification. This method will facilitate the development of beak recognition and provide a new and feasible strategy for future cephalopod biodiversity studies.
4. Discussion
4.1. The Descriptors of the Two Local Shallow Features Are HOG and LBP
Morphological features and texture features are important shallow features in the study of image classification. If efficient morphological and textural features can be extracted, this is advantageous for beak classification. The HOG descriptor modifies the cell units to alter the range of local operations. After a comparison of classification experiments using three C values to extract morphological features, the lower beak achieved the highest classification accuracy of 60.50% when C = 32. However, the upper beak achieved the highest classification accuracy of 70.25% when C = 64. The morphological features obtained at C = 32 were used in the experiments for feature fusion. LBP was proposed to extract texture features. The texture features of the beak were extracted utilizing an enhanced circular LBP descriptor, with R and P values representing the number of the neighborhood radius and sample points, respectively. Three sets of different fusions of R and P values were employed, and the results demonstrate that the small neighborhood range was more appropriate for expressing the detailed features of the beak images.
According to the results of the shallow feature experiments, morphological features were more effective than texture features in distinguishing the beaks of the four cephalopod species. The significant difference in the dimension of features retrieved by HOG and LBP is due to the HOG descriptor having an advantage in extracting high dimensional morphological features, which may convey and characterize variations in the details of the beak. The morphological specificity of the beaks [
40] is superior in cephalopod biometrics, and the detailed variation within the two-dimensional morphology of the beak is extremely rich. Therefore, morphological features can be accurately extracted by analyzing image pairs from different perspectives [
41]. There was some variation among the beak profile characterization factors of various squids, but they all contained several important characterization factors, such as upper hood length (UHL), upper crest length (UCL), and lower hood length (LHL), lower crest length (LCL), which indirectly provided a basis for the identification of cephalopod species using beak feature factors [
10,
42,
43]. In order to meet the predatory needs and changes in the cephalopods during different growth periods, the pigmentation of the beaks also changes [
43,
44,
45]. In addition, there were differences in pigmentation between male and female individuals [
46]. Therefore, these factors also increase the difficulty of extracting discriminative texture features of similar beaks. The experimental results of feature fusion show that the HOG+CNN can improve classification accuracy. Therefore, we can infer that morphological features are more suitable for distinguishing beaks.
4.2. CNN Model to Extract Global Deep Features
VGG16, InceptionV3, and Resnet series were used to extract the deep features of the beaks for performance comparison. Most models have a significant decrease in loss values at the start of training, indicating a suitable learning rate and gradient descent. After a certain stage of learning, the change in loss is not as obvious as at the beginning, and the loss curve tends to stabilize. Four evaluation metrics were used to assess the models, and all models were effective in extracting features and performed well in classification. VGG16 builds a deep network structure by stacking 16 convolutional layers, which is simple to understand and implement. However, VGG16 contains a huge number of parameters, which results in significant computational costs for training and inference. According to the loss function curve and evaluation indicators, VGG16 is easier to train on beaks but performs poorly in the upper beak classification of D. gigas. InceptionV3 improves the performance of image classification by introducing a structure of multi-scale feature extraction and parallel operation. InceptionV3 performs poorly on the beaks of D. gigas and I. argentinus. ResNet19 and ResNet34 are both equipped with skip connections and fewer full connection layers. Therefore, ResNet19 and Resnet34 have fewer parameters and faster convergence during training. Resnet18 performed the worst in the upper and lower beak classification of D. gigas. Resnet50 has a deeper network structure to acquire more complex and abstract feature representations, which achieves the highest classification accuracy in both the upper and lower beaks. Resnet101 is less accurate than Resnet50 since the beak dataset was too small to effectively train the Resnet101 model.
4.3. Advantages of Feature Fusion
The approach of fusing global deep features and local shallow features was employed in the classification of beaks for two key reasons. The first reason is that there are subtle interclass variations as well as large intraclass variations among species of beaks, which renders it challenging to classify specific regions based on subtle differences, and factors such as the morphology, size, pigmentation, age, and growth environment of the same species of cephalopod may all lead to differences. Therefore, the information contained in the fused features of different species can complement each other to produce a more robust feature representation, and the feature design and interpretation of shallow feature descriptors as well as the learning ability and generalization performance of deep learning can be used to improve accuracy in practical applications. Secondly, a strong advantage of deep learning is feature learning, i.e., automatic feature extraction from raw data, with features from higher levels of the hierarchy being formed by the composition of lower level features [
36]. However, some species of beak samples are extremely challenging to acquire and belong to the category of tiny samples. As a result, the accurate identification of beaks using deep learning techniques is limited. Based on the global deep features, using local shallow features as an important reference for the classification task can effectively help in beak recognition. The results show that the fusion of deep features and shallow features can better represent the detailed features to distinguish the four beaks, compared to single deep features or shallow features. In particular, the HOG+Resnet50 model can more accurately show the distinctions between the beaks.
4.4. Using Multi-Class SVM Classifier for Beak Classification
The recognition results achieved by fused features for beaks of the same family but distinct genera using a multi-class SVM classifier were analyzed. Approximately 800 images of each kind of beak were used for training in feature extraction process. Due to the limited training samples, there may be overfitting and underfitting. SVM can perform nonlinear classification on small samples and enhance classifier performance by mapping data to high dimensional feature spaces using kernel functions. Second, SVM is insensitive to a small number of outliers or noise data and thus can handle interference effectively. During the experiment, the parameter C was adjusted to balance the fitting ability and generalization ability of the model. In summary, SVM has superior generalization ability, robustness, and controlled complexity, which can effectively solve classification problems in small sample datasets.
5. Conclusions
The study proposes an effective method for beak identification that fuses global deep features with local shallow features and uses multi-class SVM for automatic classification. In two shallow feature experiments, adjusting the parameters resulted in the greatest results, the HOG descriptor gave better results than the LBP descriptor in extracting the features. In CNN model experiments, Renet50 performed the best, achieving an upper beak accuracy of 89.38% and a lower beak accuracy of 90.50%. In the feature fusion experiments, both sets of fusion models showed good performance, and the feature fusion method of Resnet50+HOG achieved the highest recognition accuracy, with 91.88% and 93.63% for the upper and lower beaks, respectively. Resnet50+LBP achieved 91.50% and 92.63% for the upper and lower beak test datasets, respectively. Also, it was demonstrated that classification of the beak dataset can be effectively performed automatically using the multi-class SVM classifier. This study verifies the complementarity and differentiality of different types of features in the beak recognition task by using different performance analysis methods. The comparative analysis of fusion of different features shows that the fused features can be used to analyze the biodiversity of cephalopod beaks. Extracting HOG features, LBP features, deep features, and combining two types of features is conducive to the analysis of beaks, enriching the toolbox for studying cephalopod biology, and advancing the field of cephalopod biology. The combination of feature fusion with SVM-based recognition methods demonstrates robust performance. This not only promotes the automation of beak recognition but also fosters interdisciplinary collaboration and research by bridging deep learning, machine learning, and biological studies. Therefore, this approach drives the automation of beak recognition and provides an efficient and innovative research method applicable not only to cephalopods but also to various other biological domains. Since this study used a lower resolution image dataset and a more complex image background to classify beaks, the accuracy achieved by this research method can be applied. High-quality images will help to apply the research method more accurately to solve classification problems in the future. There are still many things we can achieve in cephalopod classification. Future research will continue to focus on the use of more beneficial shallow and deep features to obtain feature information of the beaks, and how to improve the usefulness of automatic classification tools to achieve the ultimate goal of real-time image processing.