Bilinear Attention Network for Image-Based Fine-Grained Recognition of Oil Tea (Camellia oleifera Abel.) Cultivars

Zhu, Xueyan; Yu, Yue; Zheng, Yili; Su, Shuchai; Chen, Fengjun

doi:10.3390/agronomy12081846

Open AccessArticle

Bilinear Attention Network for Image-Based Fine-Grained Recognition of Oil Tea (Camellia oleifera Abel.) Cultivars

by

Xueyan Zhu

^1,2

,

Yue Yu

^1,2,

Yili Zheng

^1,2,

Shuchai Su

³ and

Fengjun Chen

^1,2,*

¹

School of Technology, Beijing Forestry University, Beijing 100083, China

²

Beijing Laboratory of Urban and Rural Ecological Environment, Beijing Forestry University, Beijing 100083, China

³

Key Laboratory of Silviculture and Conversation, Ministry of Education, Beijing Forestry University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Agronomy 2022, 12(8), 1846; https://doi.org/10.3390/agronomy12081846

Submission received: 27 June 2022 / Revised: 28 July 2022 / Accepted: 30 July 2022 / Published: 4 August 2022

Download

Browse Figures

Versions Notes

Abstract

:

Oil tea (Camellia oleifera Abel.) is a high-quality woody oil crop unique to China and has extremely high economic value and ecological benefits. One problem in oil tea production and research is the worldwide confusion regarding oil tea cultivar nomenclature. The purpose of this study was to automatic recognize some oil tea cultivars using bilinear attention network. For this purpose, we explored this possibility utilizing the bilinear attention network for five common China cultivars Ganshi 83-4, Changlin 53, Changlin 3, Ganshi 84-8, and Gan 447. We adopted the bilinear EfficientNet-B0 network and the convolutional block attention module (CBAM) to build BA-EfficientNet model being able to automatically and accurately recognize oil tea cultivars. In addition, the InceptionV3, VGG16, and ResNet50 algorithms were compared with the proposed BA-EfficientNet. The comparative test results show that BA-EfficientNet can accurately recognize oil tea cultivars in the test set, with overall accuracy and kappa coefficients reaching 91.59% and 0.89, respectively. Compared with algorithms such as InceptionV3, VGG16, and ResNet50, the BA-EfficientNet algorithm has obvious advantages in most evaluation indicators used in the experiment. In addition, the ablation experiments were designed to quantitatively evaluate the specific effects of bilinear networks and CBAM modules on oil tea cultivar recognition results. The results demonstrate that BA-EfficientNet is useful for solving the problem of recognizing oil tea cultivars under natural conditions. This paper attempts to explore new thinking for the application of deep learning methods in the field of oil tea cultivar recognition under natural conditions.

Keywords:

oil tea; cultivar recognition; EfficientNet-B0; CBAM; BA-EfficientNet

1. Introduction

Oil tea (Camellia oleifera Abel.), which is widely grown in Hunan, Jiangxi, and Guangxi, China, is one of the four largest woody oil sources in the nation [1]. There are many cultivars of oil tea bred in China, and the quality of different cultivars is uneven [2,3]. Therefore, it is necessary to conduct research on the recognition of oil tea cultivars to promote cultivar optimization and improvements. Conventionally, oil tea cultivar recognition is conducted by harvesters and forestry experts in the field based on experience, which is inefficient, time-consuming, labor-intensive, and vulnerable to subjective factors [4,5]. An automatic and accurate cultivar recognition method is urgently needed for the oil tea industry. Using computer vision technology to automatically and accurately recognize oil tea cultivars is an ideal project.

During the past ten years, a number of researchers have tried to apply molecular marker technology [6,7] and computer vision technology [8,9] to the cultivar recognition of forest plants. For example, using single nucleotide polymorphism markers, Kim et al. identified four tomato cultivars with a recognition accuracy rate of 80.0%~93.6% [10]. However, owing to the need for fruit destruction, molecular calibration technology has not been widely used [11]. Therefore, research on nondestructive cultivar recognition technology has become a hot topic in the field of cultivar recognition.

Hyperspectral imaging technology has the ability to extract the spectral and spatial information of fruits non-destructively, therefore, it is widely used in research on fruit cultivar recognition [12,13,14]. For example, after obtaining the hyperspectral imaging data of lychee within the range of 600~1000 nm, Liu et al. used a support vector machine based multivariate recognition to recognize three lychee cultivars with an accuracy of 87.81% [15]. Li et al. applied hyperspectral image technology to collect hyperspectral images of four soybean cultivars within the spectral range of 866.4~1701.0 nm, and a one-dimensional convolutional neural network was used to recognize cultivars with an accuracy higher than 98% [16]. However, these cultivar recognition methods using hyperspectral imaging technology are costly [17].

With the development of deep learning, the combination of computer vision and deep learning technology has gradually been applied in the field of cultivar recognition [18,19,20,21]. Osako et al. proposed a litchi fruit cultivar recognition method based on VGG16, with an accuracy of 98.33% [22]. Barré et al. designed LeafNet, a plant cultivar recognition system, and tested it on the LeafSnap dataset, achieving an accuracy of 86.3% [23]. Taner et al. built a Lprtnr1 model to recognize 17 hazelnut cultivars with an accuracy of 98.63% [24]. In addition, Franczyk et al. built a ResNet model to recognize five typical grape cultivars with an accuracy of 99% [25]. The above research shows that deep learning has the ability to self-learn the feature information and achieves a good fruit cultivar recognition performance [26]. At the same time, we also found that most of the objects studied above have obvious distinctions in terms of their phenotypic characteristics (e.g., texture, shape, and color). However, the recognition of oil tea cultivars is more difficult owing to minimal differences in the phenotypic traits among the different cultivars of oil tea, which are difficult to accurately recognize using traditional image recognition models such as VGG, Inception, ResNet, and EfficientNet. In this study, we describe the cultivar recognition of oil tea as a fine-grained image recognition problem.

To solve the problem of fine-grained image recognition, researchers have carried out research on both strong and weak supervision [27,28,29]. Among them, weakly supervised learning has become a research hotspot owing to its simple operation and because it does not require a large number of manually labeled image data [30]. For example, to achieve the recognition of the disease degree of wheat stripe rust, Mi et al. embedded a CBAM module into a densely connected network to design the C-DenseNet model [31]. After testing, the C-DenseNet model achieved an accuracy rate of 97.99% for the grading of six types of stripe rust. To distinguish numerous species of flowers, Pang et al. proposed a bilinear pyramid network based on the VGG16 network and tested it on the Oxford 102 category flower dataset, obtaining an accuracy of 92.1% [32]. The above studies have shown that attention modules and bilinear networks have good recognition capabilities for fine-grained images. However, there are few studies that have applied bilinear net-works and attention mechanisms to the cultivar recognition of oil tea.

In this paper, we applied the bilinear attention EfficientNet-B0 model to achieve an accurate cultivar recognition of oil tea. To be more practical, the main aspects of this research is summarized as follows: (1) Studying the oil tea cultivar recognition through bilinear attention network, (2) Proposing an algorithm BA-EfficientNet to recognize oil tea cultivars in the natural environment, (3) Evaluate the proposed BA-EfficientNet algorithm and making a comparison within the previous work, (4) Visually analyzed the cultivar recognition results of BA-EfficientNet using Gradient-weighted Class Activation Mapping (Grad-CAM).

2. Materials and Methods

2.1. Study Area

The study area was located in land owned by Jiangxi Deyiyuan Ecological Agriculture Technology Development Co., Ltd. (116.93° N, 29.18° E) in Shangrao City, Jiangxi Province, China (Figure 1). The study area is dominated by mountains and hills, with a mild climate, abundant rainfall, an average annual temperature of 17.6 °C, and an annual precipitation of 1600 mm. The 3.34 km² wide planting area has more than 20 high-quality cultivars, including Ganshi 83-4, Changlin 53, and Changlin 3, with a row spacing of 4 m and a plant spacing of 3 m.

2.2. Data Collection and Dataset Construction

The high-yield oil tea cultivars Ganshi 83-4, Changlin 53, Changlin 3, Ganshi 84-8, and Gan 447 were selected as the research objects, and plots with similar planting times were selected for image collection. From 6 September to 20 October 2020, we used an iPhone XR to collect images of oil tea under natural conditions. The collection times were 6:00–8:00, 11:00–13:00, and 15:00–17:00 every day. A total of 4640 high-quality images of different weather conditions, such as sunny, cloudy, and rainy, were obtained from the image collection of typical cultivars of oil tea, which were stored in a 24-bit true-color Joint Photographic Experts Group (JPEG) format [33]. We divided the 4640 high-quality images of oil tea into a training set, validation set, and test set at a ratio of 6:2:2 (2784 for the training set, 928 for the validation set, and 928 for the test set). The training and validation sets of oil tea images were expanded by adjusting the brightness and adding noise. The brightness adjustment includes bright and dim, which are used to simulate images under different lighting conditions. The random noises include Gaussian and impulse noises (also called salt and pepper noise), which are used to simulate images under different environmental noises. After data expansion, the training and validation sets of the oil tea image data were both expanded to 4-times the original, including 11,136 and 3712 images.

2.3. Fine-Grained Oil Tea Cultivar Recognition Model

The aim of this study was to develop a fully automated method for oil tea cultivar recognition. The overall flow of the proposed method is shown in Figure 2. The BA-EfficientNet model is mainly composed of two EfficientNet-B0 network, two CBAM module, and bilinear pooling, which are respectively used for feature extraction, feature selection, and feature fusion of an oil tea image.

2.3.1. EfficientNet-B0 Network

The EfficientNet-B0 network uses recombination coefficients to automatically adjust the depth, width, and resolution of the model, and has the characteristics of small pa-rameters and a high recognition accuracy [34]. The structure of EfficientNet-B0 is shown in Figure 2. The input of the EfficientNet-B0 network is an RGB three-channel oil tea image with a pixel resolution of 224 × 224, including 16 Mobile Inverted Bottleneck Convolution (MBConv) modules, 2 convolution layers, 1 global average pooling layer, and 1 classifi-cation layer. The structure of the MBConv module is shown in Figure 3. MBConv uses a drop connect instead of a traditional dropout, which plays a role in preventing the model from overfitting [35].

EfficientNet-B0 scaling attempts to expand the network length (L_i), width (C_i), and resolution (H_i, W_i) without changing F_i predefined in the baseline network, and restricts the uniform scaling of all layers at a constant ratio to reduce the design space. The target can be expressed through the following optimization problem to obtain the maximum model accuracy under any given resource constraint:

\begin{array}{l} {\max_{d, w, r}}^{} & A c c u r a c y (N (d, w, r)) \\ s .^{} t . & N (d, w, r) = \underset{i = 1, \dots, s}{⊙} {\hat{F}}_{i}^{d \cdot {\hat{L}}_{i}} (X_{〈r \cdot {\hat{H}}_{i}, r \cdot {\hat{W}}_{i}, w \cdot {\hat{C}}_{i}〉}) \\ M e m o r y (N) \leq t a r_m e m o r y \\ F l o p s (N) \leq t a r_f l o p s \end{array}

(1)

where d, w, and r are the depth, width, and resolution coefficients of the scaling network, respectively. In addition,

{\hat{F}}_{i}

,

{\hat{L}}_{i}

, and

{\hat{C}}_{i}

are the predefined network structure, prede-fined layers, and predefined channels, respectively, and

{\hat{H}}_{i}

and

{\hat{W}}_{i}

are predefined resolutions. Moreover, <

{\hat{H}}_{i}

,

{\hat{W}}_{i}

,

{\hat{C}}_{i}

> represents the shape of the input tensor X corresponding to layer i, and Memory(N) and Flops(N) are the parameters and floating points of the network operations, respectively. Finally, tar_memory and tar_flops are the thresholds of the parameters and floating points of the operations, respectively.

In EfficientNet-B0, the compound coefficient

ϕ

is used to uniformly scale the net-work depth, width, and resolution to balance the relationship between the three dimen-sions and obtain a better accuracy and efficiency, which can be defined as follows:

\begin{array}{l} d = α^{ϕ}, & w = β^{ϕ}, r = γ^{ϕ} \\ s . t . & α \cdot β^{2} \cdot γ^{2} \approx 2 \\ α \geq 1, β \geq 1, γ \geq 1 \end{array}

(2)

where

ϕ

is a user-specified coefficient that can be controlled based on the available re-sources. Intuitively,

α

,

β

, and

γ

are the resource control coefficients that determine how to allocate these resources to the depth, width, and resolution, respectively.

2.3.2. CBAM Module

Under natural conditions, different cultivars of oil tea are highly similar in characteristics such as color, shape, and size, which makes it difficult to achieve satisfactory results using traditional image recognition models. The CBAM module can conduct attention operations in the channel and spatial dimensions, while considering the importance of the pixels in the different channels and the importance of pixels in various positions within the same channel [36]. In fact, embedding the CBAM module in EfficientNet-B0 contributes only a modest amount of computational overhead [37]. To improve the ability of EfficientNet-B0 to recognize oil tea cultivars, the CBAM module was added to EfficientNet-B0 to guide the model to focus more attention on the key information that determines the oil tea cultivars during the recognition process. The structure of the CBAM module is shown in Figure 4.

As shown in Figure 4, after feature F is input, it first passes the global maximum pooling and global average pooling processing to obtain two different channel descriptions. The two different channel descriptions are then sent to the two-layer neural network Multi-layer Perceptron (MLP) with shared parameters for processing, and two feature maps are obtained. Finally, the two feature maps are added and processed by the sigmoid function to obtain the weight coefficient M_c of the channel attention. The specific calculation process of the weight coefficient M_c of the channel attention and scaled refined feature F′ are as follows:

M_{c} (F) = S i g m o i d (M L P (M a x P o o l (F))) + S i g m o i d (M L P (A v g P o o l (F)))

(3)

F^{'} = M_{c} (F) \times F

(4)

As shown in Figure 4, after the refined feature F′ is input to the spatial attention module, it is first subjected to maximum pooling and average pooling to obtain two different channel descriptions and to stitch them by channel. Then, convolution with a size of 7 × 7 is used to process it, and the spatial attention weight parameter M_s is obtained through sigmoid activate function. Among them, the calculation process of the spatial attention weight parameter M_s and the refined feature F″ are as follows:

M_{s} (F) = S i g m o i d (f_{7 \times 7} [M a x P o o l (F); A v g P o o l (F)])

(5)

F^{″} = M_{s} (F) \times F^{'}

(6)

The CBAM attention network realizes the combination of the channel attention mod-ule and the spatial attention module in series, so that the network can focus on the chan-nel attention information and the spatial attention information at the same time, which is conducive to bringing more stable performance improvement [38].

2.3.3. Bilinear Pooling

As shown in Figure 2, the feature maps f_A(L) and f_B(L) of the oil tea image extracted by EfficientNet-B0-A and EfficientNet-B0-B are merged through bilinear pooling to obtain the bilinear feature vector B. The specific calculation process is as follows:

\begin{matrix} B (L, f_{A}, f_{B}) & = f_{A}^{T} (L) f_{B} (L) \\ x & = \sum B (L, f_{A}, f_{B}) \\ y & = s i g n (x) \sqrt{| x |} \\ z & = \frac{y}{{‖y‖}_{2}} \end{matrix}

(7)

where f_A(L) and f_B(L) are the features of position L extracted using EfficientNet-B0-A and EfficientNet-B0-B, respectively. Here, B(L, f_A, f_B) is a bilinear matrix at position L, y is a one-dimensional bilinear fusion feature vector, and z is the normalized bilinear fusion feature vector. The results of the cultivar recognition of oil tea were obtained by predicting the normalized bilinear fusion feature vector using the Softmax classifier.

3. Results

3.1. Experiment Setup

The oil tea cultivar recognition algorithm BA-EfficientNet was designed under the Tensorflow open-source framework and trained and tested under the Ubuntu 16.04 system. The hardware configuration of the BA-EfficientNet algorithm is as follows: a 3.7-GHz Intel(R) Core(TM) I7-8700K CPU, 16 GB of memory, a 4T hard disk, and two NVIDIA GeForce GTX 1080Ti cards with 11 GB of memory. For the software configuration, Ub-untu 16.04, Python 3.6.8, Tensorflow 1.9.0, and OpenCV 3.4.0 were used. The BA-EfficientNet algorithm training process sets the batch size to 16, and the learning rate to 0.001. Figure 5 shows the loss curve of BA-EfficientNet on the training and validation dataset.

3.2. Cultivar Recognition Results and Analysis

Figure 6 shows the confusion matrix of the BA-EfficientNet model for the recognition results of 928 oil tea images in the test set. In the confusion matrix, and the sum of all elements is the total number of test images, the diagonal elements represent the number of correctly recognized oil tea cultivars, and the sum of all elements in each row is the number of test images of the corresponding cultivar.

It can be seen from the confusion matrix that the misrecognition mainly occurs between Ganshi 83-4, Changlin 53, Ganshi 84-8, and Gan 447. The analysis revealed that misidentification occurred mainly because the oil tea fruits of Ganshi 83-4, Changlin 53, Ganshi 84-8, and Gan 447 exhibited minimal differences in color, shape, and texture. In addition, the inconsistent light intensity and chaotic field background also caused some interference in the recognition of oil tea cultivars. The above factors equally influence the judgment of human experts. Therefore, these misrecognitions are tolerable to some extent.

To quantitatively evaluate the cultivar recognition results of the BA-EfficientNet model, the accuracy (Acc), precision (P), recall (R), F1-score (F_1-score), overall accuracy (OA), and kappa coefficient (K_c) commonly used in the cultivar recognition field were selected as evaluation indicators [39,40,41]. Table 1 shows the quantitative evaluation results of the proposed BA-EfficientNet model for the five cultivars of oil tea in the test set.

As shown in Table 1, BA-EfficientNet achieves an overall accuracy of 91.59% for the recognition of Ganshi 83-4, Changlin 53, Changlin 3, Ganshi 84-8, and Gan 447. In general, the BA-EfficientNet model recognizes oil tea cultivars better under natural conditions. However, the recognition ability of oil tea for different cultivars using the BA-EfficientNet model also showed significant differences. For Ganshi 84-8, the accuracy, precision, recall, and F1-scores recognized by the BA-EfficientNet model were 95.80%, 90.18%, 86.47%, and 88.29%, respectively. Compared with the recognition results of Changlin 3, the accuracy, precision, recall, and F1-score were 3.88%, 7.96%, 13.53%, and 10.77% lower, respectively. Compared with the recognition results of Changlin 53, the accuracy, precision, recall, and F1-score were 0.86%, 2.49%, 0.95%, and 1.68% lower, respectively.

4. Discussion

4.1. Comparison and Analysis of Cultivar Recognition Results

To further verify the cultivar recognition results of the method proposed in this paper, the BA-EfficientNet model is compared with InceptionV3 [42], VGG16 [43], and ResNet50 [44], which are widely used in the field of cultivar recognition. The recognition results of the test method for the test set images are presented in the form of a confusion matrix, as shown in Figure 7.

As shown in Figure 7, when using InceptionV3, VGG16, and ResNet 50 to recognize the oil tea cultivars in the test set, misrecognition also mainly occurred between Ganshi 83-4, Changlin 53, Ganshi 84-8, and Gan 447. Nevertheless, the BA-EfficientNet model has a much lower misrecognition number than the other models. Moreover, BA-EfficientNet did not show misrecognition when recognizing Changlin 3, which was a cultivar with a small number of misidentifications in the recognition process of other models. From the above confusion matrix of different cultivar recognition models, it can be found that for oil tea cultivars that are difficult to be accurately recognized by other models, BA-EfficientNet model rarely shows misidentification. For the oil tea cultivars that can be accurately recognized by other models, BA-EfficientNet did not show misrecognition.

As shown in Table 2, the accuracy, precision, recall, F1-scores, overall accuracy, and kappa coefficient of the four oil tea cultivar recognition models are compared. It can be seen from the Table 2 that the BA-EfficientNet model is significantly better that other three models in all evaluation indexes. The overall accuracy and kappa coefficient of InceptionV3 is relatively low, 76.83% and 0.70, respectively. The overall accuracy and kappa coefficient of the VGG16 model were slightly better than those of the InceptionV3 model, with 78.02% and 0.72, respectively. Compared with the previous two oil tea cultivar recognition models, the overall accuracy and kappa coefficient of ResNet50 is higher, reaching 83.94% and 0.80. The ResNet50 also obtained better experimental results than the VGG16 model in the study of cultivar recognition of chrysanthemum and classification of rapeseed image [45,46]. This may be related to the greater number of hidden layers and the skip-connection structure in the ResNet50 model [47].However, for Ganshi 83-4, Changlin 53, Ganshi 84-8, and Gan 447, the InceptionV3, VGG16 and ResNet50 failed to recognize them accurately.

The overall accuracy and kappa coefficient of BA-EfficientNet model for recognizing oil tea cultivars were 91.59% and 0.89, respectively, which were both better than the other methods in the comparison experiments. Moreover, for the four oil tea cultivars, Ganshi 83-4, Changlin 53, Ganshi 84-8 and Gan 447, the BA-EfficientNet model also recognized them accurately. All in all, among the four compared models, the BA-EfficientNet can obtain highest overall accuracy and kappa coefficient.

4.2. Ablation Experiments

To quantitatively evaluate the specific effects of bilinear networks and CBAM modules on oil tea cultivar recognition results, numerous ablation experiments were designed, and the experimental results as shown in Table 3. Where the EfficientNet-B0 model has not adopted any improvement strategy. The EfficientNet-CBAM model has adopted the improvement strategy of CBAM attention. The Billinear EfficientNet model has adopted the improvement strategy of bilinear network. The BA-EfficientNet has adopted both the CBAM attention improvement strategy and the bilinear network improvement strategy.

As shown in Table 3, the overall accuracy and kappa coefficient of EfficientNet-B0 for recognizing oil tea cultivars were 86.75% and 0.83, respectively, when no improvement strategy was used. When CBAM attention improvement strategy was used, the overall accuracy and kappa coefficient of oil tea cultivar recognition by EfficientNet-CBAM reached 88.36% and 0.85, respectively. The experimental results show that the CBAM module can significantly improve the oil tea cultivar recognition ability of EfficientNet model. This is also in line with the similar study [48]. When the bilinear network improvement strategy was used, the overall accuracy and kappa coefficient of oil tea cultivar recognition by Bilinear EfficientNet reached 88.90% and 0.86, respectively. The experimental results showed that building a bilinear network also significantly improved the oil tea cultivar recognition ability of EfficientNet model. This is in consistent with the conclusion of the study by Prakash et al. [49]. Ablation experimental results show that both CBAM attention improvement strategy and bilinear network improvement strategy can significantly improve the recognition ability of oil tea varieties by the model. Moreover, the overall accuracy and kappa coefficient of the BA-EfficientNet model reached 91.59% and 0.89 when the CBAM attention improvement strategy and the bilinear network improvement strategy were used jointly. The ablation experiments demonstrate that when the CBAM attention improvement strategy and the bilinear network improvement strategy are used jointly, the cultivar recognition results of BA-EfficientNet outperform the results when only either improvement strategy is used. This is also in line with the similar study by Wang et al. [50].

4.3. Visual Analysis of Oil Tea Cultivar Recognition

To fully understand the capacity of the proposed BA-EfficientNet model for oil tea cultivar recognition, the Grad-CAM visualization algorithm was used to conduct the visual descriptions of the cultivar recognition of the BA-EfficientNet model [51]. In particular, the randomly selected images of different cultivars of oil tea were tested, and the heat map for each cultivar is shown in Figure 8.

According to the heat maps of BA-EfficientNet generated using the Grad-CAM visualization algorithm, we found that when recognizing oil tea cultivars, the BA-EfficientNet model pays significant attention to the area where the oil tea fruit is located in the image, and seldom pays attention to the leaves and background area. According to the experience of human experts, the fruit area is the key for recognizing oil tea cultivars. Therefore, we determined that the BA-EfficientNet model can focus on and fully excavate the fruit regions that play a key role in the recognition of oil tea cultivars. The visualization results of the Grad-CAM algorithm explain the reasons that the BA-EfficientNet model has higher recognition accuracy of oil tea cultivars than other models.

5. Conclusions

In this study, an automatic oil tea cultivar recognition model, BA-EfficientNet, based on the bilinear attention EfficientNet-B0 network, was proposed. After testing, the accuracy and kappa coefficient of BA-EfficientNet for oil tea cultivar recognition were 91.59% and 0.89, respectively. For the recognition of a single oil tea cultivar, the F1-score of the BA-EfficientNet model for Ganshi 83-4, Changlin 53, Changlin 3, Ganshi 84-8, and Gan 447 were 90.52%, 89.97%, 99.06%, 88.29%, and 90.84%, respectively. The results of the recognition of oil tea cultivars in the test set when applying the BA-EfficientNet model show that the model can recognize cultivars of oil tea well under natural conditions.

Comparing the results of BA-EfficientNet in recognizing oil tea from the test set when using InceptionV3, VGG16, ResNet50, EfficientNet-B0, and EfficientNet-CBAM, the experiment results show that BA-EfficientNet has obvious advantages in terms of accuracy, kappa coefficient, and F1-score. The results of the comparative experiments show that the BA-EfficientNet model is a reliable algorithm for recognizing cultivars of oil tea. The visualization results of the Grad-CAM visualization algorithm on the BA-EfficientNet model also prove that the BA-EfficientNet model can focus on and fully excavate the fruit regions that play a key role in the recognition of oil tea cultivars. Next, we will improve the BA-EfficientNet and design a lightweight oil tea cultivar recognition algorithm that can be applied to a mobile terminal.

Author Contributions

Conceptualization, X.Z.; methodology, X.Z.; software, X.Z.; validation, X.Z. and Y.Y.; formal analysis, X.Z.; investigation, Y.Y.; data curation, Y.Y. and S.S.; writing—original draft preparation, X.Z.; writing—review and editing, F.C. and Y.Z.; visualization, Y.Y.; supervision, F.C.; project administration, F.C.; funding acquisition, F.C. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “National Key Research and Development Program of China: grant number 2019YFD1002401” and “the Fundamental Research Funds for the Central Universities: grant number 2021ZY74”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Deng, Q.; Li, J.; Gao, C.; Cheng, J.; Deng, X.; Jiang, D.; Li, L.; Yan, P. New perspective for evaluating the main Camellia oleifera cultivars in China. Sci. Rep. 2020, 10, 20676. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Chen, L.; Tang, W.; Peng, S.; Li, M.; Deng, N.; Chen, Y. Predicting potential distribution and evaluating suitable soil condition of oil tea Camellia in China. Forests 2018, 9, 487. [Google Scholar] [CrossRef] [Green Version]
Zhang, F.; Li, Z.; Zhou, J.; Gu, Y.; Tan, X. Comparative study on fruit development and oil synthesis in two cultivars of Camellia oleifera. BMC Plant Biol. 2021, 21, 348. [Google Scholar] [CrossRef] [PubMed]
Wen, Y.; Su, S.; Ma, L.; Yang, S.; Wang, Y.; Wang, X. Effects of canopy microclimate on fruit yield and quality of Camellia oleifera. Sci. Hortic. 2018, 235, 132–141. [Google Scholar] [CrossRef]
Zeng, W.; Endo, Y. Effects of cultivars and geography in China on the lipid characteristics of Camellia oleifera seeds. J. Oleo Sci. 2019, 68, 1051–1061. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cheng, J.; Jiang, D.; Cheng, H.; Zhou, X.; Fang, Y.; Zhang, X.; Xiao, X.; Deng, X.; Li, L. Determination of Camellia oleifera Abel. germplasm resources of genetic diversity in China using ISSR markers. Not. Bot. Horti Agrobot. Cluj-Napoca. 2018, 46, 501–508. [Google Scholar] [CrossRef] [Green Version]
Chao, W.; Tang, C.; Zhang, J.; Yu, L.; Yoichi, H. Development of a stable SCAR marker for rapid identification of Ganoderma lucidum Hunong 5 cultivar using DNA pooling method and inter-simple sequence repeat markers. J. Integr. Agric. 2018, 17, 130–138. [Google Scholar] [CrossRef]
Ding, P.; Zhou, H.; Shang, J.; Zou, X.; Wang, M. Object detection via flexible anchor generation. Int. J. Pattern Recogn. 2021, 35, 2155012. [Google Scholar] [CrossRef]
Tang, Y.; Cheng, Z.; Miao, A.; Zhuang, J.; Hou, C.; He, Y.; Chu, X.; Luo, S. Evaluation of cultivar identification performance using feature expressions and classification algorithms on optical images of sweet corn seeds. Agronomy 2020, 10, 1268. [Google Scholar] [CrossRef]
Kim, M.; Jung, J.; Shim, E.; Chung, S.; Park, Y.; Lee, G.; Sim, S. Genome-wide SNP discovery and core marker sets for DNA barcoding and variety identification in commercial tomato cultivars. Sci. Hortic. 2021, 276, 109734. [Google Scholar] [CrossRef]
Chen, Y.; Wang, B.; Chen, J.; Wang, X.; Wang, R.; Peng, S.; Chen, L.; Ma, L.; Luo, J. Identification of rubisco rbcL and rbcS in Camellia oleifera and their potential as molecular markers for selection of high tea oil cultivars. Front Plant Sci. 2015, 6, 189. [Google Scholar] [CrossRef] [Green Version]
Calzone, A.; Cotrozzi, L.; Lorenzini, G.; Nali, C.; Pellegrini, E. Hyperspectral detection and monitoring of salt stress in pomegranate cultivars. Agronomy 2021, 11, 1038. [Google Scholar] [CrossRef]
Naeem, S.; Ali, A.; Chesneau, C.; Tahir, M.H.; Jamal, F.; Sherwani, R.A.K.; Ul Hassan, M. The classification of medicinal plant leaves based on multispectral and texture feature using machine learning approach. Agronomy 2021, 11, 263. [Google Scholar] [CrossRef]
Zhu, S.; Zhang, J.; Chao, M.; Xu, X.; Song, P.; Zhang, J.; Huang, Z. A rapid and highly efficient method for the identification of soybean seed varieties: Hyperspectral images combined with transfer learning. Molecules. 2020, 25, 152. [Google Scholar] [CrossRef] [Green Version]
Liu, D.; Wang, L.; Sun, D.; Zeng, X.; Qu, J.; Ma, J. Lychee variety discrimination by hyperspectral imaging coupled with multivariate classification. Food Anal. Methods 2014, 7, 1848–1857. [Google Scholar] [CrossRef]
Li, H.; Zhang, L.; Sun, H.; Rao, Z.; Ji, H. Identification of soybean varieties based on hyperspectral imaging technology and one-dimensional convolutional neural network. J. Food Process Eng. 2021, 44, e13767. [Google Scholar] [CrossRef]
Zhang, J.; Dai, L.; Cheng, F. Corn seed variety classification based on hyperspectral reflectance imaging and deep convolutional neural network. Food Meas. 2021, 15, 484–494. [Google Scholar] [CrossRef]
Liu, Y.; Su, J.; Shen, L.; Lu, N.; Fang, Y.; Liu, F.; Song, Y.; Su, B. Development of a mobile application for identification of grapevine (Vitis vinifera L.) cultivars via deep learning. Int. J. Agric. Biol. Eng. 2021, 14, 172–179. [Google Scholar] [CrossRef]
Kaya, A.; Keceli, A.; Catal, C.; Yalic, H.; Temucin, H.; Tekinerdogan, B. Analysis of transfer learning for deep neural network based plant classification models. Comput. Electron Agric. 2019, 158, 20–29. [Google Scholar] [CrossRef]
Zhang, C.; Zhao, Y.; Yan, T.; Bai, X.; Xiao, Q.; Gao, P.; Li, M.; Huang, W.; Bao, Y.; He, Y.; et al. Application of near-infrared hyperspectral imaging for variety identification of coated maize kernels with deep learning. Infrared Phys. Technol. 2020, 111, 103550. [Google Scholar] [CrossRef]
Abbaspour-Gilandeh, Y.; Molaee, A.; Sabzi, S.; Nabipur, N.; Shamshirband, S.; Mosavi, A. A combined method of image processing and artificial neural network for the identification of 13 Iranian rice cultivars. Agronomy 2020, 10, 117. [Google Scholar] [CrossRef] [Green Version]
Osako, Y.; Yamane, H.; Lin, S.; Chen, P.; Tao, R. Cultivar discrimination of litchi fruit images using deep learning. Sci. Hortic. 2020, 269, 109360. [Google Scholar] [CrossRef]
Barré, P.; Stöver, B.; Müller, K.; Steinhage, V. LeafNet: A computer vision system for automatic plant species identification. Ecol. Inform. 2017, 40, 50–56. [Google Scholar] [CrossRef]
Taner, A.; Öztekin, Y.B.; Duran, H. Performance analysis of deep learning CNN models for variety classification in hazelnut. Sustainability 2021, 13, 6527. [Google Scholar] [CrossRef]
Franczyk, B.; Hernes, M.; Kozierkiewicz, A.; Kozina, A.; Pietranik, M.; Roemer, I.; Schieck, M. Deep learning for grape variety recognition. Procedia Comput. Sci. 2020, 176, 1211–1220. [Google Scholar] [CrossRef]
Nasiri, A.; Taheri-Garavand, A.; Fanourakis, D.; Zhang, Y.; Nikoloudakis, N. Automated grapevine cultivar identification via leaf imaging and deep convolutional neural networks: A proof-of-concept study employing primary iranian varieties. Plants 2021, 10, 1628. [Google Scholar] [CrossRef]
Yang, G.; He, Y.; Yang, Y.; Xu, B. Fine-grained image classification for crop disease based on attention mechanism. Front. Plant Sci. 2020, 11, 600854. [Google Scholar] [CrossRef]
Zhang, C.; Li, T.; Zhang, W. The detection of impurity content in machine-picked seed cotton based on image processing and improved YOLO V4. Agronomy 2022, 12, 66. [Google Scholar] [CrossRef]
Su, W.-H.; Zhang, J.; Yang, C.; Page, R.; Szinyei, T.; Hirsch, C.D.; Steffenson, B.J. Automatic evaluation of wheat resistance to fusarium head blight using dual Mask-RCNN deep learning frameworks in computer vision. Remote Sens. 2021, 13, 26. [Google Scholar] [CrossRef]
Kumar, M.; Gupta, S.; Gao, X.; Singh, A. Plant species recognition using morphological features and adaptive boosting methodology. IEEE Access 2019, 7, 163912–163918. [Google Scholar] [CrossRef]
Mi, Z.; Zhang, X.; Su, J.; Han, D.; Su, B. Wheat stripe rust grading by deep learning with attention mechanism and images from mobile devices. Front. Plant Sci. 2020, 11, 558126. [Google Scholar] [CrossRef]
Pang, C.; Wang, W.; Lan, R.; Shi, Z.; Luo, X. Bilinear pyramid network for flower species categorization. Multimed. Tools Appl. 2021, 80, 215–225. [Google Scholar] [CrossRef]
Rzanny, M.; Mäder, P.; Deggelmann, A.; Chen, M.; Widchen, J. Flowers, leaves or both? How to obtain suitable images for automated plant identification. Plant Methods 2019, 15, 77. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv 2019, arXiv:1905.11946. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference of Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef] [Green Version]
Woo, S.; Park, J.; Lee, J.; Kweon, I. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef] [Green Version]
Liu, S.; Zhong, X.; Shih, F. Joint learning for pneumonia classification and segmentation on medical images. Int. J. Pattern Recogn. 2021, 35, 2157003. [Google Scholar] [CrossRef]
Men, H.; Yuan, H.; Shi, Y.; Liu, M.; Wang, Q.; Liu, J. A residual network with attention module for hyperspectral information of recognition to trace the origin of rice. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2021, 263, 120155. [Google Scholar] [CrossRef]
Duong, L.; Nguyen, P.; Sipio, C.; Ruscio, D. Automated fruit recognition using EfficientNet and MixNet. Comput. Electron Agric. 2020, 171, 105326. [Google Scholar] [CrossRef]
Liu, J.; Wang, M.; Bao, L.; Li, X. EfficientNet based recognition of maize diseases by leaf image classification. J. Phys. Conf. Ser. 2020, 1693, 012148. [Google Scholar] [CrossRef]
Zhang, P.; Yang, L.; Li, D. EfficientNet-B4-Ranger: A novel method for greenhouse cucumber disease recognition under natural complex environment. Comput. Electron Agric. 2020, 176, 105652. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the European Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef] [Green Version]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference of Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Liu, Z.; Wang, J.; Tian, Y.; Dai, S. Deep learning for image-based large-flowered chrysanthemum cultivar recognition. Plant Methods 2019, 4, 146. [Google Scholar] [CrossRef]
Mirzazadeh, A.; Azizi, A.; Abbaspour-Gilandeh, Y.; Hernández-Hernández, J.L.; Hernández-Hernández, M.; Gallardo-Bernal, I. A novel technique for classifying bird damage to tapeseed plants based on a deep learning algorithm. Agronomy 2021, 11, 2364. [Google Scholar] [CrossRef]
Azizi, A.; Gilandeh, Y.A.; Mesri-Gundoshmian, T.; Saleh-Bigdeli, A.A.; Moghaddam, H.A. Classification of soil aggregates: A novel approach based on deep learning. Soil Tillage Res. 2020, 199, 104586. [Google Scholar] [CrossRef]
Zhu, X.; Zhang, X.; Sun, Z.; Zheng, Y.; Su, S.; Chen, F. Identification of oil tea (Camellia oleifera C.Abel) cultivars using EfficientNet-B4 CNN model with attention mechanism. Forests 2022, 13, 1. [Google Scholar] [CrossRef]
Prakash, A.; Prakasam, P. An intelligent fruits classification in precision agriculture using bilinear pooling convolutional neural networks. Vis. Comput. 2022. [Google Scholar] [CrossRef]
Wang, P.; Liu, J.; Xu, L.; Huang, P.; Luo, X.; Hu, Y.; Kang, Z. Classification of Amanita species based on bilinear networks with attention mechanism. Agriculture 2021, 11, 393. [Google Scholar] [CrossRef]
Selvaraju, R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The location of study area and typical oil tea images: (a) the location of the study area, (b) the real view of the study area, (c) the typical image of Ganshi 83-4, (d) the typical image of Changlin 53, (e) the typical image of Changlin 3, (f) the typical image of Ganshi 84-8, and (g) the typical image of Gan 447.

Figure 2. Model flowchart of the proposed BA-EfficientNet model. Conv represents convolution, MBConv represents mobile inverted bottleneck convolution, f_A(L) represents the features of position L extracted by EfficientNet-B0-A, f_B(L) represents the features of position L extracted by EfficientNet-B0-B, Softmax represents Softmax classifier.

Figure 3. Mobile inverted bottleneck convolution module. Conv represents convolution, DWConv represents depthwise convolution, Swish represents swish activate function, Sigmoid represents sigmoid activate function.

Figure 4. Convolutional block attention module.

Figure 5. The training loss curve of BA-EfficientNet.

Figure 6. Confusion matrix over test set cultivar recognition results.

Figure 7. Confusion matrix over test set of different cultivar recognition methods: (a) the confusion matrix of InceptionV3, (b) the confusion matrix of VGG16, (c) the confusion matrix of ResNet50, and (d) the confusion matrix of BA-EfficientNet.

Figure 8. Attention heat maps of BA-EfficientNet for oil tea cultivar recognition: (a) the original image of Ganshi 83-4, (b) the original image of Changlin 53, (c) the original image of Changlin 3, (d) the original image of Ganshi 84-8, (e) the original image of Gan 447, (f) the heat-map image of Ganshi 83-4 created by the Grad-CAM, (g) the heat-map image of Changlin 53 created by the Grad-CAM, (h) the heat-map image of Changlin 3 created by the Grad-CAM, (i) the heat-map image of Ganshi 84-8 created by the Grad-CAM, (j) the heat-map image of Gan 447 created by the Grad-CAM, (k) the heat map of Ganshi 83-4 after fusion with the original image, (l) the heat map of Changlin 53 after fusion with the original image, (m) the heat map of Changlin 3 after fusion with the original image, (n) the heat map of Ganshi 84-8 after fusion with the original image, (o) the heat map of Gan 447 after fusion with the original image. Note: the warm colors suggest that the region more strongly contributes to the cultivar recognition.

Table 1. Cultivar recognition results of oil tea by BA-EfficientNet over test set.

Cultivar	TP	TN	FP	FN	Acc (%)	P (%)	R (%)	F_1-score (%)	OA (%)	K_c
Ganshi 83-4	148	749	5	26	96.66	96.73	85.06	90.52	91.59	0.89
Changlin 53	139	758	11	20	96.66	92.67	87.42	89.97
Changlin 3	158	767	3	0	99.68	98.14	100	99.06
Ganshi 84-8	147	742	16	23	95.80	90.18	86.47	88.29
Gan 447	258	618	43	9	94.40	85.71	96.63	90.84

Table 2. Comparing different oil tea cultivar recognition methods over test set.

Model	Cultivars	TP	TN	FP	FN	ACC (%)	P (%)	R (%)	F_1-score (%)	OA (%)	K_c
InceptionV3	Ganshi 83-4	103	731	23	71	89.87	81.75	59.20	68.67	76.83	0.70
	Changlin 53	96	738	31	63	89.87	75.59	60.38	67.13
	Changlin 3	152	763	7	6	98.60	95.60	96.20	95.90
	Ganshi 84-8	113	696	62	57	87.18	64.57	66.47	65.51
	Gan 447	249	569	92	18	88.15	73.02	93.26	81.91
VGG16	Ganshi 83-4	85	747	7	89	89.66	92.39	48.85	63.91	78.02	0.72
	Changlin 53	101	759	10	58	92.67	90.99	63.52	74.81
	Changlin 3	155	755	15	3	98.06	91.18	98.10	94.51
	Ganshi 84-8	151	647	111	19	85.99	57.63	88.82	69.90
	Gan 447	232	600	61	35	89.66	79.18	86.89	82.86
ResNet50	Ganshi 83-4	119	738	16	55	92.35	88.15	68.39	77.02	83.94	0.80
	Changlin 53	98	763	6	61	92.78	94.23	61.64	74.53
	Changlin 3	155	764	6	3	99.03	96.27	98.10	97.18
	Ganshi 84-8	150	688	70	20	90.30	68.18	88.24	76.92
	Gan 447	257	610	51	10	93.43	83.44	96.25	89.39
BA-EfficientNet	Ganshi 83-4	148	749	5	26	96.66	96.73	85.06	90.52	91.59	0.89
	Changlin 53	139	758	11	20	96.66	92.67	87.42	89.97
	Changlin 3	158	767	3	0	99.68	98.14	100	99.06
	Ganshi 84-8	147	742	16	23	95.80	90.18	86.47	88.29
	Gan 447	258	618	43	9	94.40	85.71	96.63	90.84

Table 3. The experimental results of ablation experiments.

Model	Cultivars	TP	TN	FP	FN	ACC (%)	P (%)	R (%)	F_1-score (%)	OA (%)	K_c
EfficientNet-B0	Ganshi 83-4	133	743	11	41	94.40	92.36	76.44	83.65	86.75	0.83
	Changlin 53	132	753	16	27	95.37	89.19	83.02	85.99
	Changlin 3	158	759	11	0	98.81	93.49	100	96.64
	Ganshi 84-8	151	707	51	19	92.46	74.75	88.82	81.18
	Gan 447	231	627	34	36	92.46	87.17	86.52	86.84
EfficientNet-CBAM	Ganshi 83-4	136	750	4	38	95.47	97.14	78.16	86.62	88.36	0.85
	Changlin 53	126	765	4	33	96.01	96.92	79.25	87.20
	Changlin 3	158	762	8	0	99.14	95.18	100	97.53
	Ganshi 84-8	144	732	26	26	94.40	84.71	84.71	84.71
	Gan 447	256	595	66	11	91.70	79.50	95.88	86.93
Bilinear EfficientNet	Ganshi 83-4	138	745	9	36	95.15	93.88	79.31	85.98	88.90	0.86
	Changlin 53	136	751	18	23	95.58	88.31	85.53	86.90
	Changlin 3	158	765	5	0	99.46	96.93	100	98.44
	Ganshi 84-8	153	728	30	17	94.94	83.61	90	86.69
	Gan 447	240	620	41	27	92.67	85.41	89.89	87.59
BA-EfficientNet	Ganshi 83-4	148	749	5	26	96.66	96.73	85.06	90.52	91.59	0.89
	Changlin 53	139	758	11	20	96.66	92.67	87.42	89.97
	Changlin 3	158	767	3	0	99.68	98.14	100	99.06
	Ganshi 84-8	147	742	16	23	95.80	90.18	86.47	88.29
	Gan 447	258	618	43	9	94.40	85.71	96.63	90.84

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, X.; Yu, Y.; Zheng, Y.; Su, S.; Chen, F. Bilinear Attention Network for Image-Based Fine-Grained Recognition of Oil Tea (Camellia oleifera Abel.) Cultivars. Agronomy 2022, 12, 1846. https://doi.org/10.3390/agronomy12081846

AMA Style

Zhu X, Yu Y, Zheng Y, Su S, Chen F. Bilinear Attention Network for Image-Based Fine-Grained Recognition of Oil Tea (Camellia oleifera Abel.) Cultivars. Agronomy. 2022; 12(8):1846. https://doi.org/10.3390/agronomy12081846

Chicago/Turabian Style

Zhu, Xueyan, Yue Yu, Yili Zheng, Shuchai Su, and Fengjun Chen. 2022. "Bilinear Attention Network for Image-Based Fine-Grained Recognition of Oil Tea (Camellia oleifera Abel.) Cultivars" Agronomy 12, no. 8: 1846. https://doi.org/10.3390/agronomy12081846

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bilinear Attention Network for Image-Based Fine-Grained Recognition of Oil Tea (Camellia oleifera Abel.) Cultivars

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Collection and Dataset Construction

2.3. Fine-Grained Oil Tea Cultivar Recognition Model

2.3.1. EfficientNet-B0 Network

2.3.2. CBAM Module

2.3.3. Bilinear Pooling

3. Results

3.1. Experiment Setup

3.2. Cultivar Recognition Results and Analysis

4. Discussion

4.1. Comparison and Analysis of Cultivar Recognition Results

4.2. Ablation Experiments

4.3. Visual Analysis of Oil Tea Cultivar Recognition

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI