1. Introduction
Maize (
Zea mays L.) is an annual herb in the Poaceae family, and is one of the world’s most important food crops, grown in the United States, China, Brazil, and other countries. Maize is also an important source of feed for animal husbandry and farming, as well as one of the most important raw materials in various industries [
1,
2]. Variety purity is an important criterion for seed quality testing. Because of the morphological characteristics and visual similarity of maize seeds, even experts struggle to identify them with the naked eye, necessitating a significant amount of labor and time for identification.
In recent years, computer-vision technology has been widely adopted in agriculture. Yang Hang [
3] applied the Wilk-lambda stepwise discriminant method for band selection, and established a discriminant model. The average recognition accuracy of five types of maize seed was found to be 91.6%, except GAOYOU115, which was found to be 87%. Cheng Hong [
4] used the Support Vector Machine (SVM) algorithm to train the maize seed images, and the recognition accuracy reached 92.3%. Yang Shuqin [
5] used the BP artificial neural network method to perform the training recognition, and the results showed that the overall identification was 93% for four varieties of maize. Moges T.G. [
6] proposed a hybrid Convolutional Neural Network (CNN) and HOG features trained with an SVM classifier, and the recognition rate was observed to be 99%.
Deep learning is a crucial branch of machine learning. Compared with conventional machine learning, deep learning reduces the incompleteness of the manually extracted features, and can automatically extract several complex features from the input data, with stronger objectivity [
7,
8,
9]. The VGG16 network was used by Tu K. [
10] to identify the authenticity of the maize variety “JINGKE 968”, and the results highlighted that the best detection accuracy exceeded 99%. Zhou Q. [
11] proposed a method for CNN to reshape the pixel spectral images. The results demonstrated that for six varieties of common maize seeds, the test accuracy of germ and non-germ surfaces was 93.33% and 95.56%, respectively. For six varieties of sweet maize seeds, the test accuracy of germ and non-germ surfaces was 97.78% and 98.15%, respectively. Kurtulmus F. [
12] used multiple deep-learning methods to identify four varieties of sunflower seeds, and GoogleNet succeeded in achieving the highest classification accuracy (95%).
With the development of transfer learning [
13] and CNN, unlike networks with huge parameters such as VGG16(Simonyan and Zisserman 2015), more and more deep lightweight neural networks such as DenseNet [
14], NASNet [
15], MobileNetV2 [
16], SqueezeNet [
17], and Xception [
18] are being examined and focused on for ease of deployment on mobile terminals.
To classify crop diseases, Moyazzoma R. [
19] used the MobileNetV2 network and obtained a validated accuracy of 90.38%. He further applied this method in agriculture to help farmers classify diseases from the harvest. Khan E. [
20] proposed a deep learning-based technique to classify six different citrus diseases that severely impact the yield and quality of citrus fruits, using the whale optimization algorithm for optimization. The SqueezeNet model outperformed MobileNetV2, achieving an accuracy of 96%. Feng Xiao [
21] used MobileNetV2 to construct maize seed images for variety identification. The results showed that the double-sided recognition accuracy of maize-kernel double-sided characteristics modeling was 99.83%, which was better than single-sided characteristics modeling and recognition. Elfatimi E. [
22] classified the bean leaf dataset using the MobileNetV2 model, and achieved an average classification accuracy of over 97% on the training dataset, and over 92% on the test data. Jaithavil D. [
23] used multiple CNNs to classify rice seed varieties, and the experimental results illustrated that the overall accuracy of VGG16, InceptionV3, and MobileNetV2 was 80.00%, 83.33%, and 83.33%, respectively. Zhang Z [
24] proposed a rice disease identification system using lightweight MobileNetV2. He further compared it with the original model, and found that the memory usage was reduced by 74%, the number of floating-point operations per second was reduced by 49%, the number of parameters was reduced by 50%, and the accuracy of rice disease recognition was improved from 0.16% to 90.84%. Hamid Y. [
25] used MobileNetV2 to classify 14 different categories of seeds, and the results demonstrated an accuracy of 98% and 95% for the training and test sets, respectively.
In recent years, attention mechanisms such as SENet [
26], CBAM [
27], ECA [
28] have attracted extensive research, adding CNN to enhance learning ability. Wang S.H. [
29] proposed a novel VGG-Style Base Network (VSBN) as a backbone network, introduced a Convolutional Block Attention Module (CBAM), and achieved sensitivity/precision/F1 all above 95%. Jia L [
30] proposed an improved MobileNetV3 model that can perform near real-time detection on mobile terminals with an accuracy of 97%. Shahi T.B. [
31] combined CBAM with MobileNetV2, and the results proved that it could outperform other methods, with fewer trainable parameters and higher classification accuracy. Zhu X. [
32] established EfficientNet-B4-CBAM to identify
Camellia oleifera cultivars. The overall accuracy of the EfficientNet-B4-CBAM model on the test dataset reached 97.02%, with a kappa coefficient of 0.96, which was significantly higher than that of the other methods used in the comparison experiments. Wang Meihua [
33] proposed a new parallel mixed attention module, I_CBAM, which exhibited a better identification effect on the fine-grained classification of pests and diseases.
Yonis Gulzar [
34] proposed using migration learning and CNN networks to classify 14 seeds with 99% accuracy, but the common seeds used in the paper have very different properties and are not very effective in identifying maize seeds with similar properties. Yahya [
35] proposed using migration learning with VGG19 to identify haploid and polyploid maize seeds, with all metrics greater than 93%; however, the VGG19 model has a large number of parameters, a long training time, and the resulting model is unsuitable for mobile deployment. Aqib Ali [
36] first used a machine learning algorithm to obtain nine optimized features from the collected maize images, and then used a random forest to construct the model; Zhou [
11] used hyperspectral maize images combined with CNN to identify different varieties of maize seeds; and Cao Weishi [
37] proposed a combination of DWT and BP neural networks to identify the purity of maize seeds with good results, but our proposed method eliminates the complex hyperspectral image acquisition steps and significantly reduces research costs. Kha-lied [
38] proposed a model based on the MobileNetV2 architecture that could identify eight date species with 99% accuracy, demonstrating the superiority of CNNin image recognition. The distinction between the dates specified in this study was more obvious, and good recognition results were obtained by fine-tuning the MobileNetV2 network with a fully connected layer, but the improved method will still have a large model.
In this study, a new mixed attention module, I_CBAM, was constructed with MobileNetV2 as the benchmark model, and a more effective CNN I_CBAM_MobileNetV2 was proposed to achieve adaptive refinement of feature channels and spaces. This helps to solve the problem of mutual interference between two types of attention when CBAM is cascade-connected. In this experiment, based on the self-created dataset containing 3938 images of 11 maize seed varieties, several comparative experiments were conducted between I_CBAM_MobileNetV2, MobileNetV2, MobileNetV3, DenseNet121, Xception, ResNet50, and E-AlexNet. The results highlighted that the proposed I_CBAM_MobileNetV2 significantly outperforms other methods, and provides technical support for the automatic identification and non-destructive detection of maize seed varieties.
2. Materials and Methods
2.1. Maize Seed Image Collection
A total of 11 maize varieties were selected for this experiment, including AOYU 116, ZHENGDAN 958, XJH, JINGNIAN 1, KENUO 58, TIEYAN, DENGHAI 605, LIYUAN, JINYU 118, YUNYU, and BT506 (provided by Shandong Academy of Agricultural Sciences). A photographing console was built under natural lighting conditions in the laboratory, and a Canon EOS 80D digital camera at a distance of 50 cm was used to capture images of multi-grain maize seeds vertically, using black flannel as the background; the maize seeds were placed as represented in
Figure 1.
2.2. Image Processing
A single seed image for the identification of maize seeds was used in this experiment, so the single maize seed image needed to be segmented out, as demonstrated in
Figure 2.
To outline the maize seeds, the OpenCV contour algorithm was used. The image segmentation method is as follows:
The multi-grain image is subjected to binarization, bilateral filtering, erosion expansion, and edge particle removal operations.
The maize seeds are outlined using an OpenCV contour algorithm.
For the original multi-grain seed image, repeat the preceding operations. The single seed image is segmented further to yield an 11-class maize seed image dataset.
A double-sided mixed dataset of maize germ surface and non-germ surface was established based on previous research [
21]. The original dataset was divided into training set, validation set, and test set by a ratio of 7:2:1. The dataset of maize seeds (double-sided mixture) is mentioned in
Table 1. To address the problem of insufficient generalization ability due to small samples, the training set was augmented by rotation, height shift, width shift, shear, zoom, and horizontal flip.
Figure 3 depicts a sample of some of the enhanced images.
2.3. Improved Attention Mechanism CBAM
CABM is a lightweight attention mechanism module that can be easily embedded into the existing popular convolutional neural network structures without any extra computation. To achieve the rescaling operation of the original features, it uses two pooling methods: maximum pooling, and average pooling. It generates weights from the information of two dimensions (channel and space). The detailed structure of the CBAM module is represented in
Figure 4.
The CBAM module first inputs the feature map F, which is weighted by the channel attention as F1; then, the output feature map F2 is weighted by the spatial attention, which is a “cascade connection” structure. The process formula is as below:
where:
Mc(F) is the output power of
F passing through the channel attention;
Ms(F1) denotes the output power of F1 passing through the spatial attention;
⊗ denotes the feature map weighted multiplicative operator.
Whether channel attention is enabled before spatial attention (channel → spatial, i.e., CBAM), or spatial attention is enabled before the channel attention (spatial → channel, i.e., reverse CBAM, or R_CBAM), the weights ranked later would be generated from the feature maps ranked earlier. To some extent, the input of the attentional features ranked at the back is influenced by the front-attentional mechanism, and this creates interference and makes the model unstable.
In this experiment, the original “cascade connection” was replaced with a “parallel connection” to improve the serial attention module of CBAM. Thus, both attention modules learn the original input feature map directly, without paying attention to the spatial and channel attention order, thereby leading to an improved CBAM (I_CBAM). The detailed structure of I_CBAM is represented in
Figure 5.
The I_CBAM process formula is:
where
Ms(F) indicates the output power of
F passing through the spatial attention.
Channel attention is computed as follows:
Spatial attention is calculated as follows:
where:
The σ is the Sigmoid Function;
W0, and W1 indicate MLP weights;
and are the average-pooled features;
, and denote the max-pooled features;
Item f 7×7 is a convolution operation with a filter size of 7 × 7.
2.4. I_CBAM_MobileNetV2 Architecture
The Google team proposed the MobileNetV2 network in 2018 [
16]. The accuracy of this network is higher than that of the MobileNetV1 network, and the model is smaller. MobileNetV2 incorporates inverted residuals and linear bottlenecks, significantly improving the model’s accuracy and efficiency.
Images of maize seeds cannot be helpful in clearly distinguishing between different types, and the grain color is very similar. The MobileNetV2 network is strengthened in terms of representation power by adding the I_CBAM module, which enables the adaptive feature refinement of the pictures while emphasizing the local and global information about the images. To accurately identify various maize seed kinds, a new deep-learning model called I_CBAM_MobileNetV2 was created by fusing MobileNetV2 and the improved CBAM module (I_CBAM). The I_CBAM_MobileNetV2 model’s network topology is depicted in
Figure 6.
In contrast to the inclusion of the attention mechanism before transfer learning, the attention module is added at the end of the entire convolutional neural network. In this experiment, the Inverted_res_block was enhanced with the improved CBAM to allow each module to extract more precise features; this contributed to the model’s precision for fine-grained recognition.
2.5. Model Training Environment Configuration
This experiment is based on the TensorFlow platform, using the Keras deep learning framework. Jupyter was used to build the model. The specific parameter configuration is listed in
Table 2.
2.6. Model Performance Evaluation Metrics
In this experiment, accuracy, precision, recall, F1-score, kappa coefficient, and confusion matrix are used as model performance evaluation metrics.
where:
TP is the number of samples that are positive but predicted to be positive;
FP is the number of samples that are negative but predicted to be positive;
FN denotes the number of samples that are positive but predicted to be negative;
TN is the number of samples that are negative but predicted to be negative;
N indicates the number of observation points of the confusion matrix;
Item n is the number of varieties;
Item aii represents elements of the i-th row and the column of the confusion matrix;
Item ai+ represents the sum elements of the i-th row of the confusion matrix;
Item a+i represents the sum elements of the i-th column of the confusion matrix.
5. Conclusions
In this experiment, 11 maize seed varieties were identified using deep-learning-based computer vision technology, and the maize seed variety identification model I_CBAM_MobileNetV2 was proposed. The overall accuracy and kappa coefficient of I_CBAM_MobileNetV2 were 98.21% and 0.9802, respectively.
Compared with the other CNN models in comparison experiments, I_CBAM_MobileNetV2 has several advantages in various evaluation metrics. Meanwhile, the I_CBAM module properly induced the network to focus on the target area. The results of Grad-CAM visual analysis experiments demonstrate that I_CBAM_MobileNetV2 can not only accurately locate the key information of the seeds and expand the activation area when identifying maize seed varieties, but also can fully express the information of the activation area.
This experiment provides an advanced technical scheme for maize seed variety identification, and lays the groundwork for future non-destructive varieties. In the future, the maize seed variety dataset can be expanded, and images of maize seeds with more complex backgrounds can be collected to provide enough data for the maize seed variety identification algorithm. I_CBAM_MobileNetV2’s speed can eventually be optimized, while attempting to remove some blocks from the model, to reduce the model size and deploy it on mobile devices.