1. Introduction
China is the world’s fourth-largest producer of sunflower seeds with the world’s sixth-largest planting area [
1]. Sunflower seeds are one of the important oil crops and their seed kernels are rich in high-quality protein, vitamins, unsaturated potassium and phosphorus, with a fat content of 30–45%, and sometimes even more than 60%. The research and development of new protein foods have shown that the types and content of high-quality protein in sunflower seeds are higher than those of other cereals, and this protein can provide one of the most important sources of plant protein needed by the human body. However, sunflower seeds are prone to abnormalities such as breakage and deformities during planting, harvesting, and storage, which can be attributed to environmental factors and pest infestations. Therefore, there is an urgent need to establish a rapid, accurate, and non-destructive testing method to identify the quality of sunflower seeds.
The existing traditional inspection methods are mainly manual sensory evaluation methods [
2], machine sorting [
3] and X-ray [
4]. However, manual inspection is subjective and prone to visual fatigue, misdetection and omission, affecting inspection efficiency. The machine sorting technology can detect the external characteristics of sunflower seeds, but it cannot detect their internal characteristics. X-ray methods can detect the internal quality of sunflower seeds, but they require professional operating skills and may cause radiation hazards to the operator due to the high energy. Therefore, research on the rapid, accurate, and non-destructive detection of sunflower seed quality is very important for achieving good sunflower seed quality and safety monitoring, as well as the healthy development of the sunflower seed industry.
Terahertz (THz) waves are located between the microwave and infrared light frequencies, in the band of 0.1–10 THz [
5]. They have unique spectral properties, including low energy, non-contact detection, strong ability to penetrate a variety of media materials, etc. They can be used to detect internal defects in different objects, and at the same time, they have a strong communication transmission capability [
6,
7]. These properties have made it possible to apply THz technology in the fields of agricultural product safety testing [
8], drug and biomedicine [
9], grain storage quality testing, and communications [
10]. It enables the non-destructive detection and diagnosis of biomolecules, as well as qualitative and quantitative analyses of the composition of substances. THz spectroscopy provides information about the physical, chemical and molecular structure of the target sample, and is suitable for the study and characterisation of fingerprint properties of substances. THz imaging is based on THz spectroscopy, where each pixel point of the image obtained by scanning and imaging the target sample represents a piece of spectral information [
11]. The THz image of the sample is generated by simple data processing and analysis of the transmitted and reflected signals of the sample to be tested.
In recent years, THz waves have facilitated significant breakthroughs in the quality inspection of sunflower seeds, thanks to their advantages of non-destructive penetration and fast imaging. Sun et al. [
12] selected insect-eaten, defective and intact sunflower seeds as samples and obtained their THz images. The authors modelled and predicted the fullness of these seeds with a coefficient of determination and root mean square error of prediction (RMSEP) of 0.91 and 4%, respectively. Lei et al. [
13] developed a dual autoencoder (AE)–generative adversarial net (GAN) spectral dehulling semi-supervised model and, for the first time, evaluated the distribution of energy and moisture inside sunflower seed shells using non-destructive THz time-domain imaging systems. Yuan et al. [
14] dealt with the problem of sample imbalance within the spectral information of sunflower seed imperfections collected by the THz time-domain spectral detection technique. The authors expanded the dataset of imperfections using the SMOTE algorithm, and achieved an improved detection accuracy of 92.59% using the recognition model constructed by the least-squares support vector-based recognition model mechanism. Lei et al. [
15] acquired sunflower seed projections at different angles on a rotating stage by a custom-made THz time-domain imaging system, and reconstructed three-dimensional (3D) sunflower seed images using the inverse Randon transform. The two-dimensional (2D) and 3D fullness of sunflower seeds was calculated from the volume and area ratios of the whole seeds to seeds. However, as the THz imaging technique is affected by the system and environment noise during image acquisition, it may suffer from problems such as low resolution, unclear key image information and vague edge information, which affect the final sunflower seed detection and identification results.
In order to detect the sunflower seed quality more effectively, this paper introduces the efficient multi-scale attention (EMA) mechanism as a lightweight network, which ensures a high detection accuracy and eliminates the interference of redundant information. At the same time, the MV2 structure is improved, and the exponential linear unit (ELU) activation function is used to accelerate the convergence speed of the network. A non-destructive sunflower seed quality detection model known as MobileViT-E is proposed, which improves the detection accuracy by obtaining detailed features with the ability to discriminate subtle differences.
3. Results and Discussion
3.1. Model Training
In this experiment, the dataset consisted of THz images of normal, broken and deformed sunflower seeds in different frequency domains. The whole dataset contained 2340 THz transmission images of sunflower seeds, which were divided into training and test sets in the ratio of 8:2. The training batch was set to 32, the training period was 60 epochs, and the input image size was 256 × 256 pixels. To ensure the objectivity of model performance comparison, the AdamW optimizer was used in all models, the initial learning rate was set to 5 × 10−4, and the cross-entropy was used as the loss function. The classification models were constructed and trained using the Pytorch deep learning framework. The hardware used for model training and testing consisted of an Intel (R) Core (TM) i5-13400F (2.50 GHz) processor with Windows 11 operating system and an NVIDIA GeForce RTX 3060 graphics card. In terms of software configuration, Python version 3.8 and CUDA 11.8 framework were utilised to accelerate the computations. The training samples and their corresponding labels were used as inputs to the network for training and testing the image classification task, respectively.
3.2. Comparison Experiments
In order to further demonstrate the superiority of the MobileViT-E model for sunflower seed quality detection, several classical deep learning network models, namely ResNeT-50, EfficientNeT, MobileOne and MobileViT, were compared with the MobileViT-E model in this study.
Table 1 shows the training results: the MobileViT-E model demonstrated significant advantages in the classification of sunflower seeds. It achieved the highest accuracy, precision, recall and F1-score values of 96.30%, 96.35%, 96.30% and 96.31%, respectively. In comparison with the ResNet-50, EfficientNet, MobileOne and MobileViT models, the accuracy of the proposed model was improved by 4.85%, 3%, 7.84% and 1.86%, respectively. These results indicated that the proposed model was more capable of processing complex image features and could accurately identify nuances and details in sunflower seed features.
In comparison, the MobileViT model achieved the second-highest classification performance, with accuracy, precision, recall and F1-score values of 94.44%, 94.64%, 94.44% and 94.49%, respectively. The results of the EfficientNet model were similar to those of the MobileViT model, with accuracy, precision, recall and F1-score values of 93.3%, 94.1%, 93.3% and 93.2%, respectively. However, the MobileOne model exhibited relatively lower performance, with accuracy, precision, recall and F1-score values of 88.46%, 89.08%, 88.46% and 88.22%, respectively. These results might indicate the limitations in its image feature extraction.
Meanwhile, to show the prediction performance of the MobileViT-E model more clearly for different categories of sunflower seeds, the model with the highest accuracy on the test set was selected, and its confusion matrix is plotted in
Figure 9. It could be observed that the proposed model had the best prediction performance on the deformed grains, with an accuracy of 97.17%. Five samples were incorrectly predicted as normal grains, and two samples were incorrectly predicted as broken grains. The second-best prediction was on normal sunflower seeds, with an accuracy of 96.46%. In this case, one sample was incorrectly predicted as deformed grains, and six samples were incorrectly predicted as broken grains. Finally, the prediction performance for broken grains was the worst, with an accuracy of only 95.31%. There were 12 errors, with 10 samples predicted as normal grains and two samples predicted as deformed grains. This might be caused by the presence of noise and other disturbances in the images, which led to unclear features in some of the images and affected the classification performance of the model. Overall, the MobileViT-E model utilised the Transformer architecture to effectively capture global and local features when processing the sunflower seed images, achieving accurate classification of normal, broken and deformed grains.
3.3. Ablation Experiments
A series of ablation experiments were carried out to compare the performance differences under different model configurations. The actual contribution of each module to the model performance and the performance enhancement of the improved model compared to the baseline model MobileViT, as well as the MobileViT + EMA and MobileViT + ELU models, were verified.
Table 2 shows the performance comparison, where “√” indicates that the module was used in this network, and “×” indicates that the module was not used.
Table 2 shows that the inclusion of EMA and ELU modules in this study substantially improved the classification recognition accuracy, with the prediction accuracy increased from 94.44% to 96.30% in the baseline model, and the precision, recall and F1-score also significantly improved by 1.71%, 1.86% and 1.82%, respectively. When only the ELU module was used, the accuracy, precision, recall and F1-score values of the model showed minor improvements of 0.15%, 0.16%, 0.15% and 0.05%, respectively. When only the EMA module was introduced, the accuracy of the model was improved by 1.29% compared to the baseline model, and the precision, recall and F1-score values were also improved by 1.12%, 1.29% and 1.22%, respectively. Overall, these improvements were significant compared to the baseline model, but they were still lower than those exhibited by the MobileViT-E model. These results showed that the MobileViT-E model was the most effective in classifying normal, broken and deformed grains of sunflower seeds, and could achieve accurate and non-destructive detection of sunflower seed quality.
In this paper, the proposed MobileViT-E model combined the feature extraction capabilities of deep learning with the self-attention mechanism of Transformers, which demonstrated significant advantages in sunflower seed quality detection. Compared to four other image classification models, including ResNet-50, EfficientNet, MobileOne and MobileViT, the proposed model achieved higher accuracy in detecting normal, damaged and deformed sunflower seeds in THz images. Additionally, the introduction of the EMA mechanism and ELU activation function was further verified by ablation experiments to effectively improve the detection capability. However, the samples were affected by the internal noise of the system during the imaging process. This interference reduced the image details and caused the loss of image edge information. Although the model achieved a recognition accuracy of 96.30% on the sunflower seed dataset, its detection precision still required improvement. This was especially evident for damaged seeds, with an accuracy of only 95.31%. Therefore, future improvements should focus on the construction of network models with higher detection accuracy. Meanwhile, the data used in this study were limited to broken and deformed grains of sunflower seeds. In the future, the type and number of sunflower seed samples can be expanded to construct a wider dataset, and the algorithm can be applied to the detection of other agricultural features to improve the generalisation of the model.
4. Conclusions
In this study, we proposed a non-destructive inspection model MobileViT-E for sunflower seed quality. It was based on the MobileViT model and used the Transformer architecture to extract the multi-scale features of sunflower seed images. It acquired and analysed the subtle features of the images through the self-attention mechanism and global feature extraction. The EMA mechanism was introduced in the MobileViT block to further improve its performance. This optimised the model’s attention on the basic image features and reduced the interference from irrelevant information while retaining the necessary information from each channel to reduce the computational cost. Additionally, the ELU activation function was used in the MV2 structure to avoid the vanishing gradient problem, speed up network training, and improve the model’s generalisation ability. The experimental results showed that the MobileViT-E model improved the recognition accuracy by 4.85%, 3%, 7.84% and 1.86% compared with the ResNeT-50, EfficientNeT, MobileOne and MobileViT models, respectively. Thus, the proposed model could significantly improve the sunflower seed quality detection accuracy. However, the current study was mainly limited to the laboratory environment, and there is a need to further verify the model’s stability and generalisation ability in real complex environments. In the future, the MobileViT-E model can be improved and optimised in various ways. For example, a pre-trained model can be used for transfer learning to accelerate the model convergence and improve the detection accuracy. In addition, a richer and more extensive sample database can be constructed to achieve an efficient and accurate quantitative and qualitative analysis of the sunflower seed quality. The proposed model will not only enhance the standardisation of agricultural product safety testing, but also contribute to the application of machine learning and other advanced analytical techniques in the field of food quality control.