1. Introduction
As one of the three staple foods for humans, the quality of wheat affects national food safety, agricultural production development, and people’s living standards [
1]. Grains that are damaged but still useful are called unsound kernels, including injured, spotted, broken, sprouting, and moldy kernels. The proportion of unsound kernels is an important reference for measuring wheat quality. Therefore, in the flow of wheat, realizing the rapid, accurate, and non-destructive detection of unsound kernels is of great significance for the assessment of wheat quality.
Currently, there are five mainstream detection technologies for unsound kernels. Below, we enumerate the papers addressing each technology and summarize the current shortcomings.
The first technology is manual detection, which refers to picking and classifying the extracted samples one by one relying on human eyes and experience [
2]. On the one hand, it is time-consuming and labor-intensive. On the other hand, the detection accuracy is low because of subjective judgments.
The second technology is the use of the spectral information of unsound kernels for detection. Spectral features, image features, and fused spectral and image features of hyperspectral images of unsound wheat grains were extracted by [
3] and used as the input of a support vector machine (SVM) [
4]. The results showed that the fused spectral and image information was most conducive to classification. In [
5], hyperspectral images and high-resolution images of unsound wheat kernels were registered and fused as the input of a VGG [
6] network model. The average precision of the fused image was higher than that of the individual hyperspectral image and individual high-resolution image. Although this method could extract more information from images, the equipment for collecting spectral information is expensive.
The third technique is based on acoustic information for detection. Pearson [
7] used shock acoustics to detect worm-eaten kernels. The worm-eaten kernels were placed on a steel plate, and acoustic signals were generated when the particles hit the steel plate. Linear discriminant analysis (LDA) [
8] was used for feature extraction and classification, and the recognition accuracy rate was 84.4%. However, the method based on acoustic wave information relies too much on the sound propagation medium. Furthermore, it is greatly affected by noise, resulting in low detection accuracy.
The fourth technique is the use of traditional machine learning methods for detection. First, the features of the unsound wheat kernels are extracted manually, and then classifiers such as SVM and principal component analysis (PCA) [
9] are used for classification. However, features obtained through manual extraction are complex and inaccurate. Furthermore, this technique is ineffective for the segmentation of sticky kernels.
The fifth technique uses deep learning methods for detection. Many researchers [
10,
11,
12] have proposed various optimization algorithms for classification. However, these studies focused on simple classification tasks, without exploring further instances of segmentation algorithms. Furthermore, the datasets were artificially placed in advance, so it was difficult to satisfy the practical application requirements. At the same time, traditional segmentation methods are often used for classification tasks. For example, Shatadal [
13] achieved segmentation by corroding and expanding grains and then filling cavity areas. Siriwongkul [
14] obtained the separation end points based on the edge to draw the dividing line and achieve segmentation. However, the results of traditional segmentation are not good. Moreover, most semantic segmentation methods are invalid for wheat kernels with a dense distribution and a high degree of adhesion. This is because the algorithm recognizes different parts of a single target separately during the classification, which leads to multiple recognition results for one target. Take the famous U-Net algorithm as an example, whose segmentation results are shown in
Figure 1. It can be seen that the semantic segmentation produced a poor result, even for wheat kernels with a low adhesion degree.
Although the mask RCNN model is widely used in various fields because of its good performance and accuracy for large objects and objects with sparse distribution, the accuracy is relatively low in the fine-grained segmentation of small objects, such as unsound kernels with a dense distribution. The reason for this is that the characteristics of wheat grains, such as their oval shape and yellow-white skin color, are similar, which makes feature extraction networks less effective.
This paper proposes an instance segmentation algorithm for unsound wheat kernels based on an improved mask RCNN to address the aforementioned problems. On the one hand, it solves the problem of adhesion between densely distributed targets and realizes fast, accurate, and non-destructive detection, laying the foundation for the grading of wheat. On the other hand, the accuracy is higher, which provides ideas and inspiration for the application of mask RCNN in the fine-grained segmentation. The basic structure of the article is as follows:
Section 2 describes the dataset acquisition process, the improved mask RCNN network, and the evaluation metrics for object segmentation.
Section 3 evaluates the model.
Section 4 discusses the results, and
Section 5 provides a summary and a future outlook.
5. Conclusions
In this paper, a wheat unsound kernel detection model based on instance segmentation is proposed that could accurately and quickly recognize wheat unsound kernels. When compared to the original mask RCNN and mask scoring RCNN, ours improves accuracy by 28% and 19%, respectively. Instance segmentation estimates each pixel in turn, so it can overcome the problem of grain adhesion that cannot be solved by traditional segmentation. The following are some key conclusions:
- (1)
This model can solve the problem of multi-grain adhesion in dense wheat kernels. It is well known that if a classification network is used, traditional image segmentation methods such as concave segmentation and watershed are needed to solve the adhesion problem, which is not accurate.
- (2)
The mask RCNN network is improved based on the circularity characteristics, the difference between edge features and underlying features such as color and texture, to make it more efficient for multi-target and fine-grained target recognition. Mask RCNN is widely used in the segmentation of the foreground and background, the segmentation of a single class of targets, and the segmentation of the target with a small number of categories. The efficiency is good in the above respects, but for the fine-grained, multi-target segmentation, results are worse. Our model can well overcome the above problems.
However, there are still some drawbacks to our algorithm. On the one hand, the precision of recognition still needs to be improved. On the other hand, the number and quality scores of the unsound kernels need to be further statistically analyzed after the recognition. Therefore, the regression model of area and quality needs to be further constructed to achieve the final calculation of quality content so as to finally evaluate the grade. Finally, in the process of improving the detection accuracy of the perfect kernels, we did not obtain good feedback on increasing the datasets. So, it is necessary to further analyze the reasons for the phenomenon.