Next Article in Journal
Factors Influencing Informal Credit Access and Utilization among Smallholder Farmers: Insights from Mountainous Regions of Pakistan
Previous Article in Journal
An Investigation on a Comprehensive Calibration Technique to Determine the Discrete Elemental Characteristics of Unrotted Sheep Dung at Varying Water Concentrations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Study on Rice Origin and Quality Identification Based on Fluorescence Spectral Features

1
School of Physics, Changchun University of Science and Technology, Changchun 130022, China
2
Jilin Academy of Agricultural Sciences (Northeast Agricultural Research Center of China), Changchun 130000, China
*
Authors to whom correspondence should be addressed.
Agriculture 2024, 14(10), 1763; https://doi.org/10.3390/agriculture14101763
Submission received: 10 September 2024 / Revised: 1 October 2024 / Accepted: 4 October 2024 / Published: 6 October 2024
(This article belongs to the Section Digital Agriculture)

Abstract

:
The origin of agricultural products significantly influences their quality and safety. Fluorescence spectroscopy was used to analyse Japonica rice 830, grown in different areas of Jilin Province, by examining rice seed, brown rice, and rice flour from 12 origins. Fluorescence spectra were pre-processed through normalisation and smoothing to remove noise. These processed spectra were input into decision trees, support vector machines (SVMs), K-nearest neighbour (KNN), and neural network models for classification. The analysis revealed that the combined four models achieved an average classification accuracy of 98.05% with a computation time of 180 s, while the reduced-scale models improved accuracy to 98.36% and reduced computation time to 11.25 s. Additionally, prediction models using standard rice starch content values across different states achieved R² values over 0.8. This method provides a rapid, precise approach for assessing rice quality and origin, demonstrating significant potential for application in rice analysis.

1. Introduction

Rice is a crucial food source in many countries due to its rich nutritional value and adaptability. It is a major crop grown in various regions and climates [1]. There are significant differences in natural environmental conditions, such as altitude, soil type, and precipitation, among different origins. These differences directly affect the growth cycle, nutrient content, and quality characteristics of local rice. As an important food production base in the northeast of China, Jilin Province, its geographical location and environmental conditions have largely shaped the quality characteristics of rice from various origins. Jilin Province is part of a typical mid-to-high latitude, continental climate region [2]. The geographic location and environmental characteristics of different origins provide unique conditions for the quality of local rice. These factors play a key role in determining the nutritional value, taste, and adaptability of rice [3]. Against this background, the study of spectroscopic detection and identification of rice from different origins in Jilin Province is particularly important and necessary. Through the in-depth study of the spectral characteristics of rice from different origins, the best quality rice can be screened out, which can not only promote the modernisation and intelligent development of rice production in Jilin Province but also help to improve the quality and yield of rice and promote the sustainable development of the local agricultural economy [4].
Spectroscopic techniques offer great potential and advantages in tracing the origin of rice. They can provide extensive chemical information about rice samples, including data on composition and structure. These data can be used to build a spectral database of rice origin characteristics and combined with technologies such as geographic information systems (GIS) to achieve traceability management of rice from different origins. The rapid and non-destructive characteristics of spectroscopic technology can be tested without damaging the samples, greatly improving testing efficiency and sample utilisation. At the same time, spectroscopic technology also has a high detection sensitivity for trace components and can find trace compounds in rice, which helps distinguish between rice of different origins and provides a rich database for establishing a complex rice origin traceability model. Combining spectroscopic techniques with chemometrics, statistics and other methods can improve the accuracy and reliability of traceability.
Spectral data fusion technology is a method that combines spectral information from different sources, processes the data using chemometric methods, and constructs detection models with varying levels of fusion to reflect the sample information more comprehensively. According to different fusion levels, spectral data fusion techniques are categorized into three fusion strategies: data layer fusion strategy, feature layer fusion strategy, and decision layer fusion strategy [5]. Fluorescence spectral data are usually obtained by using different wavelengths of excitation light sources, and each wavelength of excitation light source can cause the sample to emit a specific wavelength of fluorescence spectra, so the selection of direct fusion avoids the loss of information that may occur during the data processing process, ensures the completeness and accuracy of the original data, and improves the understanding and analysis of the sample [6]. The fluorescence spectroscopy detection method has a wide range of applications in biomedicine [7,8], environmental monitoring [9,10,11], agriculture [12], etc. It can help researchers to better understand the nature and characteristics of samples, and thus support the related scientific research and application practice.
At present, the application of spectroscopic techniques in rice origin traceability research has made remarkable progress. Studies at home and abroad have shown that the use of spectroscopic techniques combined with chemometric methods can effectively distinguish the characteristics of different rice origins. In recent years, researchers have successfully established a spectral fingerprint library of rice origin [13,14] by analysing the spectral data of rice leaves [15,16], soil [17], and water [18], and have used different spectral techniques, such as near-infrared spectroscopy (NIRS) [19,20], Raman spectroscopy [21], fluorescence spectroscopy [22], and hyper-spectroscopy [23], etc., to carry out traceability research on rice origin.
These studies provide technical support for the geographic information traceability of rice origin and new ideas for rice quality, safety control, and brand protection. Meanwhile, domestic and foreign methods such as machine learning and artificial intelligence have been combined, based on spectroscopic technology, to further improve the accuracy and efficiency of rice origin traceability, which provides important technical support for guaranteeing food security and agricultural product quality.
As shown in Figure 1, in this paper, fluorescence spectroscopy technology was used to spectroscopically detect three different states of rice from twelve different origins in Jilin Province, and the three states of rice were individually analysed using three spectral modelling analysis methods; then the spectra of the three states of rice were fused to carry out spectral modelling analysis; and finally, the standard values of rice constituent material content of different origins were used to predict the constituent contents of rice from different origins; this provides a good basis for the quality and origin identification of rice and represents a new fast and accurate analysis method with broad application prospects.

2. Materials and Methods

2.1. Sample Sources

The samples for this experiment were selected from Japonica rice 830 of the Rice Research Institute of the Jilin Academy of Agricultural Sciences in 2023 and originated from twelve cities in Jilin Province: Songyuan (SY), Changyi District of Jilin City (CY), Hunchun (HC), Gongzhuling (GZL), Tao’er River (TEH), Qianguo (QG), Yushu (YS), Da’an (DA), Huinan (HN), Meihekou (MHK), Yanji (YJ), and Zhenlai (ZL). These origins include the major rice-producing areas in the northern, central, and southern parts of Jilin Province. The fluorescence spectra of rice seeds, brown rice, and rice flour were detected and analysed separately. The location distribution of the 12 different origins of rice is shown in Figure 2, and their geographic location information is shown in Table 1.
Before collecting fluorescence spectroscopy data, the samples undergo manual handling and screening. Basic cleaning is performed on the samples to ensure that the sample surfaces are free of dust and impurities, preventing them from affecting the accuracy of the data collection process. The manual screening was conducted to select full, well-proportioned grains of moderate size, free of pests, moulds, and mechanical damage in the rice seed samples for preservation. The selected rice seed samples of different origins were sealed in food-grade plastic bags, labelled with their corresponding origin, and placed in a refrigerator at 4 °C for cold storage to ensure that all samples were not affected by spoilage. The samples were removed and placed at room temperature for 3 h before the fluorescence spectra of the rice samples were collected.
The brown rice was obtained by hulling the rice seeds using a huller, and after hulling, any deteriorated and defective brown rice was sieved out, and the brown rice with intact and undamaged grains was selected for the subsequent experiments. For the preparation of rice flour, a grinder was used for grinding, and the grinding time was set to 40 s. This choice was made to ensure that the grinding time would place the state of the brown rice flour at the upper limit of what could be ground out, avoiding errors caused by inconsistencies in the size of the ground particles. Also, 40 s was chosen as the grinding time to avoid excessive heat generation due to a long grinding time. After grinding, the brown rice flour was passed through a standard split sample sieve and a 100-mesh sieve (the standard 100-mesh sieve is measured by the GB/T 6003.1-2012 [24] standard) and then placed into a mill bottle for preservation. Before experimenting, each type of brown rice was sampled and milled more than three times in the process, while the brown rice flour of the same origin was kept in the same sample bag. The three-state rice samples are shown in Figure 3.
The main components of rice are starch, protein, etc., which show different vibrational spectral information due to their chemical composition, content, and structure. In this study, we purchased some chemical reagents for spectroscopic analysis to study the difference in the content of rice components from different origins, which were purchased from Shanghai Aladdin Biochemical Science and Technology Co., Ltd.in China, including riboflavin (CAS No. 83-88-5), alkaline lignin (CAS No. 8068-05-1), zein (CAS No. 9010-05-1), and amylopectin from maize (CAS No. 9037-22-3).

2.2. Spectral Acquisition Test System

To accurately measure the spectral information of rice and chemical reagents, it was necessary to prepare the experimental equipment using an MTO-Laser with an excitation wavelength of 405 nm for the laser light source (product power of 50 mW, operating current of 60 mA), an Optosky fibre optic spectrometer ATP2400 purchased from Aopu Tiancheng Photoelectric CO., LTD. in China, Xiamen(detection range of 350–800 nm, a slit of 50 nm, and a resolution of 1.5 nm), and an optical fibre (Shanghai Wenyi Optoelectronics Co., Ltd. in China, model UV600-1.0, core diameter of 600 µm, and light transmission range of up to 200–1100 nm); the instrument connections and the spectral signal acquisition system is shown in Figure 3. As can be seen in this figure, the collected information on the sample was placed in the sample tank with the light source hitting the sample at a 45° angle of reflection to the fibre optic probe, and the probe is connected to the spectrometer while the spectrometer is connected to the other end of the computer experimental device. Figure 4 is the fluorescence spectroscopy experimental system schematic diagram.
In the experiments, the sample tank was filled with rice samples to ensure that the light source directly illuminated the sample surface when detecting rice seeds and brown rice and thus ensure that the thickness of the samples was consistent when measuring rice flour and chemical reagents. The surface area of the cuvette was about 6 cm2, but the number of rice grains detected cannot be guaranteed to be the same each time due to different grain sizes. The surface of the cuvette was able to detect about 50 grains of rice seeds; the spot area was approximately elliptic, at about 5 mm2; and the spot irradiation range was able to detect about 5–7 rice samples. After switching on the instrument system, the light source warmed up and the spectrometer was in a stable working condition. The average number of spectrometers was set to 1, the integration time was set to 20 s, and the sampling interval was 2 ms. During the acquisition, the incident angle and reflection angle were at 45° when the sample surface was flat. The data were collected by the software that accompanied the Optosky fibre optic spectrometer. Using the high-speed scanning function of the software, the sample position was changed by uniformly fine-tuning the displacement platform to change the measurement point during detection. After rejecting the invalid spectra, 1000 pieces of data were saved for each group.

2.3. Machine Learning Algorithms

2.3.1. Decision Tree

The decision tree is a powerful machine-learning algorithm commonly used in regression and classification problems. It achieves the prediction and classification of data by building multiple decision trees and combining them into a “forest” [25]. Each decision tree is trained by randomly selecting samples and features, and this randomness helps to reduce the risk of overfitting and improves the generalisation ability of the model. When making a prediction, the decision tree combines the results of all the decision trees to obtain the final prediction. One of the advantages of decision trees is that they can handle many input variables without much data preprocessing. Another advantage is that since the training of each decision tree is independent of each other, decision trees can be trained and predicted in parallel, thus speeding up the computation [26].

2.3.2. Support Vector Machines

The support vector machine (SVM) classification algorithm is a classification method based on statistical learning and is a in class of supervised learning that utilizes the binary classification of the data of generalized linear classifiers; its decision boundary is the maximum margin hyperplane for solving the learning samples, and it achieves the classification of the SVM mainly through the search for the classification of various types of samples between the hyperplanes [27]. The implementation of SVM is mainly divided into two steps: the first is to select the kernel function, and the second is to test the kernel function and select the optimal parameters. The kernel function maps the data from the original space to the feature space, and depending on the chosen kernel function, it can take a variety of forms, thus providing SVMs with the ability to handle linear and nonlinear classification. The kernel can be viewed as a mapping of nonlinear data to a high-dimensional feature space while providing computational shortcuts by allowing linear algorithms to be used with high-dimensional feature spaces.

2.3.3. K-Nearest Neighbour

The K-nearest neighbour (KNN) is a classical machine learning algorithm for classification and regression tasks. Its core idea is instance-based learning, i.e., prediction by comparison with nearest neighbours. In a classification problem, when given a new unlabelled data point, the KNN algorithm determines the K-nearest neighbours in the training set based on distance. It then votes on the labels of these neighbours and uses the most frequently occurring category as a prediction [28].
The key steps in the KNN algorithm include determining the number of neighbours K, calculating the distance between the new data point and each data point in the training set (usually using a distance metric such as Euclidean distance or Manhattan distance), finding the K nearest neighbours, and making predictions. Choosing a smaller value of K may result in overfitting the model and choosing a larger value of K may result in underfitting the model. In addition, the KNN algorithm has a higher computational complexity when dealing with large datasets and high-dimensional feature spaces because it needs to compute the distance between the new data point and all the training data points.

2.3.4. Neural Networks

A neural network (NN) is a computational model consisting of multiple neurons that mimics the structure and function of the human nervous system. It automatically discovers patterns and regularities in data by learning large amounts of data to carry out tasks such as classification, regression, and clustering. A neural network consists of an input layer, a hidden layer, and an output layer, where each neuron is connected to each neuron in the next layer, and the strength of the connections is regulated by weights. As data are propagated through the network, each neuron weights and sums the inputs and performs a nonlinear transformation through an activation function, which is then passed to the next layer of neurons. The connection weights are adjusted by a back-propagation algorithm to make the network’s predictions as close as possible to the true labels, and a gradient descent optimiser is used to minimise the loss function to enable model training. Neural networks have powerful expressive capabilities and are suitable for processing various types of data, including images, text, and speech [29].

2.4. Principal Component Analysis

Since the fluorescence spectral data of rice from different origins have large dimensions, PCA principal component analysis was used to reduce the dimensionality of the spectra. Principal component analysis (PCA) is a commonly used multivariate statistical analysis method, which aims to transform high-dimensional data into low-dimensional data through linear transformation while retaining the main information in the data. The core idea of PCA is to search for the most important features or principal components of the data to realise the downscaling and simplification of the data. In PCA, by calculating the covariance matrix of the data and its eigenvalues and eigenvectors, it is possible to determine the principal components of the data, i.e., the directions of the data with the highest variance. These principal components are linear combinations in the original data, which can maximise the preservation of the information in the original data, thus realizing a dimensionality reduction in the data [30]. Through PCA, patterns, structures, and correlations in the data can be discovered, helping us to understand the intrinsic characteristics of the data better; at the same time, the dimensionality of the data can be reduced, simplifying the process of data analysis and facilitating the subsequent modelling and visualization analysis. Several independent variables were obtained to replace the original variables to make them reflect as much information as possible, and the whole amount of data was decomposed into a loading matrix and a scoring matrix. The principal components of the score matrix (PC1, PC2, PC3) are projected into the 3D coordinate system. Classify and differentiate the pattern points according to their distribution in the three-dimensional coordinate system.
Since PCA can reduce the redundant information of data and highlight the main features of the data, it has a wide range of applications in the fields of data mining, pattern recognition, image processing, and so on.

2.5. Gaussian Process Regression

Regression analysis is a statistical method used to study the relationship between independent variables (explanatory variables) and dependent variables (response variables). In regression analysis, we try to find a mathematical model that describes how the independent variable affects the dependent variable to make predictions or inferences. Gaussian process regression (GPR) is a probability-based nonparametric regression method mainly used for prediction and uncertainty estimation [31]. Unlike traditional regression models, GPR does not assume a specific functional form of the data but rather describes the distribution of functions in the input space through Gaussian processes. To train the GPR model, the covariance matrix is constructed using the training data and the observed data are assumed to be affected by Gaussian noise that is independently and identically distributed. By maximising the log-likelihood function, the model parameters can be optimised. For prediction, the Gaussian distribution of the predicted values, including the predicted mean and uncertainty range, can be obtained by calculating the covariance between the new input points and the training data.
Starch content is one of the important indicators of rice quality, which directly affects the processing quality and market value of rice. With the Gaussian process regression model, we can build a prediction model using the relevant data characteristics of each origin. This model can help to understand the relationship between starch content and various potential influencing factors, thus providing a scientific basis and predictive power for agricultural production decisions.

2.6. Data Processing

Common pre-processing methods for spectral information include data smoothing, standard normal transformation, and multiple scattering correction. Through these methods, the light scattering effect due to the physical properties of the sample (i.e., hardness, particle, size, etc.) can be eliminated. The derivative derivation method is also commonly used for spectral data preprocessing, by which the spectral baseline drift can be suppressed and more details in the spectral data can be revealed. In addition, irrelevant information in the image can also be eliminated through image preprocessing, improving the quality of the data and simplifying follow-up work.
When using spectral data for the quantitative analysis of sample physicochemical values, it is often necessary to evaluate the performance of the model with the help of some statistical parameters [32]. In this paper, three statistical parameters were chosen: the decision factor, the calibrate the root mean square error and the root mean square error of prediction
The closer the R2 of the model is to 1, the higher the degree of explanation of the dependent variable by the independent variables used in building the model and the better the fit of the model. The smaller the RMSEC and RMSEP are, the better, as a smaller RMSEC indicates that the regressivity of the built model is better, that the sample relevance of the calibration set is better, and that the overall degree of deviation from the model is smaller; the smaller the RMSEP is, the smaller the error between the actual value of the samples and the prediction value. The smaller the RMSEP is, the smaller the error between the actual value of the sample and the predicted value; the stronger the predictive ability of the constructed model is, the closer the two are and the more stable the model is.

3. Results

3.1. Spectral Data

We ensured that the experimental conditions were stable and standardised before proceeding with spectral data collection. The influence of the surrounding environment on the experiment was eliminated to ensure that the light source was stable, the instrument was in proper working condition, and the rice samples were properly handled and loaded. The experimental system was constructed to collect the spectra of three different states of rice and 1000 pieces of data were measured for each origin to take the average value as the spectral representative data of the origin. The abnormal data were removed by deblurring and smoothing to eliminate the influence of error on the subsequent calculation and analysis. The results shown in Figure 5 below were obtained, with the waveband intercepted in the range of 433~800 nm.
The spectra show that rice from different origins maintains high similarity in spectral expression while still displaying significant differences in some specific wavelength ranges. These differences reveal that there may be subtle differences in the structure and content of some chemical components in rice from different origins. After pre-processing and feature extraction, the fluorescence spectral feature information of rice was obtained. The main components of rice, such as starch and protein, would exhibit different vibrational spectral information due to the differences in their chemical composition, content, and structure. The characteristic peaks on the fluorescence spectra of the rice samples during this process from dehulling treatment to milling treatment are fairly obvious, and it can be seen that there are obvious fluorescence characteristic peaks in the wavelength ranges of 475~525 nm, 550~600 nm, and 650~690 nm. The spectral data of the pure compounds of some of the main components of rice, with a wavelength intercept range of 450~750 nm, are shown in Figure 6.
A comparative analysis of Figure 5 and Figure 6 shows that the intensity of the characteristic peaks of the fluorescence spectra of rice seeds, brown rice, and rice flour in the range of 475~525 nm band is the maximum of their spectral lines, and the combination of the analysed pure spectra shows that the maize protein content accounts for a much higher percentage of the total protein content in this range, as well as some small amounts of riboflavin and alkaline lignin, which are not affected by the state of the rice. In addition, the results of spectral fingerprinting align with previous studies confirming the presence of nicotinamide adenine nucleotide purine (NADPH) in rice in this wavelength range [22]. In the spectral range of 550–600 nm, the fluorescence spectral peak intensity of rice seeds is high, and compared to rice seeds, the peak intensity of brown rice and rice flour is relatively weak, indicating that there are some nutrients that are not on the hulls; when the rice seeds are dehulled or milled to process off the hulls, the nutrients are exposed and thus captured by the spectra, and in combination with the analysed pure spectra, it can be seen that the content of riboflavin is higher than that of maize proteins in this wavelength range and that of the content of riboflavin in brown rice. In the fluorescence spectral range of 650~690 nm, the fluorescence spectra of rice seeds show almost no characteristic peaks, and it can be inferred that there are no nutrients in this band on the husks of the rice seeds, whereas brown rice and rice flour have obvious characteristic peaks; the characteristic peaks of the rice flour are stronger than those of the brown rice, and the combination of the pure spectra indicates that the absorption peaks on this band are mainly those of riboflavin. According to the absorption peak, compared with brown rice, rice flour contains more riboflavin. The different content of components in different bands provides a strong spectral basis for distinguishing rice from different sources.

3.2. Classification Results of Different States of Rice according to Different Machine Learning Algorithms

Through the comparison of the spectral data, it can be observed that spectra of the compositions content are slightly different in terms of the origins of the rice, and we can distinguish between the different origins of rice; combined with machine learning algorithms, we will achieve a more intuitive classification effect. Table 2, Table 3 and Table 4, respectively, show the three different states of rice, namely rice seeds, brown rice, and rice flour, which were classified using the four machine learning algorithms, namely the decision tree, SVM, KNN, and neural network, for identification. To ensure the authenticity of the accuracies, each classification algorithm calculated the classification accuracy of the rice in each state an average of five times and averaging the outcomes helps to mitigate the impact of any anomalies that might arise from a single computation, thus providing a more robust representation of the model performance. The following results were obtained.
Table 5 shows the classification accuracies of the three states of rice samples according to four machine learning algorithms, and it can be seen by comparison that the classification accuracy of the rice seeds is the lowest, and the classification accuracy of the rice samples according to the four machine learning algorithms in the hulling and then milling processes increases with the improvement of the algorithm’s decision-making degree capability in the sample, and the classification accuracies of rice in the state of rice flour using SVM and neural network algorithms are already as high as 99.7% and 99.8%, which already distinguish the origins of the rice well. However, due to the different origins of rice in different states, the classification results are still more obvious; for example, rice seeds in the decision tree classification algorithm have an accuracy classification of only 69.64%, but the rice flour in the decision tree classification algorithm has a classification accuracy of 95.2%. This may be due to the distribution of some nutrients in the rice. In terms of the different origins of the different rice seeds, the rice hulls may have less nutrient content and thus the spectral detection of differentiation is not obvious; therefore the classification accuracy of the minimum classification rate is lower. The rice hulls of the rice seeds contain fewer nutrients, which are not clearly distinguished by spectral detection, thus resulting in the lowest classification accuracy, while the nutrients of the rice seeds after hulling and milling are exposed and can therefore be more clearly detected, thus improving the classification accuracy. In terms of time, the average computational speed of rice in different states varies slightly due to the different characteristics of the model itself, but on the whole, it remains between 30 and 60 s, which belongs to the fast model category.

3.3. Traceability of Origin Based on the Spectral Fusion Data of Rice through a 405 nm Excitation Light Source

3.3.1. Integration

Since the classification accuracy of rice in different states varies, it was less easy to classify rice origin, so we chose to fuse the spectral data of rice in different states together to classify and identify them again in order to minimise the difference in the calculated classification accuracy caused by data measured in different states and to make the results of the rice origin classification in this study more convincing. The fluorescence spectra were collected using a 405 nm excitation light source, and different samples emitted different fluorescence spectra. The direct fusion method was used to avoid the possible loss of information in the data processing process to ensure the completeness and accuracy of the original data and to improve the understanding and analysis of the samples. The classification results of the fused data according to different machine learning algorithms are shown in Table 6 below.
It can be seen in the classification results that although the fusion of the different states was improved, in comparison to the decision tree classification accuracy, the classification accuracies obtained by most of the classification algorithms are slightly lower in the classification accuracy of rice flour. However, in comparison to the classification accuracy of rice seeds, brown rice was improved, and the combination of classification accuracies of different origins obtained by different rice states renders the overall classification of the origins more representative and convincing. However, due to the huge amount of fused data, the classification work time will be greatly extended, so that the classification process of this study no longer has the advantage of speed; so, the fused data for the dimensionality should be reduced in order to extract some of the feature points to ensure accurate classification and at the same time reduce the work time. To achieve this, principal component analysis will be used on the fused data processing.

3.3.2. Feature Wavelength Selection Based on Principal Component Analysis (PCA)

The fused 12,000 fluorescence spectral data with 4758 feature points were subjected to principal component analysis, and six principal components were extracted, of which the contribution of the first principal component was 38.0%, the contribution of the second principal component was 35.2%, the contribution of the third principal component was 12.2%, the contribution of the fourth principal component was 9.3%, the contribution of the fifth principal component was 3.2%, and the contribution rate of the sixth principal component is 1%; the cumulative contribution rate of the six principal components is 98.9%. This indicates that most of the variables can be explained by these six principal components, and the PCA analysis can reflect the overall information of the samples. Figure 7 shows the results of the distribution of the rice eigenvectors of the first three components extracted through PCA features.
Figure 8 shows the loading diagram of the six main components in the principal component analysis (PCA) after feature point extraction. A total of 50 representative feature points that responded to the overall information were selected from 4758 feature points, illustrating that the pattern recognition method has an efficient feature extraction and data analysis capability, which can reveal the hidden information in the fingerprint data. However, from the point of view of each load vector, the feature bands are more complex, the feature dimensions are still large, and the features obtained by PCA feature extraction cannot directly provide accurate origin recognition for rice of different origins, so further research is needed on the combination of appropriate classification algorithms.
Similarly, the extracted 50 feature points were combined with four classification algorithms to establish the classification model for origin traceability identification, and the calculation results are shown in Table 7 below. The computational speed of the model is also greatly improved when the calculation is performed again, which also ensures a high accuracy rate.
After all the models were classified and computed, the classification accuracies of different states of rice were compared with the accuracies after fusion, followed by principal component analysis to extract the feature points for classification computation to obtain the contents of Table 8, as shown below.
In the rice seed state, the spectral data combined with the four classification algorithms modelling analysis obtained the lowest accuracy, and when the rice state changed from rice seeds to brown rice via the hulling treatment, the accuracy was improved, indicating that the surface nutrient content of the rice hulls is less; when the rice hulls were removed, the presence of nutrients on the surface of the brown rice was detected. In the rice flour state, the highest accuracy of the model classification was due to the polishing of brown rice leaving the internal nutrients more fully exposed, and thus the nutrient contents in rice of different origins can be more clearly distinguished. However, as the classification model classification accuracy of the different states of rice varies greatly, we cannot fully explain the differentiation effects that the rice origins have on the fusion of the three states of the data modelling classifications, and for the extreme differences in the compromise processing, the accuracy obtained for the origin differentiation effect has a certain persuasive power, but the drawbacks are that the data fusion is very large and that the process of modelling and analysis are particularly time-consuming, although the accuracy is improved; however, this does not match the advantages of the spectral detection technology, so principal component analysis should be used on the fusion of the data after the downgrading process and the six main components of the loading weight map should be extracted. The 50 extracted feature points are able to retain most of the main information of the sample spectral data, and the data of these 50 feature points are used for classification modelling analysis, which allows less time to obtain a higher accuracy than the fusion of modelling analysis, proving that the analytical model in the rapid identification of the origin of rice has application value.

3.3.3. Prediction of Rice Starch Content Based on Linear Regression Model for Different Origins in Jilin Province

According to the GB/T 27404-2008 standard [33], ten different origins of rice were tested physically and chemically, and 100 standard values of starch content were obtained for each origin. A Gaussian regression model process was used to predict the content, and the results shown below were obtained; 80% of the data comprised the training set and 20% comprised the test set. The graphs of the linear regression model are shown in Figure 9 and Figure 10 below, and the results of the regression prediction based on the standard values of amylose content from rice of different origins are shown in Table 9.
From the above table, it can be seen that the R2 of the model is greater than 0.8, which demonstrates that the independent variables used to establish this model have a higher degree of explanation of the dependent variable, and the model has a better fit; the RMSEC and RMSEP are smaller, which proves that the model has better regression, that the sample relevance of the correction set is better, and that the overall degree of deviation from the model is smaller; the error between the actual value of the sample and the predicted value is smaller, and the predictive ability of the model is strong. RMSEC and RMSEP are also very close to each other, which proves that the prediction model can better predict the starch content of rice with different origins.

4. Conclusions and Prospects

4.1. Conclusions

In this paper, Japonica rice 830 planted in the same year in different areas of the Jilin Province was selected as a sample, and the spectral detection of three different states of rice from 12 different origins in the Jilin Province was carried out through the use of fluorescence spectroscopy, and the spectral data obtained after a series of pre-processing were used as inputs for the prediction model of the decision tree, SVM, KNN, and neural network. It was observed that the accuracy of the model increased as the decision-making degree of the algorithms improved in the samples. The conclusion was reached that the accuracy of the models increased with the increase in the algorithm’s decision-making degree in the samples. The accuracy data of the three different states of rice, with large differences in differentiation, were fused to reduce these differences, and although the accuracy was improved, the model computation time was greatly increased, so it was downscaled using principal component analysis; the first six principal components, which accounted for 98.9% of the overall model, were interpreted, from which 50 feature points were extracted for modelling and analysis, which greatly reduced the model computation time while obtaining a higher accuracy. The model calculation time was greatly reduced while higher accuracy was obtained. Finally, using the standard value of rice starch content from different origins, the Gaussian process regression model was used to regress and predict the starch content of rice in different states and from different origins, and the R2 of the model was greater than 0.8, while the RMSEC and RMSEP were smaller, which proved that this model was very stable, represented a new fast and accurate analysis method for the identification of rice quality and origin, and has a wide range of application prospects. Taking these factors together, the application of spectroscopic technology in the field of rice origin traceability is promising, and it will provide more scientific and accurate technical support for quality control and origin traceability in the rice industry chain.

4.2. Prospects

Models used in this study, such as decision tree, support vector machine (SVM), K-nearest neighbour (KNN), and neural networks, performed well in rice spectral detection and quality prediction but still have some limitations. The diversity and quality of data are crucial for model stability and accuracy. Insufficiently comprehensive training data or a lack of representativeness in the data from certain sources can limit the predictive effectiveness of the model. Therefore, variables such as different growing conditions, soil types, and climatic factors should be considered during the data collection phase to improve the robustness of the model.
In the future, the application of spectral technology in rice quality identification and source tracing is a promising prospect. With the development of machine learning and deep learning technologies, combined with more advanced algorithms such as convolutional neural networks (CNNs) and integrated learning methods, it is expected that the accuracy and adaptability of the model will be further improved. Meanwhile, advances in sensor technology make real-time, online spectral detection possible, providing a more accurate and efficient solution for quality control in the rice industry chain.

Author Contributions

Conceptualization, C.L. (Chunyu Liu); methodology, Z.L.; formal analysis, C.L. (Changming Li); investigation, X.M. and Z.M.; data curation, Y.Z.; data collection, Y.Q.; writing—original draft preparation, Y.Q.; supervision, Y.T. and X.T.; final revision of the manuscript: Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2023YFD1701604, 2023YFD1701600), the Jilin Province Science and Technology Development Plan Project (20230101187JC), the Scientific Research Project of the Jilin Provincial Department of Education (JJKH20241646KJ), and the Jilin Science and Technology Development Programme Project (20240304191SF).

Data Availability Statement

The original contributions presented in this study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no competing interests.

References

  1. Miao, X.X.; Miao, Y.; Liu, Y.; Tao, S.H.; Zheng, H.B.; Wang, J.M.; Wang, W.Q.; Tang, Q.Y. Measurement of nitrogen content in rice plant using near-infrared spectroscopy combined with different PLS algorithms. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2023, 284, 121733. [Google Scholar] [CrossRef] [PubMed]
  2. Hua, Y.J. Analysis of the role mechanism of rice industrial policy on sustainable agricultural development. North. Rice 2024, 54, 73–75. [Google Scholar]
  3. Xu, C.C.; Ji, L.; Chen, Z.D.; Fang, F.P. Analysis of China’s rice industry situation in 2023 and outlook for 2024. China Rice 2024, 30, 1–4. [Google Scholar]
  4. Xu, H.; Liu, S. Research on agri-environmental technology efficiency—Take Jilin Province in China as an example. Heliyon 2024, 10, e25879. [Google Scholar] [CrossRef] [PubMed]
  5. Azcarate, S.M.; Ríos-Reina, R.; Amigo, J.M.; Goicoechea, H.C. Data handling in data fusion: Methodologies and applications. Trac-Trends Anal. Chem. 2021, 143, 116355. [Google Scholar] [CrossRef]
  6. Sikorska, E.; Igor, K.; Marek, S. Fluorescence spectroscopy and imaging instruments for food quality evaluation. In Evaluation Technologies for Food Quality; Woodhead Publishing: Sawston, UK, 2019; pp. 491–533. [Google Scholar]
  7. Garry, V.P.; Natalya, D.P.; Ekaterina, N.G.; Nikolay, N.P.; Michael, M.G. Autofluorescence spectroscopy in photodynamic therapy for skin rejuvenation: A theranostic approach in aesthetic medicine. Photodiagnosis Photodyn. Ther. 2024, 45, 103948. [Google Scholar]
  8. Aamir, S.; Gottfried, K.; Martin, K.; Erwin, G.; Martin, P.; Michael, E. Emerging applications of fluorescence spectroscopy in the medical microbiology field. J. Transl. Med. 2009, 7, 99. [Google Scholar]
  9. Duan, Z.; Li, Y.; Wang, X.; Wang, J.L.; Mikkel, B.; Zhao, G.Y.; Sune, S. Drone-based fluorescence lidar systems for vegetation and marine environment monitoring, EPJ Web of Conferences. EDP Sci. 2020, 237, 07013. [Google Scholar]
  10. Shen, J.; Deng, S.B.; Wu, J. Identifying pollution sources in surface water using a fluorescence fingerprint technique in an analytical chemistry laboratory experiment for advanced undergraduates. J. Chem. Educ. 2021, 99, 932–940. [Google Scholar] [CrossRef]
  11. Li, Y.Y.; Hu, J.; Li, C.H.; Hou, X.D. Magnetic Covalent Organic Framework for Efficient Solid-Phase Extraction of Uranium for on-Site Determination by Portable X-ray Fluorescence Spectrometry. Anal. Chem. 2024, 96, 5757–5762. [Google Scholar] [CrossRef]
  12. Tatiana, A.M.; Ruslan, M.S.; Alexander, V.S.; Maxim, E.A.; Dmitriy, E.B.; Vasily, N.L.; Pavel, A.S.; Mikhail, Y.G.; Sergey, M.P.; Narek, O.C.; et al. Using fluorescence spectroscopy to detect rot in fruit and vegetable crops. Appl. Sci. 2022, 12, 3391. [Google Scholar] [CrossRef]
  13. Lapcharoensuk, R.; Moul, C. Geographical origin identification of Khao Dawk Mali 105 rice using a combination of FT-NIR spectroscopy and machine learning algorithm. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2024, 318, 124480. [Google Scholar] [CrossRef]
  14. Wang, B.S.; Lu, A.; Yu, L. A multi-kernel channel attention combined with convolutional neural network to identify spectral information for tracing the origins of rice samples. Anal. Methods 2023, 15, 179–186. [Google Scholar] [CrossRef] [PubMed]
  15. Yu, Y.; Yu, H.Y.; Li, X.K.; Zhang, L.; Sui, Y.Y. Prediction of Potassium Content in Rice Leaves Based on Spectral Features and Random Forest. Agronomy 2023, 13, 2337. [Google Scholar] [CrossRef]
  16. Székely, Á.; Szalóki, T.; Jancsó, M.; Pauk, J.; Lantos, C. Temporal Changes of Leaf Spectral Properties and Rapid Chlorophyll—A Fluorescence under Natural Cold Stress in Seed Ricelings. Plants 2023, 12, 2415. [Google Scholar] [CrossRef] [PubMed]
  17. Li, P.; Yan, Y.; Li, C.; Tang, W.G.; Xiong, Z.Q.; Tian, Y.H.; Zhou, K.; Yi, Z.X.; Zheng, Z.Y.; Rang, Z.W.; et al. Response of rice growth to soil microorganisms and soil properties in different soil type. Agron. J. 2023, 115, 197–207. [Google Scholar] [CrossRef]
  18. Cristina, M.; Jelena, M.; Eleonora, M.; Roumiana, T.; Paolo, O. Analysing the water spectral pattern by near-infrared spectroscopy and chemometrics as a dynamic multidimensional biomarker in preservation: Rice germ storage monitoring. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 265, 120396. [Google Scholar]
  19. Huang, F.P.; Peng, Y.M.; Li, L.H.; Ye, S.T.; Hong, S.Y. Near-infrared spectroscopy combined with machine learning methods for distinguishment of the storage years of rice. Infrared Phys. Technol. 2023, 133, 104835. [Google Scholar] [CrossRef]
  20. Yang, S.; Wang, Z.M.; Zhang, H.Q.; Song, W.L. Rice Variety Classification Based on Optimized Near-Infrared Spectral Classification Mode. Rice Sci. 2024, 31, 6–9. [Google Scholar] [CrossRef] [PubMed]
  21. Pezzotti, G.; Zhu, W.; Chikaguchi, H.; Marin, E.; Masumura, T.; Sato, Y.I.; Nakazaki, T. Raman spectroscopic analysis of polysaccharides in popular Japanese rice cultivars. Food Chem. 2021, 354, 129434. [Google Scholar] [CrossRef] [PubMed]
  22. Li, C.M.; Tan, Y.; Liu, C.Y.; Guo, W.J. Rice Origin Tracing Technology Based on Fluorescence Spectroscopy and Stoichiometry. Sensors 2024, 24, 2994. [Google Scholar] [CrossRef] [PubMed]
  23. Shi, Y.; Liu, M.; Sun, A.; Liu, J.J.; Men, H. A data fusion method of electronic nose and hyperspectral to identify the origin of rice. Sens. Actuators A Phys. 2021, 332, 113184. [Google Scholar] [CrossRef]
  24. GB/T 6003.1-2012; Determination of Tensile Properties of Film Materials Part 1: Methods. China Standard Press: Beijing, China, 2012.
  25. Bollwein, F.; Dahmen, M.; Westphal, S. A branch & bound algorithm to determine optimal cross-splits for decision tree induction. Ann. Math. Artif. Intell. 2020, 88, 291–311. [Google Scholar]
  26. Magana-Mora, A.; Bajic, V.B. OmniGA: Optimized omnivariate decision trees for generalizable classification models. Sci. Rep. 2017, 7, 3898. [Google Scholar] [CrossRef] [PubMed]
  27. Valkenborg, D.; Rousseau, A.J.; Geubbelmans, M.; Burzykowski, T. Support vector machines. Am. J. Orthod. Dentofac. Orthop. 2023, 164, 754–757. [Google Scholar] [CrossRef] [PubMed]
  28. Jiang, S.Y.; Pang, G.S.; Wu, M.L.; Kuang, L.M. An improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 2012, 39, 1503–1509. [Google Scholar] [CrossRef]
  29. Basri, K.N.; Yazid, F.; Zain, M.N.M.; Yusof, Z.M.; Rani, R.A.; Zoolfakar, A.S. Artificial neural network and convolutional neural network for prediction of dental caries. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2024, 312, 124063. [Google Scholar] [CrossRef]
  30. Gewers, F.L.; Ferreira, G.R.; Arruda, H.F.D.; Silva, F.N.; Comin, C.H.; Amancio, D.R.; Costa, L.D.F. Principal component analysis: A natural approach to data exploration. ACM Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
  31. Wang, J. An intuitive tutorial to Gaussian processes regression. Comput. Sci. Eng. 2023, 25, 4–11. [Google Scholar] [CrossRef]
  32. Stufflebeam, D. Evaluation models. New Dir. Eval. 2001, 2001, 7–98. [Google Scholar] [CrossRef]
  33. GB/T 27404-2008; Method for the Determination of Microbiological Limits in Traditional Foods. China Standard Press: Beijing, China, 2008.
Figure 1. Method flow chart.
Figure 1. Method flow chart.
Agriculture 14 01763 g001
Figure 2. The geographic locations of the 12 origins of rice.
Figure 2. The geographic locations of the 12 origins of rice.
Agriculture 14 01763 g002
Figure 3. Map of rice samples in different states from 12 origins.(From top to bottom: rice seeds, brown rice and rice flour).
Figure 3. Map of rice samples in different states from 12 origins.(From top to bottom: rice seeds, brown rice and rice flour).
Agriculture 14 01763 g003
Figure 4. Schematic diagram of fluorescence spectroscopy detection experiment system.
Figure 4. Schematic diagram of fluorescence spectroscopy detection experiment system.
Agriculture 14 01763 g004
Figure 5. Spectra of different forms of rice under a 405 nm excitation light source: (a) rice seeds, (b) brown rice, and (c) rice flour.
Figure 5. Spectra of different forms of rice under a 405 nm excitation light source: (a) rice seeds, (b) brown rice, and (c) rice flour.
Agriculture 14 01763 g005
Figure 6. Analytically pure fluorescence spectra of riboflavin, basic lignin, zeatin, and branched-chain starch at an excitation wavelength of 405 nm.
Figure 6. Analytically pure fluorescence spectra of riboflavin, basic lignin, zeatin, and branched-chain starch at an excitation wavelength of 405 nm.
Agriculture 14 01763 g006
Figure 7. Distribution of rice features of the first three components after PCA feature extraction.
Figure 7. Distribution of rice features of the first three components after PCA feature extraction.
Agriculture 14 01763 g007
Figure 8. Load weighting plot for the first six principal components.(The symbols in the figure mark the extracted feature points).
Figure 8. Load weighting plot for the first six principal components.(The symbols in the figure mark the extracted feature points).
Agriculture 14 01763 g008
Figure 9. Validation of actual values and predicted values: (a) rice seeds, (b) brown rice, and (c) rice flour.
Figure 9. Validation of actual values and predicted values: (a) rice seeds, (b) brown rice, and (c) rice flour.
Agriculture 14 01763 g009
Figure 10. Plot of test predicted values versus actual values: (a) rice seeds, (b) brown rice, and (c) rice flour.
Figure 10. Plot of test predicted values versus actual values: (a) rice seeds, (b) brown rice, and (c) rice flour.
Agriculture 14 01763 g010
Table 1. Table of information on the location of the 12 different origins.
Table 1. Table of information on the location of the 12 different origins.
The Source (of a Product)Longitude and Latitude
Songyuan (SY)45°08′31″ N, 124°49′31″ E
Changyi District, Jilin (CY)43°52′48.3″ N, 126°33′53.7″ E
Huichun (HC)42°51′44.96″ N, 130°21′56.77 ″E
Gongzhuling (GZL)43°30′16.85″ N, 124°49′22.08″ E
Tao’er River (TEH)45°20′8.12″ N, 122°47′10.86″ E
Qian Guo (QG)44°17′–45°28′ N, 123°35′–125°18′ E
Yushu (YS)44°50′17.2″ N, 126°31′37.7″ E
Da’an (DA)45°30′15.9″ N, 124°17′8.1″ E
Huinan (HN)42°16′19″–42°49′15″ N, 125°58′49″–126°44′39″ E
Meihekou (MHK)42°32′19.43″ N, 125°42′43.56″ E
Yanji (YJ)42°53′27.85″ N, 129°30′32.76″ E
Zhenlai (ZL)45°50′53.70″ N, 123°11′59.53″ E
Table 2. Table of five classifications and the average results of rice seeds according to different machine learning algorithms.
Table 2. Table of five classifications and the average results of rice seeds according to different machine learning algorithms.
Number of TestsDecision TreeSVMKNNNeural Network
170.10%95.20%92.70%98.00%
269.50%95.20%92.40%98.10%
369.70%95.20%92.40%98.00%
469.90%95.20%92.40%98.20%
569.00%95.40%92.40%98.20%
Average Value69.64%95.24%92.46%98.10%
Average Calculation Time46 s54 s39 s48 s
Table 3. Table of five classifications and the average results of brown rice according to different machine learning algorithms.
Table 3. Table of five classifications and the average results of brown rice according to different machine learning algorithms.
Number of TestsDecision TreeSVMKNNNeural Network
186.10%94.50%95.60%98.40%
285.70%94.50%95.50%98.40%
385.70%94.50%95.50%98.30%
485.80%94.70%95.70%98.50%
586.30%94.70%95.70%98.40%
Average Value85.92%94.54%95.60%98.40%
Average Calculation Time45 s56 s38 s47 s
Table 4. Table of five classifications and the average results of rice flour according to different machine learning algorithms.
Table 4. Table of five classifications and the average results of rice flour according to different machine learning algorithms.
Number of TestsDecision TreeSVMKNNNeural Network
195.00%99.70%98.70%99.80%
295.10%99.70%98.70%99.80%
394.80%99.70%98.80%99.80%
495.10%99.70%98.80%99.80%
595.10%99.70%98.80%99.80%
Average Value95.02%99.70%98.76%99.80%
Average Calculation Time42 s54 s36 s47 s
Table 5. Classification accuracy of rice in different states according to different machine learning algorithms.
Table 5. Classification accuracy of rice in different states according to different machine learning algorithms.
Rice StateDecision TreeSVMKNNNeural Network
Rice Seeds69.64%95.24%92.46%98.10%
Brown Rice85.92%94.54%95.60%98.40%
Rice Flour95.02%99.70%98.76%99.80%
Table 6. Classification results of fused rice data in different states according to different machine learning algorithms.
Table 6. Classification results of fused rice data in different states according to different machine learning algorithms.
Number of TestsDecision TreeSVMKNNNeural Network
196.70%98.70%98.00%98.80%
297.10%98.70%97.90%98.70%
396.80%98.60%97.90%98.70%
496.80%98.70%97.90%98.70%
597.00%98.70%97.90%98.80%
Average Value96.88%98.68%97.92%98.74%
Average Calculation Time140 s328 s89 s164 s
Table 7. Table of classification results of rice varieties according to different machine learning algorithms after extracting feature points by principal component analysis.
Table 7. Table of classification results of rice varieties according to different machine learning algorithms after extracting feature points by principal component analysis.
Number of TestsDecision TreeSVMKNNNeural Network
197.60%98.80%98.10%98.90%
297.60%98.80%98.10%99.00%
397.60%98.80%98.10%99.00%
497.60%98.80%98.10%98.90%
597.60%98.80%98.10%98.90%
Average Value97.60%98.80%98.10%98.94%
Average Calculation Time10 s15 s8 s12 s
Table 8. Classification accuracy of rice according to different machine learning algorithms.
Table 8. Classification accuracy of rice according to different machine learning algorithms.
Decision TreeSVMKNNNeural Network
Rice Seeds69.64%95.24%92.46%98.10%
Brown Rice85.92%94.54%95.60%98.40%
Rice Flour95.02%99.70%98.76%99.80%
Fusion96.88%98.68%97.92%98.74%
Dimensionality Reduction97.60%98.80%98.10%98.94%
Table 9. Prediction results based on regression of standardised values of amylose content in rice of different origins (80% of data comprised the training set and 20% comprised the test set).
Table 9. Prediction results based on regression of standardised values of amylose content in rice of different origins (80% of data comprised the training set and 20% comprised the test set).
Rice SeedsBrown RiceRice Flour
RMSEC0.508740.449050.44906
RMSEP0.498030.437110.38316
R20.820.850.89
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qiu, Y.; Tan, Y.; Zhou, Y.; Li, Z.; Miao, Z.; Li, C.; Mei, X.; Liu, C.; Teng, X. Study on Rice Origin and Quality Identification Based on Fluorescence Spectral Features. Agriculture 2024, 14, 1763. https://doi.org/10.3390/agriculture14101763

AMA Style

Qiu Y, Tan Y, Zhou Y, Li Z, Miao Z, Li C, Mei X, Liu C, Teng X. Study on Rice Origin and Quality Identification Based on Fluorescence Spectral Features. Agriculture. 2024; 14(10):1763. https://doi.org/10.3390/agriculture14101763

Chicago/Turabian Style

Qiu, Yixin, Yong Tan, Yingying Zhou, Zhipeng Li, Zhuang Miao, Changming Li, Xitian Mei, Chunyu Liu, and Xing Teng. 2024. "Study on Rice Origin and Quality Identification Based on Fluorescence Spectral Features" Agriculture 14, no. 10: 1763. https://doi.org/10.3390/agriculture14101763

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop