1. Introduction
The implementation of traceability technology to determine the origin of agricultural products can provide a technical and theoretical basis for the tracing and confirmation of geographical indication products, as well as those of a regional or specialty nature that has gained a reputation. The methodologies and conceptual frameworks underpinning this technical research were initially developed, and a technical system was gradually being established. The feasibility of near-infrared (NIR) spectroscopy has been widely studied in the context of traceability technology for the origin of agricultural products, both domestically and abroad. For example, Son et al. explored its use in rice [
1]. Sweet potatoes were studied in 2022 [
2], followed by apples [
3], edible oils [
4], camellia oils [
5], mung beans [
6], and coffee [
7]. Qian et al. used near-infrared spectroscopy to collect spectral data from five rice origins, namely Wuchang and Jiamusi in Heilongjiang Province, etc. The data were then subjected to partial least squares (PLS) analysis to establish an origin discrimination model. The results demonstrated that the accuracies in discriminating the origin of rice were 95.83%, 100.00%, and 95.83%, respectively, with an overall discriminating accuracy exceeding 95.00%. This study on the origin traceability of rice was conducted using the NIR spectral analysis technology [
8]. In a study by Liu et al., near-infrared spectroscopy was used to collect spectral data from red Fuji apples sourced from three geographical locations (Xinjiang, Shanxi, and Shandong). Before analysis, these data underwent preprocessing steps, including normalization and multivariate scattering correction (MSC), etc. A model for origin discrimination was developed using principal component analysis (PCA) with the K-nearest neighbor (KNN) method. The results demonstrated that the accuracies of discriminating the origin of red Fuji apples reached 97.30% and 92.30%, respectively, effectively achieving origin traceability of red Fuji apples [
9]. Xia et al. assessed shiitake from disparate origins using near-infrared spectroscopy. They then established discriminant models for the provinces of Jilin, Hubei, and Fujian by preprocessing the raw spectra and combining them with the PLS discriminant analysis method. The results demonstrated that the discrimination accuracies of the model were 96.70%, 95.60%, and 100.00% for the Jilin, Hubei, and Fujian Provinces, respectively. The establishment of this method provides a novel approach for authenticating the origin of shiitakes [
10].
In a discriminative study of palm sugar using near-infrared spectroscopy, Rismiwandira et al. developed a discriminative model for the adulteration of palm sugar by combining PCA and PLS regression analysis after five preprocessing methods (First Derivatives, Vector Normalization, Standard Normal Variable (SNV), MSC, and Baseline). The results demonstrated that the accuracy of discriminating palm sugar adulteration was 91.00% [
11]. Zhuang et al. collected raw spectral data from 60 green tea samples using near-infrared spectroscopy to improve the performance of the discriminant model through Smoothing, Derivatives, Vector Normalization (VN), and PCA. A discriminative model was constructed using a Support Vector Machine (SVM). The results demonstrated that the discrimination accuracy of green tea samples was 96.11%. The combination of near-infrared spectroscopy with appropriate chemometrics is an effective method for determining the origin of green tea [
12]. NIRS utilizes specific functional groups (such as C–H, O–H, N–H) in substances to selectively absorb near-infrared light. Different geographical locations and growth conditions can affect the distribution pattern of these functional groups in substances, forming unique spectral features. By analyzing these spectra, the specific chemical compositions of different regions can be reflected, forming the theoretical basis for near-infrared spectroscopy for origin tracing. Thus, it is possible to distinguish the geographical origins of different research objects and purposes using NIR spectroscopy. However, this technology is affected by numerous factors. (a) Equipment quality: If the light source of NIR equipment is unstable, the intensity and wavelength of the emitted near-infrared light may fluctuate, making it difficult to accurately measure the absorption and reflection characteristics of the sample to light. (b) Calibration issues: When analyzing a certain chemical substance, wavelength calibration errors can cause the position of characteristic absorption peaks to shift, and the model may not be able to correctly identify them. (c) Environmental interference (e.g., humidity, temperature): When the humidity is high, the surface of optical components (lenses, mirrors, etc.) in NIR devices may adsorb water vapor, causing the focusing position of light passing through the lens to shift, resulting in changes in the interaction between light and the sample and thereby affecting the collection of spectral data. When the temperature rises, it may cause inaccurate focusing of light, affecting the accuracy of spectral measurements and ultimately reducing the accuracy of the model. Meanwhile, the application prospects of NIR spectral analysis models in the field of agricultural product origin tracing are broad, but they also face challenges such as overfitting, model robustness, and potential variability under different conditions: (a) Overfitting is one of the common problems in machine learning. In NIR spectral analysis, overfitting may be caused by various factors (insufficient training data, excessive or underfitting model parameters, etc.). (b) The model robustness may be affected by various factors (sensitivity to differences in sample type, composition, and structure), which may lead to a decrease in the predictive accuracy of the model. (c) There is potential variability under different conditions (growing seasons or environmental factors), as the growth rate and quality of the sample may change, which will affect the accuracy of the NIR spectral analysis model. The model needs to be able to adapt to these changes in order to maintain high prediction accuracy under different conditions.
In a study by Qian et al., NIR diffused reflectance spectroscopy was used to investigate the origin traceability of rice samples sourced from the Jiansanjiang and Wuchang regions of China. The results demonstrated 100.00% and 98.00% accuracies in discriminating the origin traceability model based on Jiansanjiang and Wuchang rice, respectively [
13]. Similarly, Firmani et al. developed a discriminant model for black tea of different origins based on NIR spectra, achieving an accuracy of 98.57% through the use of NIR spectroscopy to differentiate between black tea from Dajiling and other regions using Partial Least Squares Discriminant Analysis (PLS-DA). The findings of this study indicate a robust correlation between the accuracy of origin discrimination models and geographical origins [
14]. Zhang et al. used near-infrared spectroscopy to distinguish six tea categories: oolong, black, white, green, black, and yellow. The results indicated that, except for yellow tea, the spectral characteristics of the other tea types exhibited considerable variation in the three-dimensional spatial distribution. This variation may be attributed to differences in processing techniques and tea tree varieties [
15]. Yu et al. used near-infrared spectroscopy to gather spectral data from Citri reticulata pericarpium samples at the ages of 5, 10, 15, 20, and 25 years. Following the application of various spectral preprocessing techniques with Fisher ’s linear discriminant (FLD) analysis, a discriminant model was constructed to differentiate between the samples of Citri reticulata pericarpium from different years. The discrimination accuracies for the Citri reticulata pericarpium samples were 90.00%, 100.00%, 90.00%, 100.00%, and 100.00%, respectively, indicating that different years affected the results of the traceability model based on near-infrared spectroscopy [
16]. Previous studies have demonstrated that different types of agricultural products analyzed using near-infrared spectroscopy exhibit variations in chemical composition and physical properties, which can be attributed to various factors, including their origin, variety, year, and processing technology. An appropriate model was established using the characteristic information obtained by near-infrared spectroscopy, and the impact of these factors on the results of the discernment of the origin was analyzed. However, there is a paucity of research on the effects of the distance range of the sample origin, sample quantity collected, and shelf life of the samples on the discrimination results of the origin traceability. Consequently, it can be regarded as a pivotal aspect of the investigation into the influence of traceability technology on agricultural products, and its variables can be evaluated for their impact on the original traceability model.
In this study, we analyzed the results of near-infrared spectroscopy (NIRS) to discriminate between mung bean origins. Samples of mung beans were collected from four sources: the Dorbod Mongol Autonomous, Tailai County of Heilongjiang Province, Baicheng City of Jilin Province, and Sishui County of Shandong Province. The raw spectra of the mung bean samples were subjected to spectral preprocessing using a range of methods. A near-infrared (NIR) discrimination model for mung bean samples of different origins was established using PCA combined with the KNN method to confirm the feasibility of tracing mung bean samples based on NIR spectroscopy. Furthermore, the impact of the sample quantity, traceability range, and shelf life on the establishment of different mung bean origin traceability discriminant models was analyzed. An analysis of the influence of the aforementioned factors on the mung bean origin discrimination model offers a theoretical foundation for the establishment of a technical standard system for the traceability of diverse agricultural products, including the traceability of high-value-added agricultural products.
4. Conclusions
In this study, spectral acquisition of mung bean samples of different origins was performed using NIRS. The objective of this study was to investigate the use of PCs to analyze near-infrared spectral data of mung beans of different origins after spectral preprocessing of mung bean samples that had undergone different spectral preprocessing methods. The results indicate that the total cumulative variance contribution rate of the first three PCs was 98.16%. It represented most of the near-infrared spectral information that was present in the mung beans of different origins. Furthermore, the accuracy of the mung bean origin discrimination model established by combining the model with the KNN method was 98.25%. This demonstrates the feasibility of using NIRS with the KNN method to conduct discriminative studies on mung beans of varying origins. Nevertheless, the results are impacted by numerous variables, including the sample quantity, sample traceability scale, and shelf life.
The impacts of sample quantity, traceability scale, and shelf life on the NIR traceability discrimination results of mung beans of different origins were investigated. (a) The origin discrimination model exhibited greater accuracy with a larger sample quantity (600) than with a smaller sample quantity (200). The accuracy of the origin discrimination model can be enhanced by selecting the largest feasible quantity of samples as an indicator of the accuracy of origin discrimination. (b) The accuracy of the origin traceability discrimination model established for mung bean samples at larger traceability scales (Tailai-Sishui) versus smaller traceability scales (Tailai-Baicheng, Tailai-Dorbod Mongol Autonomous) was enhanced, whereas the accuracy of the model discrimination based on small scales was lower. (c) The original discrimination model based on multiple shelf lives (90–180–270–360 d) demonstrated greater accuracy than that based on a single shelf life (90, 180, 270, 360 d). These findings demonstrate that shelf life is a crucial factor that influences the development of an effective origin discrimination model. This model can effectively discriminate mung bean origin traceability and provide novel insights and research avenues for the study of the traceability of agricultural products.
Although this study investigated the influence of different geographical origins of the mung bean sample traceability scale, shelf life, and other factors on the model discrimination results, studying the sample traceability scale remains challenging at a limited scale of origin traceability and accurate discrimination. Due to the large differences in the origin, climate, soil, water, and other factors among a large amount of samples [
35,
36], the annual average temperature and soil types of the three geographical origins in Northeast (Jilin, Tailai, and Dorbod Mongol Autonomous) are similar (4.5–6.0 °C (2022–2023), including caltunica nigritesoil, meadow soil, and sand soil), while the annual average temperature in Shandong (Sishui) is 13.4 °C (2022–2023), and the soil types are brown loam and brown soil. This is also the main reason for the small differences in near-infrared feature information from the small-scale geographical origins in Northeast. Kaoru et al. used high-resolution inductively coupled plasma mass spectrometry to determine the mineral element contents in rice from a large area (Japan, Thailand, the United States, China). The results showed that the discrimination accuracy rate of the origin traceability model based on different origins was 97.0% [
37]. Shi et al. used mineral element fingerprint technology to determine the mineral element contents in rice from small areas (Songjiang and Jinshan and Chongming), and the results showed that the discrimination accuracy of the origin traceability model based on different origins was 92.1% [
38]. The above studies’ conclusions are consistent with the conclusions of this study. Therefore, further supplementation with different agricultural products from other regions is necessary to meet the requirements of the sample traceability scale, which affects the origin traceability identification for different agricultural product research.
Furthermore, this study demonstrated a correlation between the overall correct discrimination rate of mung bean origins from different regions and a discrimination model based on the shelf life of different samples. In the process of sample discrimination, over time, aspects such as the moisture and protein in the sample may undergo slow changes [
39,
40], which have an impact on the collection of near-infrared spectroscopy data and the discrimination results of the established model. Even within the same shelf life, the storage conditions of samples of different origins (such as humidity, temperature, light, etc.) may have different effects on the model’s discrimination performance. Meanwhile, samples of different origins may have varying abilities to trace and distinguish the origin of different agricultural products due to differences in natural and anthropogenic environments. Consequently, it is essential to consider the impact of a sample’s shelf life on the development of future standard system technologies for the traceability of agricultural products. The sample collection process for research on the traceability of agricultural products should aim to include numerous samples with different shelf lives to enhance the stability and accuracy of the discriminative model.