Fluorescence Spectroscopy and a Convolutional Neural Network for High-Accuracy Japanese Green Tea Origin Identification

Akiyama, Rikuto; Suzuki, Kana; Llave, Yvan; Matsumoto, Takashi

doi:10.3390/agriengineering7040095

Open AccessTechnical Note

Fluorescence Spectroscopy and a Convolutional Neural Network for High-Accuracy Japanese Green Tea Origin Identification

Department of Food Science and Technology, Tokyo University of Marine Science and Technology, 4-5-7 Konan, Minato-ku, Tokyo 108-8477, Japan

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(4), 95; https://doi.org/10.3390/agriengineering7040095

Submission received: 28 January 2025 / Revised: 17 March 2025 / Accepted: 18 March 2025 / Published: 1 April 2025

(This article belongs to the Special Issue The Future of Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

This study aims to develop a system combining fluorescence spectroscopy and machine learning through a convolutional neural network (CNN) to identify the origins of various Japanese green teas (Sayama tea, Kakegawa tea, Yame tea, and Chiran tea). Although food origin labeling is important for ensuring consumer quality and safety, ac-curate identification remains a priority for the food industry due to the emergence of problems with false origin labeling. In this study, image data of the fluorescent fingerprints of green teas were collected using fluorescence spectroscopy and analyzed using a CNN model implemented in Python (ver. 3.13.2), TensorFlow (ver. 2.18.0), and Keras (ver. 3.9). The fluorescence of each sample was measured in the range of 250 to 550 nm, highlighting the differences in chemical composition that reflect each region. Using these data, a CNN suitable for image recognition successfully identified the origins of the teas with an average accuracy of 92.83% in 10 trials. For Chiran tea and Yame tea, precision and recall rates of over 95% were achieved, showing clear differences from other regions. In contrast, the classification of Kakegawa and Sayama teas proved challenging due to their similar fluorescence patterns in the 300–350 nm spectral range, corresponding to catechins and polyphenolic compounds. These similarities are presumed to reflect the comparable growing conditions and processing methods characteristic of the two regions. This study shows the potential of this system in food origin identification, suggesting applications in preventing origin fraud and quality control. Future research will aim to extend the system to other regions and foods, enhance data preprocessing to improve accuracy, and develop a versatile identification system.

Keywords:

machine learning; convolutional neural network (CNN); fluorescence spectroscopy; fluorescence fingerprinting; food origin identification; food variety identification; green tea

1. Introduction

Food origin labeling is an important factor for consumers when assessing the quality and safety of products. In Japan, the origin labeling of ingredients in processed foods has become mandatory since April 2022, and interest in the accuracy of origin labeling has been increasing [1]. This trend is not limited to Japan but is observed globally. Food-origin labeling is required in the European Union (EU) under EU Law [2] and in the United States under the Code of Federal Regulations [3]. These regulations have enhanced transparency and reliability in the food industry and enabled consumers to make informed choices. However, despite the widespread adoption of country-of-origin labeling, origin fraud remains a notable issue. This includes practices such as falsely attributing the country of origin of high-value products to lower-value ones, which undermines consumer trust and poses safety risks. Therefore, reliable methods to verify the authenticity of food origins and support accurate labeling are urgently required.

In recent years, green tea (leaf tea) production in Japan has been facing issues. The main factors are a decline in consumption, a decline in producers due to aging, and competition between domestic products and imported tea. In particular, consumer preferences have shifted to bottled tea, and the demand for traditional green tea has declined. Under these circumstances, identification of the origin of green tea plays an important role in protecting regional brands and maintaining market competitiveness.

Identifying origins and ensuring food quality are critical issues in the food industry from the perspectives of preventing food fraud, maintaining quality, and ensuring consumer trust. Determining the geographic origin of food is essential for protecting regional brands and ensuring transparency in import and export transactions. Many studies have been conducted in this field.

Various studies have been conducted to identify the geographical origins of food such as olive oil, honey, coffee beans, white wines, and rice [4,5,6,7,8]. The methods used to distinguish these origins include front-face fluorescence, ultraviolet (UV-VIS), and fluorescence excitation–emission matrix (EEM) spectroscopy.

Several studies have explored effective methods to address food fraud and prevent adulteration, such as with olive oils, cumin powder, pomegranate juice, herbs and spices, and skim milk powder [9,10,11,12,13,14]. To achieve this, spectroscopy methods such as three-dimensional fluorescence (3D-EEM), front-face fluorescence, Raman spectroscopy, near-infrared spectroscopy, and UV-VIS were used.

Various machine learning methods have been developed and combined with other methods to identify the geographic origins of food, including terahertz spectroscopy, a fluorescence sensor array using principal component analysis and random forest (RF), Raman spectroscopy, UV-VIS, an absorption-transmission fluorescence cold-field emission matrix, and image processing [15,16,17,18,19,20,21,22,23].

Several studies have investigated the effects of teas. In a study using hyperspectral imaging technology, Chen et al. [24] presented a method to identify the origin of Pu-erh tea, and Zou et al. [25] proposed a method to classify Chinese tea varieties and detect adulteration with other varieties. Bi et al. [26] developed a sensor array to identify phenols and ketones, suggesting its possible application for identifying the geographic origin of Pu-erh tea and detecting adulteration. Wei et al. [27] used mid-infrared spectroscopy to identify varieties and predict their aging periods. Fang et al. [28] demonstrated a method for identifying adulteration in Wuyi rock tea using EEM. Luo et al. [29] developed a method for predicting the content of catechins and other components with high accuracy by combining multivariate regression analysis and support vector machine (SVM) regression using NIR. Hou et al. [30] demonstrated a method to identify the geographic origin of Chinese Longjing tea by building a machine learning model using 1H NMR and RF algorithms.

Based on the findings of previous studies, this study aims to provide new academic and practical contributions to the literature. In previous studies, there have been limited applications of fluorescence spectroscopy to green tea, and few studies have attempted to analyze data with high accuracy using machine learning techniques. In this study, a novel approach that uses fluorescence fingerprint data obtained through fluorescence spectroscopy was introduced to identify the origin of green tea produced in Japan. Additionally, this study aims to extract origin-specific information from fluorescence fingerprint data with high accuracy using a convolutional neural network (CNN), a type of deep machine learning. CNNs have an excellent ability to automatically learn the features of complex patterns and are expected to have a higher classification accuracy than that of conventional machine learning algorithms. By applying a CNN to green tea origin identification, this study seeks to enhance the accuracy of origin identification. Furthermore, this study proposes a highly practical identification method for preventing food fraud and ensuring quality assurance.

2. Materials and Methods

2.1. Target Foods

Samples were collected from four major green tea-producing regions in Japan: Sayama (Saitama Prefecture), Yame (Fukuoka Prefecture), Chiran (Kagoshima Prefecture), and Kakegawa (Shizuoka Prefecture). Figure 1 shows the geographic distribution of the green tea-producing areas covered in this study. These regions were selected because they are well-known. Three green tea products were selected from each region, resulting in 12 products. These products were sourced from different manufacturers, farms, or expiration dates to ensure diversity in the samples representative of each region. Local producers were selected to reflect their regional characteristics. The green tea samples were branded products with recognized regional characteristics; therefore, no exploratory analysis was performed to compare the fluorescence properties within and between regions, ensuring that the samples essentially represented the fluorescence properties of each region. The green tea samples used were not the powdered type, which is dissolved in hot water and consumed as-is. The samples were those that require extraction with hot water through a tea strainer. After grinding and homogenizing the green tea samples in a mill, fluorescence fingerprints were measured using a fluorescence spectrophotometer. For each product, 100 fluorescence fingerprint data points were collected, yielding a dataset of 1200 data points. This allowed the development of a dataset for identifying the differences in chemical and physical properties between the production regions using machine learning techniques.

2.2. Analysis Method

A fluorescence spectrophotometer (F-7100; Hitachi High-Tech Science, Ibaraki, Japan) was used to measure the fluorescence of each sample across excitation and emission wavelengths in the range of 250–550 nm (including UV and part of the visible spectrum). To ensure accuracy and reliability, the spectrophotometer was maintained and inspected by the manufacturer, including calibration once a year, and the zero point was adjusted before each measurement. Repeated measurements were taken for each sample to evaluate reproducibility, and the average value was used for further analysis. These procedures minimized instrument variation and ensured the consistency of the data used for machine learning.

Before measurement, the green tea samples were homogenized by grinding for approximately 1 min (10 s, six times) using a mill (DR MILLS, DM-7452, Guangzhou, Guangdong, China) to obtain uniform fluorescence properties. The ground samples were stored at room temperature (approximately 25 °C) in an airtight container to minimize exposure to environmental factors known to alter fluorescence properties, such as humidity and oxidation. This controlled environment ensured the consistency and reproducibility of the fluorescence measurements. Changes in temperature and humidity can affect the chemical composition of tea, particularly catechins and chlorophyll, leading to changes in fluorescence intensity. To maintain these controlled conditions and ensure accurate data collection, the samples were placed in a fluorescence spectrophotometer immediately before measurement.

2.3. Development of Machine Learning Model

2.3.1. Data Acquisition and Preprocessing

The fluorescence fingerprint data of the green tea samples were measured using a spectrofluorometer. Both the excitation and emission wavelength ranges were set to 250–550 nm, with an excitation sampling interval of 10.0 nm and an emission sampling interval of 5.0 nm. The photomultiplier tube voltage was set to 680 V. The spectra were acquired under these measurement conditions, and a 3D fluorescence matrix (FD3 file) was created for each sample. The FD3 files were converted into TXT format using FL Solution 4.2 and subsequently into 2D images (PNG format) using Python (ver. 3.13.2), for further processing. As a data preprocessing step, the TXT file was converted into a two-dimensional (2D) image (PNG file) using Python. The scattered light was removed by applying predefined parameters, and the primary and secondary light removal widths were set to 30 nm. The fluorescence intensity values of all the samples were normalized to a consistent scale by extracting the maximum and minimum intensity values (2281 and 241, respectively) and mapping them to grayscale brightness levels. Additionally, a consistent image representation was created by mapping the fluorescence intensity values to the grayscale brightness levels, and the images were cropped to uniform dimensions (360 × 360 pixels) to ensure consistency across the dataset. The maximum and minimum scale values were 2281 and 241, respectively. The filenames of the 2D images were assigned four-digit numbers (0001–1200) corresponding to their classes (origins), which were recorded in the CSV file. The 2D images were each cropped to a square (360 pixels vertical × 360 pixels horizontal) using a Python program to remove the scales common to all images and unnecessary white spaces. These 2D images and the CSV file describing the class (origin) of the 2D images were used as input data for the machine-learning model.

2.3.2. Creation and Division of the Dataset

The preprocessed fluorescence fingerprint image data were labeled for supervised learning. In the accompanying CSV file, labels for the image data from each production area were assigned numerical values: ‘0’ for Kakegawa tea, ‘1’ for Sayama tea, ‘2’ for Chiran tea, and ‘3’ for Yame tea. One-hot encoding was applied to represent the classes as binary vectors to ensure that the model effectively captured the relationships between the classes and improved the classification performance.

The data were split into training and test sets in a ratio of 80:20 using stratified sampling. The training set was further split into training and validation subsets in an 80:20 ratio to ensure that the validation set was used solely for model evaluation during training and remained independent of the test data. To ensure a robust evaluation and balanced data distribution, k-fold cross-validation was conducted by stratifying the training data into five subsets (folds). Validation accuracy was adopted as the primary metric to evaluate the performance of each subset, contributing to the final ensemble prediction. To facilitate cross-validation, the training set was stratified and divided into five subsets (folds). This ensured that the data were adequately prepared for both learning and performance assessments.

2.3.3. Model Construction

The CNN was implemented using Python with the TensorFlow and Keras libraries. The model architecture is described as follows:

Convolutional layer: In the first convolutional layer, 16 filters (3 × 3 kernel size) were used and the ReLU activation function was applied to extract features to extract local patterns in the image. Furthermore, L2 regularization was applied to penalize the weight magnitude to suppress model overfitting and improve model stability. The value of L2 was fixed based on the results of prior trials to reduce computational cost and ensure consistency in the model performance.
Pooling layer: A 2 × 2 max pooling was performed to shrink the feature map and retain important information. This reduced the computational load and suppressed overfitting.
Additional convolutional and pooling layers: Convolutional layers with 32 and 64 filters were combined with subsequent pooling layers to extract higher-order features. L2 regularization was also applied to these convolutional layers.
Fully connected layer: After the feature maps were converted into one dimension using a flattening layer, a fully connected layer with 256 units was used for the final classification. L2 regularization was also applied to the fully connected layer.
Dropout layer: To reduce overfitting and improve robustness further, a dropout layer was added, randomly disabling 30% of the output of the fully connected layer.
Output layer: A SoftMax activation function was employed in the output layer to calculate the probabilities of each class, predicting the origin of the sample based on the highest probability.

2.3.4. Training and Optimization

An Adam optimizer was used to train the model. Adam dynamically adjusts the learning rate based on the magnitude of the change in the loss function, thereby enabling efficient and effective optimization. Categorical cross-entropy was selected as the loss function because it is appropriate for multiclass classification problems. The batch size was set as 32. Early stopping was introduced to prevent overfitting and improve learning efficiency. Validation accuracy was used as the monitoring indicator because the dataset had a constant number of samples per class, and the model accuracy was of primary importance. Moreover, the patience parameter was set to 10, meaning that training stopped if no improvement in validation accuracy was observed within 10 epochs. The maximum number of epochs was set to 100, allowing training to proceed until no further improvement in the validation accuracy was detected. The model was configured to restore the optimal weights at the end of the training. Cross-validation was employed during training to enhance the stability of the model.

2.3.5. Model Performance Evaluation

Precision: The percentage of green tea samples predicted to originate from a particular region that actually originated in that region.

(True Positive)/(True Positive + False Positive)

Recall: The percentage of green tea samples correctly predicted to originate from a particular region. This index was used to evaluate the accuracy of the model in detecting the actual positive class (a specific region).

(True Positive)/(True Positive + False Negative)

F1 Score: This represents the harmonic mean of the recall and precision and is an index used to evaluate the overall balance of the model. This index decreases when recall is high but precision is low or when precision is high but recall is low; therefore, it is suitable for measuring the performance of a balanced model.

2 × (Precision × Recall)/(Precision + Recall)

In this study, ensemble learning was performed based on the prediction results of each model obtained from the five folds selected to balance the training and validation data. For the final evaluation, a weighted average method was used, with the validation accuracy serving as the primary metric for assigning weights to predictions from each fold. Specifically, the weight of each fold was calculated based on the validation accuracy of the models obtained for each fold, emphasizing the predictions from models with higher validation accuracy. The accuracy of the ensemble predictions of the test data was used as an evaluation metric for the model.

2.3.6. Prediction for Unknown Data

For unknown samples, the trained model predicted the probability of each origin and assigned each sample to the origin with the highest probability. This approach allowed the model to identify the origins of previously unclassified samples, highlighting its practical applicability for origin identification.

3. Results

Results of Green Tea Origin Identification

Table 1 lists the accuracy and loss of the test data when the weighted average ensemble method based on the validation accuracy was applied every 10 trials, which was 92.83%, confirming the high prediction accuracy of the origin. Moreover, the cross-validation showed that the model performance was stable, with an accuracy of a 0.01329 standard deviation. The validation accuracy between the folds was also consistent, confirming the robustness of the method. Furthermore, the average loss of the test data using the ensemble method was 0.8487, revealing that there was room for improvement in the quality of the prediction. However, the standard deviation of the loss was small, at 0.01376, confirming that the prediction performance of the model between trials was generally stable. Analysis of the confusion matrix based on the test data revealed the numerical values of the classification performance evaluation index and misclassification tendency for each origin (Table 2 and Table 3).

The precision, recall, and F1 scores of the Chiran tea and Yame tea were >95%, and classification was performed with high accuracy.

For the Kakegawa and Sayama teas, the recall rate for the Kakegawa tea and the precision rate for the Sayama tea were >93%, which were lower than those of the Chiran and Yame teas; however, the classification performance of the models was good.

In contrast, the precision rate for the Kakegawa tea was approximately 85% and the F1 score was approximately 87%, while the recall rate for the Sayama tea was approximately 81% and the F1 score was approximately 88%.

These indices were <90%, and the classification performances of the models for the Kakegawa and Sayama teas were inferior to those for the Chiran and Yame teas.

In the misclassification tendency, the Sayama tea was most frequently misclassified as Kakegawa tea, at 10.3 times per trial. The Kakegawa tea was misclassified as Sayama tea at 3.4 times per trial. The Chiran and Yame teas were misclassified as having other origins less frequently than the Kakegawa and Sayama teas, with the Yame tea being most frequently misclassified as Sayama tea (2.0 times per trial). The Chiran tea was never misclassified as having another origin in any of the 10 trials. In addition, in the analysis of the learning curve, as the number of epochs increased, the training loss and validation loss decreased while taking close values (the training and validation accuracies increased while taking close values) and the performance of the model improved. The learning curve for each fold in trial 1 is shown in Figure 2. Generally, overlearning begins when the number of epochs exceeds a certain value. In this study, early termination was introduced, which made it possible to detect signs of overfitting and stop learning at the optimal number of epochs. The number of epochs in the model training was an average of 35 epochs, a maximum of 51 epochs, and a minimum of 20 epochs, and the computational load required for model convergence was approximately 1.5 times higher (Table 4). Although the number of epochs for each training slightly varied, the validation accuracy was within a certain range, with an average of 0.9530 (rounded to the nearest five decimal places) and a standard deviation of 0.02041 (rounded to the nearest six decimal places). As the patience (waiting period) was set to 10, it was confirmed that the optimal weights were obtained in an average of 25 epochs. Moreover, the validation accuracy increased with the number of training epochs for the model (Figure 3).

Furthermore, analysis of the fluorescent fingerprint images revealed similarities and differences in the fluorescence characteristics of each production area (Figure 4). The Chiran and Yame teas showed clear fluorescence characteristics in some products and could be visually distinguished. However, the fluorescent properties of the Kakegawa and Sayama teas were very similar, making it difficult to visually distinguish them. These visual distinctions were quantitatively evaluated using statistical indicators, such as the mean fluorescence intensity and standard deviation within the region of interest for each tea origin. To further verify this distinction, confusion matrix analysis was employed to provide indicators such as precision, recall, and F1 score, and the degree of clarity of the distinction between the tea origins was quantitatively demonstrated. However, the fluorescent properties of the Kakegawa and Sayama teas were very similar, making it difficult to distinguish them visually. It was also confirmed that the fluorescence properties differ between products, even within the same origin and within the same product.

The fluorescent components in green tea include catechins and polyphenols. In the fluorescent fingerprint images of green tea, the fluorescence intensity increased in the ranges corresponding to the excitation and fluorescence wavelengths of these substances, which was revealed to be the cause of the differences and similarities in the fluorescent properties of the green tea. A correlation was found between these fluorescent compounds and the classification results; the Chiran and Yame teas showed unique fluorescence intensity patterns associated with high concentrations of catechins and tea polyphenols [31,32]. These relationships suggest that highlighting these regions of interest during preprocessing or integrating chemical composition data into the model is expected to enhance the classification accuracy and improve the model performance. However, the fluorescent components and wavelengths of green tea shown in Table 5 are data from green tea extracts; therefore, the differences from this experiment using ground green tea are notable.

These results indicate that a combination of fluorescence spectroscopy and machine learning is an effective method for identifying the origins of green tea.

4. Discussion

In this study, the origins of green teas in Japan were identified by combining fluorescence fingerprint data with a CNN. The identification accuracy of the test data exceeded 92% (average of 10 trials), indicating a high accuracy. This indicates that the fluorescence fingerprint method is a promising approach for identifying the origins of food and reaffirms the effectiveness of fluorescence spectroscopy, consistently with the results of previous studies that identified the geographic origins of food [7,8]. Notably, the accuracy achieved was comparable with that of the studies by Yang et al. [15], who identified the geographic origins of coffee using terahertz spectroscopy, and Hou et al. [30], who identified the geographic origins of green tea using 1H NMR spectroscopy. Compared with those previous studies, this study is original in that it applied a CNN to identify the origins of green tea using fluorescence spectroscopy.

The loss of test data (0.8487) indicates acceptable performance; however, further improvement in the model’s predictive ability is necessary. The accuracy of identification varied depending on the place of origin, with low accuracy for the Sayama and Kakegawa teas. This was probably because the peak patterns of the fluorescence fingerprints of these teas were similar, providing only limited features for distinguishing their place of origin. In contrast, the Yame and Chiran teas, which had distinctive peak patterns, achieved high accuracy. These results suggest that the fluorescent properties of green tea are attributable to catechins and other fluorescent components.

Furthermore, previous studies have suggested that accuracy can be improved by integrating fluorescence fingerprinting and multi-component analysis. Lia et al. [9] used 3D fluorescence spectroscopy to detect adulterated olive oil, and Xie et al. [33] used front-face fluorescence spectroscopy to detect adulterated roasted Arabica coffee. The significance of this study is that it applies analytical methods presented to pursue identification based on the unique chemical composition of green tea.

Furthermore, research by Müller-Maatsch et al. [14] on the use of portable multisensor devices and Yang et al. [20] on the use of VIS-NIR spectroscopy suggests the possibility of improving classification accuracy by combining multiple analytical methods. In this study, a multimodal approach combining fluorescence and NIR spectroscopy was expected to improve the robustness of the classification.

Data preprocessing played an important role, as normalizing the fluorescence intensity reduced the noise and variability. This tradeoff highlights the need for optimized preprocessing techniques that balance noise reduction and the preservation of distinctive spectral features. To improve the classification of Sayama and Kakegawa teas, it would be beneficial to explore preprocessing methods that preserve changes in fluorescence intensity and account for environmental variations. Furthermore, feature extraction methods such as principal component analysis (PCA) and deep feature fusion may be utilized to highlight subtle spectral differences. In addition, combining advanced spectral analysis methods such as spectral unmixing could separate overlapping fluorescence signals and improve classification accuracy.

The use of CNNs enables pattern recognition of image data, achieving higher accuracy than that with traditional methods. Hu et al. [23] reported that CNNs demonstrated analytical accuracy that exceeded that of other machine-learning algorithms (SVM and RF). This study takes advantage of this to establish origin identification from the fluorescent fingerprint data of green tea, further expanding the scope of application of CNNs compared with that in previous studies.

The effect of the number of epochs on the accuracy of the model was also examined. The improvement in accuracy with the increase in the number of epochs suggests that the training had not yet fully converged. In future training, systematic criteria, such as early stopping and monitoring of validation accuracy, will be used to refine the settings. Expanding the dataset to include additional regions and food categories is expected to further increase the versatility of this approach. While this study focused primarily on selecting samples based on their origin, the potential impact of temporal variables, such as seasonal variation, on fluorescence properties was also recognized. Seasonal variations in temperature, sun exposure, and rainfall can affect the chemical compositions of tea leaves, including those of catechins and polyphenolic compounds. Future studies should address these factors to improve the robustness of the model.

This study provides a new approach for preventing origin fraud and quality control in the food manufacturing industry. The findings of this study can be implemented by integrating the developed model into the quality control processes of food manufacturing facilities to address food-origin fraud. Continuous collection of fluorescence fingerprint data during the raw material-receiving process can rapidly verify the origin of food in real time. Furthermore, creating a fluorescence fingerprint database shared between companies and industries can provide a robust mechanism for preventing origin fraud. Future efforts should focus on improving this model using advanced preprocessing techniques and incorporating chemical composition data. These advances are expected to improve origin identification and quality control in the food industry, thereby contributing to increased consumer confidence. Additionally, seasonal variation was not explicitly considered, and this needs to be addressed to minimize variation and increase the representativeness of the samples.

5. Conclusions

This study demonstrated that a method combining the image data of fluorescence fingerprints from fluorescence spectroscopy and machine learning through a CNN is effective for identifying the origin of Japanese green tea. The model achieved an identification accuracy of 92.83%, highlighting the possibility of a new method for food traceability and prevention of origin fraud. Specifically, Chiran tea and Yame tea achieved precision and recall rates of over 95%, showing clear differences from the other regions. However, distinguishing samples with similar fluorescence characteristics, especially Sayama and Kakegawa teas, remains challenging. The precision and recall rates of these teas were approximately 85% and 81%, respectively, indicating room for improvement. To address these issues, data preprocessing methods and feature extraction techniques must be further improved.

Future research should expand this dataset to include green teas from other regions. The versatility of this model can be further enhanced by incorporating data on the variation in fluorescence characteristics due to the harvest time and processing methods. In addition, combining the fluorescence fingerprinting method with other analytical methods, such as component analysis and spectral data, is expected to improve the accuracy of the model by providing a more comprehensive assessment of food properties.

Technological advances, such as exploring alternative algorithms, including ResNet and EfficientNet, and leveraging transfer learning, are crucial for optimizing model performance and reducing reliance on large datasets. These efforts aim to improve the reliability and practicality of this method, making it a valuable tool for preventing origin fraud and enhancing quality control in the food manufacturing industry.

With these advances, the proposed method can contribute to increasing consumer trust by ensuring the integrity of origin labeling and supporting robust quality assurance practices. Therefore, this study provides a basis for the broader application of fluorescence spectroscopy and machine learning in food science and industry. Efforts should focus on further improving accuracy and robustness to meet various application requirements. Additionally, the versatility of the identification model can be improved by targeting additional production regions. As this method relies on the chemical composition of foods, its applicability to other products is highly likely, and its potential use in food traceability and quality control should be explored.

Author Contributions

Conceptualization, T.M.; methodology, R.A. and K.S.; software, R.A., K.S. and T.M.; validation, T.M.; data curation, R.A. and K.S.; writing—original draft preparation, R.A.; review, Y.L.; writing—review and editing, T.M.; project administration, T.M.; funding acquisition, T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKENHI Grant Number JP22K02156: https://www.jsps.go.jp/j-grantsinaid/16_rule/rule.html (accessed on 5 February 2025).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ministry of Agriculture, Forestry and Fisheries. About the System for Labeling the Origin of Ingredients in Processed Foods. (In Japanese). Available online: https://www.maff.go.jp/j/syouan/hyoji/gengen_hyoji.html (accessed on 25 December 2024).
European Union Laws. Available online: https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:32018R0775 (accessed on 25 December 2024).
Code of Federal Regulations. Available online: https://www.ecfr.gov/current/title-19/chapter-I/part-134 (accessed on 25 December 2024).
Al Riza, D.F.; Kondo, N.; Rotich, V.K.; Perone, C.; Giametta, F. Cultivar and Geo-graphical Origin Authentication of Italian Extra Virgin Olive Oil Using Front-Face Fluorescence Spectroscopy and Chemometrics. Food Control 2021, 121, 107604. [Google Scholar] [CrossRef]
Ruoff, K.; Luginbühl, W.; Künzli, R.; Bogdanov, S.; Bosset, J.O.; von der Ohe, K.; von der Ohe, W.; Amadò, R. Authentication of the botanical and geographical origin of honey by front-face fluorescence spectroscopy. J. Agric. Food Chem. 2006, 54, 6858–6866. Available online: https://pubs.acs.org/doi/abs/10.1021/jf060697t (accessed on 5 February 2025). [PubMed]
Quan, N.M.; Phung, H.M.; Uyen, L.; Dat, L.Q.; Ngoc, L.G.; Hoang, N.M.; Tu, T.K.M.; Dung, N.H.; Ai, C.T.D.; Trinh, D.N.T. Species and geographical origin authenticity of green coffee beans using UV–VIS spectroscopy and PLS–DA prediction model. Food Chem. Adv. 2023, 2, 100281. [Google Scholar] [CrossRef]
Suciu, R.-C.; Zarbo, L.; Guyon, F.; Magdas, D.A. Application of fluorescence spectroscopy using classical right angle technique in white wines classification. Sci. Rep. 2019, 9, 18250. Available online: https://www.nature.com/articles/s41598-019-54697-8 (accessed on 5 February 2025).
Hu, L.; Zhang, Y.; Ju, Y.; Meng, X.; Yin, C. Rapid identification of rice geographical origin and adulteration by excitation-emission matrix fluorescence spectroscopy combined with chemometrics based on fluorescence probe. Food Control 2023, 146, 109547. [Google Scholar] [CrossRef]
Lia, F.; Formosa, J.P.; Zammit-Mangion, M.; Farrugia, C. The first identification of the uniqueness and authentication of Maltese extra virgin olive oil using 3D-fluorescence spectroscopy coupled with multi-way data analysis. Foods 2020, 9, 498. [Google Scholar] [CrossRef]
Tan, J.; Liu, J.-Y.; Su, H.; Yang, X.-H.; Li, H.-F. Detection of adulteration of cumin powder by front-face synchronous fluorescence spectroscopy: The influence of the natural variation of adulterants. Food Control 2024, 158, 110228. [Google Scholar] [CrossRef]
Gao, X.; Fan, D.; Li, W.; Zhang, X.; Ye, Z.; Meng, Y.; Liu, T.C.-Y. Rapid quantification of the adulteration of pomegranate juices by Raman spectroscopy and chemometrics. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2023, 302, 23014. [Google Scholar] [CrossRef]
Vafakhah, M.; Asadollahi-Baboli, M.; Hassaninejad-Darzi, S.K. Raman spectroscopy and chemometrics for rice quality control and fraud detection. J. Consum. Prot. Food Saf. 2023, 18, 403–413. Available online: https://link.springer.com/article/10.1007/s00003-023-01435-y (accessed on 5 February 2025).
Kucharska-Ambrożej, K.; Karpinska, J. The application of spectroscopic techniques in combination with chemometrics for detection adulteration of some herbs and spices. Microchem. J. 2020, 153, 10427. [Google Scholar] [CrossRef]
Müller-Maatsch, J.; Alewijn, M.; Wijtten, M.; Weesepoel, Y. Detecting fraudulent additions in skimmed milk powder using a portable, hyphenated, optical multi-sensor approach in combination with one-class classification. Food Control 2021, 121, 107744. [Google Scholar] [CrossRef]
Yang, S.; Li, C.; Mei, Y.; Liu, W.; Liu, R.; Chen, W.; Han, D.; Xu, K. Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined with Machine Learning Methods. Front. Nutr. 2021, 8, 680627. [Google Scholar] [CrossRef]
Mo, Y.; Xu, J.; Zhou, H.; Zhao, Y.; Chen, K.; Zhang, J.; Deng, L.; Zhang, S. A machine learning-assisted fluorescent sensor array utilizing silver nanoclusters for coffee discrimination. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2024, 322, 24760. [Google Scholar] [CrossRef]
Berghian-Grosan, C.; Magdas, D.A. Application of Raman spectroscopy and machine learning algorithms for fruit distillates discrimination. Sci. Rep. 2020, 10, 21152. Available online: https://www.nature.com/articles/s41598-020-78159-8 (accessed on 5 February 2025).
Gu, H.-W.; Zhou, H.-H.; Lv, Y.; Wu, Q.; Pan, Y.; Peng, Z.-X.; Zhang, X.-H.; Yin, X.-L. Geographical origin identification of Chinese red wines using ultraviolet-visible spectroscopy coupled with machine learning techniques. J. Food Compos. Anal. 2023, 119, 105265. [Google Scholar] [CrossRef]
Ranaweera, R.K.; Gilmore, A.M.; Capone, D.L.; Bastian, S.E.; Jeffery, D.W. Spectrofluorometric analysis combined with machine learning for geographical and varietal authentication, and prediction of phenolic compound concentrations in red wine. Food Chem. 2021, 361, 130149. [Google Scholar] [CrossRef]
Yang, Q.; Tian, S.; Xu, H. Identification of the geographic origin of peaches by VIS-NIR spectroscopy, fluorescence spectroscopy and image processing technology. J. Food Compos. Anal. 2022, 114, 104843. [Google Scholar] [CrossRef]
Chen, A.-Q.; Wu, H.-L.; Wang, T.; Wang, X.-Z.; Sun, H.-B.; Yu, R.-Q. Intelligent analysis of excitation-emission matrix fluorescence fingerprint to identify and quantify adulteration in camellia oil based on machine learning. Talanta 2023, 251, 123733. [Google Scholar] [CrossRef]
Wu, M.; Li, M.; Fan, B.; Sun, Y.; Tong, L.; Wang, F.; Li, L. A rapid and low-cost method for detection of nine kinds of vegetable oil adulteration based on 3-D fluorescence spectroscopy. LWT 2023, 188, 115419. [Google Scholar] [CrossRef]
Hu, Y.; Wei, C.; Wang, X.; Wang, W.; Jiao, Y. Using three-dimensional fluorescence spectroscopy and machine learning for rapid detection of adulteration in camellia oil. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2025, 329, 125524. [Google Scholar] [CrossRef]
Chen, M.; Guo, W.; Yi, X.; Jiang, Q.; Hu, X.; Peng, J.; Tian, J. Hyperspectral imaging combined with convolutional neural network for Pu’er ripe tea origin recognition. J. Food Compos. Anal. 2025, 139, 107093. [Google Scholar] [CrossRef]
Zou, Z.; Wu, Q.; Long, T.; Zou, B.; Zhou, M.; Wang, Y.; Liu, B.; Luo, J.; Yin, S.; Zhao, Y.; et al. Classification and adulteration of mengding mountain green tea varieties based on fluorescence hyperspectral image method. J. Food Compos. Anal. 2023, 117, 105141. [Google Scholar] [CrossRef]
Bi, Z.; Zhen, L.; Li, J.; Jin, P.; Gu, Y.; Ma, Y.; Song, D.; Yang, X.; Li, Y.; Huang, H. A novel sensor array with ability to respectively identify phenol and ketone for the precise discrimination of origins of raw Pu-erh and its counterfeiting. Food Res. Int. 2025, 199, 115371. [Google Scholar] [CrossRef]
Wei, L.; Hu, O.; Chen, H.; Yang, T.; Fan, Y.; Xu, L.; Zhang, L.; Lan, W.; She, Y.; Fu, H. Variety identification and age prediction of Pu-erh tea using graphene oxide and porphyrin complex based mid-infrared spectroscopy coupled with chemometrics. Microchem. J. 2020, 158, 105. [Google Scholar] [CrossRef]
Fang, H.; Wang, T.; Chen, L.; Wang, X.-Z.; Wu, H.-L.; Chen, Y.; Yu, R.-Q. Rapid authenticity identification of high-quality Wuyi Rock tea by multidimensional fluorescence spectroscopy coupled with chemometrics. J. Food Compos. Anal. 2024, 135, 106632. [Google Scholar] [CrossRef]
Luo, W.; Li, W.; Liu, S.; Li, Q.; Huang, H.; Zhang, H. Measurement of four main catechins content in green tea based on visible and near-infrared spectroscopy using optimized machine learning algorithm. J. Food Compos. Anal. 2025, 138, 106990. [Google Scholar] [CrossRef]
Hou, Z.; Jin, Y.; Gu, Z.; Zhang, R.; Su, Z.; Liu, S. 1H NMR spectroscopy combined with machine-learning algorithm for origin recognition of Chinese famous green tea Longjing tea. Foods 2024, 13, 2702. [Google Scholar] [CrossRef]
Hu, X.-C.; Yu, H.; Deng, Y.; Chen, Y.; Zhang, X.-H.; Gu, H.-W.; Yin, X.-L. Rapid authentication of green tea grade by excitation-emission matrix fluorescence spectroscopy coupled with multi-way chemometric methods. Eur. Food Res. Technol. 2023, 249, 767–775. [Google Scholar] [CrossRef]
Yamazaki, K.; Murakami, T.; Okada, N.; Terai, H.; Miyase, T.; Sano, M. Fluorescence characteristics of Pu-erh Tea. Nippon Shokuhin Kagaku Kogaku Kaishi 2013, 60, 87–95. (In Japanese) [Google Scholar] [CrossRef][Green Version]
Xie, J.-Y.; Tan, J. Front-face synchronous fluorescence spectroscopy: A rapid and non-destructive authentication method for Arabica coffee adulterated with maize and soybean flours. J. Consum. Prot. Food Saf. 2022, 17, 209–219. Available online: https://link.springer.com/article/10.1007/s00003-022-01396-8 (accessed on 5 February 2025).

Figure 1. Geographical distribution of Japanese green tea-producing areas covered in this study on a map of Japan.

Figure 2. Learning curves showing training and validation loss trends during model training (Trial Number 1).

Figure 3. Validation accuracy as a function of training epochs, showing steady improvement over time.

Figure 4. Representative examples of the preprocessed fluorescence fingerprint images (cropped). * From the left, three rows each are Kakegawa tea, Sayama tea, Chiran tea, and Yame tea. The same products are in the same rows. The X-axis shows excitation, with a wavelength range of 250–550 nm, and the Y-axis shows emission, with a wavelength range of 250–550 nm.

Table 1. Accuracy and loss of test data by the weighted average ensemble method based on the accuracy of each fold of 10 trials.

Trial Number	Accuracy	Loss
1	0.9375	0.8438
2	0.9375	0.8289
3	0.9125	0.8561
4	0.9417	0.8287
5	0.9125	0.8690
6	0.9042	0.8640
7	0.9333	0.8581
8	0.9375	0.8454
9	0.9333	0.8525
10	0.9333	0.8404
Mean	0.9283	0.8487
Standard Deviation	0.01329	0.01376

The standard deviations were rounded to the sixth decimal place and listed to the fifth decimal place. Other values were rounded off to the fifth decimal place and listed to the fourth decimal place.

Table 2. Classification performance evaluation index values for each production area (average of 10 trials).

Type of Tea	Precision	Recall
Kakegawa	0.8477	0.9383
Sayama	0.9322	0.8117
Chiran	0.9539	1.0000
Yame	0.9932	0.9633

Rounded to the fifth decimal place and listed to the fourth decimal place.

Table 3. Confusion matrix.

	Type of Tea	Predicted
	Type of Tea	Kakegawa	Sayama	Chiran	Yame
True	Kakegawa	56.3	3.4	0.0	0.3
	Sayama	10.3	48.7	0.9	0.1
	Chiran	0.0	0.0	60.0	0.0
	Yame	0.0	0.2	2.0	57.8

Average of 10 trials.

Table 4. Number of epochs per fold for each trial.

	Fold
Trial Number	1	2	3	4	5
1	29	20	47	40	34
2	39	36	23	44	35
3	36	30	21	37	51
4	33	28	41	40	41
5	43	25	28	28	32
6	33	35	24	35	28
7	53	46	22	36	48
8	33	21	40	49	31
9	23	42	26	45	32
10	42	31	33	39	42

Table 5. Fluorescent components in green tea [31,32].

	Excitation Wavelength (nm)	Emission Wavelength (nm)
Catechins (EC, EGC, ECG, EGCG)	250–295	300–390
Tea polyphenols, flavonoids	200–400	300–500
Fulvic acid	255–500	380–580
Humic acid	250–500	350–600

The wavelengths of catechin, fulvic acid, and humic acid were obtained from the EEM spectra.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akiyama, R.; Suzuki, K.; Llave, Y.; Matsumoto, T. Fluorescence Spectroscopy and a Convolutional Neural Network for High-Accuracy Japanese Green Tea Origin Identification. AgriEngineering 2025, 7, 95. https://doi.org/10.3390/agriengineering7040095

AMA Style

Akiyama R, Suzuki K, Llave Y, Matsumoto T. Fluorescence Spectroscopy and a Convolutional Neural Network for High-Accuracy Japanese Green Tea Origin Identification. AgriEngineering. 2025; 7(4):95. https://doi.org/10.3390/agriengineering7040095

Chicago/Turabian Style

Akiyama, Rikuto, Kana Suzuki, Yvan Llave, and Takashi Matsumoto. 2025. "Fluorescence Spectroscopy and a Convolutional Neural Network for High-Accuracy Japanese Green Tea Origin Identification" AgriEngineering 7, no. 4: 95. https://doi.org/10.3390/agriengineering7040095

APA Style

Akiyama, R., Suzuki, K., Llave, Y., & Matsumoto, T. (2025). Fluorescence Spectroscopy and a Convolutional Neural Network for High-Accuracy Japanese Green Tea Origin Identification. AgriEngineering, 7(4), 95. https://doi.org/10.3390/agriengineering7040095

Article Menu

Fluorescence Spectroscopy and a Convolutional Neural Network for High-Accuracy Japanese Green Tea Origin Identification

Abstract

1. Introduction

2. Materials and Methods

2.1. Target Foods

2.2. Analysis Method

2.3. Development of Machine Learning Model

2.3.1. Data Acquisition and Preprocessing

2.3.2. Creation and Division of the Dataset

2.3.3. Model Construction

2.3.4. Training and Optimization

2.3.5. Model Performance Evaluation

2.3.6. Prediction for Unknown Data

3. Results

Results of Green Tea Origin Identification

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI