Next Article in Journal
Red Dragon Fruit Peels: Effect of Two Species Ratio and Particle Size on Fibre Quality and Its Application in Reduced-Fat Alpaca-Based Sausages
Previous Article in Journal
Effects of Edible Organic Acid Soaking on Color, Protein Physicochemical, and Digestion Characteristics of Ready-to-Eat Shrimp upon Processing and Sterilization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Terahertz Spectroscopic Identification of Roast Degree and Variety of Coffee Beans

1
School of Food and Drug, Shenzhen Polytechnic University, No. 2190, Liuxian Road, Shenzhen 518055, China
2
College of Food Science and Technology, Hebei Agricultural University, Baoding 071001, China
3
College of Food and Bioengineering, Henan University of Science and Technology, Luoyang 471003, China
*
Authors to whom correspondence should be addressed.
Foods 2024, 13(3), 389; https://doi.org/10.3390/foods13030389
Submission received: 21 November 2023 / Revised: 6 January 2024 / Accepted: 19 January 2024 / Published: 24 January 2024
(This article belongs to the Section Food Engineering and Technology)

Abstract

:
In this study, terahertz time-domain spectroscopy (THz-TDS) was proposed to identify coffee of three different varieties and three different roasting degrees of one variety. Principal component analysis (PCA) was applied to extract features from frequency-domain spectral data, and the extracted features were used for classification prediction through linear discrimination (LD), support vector machine (SVM), naive Bayes (NB), and k-nearest neighbors (KNN). The classification effect and misclassification of the model were analyzed via confusion matrix. The coffee varieties, namely Catimor, Typica 1, and Typica 2, under the condition of shallow drying were used for comparative tests. The LD classification model combined with PCA had the best effect of dimension reduction classification, while the speed and accuracy reached 20 ms and 100%, respectively. The LD model was found with the highest speed (25 ms) and accuracy (100%) by comparing the classification results of Typica 1 for three different roasting degrees. The coffee bean quality detection method based on THz-TDS combined with a modeling analysis method had a higher accuracy, faster speed, and simpler operation, and it is expected to become an effective detection method in coffee identification.

Graphical Abstract

1. Introduction

Coffee is one of the most popular beverages in the world. It has been consumed for hundreds of years and has become an important part of cultural traditions and social life. In the United States, 85% of adults consume an average of 135 mg of caffeine per day, which is equivalent to about 1.5 standard cups of coffee, and coffee is the main source of caffeine intake for adults [1]. In terms of its economic value, coffee is one of the most valuable agricultural products, which has been exported by the third world and developing countries [2]. It is planted and cultivated in the equatorial zone; cherries and beans are the fruits and seeds of a coffee plant [3]. The final quality of green coffee beans depends on the coffee plant cultivar, geography, weather conditions, and infrastructure available. Wet and dry post-harvest processes may affect the final taste of the beverage; however, the final flavor and aroma are mainly influenced by roasting [4]. To obtain a popular beverage, green coffee beans are roasted at temperatures ranging from 176.7 to 232.2 °C with roasting times ranging from 10 to 20 min, producing light to dark roasts [5]. In general, light-roast coffee beans are suitable for pour-over coffee and dark-roast coffee beans are good for espresso. Since the applications of coffee beans with different roast levels are becoming more important, the precise level of roasting has become an indicator of roasted coffee bean applications.
In the coffee industry, variety identification is generally used for two purposes: for a consistent supply of coffee beans and for the identification of a new variety. The development of single-origin coffee is an important strategy to maintain coffee quality, grade, and a high cupping score [6]. Roasting level identification is used to evaluate whether the original roasting level is maintained from a small batch to a large batch of production. Capillary electrophoresis-mass spectrometry [7], GC/MS [6], high-throughput metabolomics, DNA RFLP fingerprinting [8], hyperspectral imaging [9], Near-Infrared Spectrometer [10], etc., were used to identify coffee varieties. The variety identification was carried out through the separation and identification of a chemical compound in green coffee beans or coffee metabolites for capillary electrophoresis-mass spectrometry and GC/MS. DNA RFLP fingerprinting is a biological detection method, including DNA extraction, PCR amplification, subcloning, and sequencing steps. High-throughput metabolomics is chemical analysis combined with biological information analysis. Hyperspectral imaging and near-infrared spectrometer are spectrum detection methods. The accuracy of chemical analysis combined with biological analysis methods is very high, but the detection process is high-cost, time-consuming, and environmentally unfriendly. For spectrum detection methods, the optical signal is converted into data and then analyzed in combination with mathematical models. The accuracy of spectrum detection methods is not as good as chemical and biological detection methods, but simple pretreatment, time efficiency, and environmental friendliness are advantages of the spectrum detection method. At present, the identification of roasting level uses capillary electrophoresis-mass spectrometry [7], color values, coffee pour-over, etc. The color value method is most commonly used, but it cannot distinguish between light and medium roasting or medium and dark roasting, among different coffee varieties. Sensory evaluation is a very important method to evaluate the quality of coffee. For a given coffee bean, if the taste results of the pour-over method are correlated with the results of an equipment and there are effective associations between sensory evaluation and instrumental data, we can ensure the controllability of coffee bean quality further.
Terahertz (electromagnetic waves at frequencies of 0.1 to 10 THz) detection technology has been utilized in areas such as agricultural breeding [11], medicine [12], biology [13], chemistry [14], and foods [15]. The main advantages of THz spectroscopic detection are that it is nondestructive and environmentally friendly, which enables real-time characterization. Absorption spectra in the THz range provide molecular data in rotational and vibrational modes, which can be used to identify chemical compounds. Furthermore, both frequency-domain and time-domain information are related to the physical structure and chemical composition of a sample [16].
THz radiation does not readily penetrate through metals or water like a polar liquid, which makes it one of the most useful applications of THz imaging via the detection of moisture or humidity in materials [17]. However, with the exception of the measured water itself, THz imaging requires the drying of samples to remove interferences from the water signal [18,19]. Some samples contain hydrogen bonds or bioactive macromolecules, which have characteristic absorption peaks in the Terahertz band [20], making it useful for qualitative and quantitative detection [16]. But many samples do not have characteristic absorption peaks, so the analysis of these samples must depend on mathematical models. Suitable mathematical models are established first, and then the reliability of the developed models is confirmed via model training and verification [21,22,23]. Terahertz spectroscopy and SVM (support vector machine) were used to identify foods or biological tissue samples. The results showed that the model provided a fast and accurate method for food detection and origin tracing research in the terahertz band with no obvious absorption peak [24]. In this study, coffee samples from different origins were analyzed, while no specific coffee varieties were given. In fact, there is a huge difference in the quality of coffee beans of different varieties. The most common types of coffee beans on the market are two types: Arabica and Robusta. Arabica coffee beans are suitable for making specialty coffee due to their strong aroma, soft taste, and high acidity. Robusta beans are suitable for making instant coffee and espresso due to their stronger aroma, coarser taste, and greater caffeine content. Typica is the native variety of Arabica, while Catimor is a cross between Arabica and Robusta. It would be more useful if terahertz technology could identify coffee beans from the same origin with different varieties, or from the same variety with different roasting levels.
The objectives of this study were to use the terahertz spectrum to detect coffee beans and to establish different mathematical models based on the detection data—including linear discriminant (LD), support vector machine (SVM), naive Bayes (NB), and k-nearest neighbor (KNN)—which were then used to obtain the highest accuracy model through model training and comparison so as to accurately identify different varieties of coffee beans with the same and different roasting levels for the same coffee variety.

2. Material and Methods

2.1. Experimental Materials and Procedures

The three coffee bean types tested were Catimor (Sun-dried, Puer, Yunnan Province, China), Typica 1 (Sun-dried, Puer, Yunnan Province, China), and Typica 2 (Natural process, Sumatera, Pulau, Indonesia), Figure S1. The three coffee beans were purchased from Box Coffee Co., Ltd., (Shenzhen, China). The FUJIROYAL (FUJIKOUKI Co., Ltd., Kumamoto, Japan) roaster was used to roast the coffee beans for light-, medium-, and dark-roast coffee. The weight of each roasted coffee bean was 200 g, and the initial temperature of roasting was 170 °C. Roast temperature during the roasting process, color value, and sensory flavor description are shown in Table 1. Color values of the coffee bean surface and powdered coffee were tested using a coffee roast analyzer (SYNCFO CRA-01, Jianxin Technology Co., Ltd., Xinbei, Taiwan). Six coffee bean samples, including light-roast Catimor, light-roast Typica 2, light-roast Typica 1, medium-roast Typica 1, dark-roast Typica 1, and green Typica 1 were tested in this experiment.
Coffee powder was placed in a weighing bottle and put in an oven (DHG-9123A, Shanghai Jinghong Experimental Equipment Co., Ltd., Shanghai, China) at 50 °C for 2 h. Next, it was removed and placed in a brown dryer for cooling. Approximately 150 mg of dried coffee powder was pulverized with a mortar and pestle and then placed in a compression mold (QYL5t, Guangyao Machinery Factory, Jiaxing, China). A pressure of 25 MPa was applied for 3 min to form a 13 mm diameter disk of 100% coffee powder. Then, 125 tablets were obtained for each coffee sample; so, each coffee sample was measured 125 times. The data of 100 repetitions were used to build mathematical models and the data of another 25 were used to verify the models.

2.2. THz-TDS System Introduction

The experimental spectra in this research were obtained using a CCT-1800 Terahertz time-domain spectroscopy (THz-TDS) system (China Communication Technology, Shenzhen, China). The typical transmission TDS set-up is depicted in Figure 1.
The femtosecond laser light was separated by a beam-splitter into two beams: an excitation beam and a probe beam. Terahertz radiation was generated by optical excitation of the THz emitter. The probe beam was optically delayed by a variable delay stage and collimated onto the THz receiver. The emitted terahertz pulse was collected, collimated, and then focused by an off-axis parabolic mirror (OPM1) onto the sample under testing. The terahertz pulses emitted from the sample were then collected and focused using another off-axis parabolic mirror (OPM2) on the surface of the THz receiver [25,26,27]. A waveform comprising the terahertz signal as a function of time was reconstructed by varying the optical time delay.
The spectrometer data acquisition system is designed to detect low frequency and low noise THz signal. The THz emitter is modulated using a pulsed DC bias of 80 V and a modulation frequency of 10 kHz is synchronized with an analogue lock-in amplifier. A current-to-voltage preamplifier is connected to the THz receiver to amplify the current of the THz signal and transmits the signal to the lock-in amplifier to boost the signal-to-noise ratio. Then, a high frequency 24-bit ADC is used to store the THz signal. The spectral range of the TDS system is 0.06–4.5 THz and the spectral resolution is better than 10 GHz. The peak dynamic range is greater than 80 dB. The system is no recalibration needed before or during running.

2.3. THz-TDS Detection

The sample was placed in a special test fixture, which was then directly attached to the THz-TDS. The section of the sample to be measured was fixed at the focus-point of the transmitted light, and the cavity was sealed after the sample was covered with a lid. The terahertz emission signal of the test sample was then obtained. During the experiment, the sealed sample chamber was continuously filled with nitrogen gas so that the relative humidity was less than 5% and the temperature was maintained at approximately 25 °C.
The reference signal obtained via passing nitrogen (without sample placement) was Eref(ω) and the sample signal obtained by measuring the sample with a thickness of d was Esam(ω). The spectral response function H(ω) of the sample is expressed by Equation (1) [16,21,28,29].
H ω = E s a m ( ω ) E r e f ( ω ) = 4 n 1 + n e a d / 2 + j π ω ( n 1 ) d / c = A ω e j φ ( ω )
where A(ω) is the amplitude ratio of sample to reference; and φ(ω) is the chromatographic difference between the sample and reference. According to the model of optical parameters of the material extraction [30], the refractive index n(ω) and the absorbance a(ω) of the sample are determined using Equations (2) and (3), respectively.
n ω = φ ω c / ω d + 1
a ω = 2 l n 4 n ( ω ) A ( ω ) n ω + 1 2
where d is the thickness of the sample, ω is the frequency of the radiation, and c is the speed of light in a vacuum.

2.4. Dimension Reduction and Visualization

The detection method of coffee quality based on THz-TDS combined with model analysis methods in this study was a new application idea. In order to judge the feasibility of the scheme effectively, it was necessary to reduce the dimension of data for better visualization. The principal component analysis method (PCA) is a simple and effective method of dimension reduction [11]. The data of frequency-domain, time-domain, and absorption were used for the analysis of PCA. First, the response function of each coffee sample was established. Second, LD, SVM, NB, and KNN models in a full-band were established. The accuracy and training time of the model were determined by averaging the results 10 times. The accuracy of the full-band linear discrimination model was the highest. Then, 1–20 principal components were extracted, and the relationship between training time and accuracy was studied. The number of principal components was determined according to the relationship between the number of principal components and the accuracy, as well as the efficiency of the model training time.

2.5. Modeling and Model Evaluation

Linear discriminant (LD) is a classical supervised data reduction and classification method to find linear combinations of features that can best distinguish different classes [31,32,33]. In this study, the data reduced by PCA algorithm were input into the LD model, and the average difference between classes was maximized and the variance within classes was minimized. Finally, the classification decision was made via the threshold method. This means that LD clusters data of the same type together, while data of different types are far apart.
Support vector machine (SVM) is a commonly used classifier that essentially searches for the hyperplane with the largest interval in the feature space [34]. In this study, the reduced dimension data of PCA were used as the input of the support vector machine, while the interval maximization learning strategy was used to solve the convex quadratic programming problem. The type parameter of kernel function was selected as quadratic kernel function and the regularization parameter was set to 100, according to the amount of training data.
The naive Bayes (NB) method is a classification method based on Bayes’ theorem and the assumption of conditional independence of features [35,36]. In this study, NB learnt the joint probability distribution based on training data after PCA dimension reduction; then calculated the corresponding posterior probability of each category; and finally selected the category with the largest posterior probability as the prediction result by adjusting the smoothing parameter. According to the size of the dataset and the sparsity of the features, the smoothing parameter was set to 1.0.
K-nearest neighbor (KNN) performs classification by measuring the distance between different feature values [37]. In this study, KNN was calculated using the distance between data after PCA dimension reduction as an indicator of dissimilarity between objects, avoiding the matching problem between objects. Then, the K closest samples were selected as neighbors and classified according to the categories of the neighbors. The algorithm decision distance metric used Euclidean distance and the number of neighbors was selected as 6. Meanwhile, decisions were made based on the dominant categories of multiple objects rather than a single object category.

2.6. Data Analysis

The experimental data were analyzed using the statistical software statistic product and service solutions (SPSS, Inc., Chicago, IL, USA) and MATLAB (9.13.0 (R2022b), MathWorks, Natick, MA, USA), and analyses of variance (ANOVAs) were conducted via the ANOVA procedure. All measurements were carried out in triplicate. Mean values were considered significantly different when p ≤ 0.05.

3. Results and Discussion

3.1. THz-TDS Spectral Analysis

Time-domain spectrograms of six coffee bean samples are shown in Figure 2. There were significant differences in the amplitude and phase of the terahertz time-domain spectral signal between the samples and the reference. The Typica 1 coffee bean with different roasting degrees had the same amplitude. The amplitude of Typica 1 green bean was lower, while that of light Catimor and light Typica 2 was the lowest. In terms of phase, compared to the reference, the time-domain signal of Typica 2 coffee bean had a time delay at 4.3 mm and the time-domain signal of other coffee beans had a time delay at 3.9 mm.
The terahertz permeable domain spectrum of the sample was obtained via Fast Fourier Transform (FFT) of the time-domain spectral signal. From Figure 2b, it can be seen that the effective spectral frequency-domain region of the terahertz signal is within 0.0–2.0 THz. The spectrum curves of the tested six coffee beans had the same trend, while the transmitted energy at different frequency points was different. Unlike many complex organisms, coffee beans had no distinct absorption peaks [21,24,38]; in the range of 0.0 to 2.0 THz, there were 436 frequency points. While the high-dimensional Terahertz spectral features bring abundant information; some information with a weak or even irrelevant correlation with sample quality affects the modeling effect. Therefore, principal component analysis (PCA) was first applied in the dimension reduction of terahertz spectral data of different coffee varieties and roasting levels, which were then used for the qualitative identification of these parameters.

3.2. Dimension Reduction and Visualization Using PCA

The high-dimensional characteristics of THz-TDS spectrum and the data correction at adjacent frequencies have adverse effects on the model’s performance. To gain insight into the similarity of three coffee bean varieties, the characteristic differences of the same variety under different roasting degrees were statistically analyzed via PCA.
The 3D-PCA is the sample distribution under three principal components obtained via PCA in the frequency-domain of six samples in Figure 3. It can be seen that samples of the same category form clustered clusters, and the clusters of samples of different categories overlap in some areas, indicating that the six samples cannot be completely distinguished using the three principal components. Therefore, PCA dimension reduction analysis was constructed for different types of coffee beans and different roasting degrees of the same coffee beans, which can effectively improve the accuracy of the model and reduce the dimension of the data.

3.2.1. Coffee Bean Varieties

The differences in the THz-TDS spectra of different varieties of coffee bean were the basis for model identification. PCA was used to reduce the sample data to two dimensions and visualization [24]. The distributions of light-roast Typica 1, Typica 2, and Catimor were seen under PC1 and PC2 (Figure 4). Different varieties of coffee beans were obviously aggregated into different clusters when the THz-TDS spectral data were reduced to two dimensions. The data cluster of Catimor and Typica 1 partially crossed and coincided, while the cluster of Typica 2 was distributed independently, which was consistent with the terahertz spectral images of the three varieties. The percentage of data covered by PC1 and PC2 was 96.41%. However, different varieties of coffee beans cannot be completely distinguished by only two principal components. More principal components should be used to input the model for prediction.

3.2.2. Coffee Bean Roasting Degrees

The composition of the same coffee bean gradually changes during roasting. At the initial stage, water evaporation dominates, while carbohydrates and amino acids start to degrade at a later stage of roasting, resulting in changes in color, taste, and nutritional composition. Figure 5 shows the distribution of green bean, light-roast, medium-roast, and dark-roast Typica 1 under PC1 and PC2. The green bean and light-roast Typica 1 were gathered separately in their respective clusters. The distribution of medium-roast and dark-roast Typica 1 overlapped. This might be due to the slow change in terahertz-spectrum-sensitive components in coffee beans from medium-roast to dark-roast. Incomplete and non-uniform roasting might cause confusion in feature distribution. The percent of data explained by PC1 and PC2 was 94.10%. The explanation of different roasting levels also required more principal components.

3.3. Results of Modeling and Analysis

3.3.1. Coffee Bean Varieties Prediction Result

Coffee beans with a different roasting degree and different variety were divided into two groups to establish mathematical models. First, three kinds of light roast coffee beans were distinguished via a mathematical model. The response function of each coffee sample was established. Then, LD, SVM, NB, and KNN models in a full-band were established. The accuracy and training time of the model were determined by averaging 10 replicates. The optimal number of principal components was found via PCA. Second, green coffee beans, and light-roast, medium-roast and dark-roast Typica 1 were distinguished via a mathematical model. The working steps were the same as before. Third, six coffee samples were put together to distinguish between them, following the same working procedure as before.
The THz-TDS spectra of Typica 1, Catimor, and Typica 2 (100 per species) with a light roast were used as features for model training and analysis. Independently collected test set response spectra (25 per species) were used to evaluate the classifier. The feature data were input into LD, SVM, NB, and KNN, while the cross-validation accuracy and model training time of the training set were under 5-fold cross-validation (Table 2). All the results shown in the table were the average of 10 independent tests. The LD had the highest cross-validation accuracy with 100%, while the KNN had the lowest time loss of 0.73 s. The time loss of LD and SVM was smaller than that of KNN; however, the accuracy was higher. When classifying independent test sets, LD had both the highest accuracy of 100% and the lowest time loss of 0.034 s. This also showed that there was a linear relationship between the terahertz spectral data of the three kinds of coffee under light roasting.
Through PCA dimension reduction, the correlation between adjacent bands was eliminated and a simpler and faster model was constructed, which was suitable for industrial real-time detection [39]. The test set prediction accuracy of the LD model reached 98%, while the prediction time stabilized at 15 ms under the first five principal components (Figure 6). Such fast detection was able to save 50% of the time compared to the full-band modeling. The first 20 principal components can be selected to pursue high-precision detection. Obviously, the increase in the number of principal components effectively improved the prediction accuracy of the model; at the same time, the prediction time of the model also increased.

3.3.2. Coffee Bean Roasting Degrees Prediction Result

The THz-TDS of green-bean, light-, medium- and dark-roast Typica 1 (100 per species) were used as features for model training and analysis. Independently collected test set response spectra (25 per species) were used to evaluate the classifier. The LD, SVM, NB, and KNN model were used as classifiers to evaluate the roasting degree of Typica 1. The results of 5-fold cross-validation and prediction are shown in Table 3. The LD had the highest cross-validation accuracy of 100%, while the KNN had the lowest time loss of 1.08 s. The time loss of LD and SVM was smaller than that of KNN; however, the accuracy was higher. When classifying independent test sets, LD had both the highest accuracy of 100% and the lowest time loss of 0.054 s.
In order to improve the model robustness and further improve the classification efficiency, feature dimension reduction was performed through PCA. Figure 7 shows the time loss and accuracy of the LD model on the test set when the number of principal components increases. As the number of principal components (the model input features) increased, the LD model prediction time and accuracy gradually increased. After reaching 28 principal components, the prediction accuracy stabilized at 100%, while the prediction time fluctuated around 25 ms. Compared with full-band modeling, the efficiency was increased by 53.7%. In this study, the prediction accuracy reached 100% because there were significant differences between different types of data on a high-dimensional level, and the features of the terahertz spectrum were effectively extracted in modeling analysis and important information was retained. At the same time, the test task was carried out under laboratory conditions, which can effectively reduce the noise and uncertainty of data and provided a great reference value for industrial field application.

3.3.3. Comprehensive Detection of Coffee Beans

All six samples were used for classification. The PCA and LD classifiers were also used to identify the samples; their variety and degree of roasting could not be determined. The classification result of 150 samples (25 per species) under 30 principal components was 95.3%. The confusion matrix of the test set classification is shown in Figure 8. Four samples of medium-roast Typica 1 were predicted to be dark-roast Typica 1; one sample of light-roast Typica 1 was predicted to be medium-roast Typica 1; and two samples of dark-roast Typica 1 were predicted to be medium-roast Typica 1. Parts of the samples were not uniformly roasted; so, the components that were sensitive to the terahertz spectrum changed slightly during the roasting process, causing the model to misidentify. Therefore, different types of coffee beans can still be fully identified.

3.4. Discussion

Terahertz detection combined with a mathematical model was used to identify coffee bean variety and roasting degree in this study. According to the results of the data analysis, the PCA method combined with the linear model can used to distinguish six coffee samples. For samples of three different varieties, Catimor and Typica 1 are both from Yunnan, China, while Typica 1 and Typica 2 are native varieties of Arabica. This means that this identification method has the potential to distinguish different varieties of the same origin, as well as the same variety of coffee beans from different origins. Typica 1 coffee beans with different roasting degrees can also be clearly distinguished, which provides method support for distinguishing different roasting degrees of the same variety or different roasting degrees of different varieties. In recent years, there are fewer and fewer areas suitable for the growth of Arabica coffee beans due to climate and other factors. However, the global coffee market has shown an increasing demand for high-quality coffee beans. As mentioned above, researchers are also trying to use different methods to identify the variety, origin, and roasting degree of coffee beans [6,7,8,9,10]. Each method has its own advantages and disadvantages, but, on the whole, a convenient, environmentally friendly, high-accuracy identification method is the most valuable application. Terahertz detection is such a method.

4. Conclusions

Although there was no obvious characteristic absorption peak in the terahertz spectrum of coffee beans, there were many data detected using terahertz, including time-domain, frequency-domain, absorption, refractive index, etc. Too much information was not conducive to the establishment of an effective mathematical model because there was a correlation between some data. Dimensionality reduction of data processing can improve the effectiveness of data processing. Meanwhile, too low a dimension can make it difficult to distinguish coffee samples. Among the four models, LD had the highest accuracy, indicating that there was a linear relationship between the terahertz spectra of coffee samples. Therefore, the LD combined with PCA was an effective way to distinguish coffee samples. If more coffee beans, e.g., with a different geographical origin, different treatment method, different harvest year, etc., are to be considered to build the database, multi-category models will be established according to different requirements. Additionally, the model can be trained sufficiently to reduce its error and improve its accuracy. Such strategies have application value for the identification of coffee bean varieties and the commercial classification of coffee beans for evaluating coffee quality during competition activities.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/foods13030389/s1, Figure S1: Photos of six coffee beans were used in this paper.

Author Contributions

Conceptualization, L.H. and X.D.; methodology, B.L. and B.C.; software, M.L.; validation, L.H., B.L. and X.D.; writing-original draft preparation, L.H.; writing-review and editing, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 31972207 and 32172352) and the Innovation Team of Guangdong Education Department (2021KCXTD069).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article or Supplementary Materials.

Acknowledgments

The authors thank the device support by Shenzhen Institude of Terahertz Technology and Innovation.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Van Dam, R.M.; Campion, E.W.; Hu, F.B. Coffee, Caffeine, and Health. N. Engl. J. Med. 2020, 383, 369–378. [Google Scholar] [CrossRef]
  2. Crozier, T.W.M.; Stalmach, A.; Lean, M.E.J.; Crozier, A. Espresso Coffees, Caffeine and Chlorogenic Acid Intake: Potential Health Implications. Food Funct. 2012, 3, 30–33. [Google Scholar] [CrossRef]
  3. Crozier, T.W.M.; Stalmach, A.; Lean, M.E.J.; Crozier, A. Exploring the Impacts of Postharvest Processing on the Microbiota and Metabolite Profiles During Green Coffee Bean Production. Appl. Environ. Microbiol. 2017, 83, e02398-16. [Google Scholar] [CrossRef]
  4. Pereira, L.L.; Guarçoni, R.C.; Pinheiro, P.F.; Osório, V.M.; Pinheiro, C.A.; Moreira, T.R.; Ten Caten, C.S. New Propositions About Coffee Wet Processing: Chemical and Sensory Perspectives. Food Chem. 2020, 310, 125943. [Google Scholar] [CrossRef] [PubMed]
  5. Duangjai, A.; Saokaew, S.; Goh, B.-H.; Phisalprapa, P. Shifting of Physicochemical and Biological Characteristics of Coffee Roasting under Ultrasound-Assisted Extraction. Front. Nutr. 2021, 8, 724591. [Google Scholar] [CrossRef]
  6. Putri, S.P.; Irifune, T.; Yusianto; Fukusaki, E. GC/MS Based Metabolite Profiling of Indonesian Specialty Coffee from Different Species and Geographical Origin. Metabolomics 2019, 15, 126. [Google Scholar] [CrossRef] [PubMed]
  7. Pérez-Míguez, R.; Sánchez-López, E.; Plaza, M.; Marina, M.L.; Castro-Puyana, M. Capillary Electrophoresis-mass Spectrometry Metabolic Fingerprinting of Green and Roasted Coffee. J. Chromatogr. A 2019, 1605, 360353. [Google Scholar] [CrossRef] [PubMed]
  8. Mannino, G.; Kunz, R.; Maffei, M.E. Discrimination of Green Coffee (Coffea arabica and Coffea canephora) of Different Geographical Origin Based on Antioxidant Activity, High-Throughput Metabolomics, and DNA RFLP Fingerprinting. Antioxidants 2023, 12, 1135. [Google Scholar] [CrossRef] [PubMed]
  9. Zhang, C.; Liu, F.; He, Y. Identification of Coffee Bean Varieties Using Hyperspectral Imaging: Influence of Preprocessing Methods and Pixel-wise Spectra Analysis. Sci. Rep. 2018, 8, 2166. [Google Scholar] [CrossRef]
  10. Phuangsaijai, N.; Theanjumpol, P.; Kittiwachana, S. Performance Optimization of a Developed Near-Infrared Spectrometer Using Calibration Transfer with a Variety of Transfer Samples for Geographical Origin Identification of Coffee Beans. Molecules 2022, 27, 8202. [Google Scholar] [CrossRef]
  11. Zhang, X.; Wang, Y.; Zhou, Z.; Zhang, Y.; Wang, X. Detection Method for Tomato Leaf Mildew Based on Hyperspectral Fusion Terahertz Technology. Foods 2023, 12, 535. [Google Scholar] [CrossRef] [PubMed]
  12. Han, C.; Qu, F.; Wang, X.; Zhai, X.; Li, J.; Yu, K.; Zhao, Y. Terahertz Spectroscopy and Imaging Techniques for Herbal Medicinal Plants Detection: A Comprehensive Review. Crit. Rev. Anal. Chem. 2023, 1–15. [Google Scholar] [CrossRef] [PubMed]
  13. Tang, M.; Zhang, M.; Fu, Y.; Chen, L.; Li, D.; Zhang, H.; Yang, Z.; Wang, C.; Xiu, P.; Wilksch, J.J.; et al. Terahertz Label-Free Detection of Nicotine-Induced Neural Cell Changes and the Underlying Mechanisms. Biosens. Bioelectron. 2023, 241, 115697. [Google Scholar] [CrossRef]
  14. Hoshina, H.; Iwasaki, Y.; Katahira, E.; Okamoto, M.; Otani, C. Structure and Dynamics of Bound Water in Poly (Ethylene-Vinylalcohol) Copolymers Studied by Terahertz Spectroscopy. Polymer 2018, 148, 49–60. [Google Scholar] [CrossRef]
  15. Li, Q.; Lei, T.; Sun, D.W. Analysis and Detection Using Novel Terahertz Spectroscopy Technique in Dietary Carbohydrate-Related Research: Principles and Application Advances. Crit. Rev. Food Sci. Nutr. 2023, 63, 1793–1805. [Google Scholar] [CrossRef]
  16. Huang, L.L.; Li, C.; Li, B.; Liu, M.L.; Lian, M.M.; Yang, S.Z. Studies on Qualitative and Quantitative Detection of Trehalose Purity by Terahertz Spectroscopy. Food Sci. Nutr. 2020, 8, 1828–1836. [Google Scholar] [CrossRef]
  17. Federici, J.F. Review of Moisture a5nd Liquid Detection and Mapping Using Terahertz Imaging. J. Infrared Millim. Terahertz Waves 2012, 33, 97–126. [Google Scholar] [CrossRef]
  18. Baek, S.H.; Lim, H.B.; Chun, H.S. Detection of Melamine in Foods Using Terahertz Time-Domain Spectroscopy. J. Agric. Food Chem. 2014, 62, 5403–5407. [Google Scholar] [CrossRef]
  19. Qu, F.F.; Pan, Y.; Lin, L.; Cai, C.Y.; Dong, T.; He, Y.; Nie, P.C. Experimental and Theoretical Study on Terahertz Absorption Characteristics and Spectral De-Noising of Three Plant Growth Regulators. J. Infrared Millim. Terahertz Waves 2018, 39, 1015–1027. [Google Scholar] [CrossRef]
  20. Nakajima, S.; Horiuchi, S.; Ikehata, A.; Ogawa, Y. Determination of Starch Crystallinity with the Fourier-Transform Terahertz Spectrometer. Carbohydr. Polym. 2021, 262, 117928. [Google Scholar] [CrossRef]
  21. Ge, H.Y.; Jiang, Y.Y.; Xu, Z.H.; Lian, F.Y.; Zhang, Y.; Xia, S.H. Identification of Wheat Quality Using Thz Spectrum. Opt. Express 2014, 22, 12533–12544. [Google Scholar] [CrossRef]
  22. Liu, H.S.; Zhang, Z.W.; Zhang, X.; Yang, Y.P.; Zhang, Z.Y.; Liu, X.Y.; Wang, F.; Han, Y.D.; Zhang, C.L. Dimensionality Reduction for Identification of Hepatic Tumor Samples Based on Terahertz Time-Domain Spectroscopy. IEEE Trans. Terahertz Sci. Technol. 2018, 8, 271–277. [Google Scholar] [CrossRef]
  23. Zhang, X.; Lu, S.H.; Liao, Y.; Zhang, Z.Y. Simultaneous Determination of Amino Acid Mixtures in Cereal by Using Terahertz Time Domain Spectroscopy and Chemometrics. Chemom. Intell. Lab. Syst. 2017, 164, 8–15. [Google Scholar] [CrossRef]
  24. Yang, S.; Li, C.; Mei, Y.; Liu, W.; Liu, R.; Chen, W.; Han, D.; Xu, K. Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined with Machine Learning Methods. Front. Nutr. 2021, 8, 680627. [Google Scholar] [CrossRef] [PubMed]
  25. Shen, Y.C. Terahertz Pulsed Spectroscopy and Imaging for Pharmaceutical Applications: A Review. Int. J. Pharm. 2011, 417, 48–60. [Google Scholar] [CrossRef] [PubMed]
  26. Shen, Y.C.; Upadhya, P.C.; Linfield, E.H.; Davies, A.G. Vibrational Spectra of Nucleosides Studied Using Terahertz Time-Domain Spectroscopy. Vib. Spectrosc. 2004, 35, 111–114. [Google Scholar] [CrossRef]
  27. Zeitler, J.A.; Taday, P.F.; Newnham, D.A.; Pepper, M.; Gordon, K.C.; Rades, T. Terahertz Pulsed Spectroscopy and Imaging in the Pharmaceutical Setting—A Review. J. Pharm. Pharmacol. 2007, 59, 209–223. [Google Scholar] [CrossRef] [PubMed]
  28. Gowen, A.A.; O’Sullivan, C.; O’Donnell, C.P. Terahertz Time Domain Spectroscopy and Imaging: Emerging Techniques for Food Process Monitoring and Quality Control. Trends Food Sci. Technol. 2012, 25, 40–46. [Google Scholar] [CrossRef]
  29. Zheng, C.; Cai, S.; Li, Q.; Li, C.; Li, X. A Collaborative Classification Algorithm with Multi-View Terahertz Spectra. Results Phys. 2022, 42, 106023. [Google Scholar] [CrossRef]
  30. Walther, M.; Fischer, B.; Schall, M.; Helm, H.; Jepsen, P.U. Far-infrared vibrational spectra of all-trans, 9-cis and 13-cis retinal measured by THz time-domain spectroscopy. Chem. Phys. Lett. 2000, 332, 389–395. [Google Scholar] [CrossRef]
  31. Yamaguchi, S.; Fukushi, Y.; Kubota, O.; Itsuji, T.; Ouchi, T.; Yamamoto, S. Brain Tumor Imaging of Rat Fresh Tissue Using Terahertz Spectroscopy. Sci. Rep. 2016, 6, 30124. [Google Scholar] [CrossRef]
  32. Afsah-Hejri, L.; Akbari, E.; Toudeshki, A.; Homayouni, T.; Alizadeh, A.; Ehsani, R. Terahertz Spectroscopy and Imaging: A Review on Agricultural Applications. Comput. Electron. Agric. 2020, 177, 105628. [Google Scholar] [CrossRef]
  33. Tharwat, A.; Gaber, T.; Ibrahim, A.; Hassanien, A.E. Linear Discriminant Analysis: A Detailed Tutorial. AI Commun. 2017, 30, 169–190. [Google Scholar] [CrossRef]
  34. Li, Q.; Lei, T.; Cheng, Y.; Wei, X.; Sun, D.-W. Predicting Wheat Gluten Concentrations in Potato Starch Using Gpr and Svm Models Built by Terahertz Time-Domain Spectroscopy. Food Chem. 2024, 432, 137235. [Google Scholar] [CrossRef] [PubMed]
  35. Siuly; Yin, X.X.; Hadjiloucas, S.; Zhang, Y.C. Classification of Thz Pulse Signals Using Two-Dimensional Cross-Correlation Feature Extraction and Non-Linear Classifiers. Comput. Methods Programs Biomed. 2016, 127, 64–82. [Google Scholar] [CrossRef] [PubMed]
  36. Koch, I.; Naito, K.; Tanaka, H. Kernel Naive Bayes Discrimination for High-Dimensional Pattern Recognition. Aust. N. Z. J. Stat. 2019, 61, 401–428. [Google Scholar] [CrossRef]
  37. Li, B.; Zhang, D.P.; Shen, Y. Study on Terahertz Spectrum Analysis and Recognition Modeling of Common Agricultural Diseases. Spectrochim. Acta Part A-Mol. Biomol. Spectrosc. 2020, 243, 118820. [Google Scholar] [CrossRef]
  38. Liu, J.J.; Li, Z.; Hu, F.R.; Chen, T.; Zhu, A.J. A Thz Spectroscopy Nondestructive Identification Method for Transgenic Cotton Seed Based on Ga-Svm. Opt. Quantum Electron. 2015, 47, 313–322. [Google Scholar] [CrossRef]
  39. Bilge, G. Investigating the Effects of Geographical Origin, Roasting Degree, Particle Size and Brewing Method on the Physicochemical and Spectral Properties of Arabica Coffee by Pca Analysis. J. Food Sci. Technol.-Mysore 2020, 57, 3345–3354. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram and picture of transmission-type THz-TDS system. (a) Schematic diagram; and (b) actual picture.
Figure 1. Schematic diagram and picture of transmission-type THz-TDS system. (a) Schematic diagram; and (b) actual picture.
Foods 13 00389 g001
Figure 2. Time-domain spectra (a) and frequency-domain spectra (b) of six coffee beans with different roasting levels and varieties.
Figure 2. Time-domain spectra (a) and frequency-domain spectra (b) of six coffee beans with different roasting levels and varieties.
Foods 13 00389 g002
Figure 3. Distribution of six coffee samples under PC1, PC2, and PC3.
Figure 3. Distribution of six coffee samples under PC1, PC2, and PC3.
Foods 13 00389 g003
Figure 4. Distribution of light-roast Typica 1, Catimor, and Typica 2 under PC1 and PC2.
Figure 4. Distribution of light-roast Typica 1, Catimor, and Typica 2 under PC1 and PC2.
Foods 13 00389 g004
Figure 5. Distribution of green-bean, light-roast, medium-roast, and dark-roast Typica 1 under PC1 and PC2.
Figure 5. Distribution of green-bean, light-roast, medium-roast, and dark-roast Typica 1 under PC1 and PC2.
Foods 13 00389 g005
Figure 6. The relationship between testing time and accuracy of the coffee varieties’ classification as the number of PCA principal components increases.
Figure 6. The relationship between testing time and accuracy of the coffee varieties’ classification as the number of PCA principal components increases.
Foods 13 00389 g006
Figure 7. The relationship between the testing time and accuracy of the roasting degree classification as the number of PCA principal components increases.
Figure 7. The relationship between the testing time and accuracy of the roasting degree classification as the number of PCA principal components increases.
Foods 13 00389 g007
Figure 8. Confusion matrix of the classification of six samples using PCA combined with the LD model.
Figure 8. Confusion matrix of the classification of six samples using PCA combined with the LD model.
Foods 13 00389 g008
Table 1. Roasting conditions of different coffee beans and cupping test result.
Table 1. Roasting conditions of different coffee beans and cupping test result.
RoastingSurface ColorPowder ColorFirst Burst Temperature (°C)First Burst TimeEnd
Temperature (°C)
End TimeMain Flavor Description
CatimorLight68.8 ± 1.6 c103.0 ± 0.3 a1937′50″2008′23″Tropical fruit, fermented
Typica 1Green--------------
Typica 1Light84.4 ± 1.9 a101.8 ± 0.3 b1907′40″2008′25″Dried fruit, slightly ripe fruit, woody
Typica 1Medium65.8 ± 1.5 c79.6 ± 0.2 d1907′45″2109′34″Citric acid, grape,
Typica 1Dark47.5 ± 1.2 d57.0 ± 0.2 e1907′38″22010′29″Roasted nuts, smoky
Typica 2Light77.9 ± 1.4 b92.4 ± 0.2 c1847′50″2009′36″Barley tea, vegetative, peanut
“--” means no value due to no roasting of green coffee bean; values in the same column not sharing the same superscript are significantly different (p < 0.05).
Table 2. Cross validation and prediction results of the four models for different varieties of coffee.
Table 2. Cross validation and prediction results of the four models for different varieties of coffee.
ModelCross-ValidationPrediction
AccuracyTime (s)AccuracyTime (s)
LD100%1.10100%0.034
SVM100%1.0198.7%0.032
NB99.7%20.77100%0.402
KNN100%0.7394.7%0.029
Table 3. Cross validation and prediction results of the four models for different roasting degrees.
Table 3. Cross validation and prediction results of the four models for different roasting degrees.
ModelCross-ValidationPrediction
AccuracyTime (s)AccuracyTime (s)
LD100%4.31100%0.054
SVM99.0%3.0996%0.052
NB94.8%30.8991%0.658
KNN96.5%1.0890% 0.029
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, L.; Liu, M.; Li, B.; Chitrakar, B.; Duan, X. Terahertz Spectroscopic Identification of Roast Degree and Variety of Coffee Beans. Foods 2024, 13, 389. https://doi.org/10.3390/foods13030389

AMA Style

Huang L, Liu M, Li B, Chitrakar B, Duan X. Terahertz Spectroscopic Identification of Roast Degree and Variety of Coffee Beans. Foods. 2024; 13(3):389. https://doi.org/10.3390/foods13030389

Chicago/Turabian Style

Huang, Luelue, Miaoling Liu, Bin Li, Bimal Chitrakar, and Xu Duan. 2024. "Terahertz Spectroscopic Identification of Roast Degree and Variety of Coffee Beans" Foods 13, no. 3: 389. https://doi.org/10.3390/foods13030389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop