3.2.1. Canonical Discriminant Analysis
Canonical discriminant analysis was performed as a multivariate statistical analysis to achieve the most discriminative peak variables for the arrangement of red pepper powder samples in a lower dimensional space by maximizing the distances between the origin groups. To ensure the robustness of these statistical processes, the homogeneity of the variance of each variable must be considered [
31]. Therefore, to select suitable variables for the statistical analysis, a variance homogeneity test was conducted first. As a result of testing 18 peaks, it was confirmed that eight peak variables, namely P5, P7, P8, P10, P12, P14, P16, and P17, had equal variance (
p > 0.05), while the 10 remaining peaks did not exhibit equal variance (
p < 0.05) (
Table S1).
In this study, an ANOVA test was performed to determine the second-derivative FT-IR peak variables with meaningful differences among the Korean, Chinese, and Vietnamese red pepper powder groups. The ANOVA test verified the equality of the group means of variables using the
F test, and determined whether the means of three or more groups were different [
31]. Since the ANOVA test is a parametric test, only the eight peaks with equal variance identified in the previous test of homogeneity of variance were considered [
31]. All the eight peak variables exhibited significant differences in the origins (
p < 0.001) with large
F values (
Table 2). As can be seen in
Table 2, a smaller Wilks’ lambda value (i.e., a larger
F-value) implies a higher significance in the discrimination analysis.
These eight significant variables were used for the canonical discriminant analysis to establish the discriminant functions. Two canonical discriminant functions were derived for identifying the red pepper powder samples from different origins, and accounted for 100% of the variance. Functions 1 and 2 accounted for 65.2% and 34.8% of the total variance, respectively. The separation between the red pepper powder samples of different geographical origins in the discriminant space was investigated by scatter plotting the discriminant function scores. The score plot showed good separation among the samples from three different origins (
Figure 2), suggesting that the variables used to derive the discriminant functions provided sufficient information to identify the geographical origins of red pepper powders. The Korean and Vietnamese samples were found to be completely distinguishable from each other, while the Chinese samples appeared relatively widely scattered between the Korean and Vietnamese samples. This may be attributed to the diversity of the Chinese samples, reflecting the characteristics of China’s large geographical area.
To verify and examine the predictive discrimination capability of the established canonical discriminant functions, we reclassified the red pepper powder samples used in the multivariate statistical analysis, according to their geographical origins.
Table 3 indicates that the canonical discriminant functions correctly classified all 86 red pepper powder samples (50 Korean, 23 Chinese, and 13 Vietnamese) according to their geographical origins (100% of the original group cases were correctly classified), while only one Chinese sample was incorrectly classified in the cross-validation (98.8% of the original group cases were correctly classified). These results were similar to the discrimination results of the origins of 62 Asian red pepper powder (36 Korean, 17 Chinese, and 9 Vietnamese) samples using
1H NMR spectroscopy [
23]. In particular, this result was of significance considering that various metabolite components even with minor contents could be used as individual indicators in the
1H NMR analysis. By comparing the analysis results of the mineral elements [
25] and sensor characteristics [
24] of red peppers from other regions in China using various multivariate statistical analysis methods, it can be observed that their regional scopes were different. However, it can be confirmed that the second-derivative FT-IR method can be sufficiently utilized to discriminate the origins of red pepper powders. In addition, similar discrimination abilities can be confirmed by comparing previous results of the origins of other foods, such as olive oil and honey, using the FT-IR technique [
17,
18]. Overall, these results indicate that second-derivative FT-IR spectroscopy combined with canonical discriminant analysis has the potential to discriminate Asian red pepper powders according to their geographical origins.
3.2.2. Discriminant Indicator Variables
It was confirmed that Asian red pepper powders could be effectively discriminated according to their geographical origins by canonical discriminant analysis of the signals obtained from the second-derivative FT-IR spectra. This protocol can also be applied to the discrimination of new red pepper powder samples through statistical processes. If several indicator variables suitable for discriminating the origin of red pepper samples are selected and appropriate ranges are set for them, rapid and facile discrimination of the geographical origins of new red pepper powder samples is possible without the need for a specific statistical program or process.
The Pearson coefficients are summarized in the structure matrix table (
Table 4), which shows the correlation of each variable with each canonical discriminant function [
47,
48,
49]. This table reveals that P12 and P17 are the most significant variables in discriminant Functions 1 and 2 (with correlations of −0.475 and 0.714), respectively. P14, and P8 also show high significance in both functions.
These four peak variables (P8, P12, P14, and P17) were also found to have high significance in the mean difference, with an
F-value of 60 or more in the one-way ANOVA test (
Table 2). The distribution of data between the geographical origin groups of these four variables were compared as box plots (
Figure 3), confirming that P12 and P17 were the most effective variables for discriminating the Korean and Vietnamese samples, respectively, from those of other geographical origins. Additionally, the distribution characteristics of P8, P12, and P14 were similar, whereas those for P17 were different. This was also confirmed in the Pearson correlation matrix, which shows the correlations among variables (
Table S2).
Considering their correlation with the discriminant functions, mean difference, and difference in distribution values, P8 and P14, along with the most significant variables P12 and P17, were selected as indicator variables for discriminating the origins of Asian red pepper powder samples. To discriminate the geographical origins using the specific indicator variables, they must have ranges differentiated according to the origins.
For the Korean red pepper samples, the distribution values of P8 and P12 were smaller than those of the others. These signals can be attributed to C–H stretching vibrations, which are derived from various metabolites containing alkyl groups, and are likely largely influenced by the hydrocarbon chains of fatty acids. Because the fatty acid content is relatively higher in seeds than in the peel of red pepper [
50], it can be estimated that the Korean red pepper powder samples contain relatively fewer seeds than the Chinese or Vietnamese samples. Moreover, the P17 signal attributed to the C–O stretching vibration arising mainly from the fructosyl unit [
45] was observed to be small in the Vietnamese samples. This implies that the Vietnamese red pepper powders had relatively lower fructose content than those of the Korean and Chinese peppers, which was also confirmed in previous NMR experiments (
Figure S1) [
23]. For the Chinese red pepper powder samples, all four variables exhibited relatively higher means than the others. However, owing to the diversity of the Chinese samples, the ranges of all the indicator variables significantly overlapped with the ranges of those for other origins; hence, establishing independent variable ranges for Chinese samples was not possible.
Based on these observations, the ranges of the discriminant indicator variables that could discriminate between Korean and Vietnamese red pepper powder samples were set as presented in
Table 5.
The range of each discriminant variable was set based on their maximum or minimum values, or by considering values between the minimum and maximum based on the relative distribution characteristics of each variable value [
27,
37,
38]. For example, in the case of the P8 variable, because Korean red pepper powders had the lowest distribution, its range was set below the maximum value for Korean samples. On the contrary, the Vietnamese samples had a relatively high distribution and, thus, were set above the minimum value for Vietnamese samples. It is worth noting that if each variable value obtained the analysis of more samples satisfied the normal distribution sufficiently, the ranges could be established using a statistical technique as well.
To confirm the suitability of the selected indicator variables and their range settings, we reclassified the red pepper powder samples used in the multivariate statistical analysis, based on their geographical origins. A sample was attributed to a specific origin only if the values of all the indicator variables for the sample were within the discriminant ranges for that origin; the results are summarized in
Table 6. When the ranges of the indicator variables for the Korean red pepper powder samples were applied, all 50 Korean samples were identified as “Korean,” and the remaining 36 samples (23 Chinese and 13 Vietnamese) were all classified as “not Korean.” When applying the ranges of the indicator variables for the Vietnamese red pepper powder samples to the 36 “not Korean” samples, all 13 Vietnamese samples were identified as “Vietnamese” and the remaining 23 Chinese samples were identified as “not Vietnamese.” Changing the order of applying the indicator variable ranges for the Korean and Vietnamese samples produced the same results, indicating that the two sets of ranges were well separated.
Setting the range of discriminant indicator variables aids in determining the authenticity of food, based on the content of intrinsic ingredients (such as metabolites and minerals) [
28,
37,
38,
39]. However, it is not easy to apply this method to discriminate between the origins of the same food. Therefore, it is meaningful that the geographical origin was correctly classified by setting several discriminant indicators and their ranges. Recently, FT-IR spectroscopy combined with statistical analysis has been actively applied to determine the authenticity, adulteration, and geographical origins of various foods. If the discriminant indicator variables and their ranges are set suitably, more effective and practical use of such results can be realized.