*3.2. Metabolite Identification*

To screen the compounds explaining the overall grouping patterns observed in the PCA score plots, differential entities among the bee pollen samples (*p* < 0.05 in ANOVA) were subjected to compound identification. Finally, 54 compounds were identified, including 15 flavonol glycosides and flavone glycosides, 3 catechins, 11 amino acids, 8 organic acids, 4 fatty acids, 4 nucleotides and their derivatives, 2 aldehydes, and 7 other compounds (Table S1). Among them, four compounds were detected exclusively in all CBP samples, including ECG, L-theanine, gallic acid (GA), and kaempferol. The CBP samples were all clustered into a single clade, whereas the non-CBP samples formed a different clade in the clustering heatmap of the identified compounds (Figure 2).

*3.3. Marker Compound Selection of CBP*

To pick out the most discriminating compounds between the CBP and non-CBP samples, univariate and multivariate analyses were conducted based on the relative abundance

levels of the 54 identified compounds (Table S2). In total, 16 compounds with FC > 1.5 showed a significant difference (*p* < 0.05) between the CBP and non-CBP samples (Table S1). A reliable OPLS-DA model was established (*R*2*Y* = 0.873, *Q*<sup>2</sup> = 0.845, *R*<sup>2</sup> intercepts = 0.0902, and *<sup>Q</sup>*<sup>2</sup> intercepts = −0.4112 in a 200-time permutation test), and the resulting score plots supported a clear separation between the CBP and non-CBP samples (Figure 1B), as is consistent with the grouping patterns in the PCA score plots (Figure 1A). Further filtering with VIP values > 1.0 in the OPLS-DA resulted in a final selection of two compounds, i.e., ECG and L-theanine, which had the highest VIP values (Figure 1D). Moreover, ECG and L-theanine exhibited the greatest distance from the origin in the loading plots (Figure 1C) and, hence, had the highest discriminatory power. Taken together, ECG and L-theanine could be used as marker compounds to distinguish CBP from non-CBP samples.
