**Front-Face Fluorescence Spectroscopy and Chemometrics for Quality Control of Cold-Pressed Rapeseed Oil During Storage**

**Ewa Sikorska 1,\*, Krzysztof Wójcicki 1, Wojciech Kozak 1, Anna Gliszczy ´nska-Swigło ´ 1, Igor Khmelinskii 2, Tomasz Górecki 3, Francesco Caponio 4, Vito M. Paradiso 4, Carmine Summo <sup>4</sup> and Antonella Pasqualone <sup>4</sup>**


Received: 11 November 2019; Accepted: 5 December 2019; Published: 10 December 2019

**Abstract:** The aim of this study was to test the usability of fluorescence spectroscopy to evaluate the stability of cold-pressed rapeseed oil during storage. Freshly-pressed rapeseed oil was stored in colorless and green glass bottles exposed to light, and in darkness for a period of 6 months. The quality deterioration of oils was evaluated on the basis of several chemical parameters (peroxide value, acid value, K232 and K270, polar compounds, tocopherols, carotenoids, pheophytins, oxygen concentration) and fluorescence. Parallel factor analysis (PARAFAC) of oil excitation-emission matrices revealed the presence of four fluorophores that showed different evolution throughout the storage period. The fluorescence study provided direct information about tocopherol and pheophytin degradation and revealed formation of a new fluorescent product. Principal component analysis (PCA) performed on analytical and fluorescence data showed that oxidation was more advanced in samples exposed to light due to the photo-induced processes; only a very minor effect of the bottle color was observed. Multiple linear regression (MLR) and partial least squares regression (PLSR) on the PARAFAC scores revealed a quantitative relationship between fluorescence and some of the chemical parameters.

**Keywords:** fluorescence; rapeseed oil; multiway analysis; parallel factor analysis (PARAFAC); multivariate regression

#### **1. Introduction**

The popularity of cold-pressed vegetable oils has increased in recent years due to the tendency of consumers to choose less-processed, healthy foods [1]. The most popular is olive oil with its well established health-promoting properties. Rapeseed oil is a valuable alternative to olive oil. It has a favorable composition of unsaturated fatty acids, from the nutritional point of view; the relation between linoleic (ω-6) and α-linolenic (ω-3) acids in this oil is 2:1. Moreover, it is a rich source of tocopherols and phytosterols [2–4].

The quality of oil is affected by oxidation processes that may lead to the loss of nutritional value, deterioration of sensory properties, and formation of toxic products. Oil oxidation proceeds via autoor photo-oxidation processes, which involve, respectively, triplet or singlet oxygen. Autooxidation of oils involves radical forms of acylglycerols. In photosensitized oxidation, chlorophyll acts as a photosensitizer for the formation of singlet oxygen 1O2, which reacts directly with double bonds of fatty acids. The hydroperoxides formed during oxidation decompose to produce off-flavor compounds [5]. The rate of oxidation depends on a variety of factors, including chemical composition of oil, temperature, exposure to light, and presence of oxygen.

Rapeseed oil has lower oxidative stability as compared to olive oil [6,7]. Several studies have been performed to assess quality degradation of rapeseed oil during photo- and autooxidation and storage [8–14]. It was found that the overall quality during storage was influenced by the type of container material (plastics, glass), storage conditions (light, temperature, oxygen availability), and time [13].

Glass is the most popular material used for bottling oils. It is one of the most inert and easy to clean materials. However, colorless glass transmits radiation in the entire visible range, leading to photooxidation of oils and reduction of their shelf-life. Colored glass bottles are used to prevent or limit photooxidation. Green glass bottles are often used to protect oil from light in the 300–500 nm wavelength range [15].

A variety of methods have been used to study oil oxidation and deterioration during shelflife [15–17]. In addition to traditional techniques, spectroscopy coupled with chemometrics was found useful in assessing various aspects of oil quality. Among the spectroscopic techniques, fluorescence provides two main advantages—sensitivity and selectivity, enabling evaluation of minor oil components. Several studies have reported successful applications of oil autofluorescence in the analysis of oil oxidation [18–27]. Various measurement techniques were used in these studies, including measurements of excitation spectra [18], emission spectra [22,25,26], synchronous fluorescence spectra [19–21], and excitation-emission matrices [23,24,27]. This last technique provides the most comprehensive characteristics of fluorescent systems, because excitation-emission matrices are determined by both absorption and emission properties of a sample. Most of the fluorescence studies of oils were conducted using conventional spectrofluorimeters. Recently, a fluorometer with Light Emitting Diode (LED) flashlight excitation and a smartphone was developed for edible oil authentication using a hue-based fluorescence method [28].

The aim of the present study was to evaluate quality changes occurring in cold-pressed rapeseed oil during storage for 6 months under different conditions by means of chemical parameters and fluorescence spectroscopy. Parallel factor analysis (PARAFAC) was applied to decompose the excitation-emission matrices. Principal component analysis (PCA) allowed visualization of the relations between differently stored oil samples and the variables describing their properties. The relations between chemical changes and oil fluorescence were studied quantitatively by means of multiple linear regression (MLR) and partial least squares regression (PLSR) performed on PARAFAC scores.

#### **2. Materials and Methods**

#### *2.1. Samples and Storage Conditions*

Rapeseed oil freshly pressed in a local oil mill was used in the study. A volume of 200 cm<sup>3</sup> of oil was placed into 250 cm<sup>3</sup> glass bottles, either colorless or green. The transmission spectra of colorless and green bottles are presented in Figure 1.

**Figure 1.** The transmission spectra of colorless (blue line) and green (green line) glass bottles.

The bottling procedure was the same as used in the oil mills. The samples were stored for a period of 6 months (from July to December). The storage conditions were similar to those used in the distribution and marketing of oils. Oil samples were divided into three groups and kept in appropriate conditions: (1) oils in colorless glass bottles, stored without light, marked RD; (2) oils in colorless glass bottles exposed to light, marked RL; (3) oils in green glass bottles, exposed to light, marked RG. The samples were exposed to diffused daylight and additionally for 12 h per day to artificial fluorescent cool white, 4000 K light (Osram light bulb) and illuminance of about 500 lx. Storage under diffused light simulated the conditions of a supermarket shelf. Storage in the dark simulated the warehouse storage. All bottles were stored at ambient room temperatures (18–25 ◦C). The oils samples were analyzed at the time of packaging and periodically during the 6-month period. Each month, one bottle corresponding to each of the storage conditions was opened and analyzed.

A total number of 18 samples was used for analysis; a single sample stored for 4 months in a colorless bottle exposed to light was excluded from analysis due to a leaking bottle cap.

#### *2.2. Determination of Chemical Parameters*

Oxygen concentrations in the bottle headspace and in the oil were determined using a commercially available OxySense 325i system. This instrument enabled non-invasive measurement of the oxygen content through the bottle wall, based on the effect of oxygen on the fluorescence lifetime of optically-excited Tris (4,7-diphenyl-1,10-phenanthroline) ruthenium chloride complex immobilized in a highly stable polymer in the form of an OxyDot oxygen indicator. The system consists of an oxygen concentration analyzer and an EasAlign pen with a built-in temperature sensor and OxyDot indicator. The OxyDot indicators were placed on the internal bottle wall in the headspace and in the oil phase using a transparent adhesive. The calibration of the sensors was performed using pure nitrogen (5.0 purity) and ambient air as standards.

Peroxide values expressed in milliequivalents of active oxygen/kg (meq O2/kg of oil) were determined by the iodometric method, according to International Organization for Standardization (ISO) 3960:2007 standard [29].

Acid values were determined according to the standard ISO method 660:2009 [30].

The conjugated dienes and trienes were measured for samples diluted in n-hexane, using a Genesys UV-VIS spectrophotometer (Milton Roy, Houston, USA) and expressed as specific absorption coefficients at 232 (K232) and 270 nm (K270), according to the standard ISO method 3656:2011 [31].

Polar compounds (PCs) were separated from the oil samples by silica gel column chromatography, according to the Association of Official Analytical Chemists International (AOAC) method no. 982.27 [32]. After elution of the non-polar components with 150 mL of petroleum ether-diethyl ether (87:13, *v*/*v*), the PCs were recovered with 150 mL of diethyl ether. After removing the diethyl ether,

the PCs were recovered in tetrahydrofuran (THF) and subjected to high performance size exclusion chromatography (HPSEC) analysis to determine the single classes of substances constituting them. The chromatographic system consisted of a series 200 Perkin–Elmer (Beaconsfield, UK) pump, a 50 μL injector loop (Perkin–Elmer, Beaconsfield, UK), a PL-gel guard column (Polymer Laboratories, Church Stretton, UK) of 5 cm length × 7.5 mm inner diameter, and a series of two PL-gel columns (Polymer Laboratories, Church Stretton, UK) of 30 cm length × 7.5 mm inner diameter each. The columns were packed with highly cross-linked styrene divinylbenzene copolymers with a particle diameter of 5 μm and pore diameters of 500 Å. The detector was a differential refractometer (series 200A, Perkin–Elmer, Beaconsfield, UK). The elution solvent used was THF for high performance liquid chromatography (HPLC) at a flow rate of 1.0 mL/min. The identification and quantification of individual peaks was carried out, as described in previous paper [33]. Three replicates were analyzed per sample.

Total tocopherol content was determined by using a HPLC method. The details of the method are described in Reference [34]. The analysis was carried out using a Waters 600 high-performance liquid chromatograph. Chromatographic separation was achieved at room temperature, using a Symmetry C18 column (150 mm × 3.9 mm, 5 μm), fitted with a μBondapak C18 cartridge guard column (all from Waters, Milford, MA, USA). A mobile phase composed of 50% acetonitrile and 50% methanol was used with a flow rate of 1 mL/min. Samples of oil were weighed (0.0800 g) and dissolved in 1 mL of 2-propanol. Vortex-mixed samples were directly injected into the HPLC column without any additional sample treatment. The injection volume was 20 μL. The eluate was detected using a Waters 474 scanning fluorescence detector, set for emission at 325 and excitation at 295 nm. The emission slit width was 10 nm, fluorometer gain 100, and attenuation 1. Tocopherols were identified by comparing their retention times with those of the corresponding standards. Additionally, a Waters 996 photodiode array detector was used to identify the compounds by their absorption spectra.

Carotenoids were determined using the spectrophotometric method. The concentration of total carotenoids was determined by measuring the absorption of oil dissolved in n-hexane at 450 nm. The molar absorption coefficient at 450 nm for carotenoids: ε = 138730 [L/cm × mol] was used for the calculation of carotenoid concentration [35]. Genesys 2 UV-VIS spectrophotometer (Milton Roy, Houston, USA) was used in these measurements.

Pheophytins content in fresh samples (expressed as mg of pheophytin as per kg of oil) was determined according to official methods and recommended practices of the American Oil Chemists' Society [36]. The absorbance of the oil sample was read at 630, 670, and 710 nm. In samples exposed to light, low concentration of pheophytins prevented their determination by means of the absorption method. Therefore, relative pheophytins content in all of the samples was determined based on the fluorescence intensity measured at λexc = 670 nm and λem = 680 nm, performed using a Fluorolog 3-11 spectrofluorometer (Spex–Jobin Yvon, Palaiseau, France). The relative pheophytins content in all samples studied was expressed as a percentage for further analysis.

#### *2.3. Fluorescence Measurements*

The fluorescence spectra were obtained using a Fluorolog 3-11 spectrofluorometer (Spex–Jobin Yvon). The excitation-emission matrices (EEMs) were collected by measuring the emission spectra in the 260–700 nm range with the excitation in the 250–500 nm range, with a 10 nm interval. The excitation and emission slit widths were 3 nm. The acquisition step and the integration time were maintained at 1 nm and 0.1 s, respectively. A reference photodiode detector was used to compensate for the source intensity fluctuations. The individual spectra were corrected for the wavelength-dependent response of the system. Front-face geometry was used for undiluted samples in a 10 mm fused-silica cuvette.

#### *2.4. Data Analysis*

The analysis of covariance (ANCOVA) was used to compare the chemical data for the samples stored in different conditions. Pearson correlation coefficients were calculated to test the correlations between the individual analytical parameters.

A parallel factor analysis (PARAFAC) was used to decompose excitation-emission matrices into contributions from individual fluorescent components. In PARAFAC, each component consists of one score vector and two loading vectors. The loading matrices contain the spectral excitation and emission profiles of fluorescent components, and the score matrix contains information about the relative contribution of each component to every sample EEMs included into the model [37].

A three-way array of data with the dimensions of 18 × 431 × 26 (number of samples × number of emission wavelengths × number of excitation wavelengths) was used for the PARAFAC. Rayleigh signals in EEMs were removed by replacing them with missing values. Non-negativity constraints on all three modes were applied. The optimal number of components in PARAFAC models was determined based on the core consistency diagnostic (CORCONDIA) and analysis of explained variance [37].

Principal component analysis (PCA) was performed on the X matrix, which contains chemical parameters and fluorescence score data obtained from the PARAFAC. The X data were scaled prior to analysis.

Multiple linear regression (MLR) and partial least squares regression (PLSR) were used to model the relation between the score data obtained from PARAFAC (X) and chemical data (Y). Full cross-validation was applied to all of the regression models. The regression models were evaluated using the adjusted *R*<sup>2</sup> and the root mean-square error of cross-validation (RMSECV) parameter.

The data analysis was carried out using Solo v. 5.0.1 (Eigenvector Research Inc., Manson, WA, USA) and Unscrambler 9.0 (CAMO, Oslo, Norway) software.

#### **3. Results and Discussion**

#### *3.1. Evolution of Chemical Parameters of Cold-Pressed Rapeseed Oil During Storage*

Freshly-pressed rapeseed oil samples were stored in the darkness and exposed to light in colorless and green glass bottles. As shown in Figure 1, colorless glass transmits radiation throughout the visible spectrum in the characteristic absorption range for both carotenoids and pheophytins. Green glass transmission is dependent on the spectral range and it is very low, below 500 nm, and higher in the long-wavelength range where long-wavelength absorption of pheophytins occurs.

The changes in the chemical parameters of the cold-pressed rapeseed oil stored for 6 months in the respective conditions are presented in Figure 2.

Table 1 reports the *p*-values of the analysis of covariance (ANCOVA) of the chemical data obtained for the oil samples stored for 6 months. The effect of time and storage conditions, as well as the results of the first-order interactions on chemical parameters, are presented. The results indicate that the storage time and conditions affected most of the studied oil quality parameters. The statistically significant effects of the storage conditions (*p* < 0.05) were observed for the time evolution of the following parameters: headspace oxygen, peroxide value (PV), K270, polymerized triacylglycerols (TAGPs), diacylglycerols (DAG), tocopherols, and pheophytins.

The oxygen concentration in the bottle headspace and in the oil phase was measured in a non-invasive way using a fluorescence sensor. The headspace oxygen content was 20.9% in the freshly-bottled oil samples. A much lower concentration of 0.0007% was detected in the oil phase. The headspace oxygen content decreased during storage at rates depending on the storage conditions (*p* = 0.014), Figure 2a,b. The samples stored in colorless bottles exposed to light had the fastest drop in the headspace oxygen. The oil stored in green bottles revealed a similar decrease in the headspace oxygen concentration. The minimum headspace oxygen concentration, very close to 0.5%, was recorded in these samples when exposed to light in the third month of storage, with some increase observed at longer storage times. The smallest decrease in the headspace oxygen content was observed in oil stored in darkness; the lowest concentration of oxygen, at about 5%, was detected in these samples from the second to the fifth month of the experiment. Similar to the samples exposed to light, higher concentrations of the headspace oxygen were recorded at longer storage times. This minor increase in the oxygen concentrations in the final period of the experiment may result from the oxygen diffusion

into the bottle or liberation of oxygen in the radical recombination reactions at the termination step of the oxidation process.

**Figure 2.** The changes of chemical parameters in the cold-pressed rapeseed oil during the 6 months of storage: (**a**) Oxygen concentration in the bottle headspace; (**b**) oxygen concentration in the oil; (**c**) peroxide value (PV); (**d**) specific absorption at 232 nm, K232; (**e**) specific absorption at 270 nm, K270; (**f**) content of the oxidized triacylglycerols (Ox-TAG); (**g**) content of polymerized triacylglycerols (TAGPs); (**h**) acid value (AV); (**i**) content of diacylglycerols (DAG); (**j**) total tocopherols; (**k**) carotenoids; (**l**) pheophytins.


**Table 1.** Results (*p*-values) of the analysis of covariance (ANCOVA) for the chemical parameters evaluated during the shelf-life of the rapeseed oil.

Peroxide value (PV); acid value (AV); K232, K270—absorption coefficients at 232 and 270 nm, respectively; oxidised triacylglycerols (Ox-TAG), polymerized triacylglycerols (TAGPs), diacylglycerols (DAG).

Very low concentrations of oxygen dissolved in oil were recorded for the samples stored in darkness, starting from the second month of the experiment, as shown in Figure 2b.

The peroxide value (PV) evaluates the presence of the primary oxidation products of lipids (hydroperoxides). For a sample of fresh cold-pressed rapeseed oil, the PV was 1.8 meq O2/kg and satisfied the requirements of the respective standards for cold-pressed oils [38]. The PV value changes during storage were significantly affected by the storage conditions (*p* = 0.034), as shown in Figure 2c. During the first month, the increase of the PV was much faster in the samples exposed to light (both in colorless and green bottles), as compared to the samples stored in darkness. The peroxide formation rates slowed down in the samples exposed to light after the first month. However, the PV increased almost linearly to the end of the experiment in the samples protected from light. The PV of 15 meq O2/kg, allowed by the Codex Standard [38] for virgin oils, was exceeded in the samples stored in darkness after the third month.

The specific absorption coefficient at 232 nm (K232) measures the concentration of conjugated dienes, another group of primary oxidation products. The K232 value was lower in samples stored in darkness only after the first month (*p* = 0.089), as shown in Figure 2d. From the second month on, the concentration of the conjugated dienes was higher in the samples stored in darkness as compared to the respective samples exposed to light. However, there was no statistically significant effect of the storage conditions on the evolution of this parameter. The lipid hydroperoxides formed during autooxidation are conjugated dienes, whereas photooxidation leads to the formation of both conjugated and nonconjugated dienes [5].

The specific absorption coefficient at 270 nm (K270), quantifying the conjugated trienes and the secondary oxidation products (carbonyl compounds), was significantly affected by the storage conditions (*p* = 0.03), as shown in Figure 2e. Its increase was more pronounced and similar in the samples stored in light, independently to the bottle glass color, as compared to those stored in darkness. These observations are in agreement with our previous study; we observed higher increases of K270 for the olive oil samples stored under light as compared to those protected from light [39].

The observed changes in the PV, K232, and K270 parameters indicate that the oxidation was more advanced in the samples stored under light, thus the primary oxidation may have evolved into the secondary. These findings were further verified by the analysis of the evolution of the polar compounds. The substances that are typical oxidation (oxidized triacylglycerols (Ox-TAG), TAGP) and hydrolysis (DAG) products were quantified using the HPSEC method. The analysis of these compounds has already been successfully used for the estimation of the real extent of the oxidative and hydrolytic degradation of various edible oils and fats [40,41].

The evolution of the oxidation products (Ox-TAG and TAGP) is presented in Figure 2f,g. The oxidation product concentration was growing during the storage in all of the storage conditions. The formation of Ox-TAG, Figure 2f, (*p* = 0.609) was not significantly affected by the storage conditions. The Ox-TAG denomination comprises all of the oxidative products deriving from triacylglycerols, which can undergo further polymerization and degradation reactions. It was proposed that these substances could indicate the level of primary oxidation of oils [42].

TAGPs were initially formed at higher rates in the samples exposed to light (*p* = 0.039); however, after 4 months, similar levels of these compounds were observed in all samples, as shown in Figure 2g. The storage conditions significantly affected the evolution of these compounds. TAGPs were proposed as an index of the secondary oxidative degradation, because of their high stability and low volatility [42].

Hydrolytic degradation of oil during storage was evaluated on the basis of the acid value (AV) and DAG content. The AV increased in the samples stored under light throughout the entire duration of the experiment (*p* = 0.543), as shown in Figure 2h. Some fluctuations of the AV were recorded in darkness, although similar values were measured at the beginning and at the end of the storage period. The formation of DAG was affected by the storage conditions (*p* = 0.040), as shown in Figure 2i. Initially, it was higher in the samples exposed to light, while similar levels were recorded in all of the samples at the end of the experiment.

The progress of oil oxidation also depends on the presence of minor substances. Thus, the concentrations of tocopherols, carotenoids, and pheophytins were measured both in the fresh oil and during storage. Fresh oil was characterized by a relatively high content of vitamin E (total tocopherols) of 684 mg/kg. The tocopherol content decreased with the rates, depending on the storage conditions (*p* = 0.022), as shown in Figure 2j. The fastest decay was noted in the oil stored in colorless bottles. The degradation of tocopherols was the slowest in the samples stored in darkness. The lowest concentration of tocopherols after 6 months was recorded in the green glass bottle.

In fresh oil the content of carotenoid pigments was 6.0 mg/L. In contrast to tocopherols, the effect of the storage conditions on the decay of carotenoids during the experiment was insignificant (*p* = 0.160), as shown in Figure 2k.

The evolution of the pheophytins during the storage is of great importance due to their role of a photosensitizer in the photooxidation. In fresh oil, the content of pheophytin pigments of 1.7 mg/kg (pheophytin a) was rather low. The decay of pheophytins was markedly affected by the storage conditions (*p* = 0.001), as shown in Figure 2l. The effect of the bottle glass color was also clearly noticeable. The decay of pheophytins was the most advanced in the samples stored under light in the colorless bottles, as these pigments were partially protected from the photodegradation in the green glass bottles.

Based on the present results, we conclude that the oxidation of rapeseed oil is affected by light exposure mainly during the first few months of storage. Although the studied oil contained only minor amounts of pheophytins, even these speeded up its photooxidation. The green color of bottle glass slowed down the degradation of pheophytins and tocopherols at the initial period of the experiment, having a less pronounced effect on the oxidation during the entire period of storage. Moreover, it seems that oxygen headspace concentration was an important factor that limited the degree and rates of oxidation. It should be noted that due to the high rate of photooxidation, the study in the initial storage period, e.g., up to several days, should provide better insight into the kinetics of this process in colorless and green bottles.

#### *3.2. Evolution of Fluorescence of Rapeseed Oil during Storage*

The fluorescence methods provide two main advantages—high sensitivity and selectivity. Thus, fluorescence enables direct monitoring of minor components of oils, namely tocopherols, pheophytins, and oxidation products. Figure 3 presents the EEMs of fresh rapeseed oil and samples stored for 6 months, exposed to light in colorless and green bottles and stored in darkness. The oils with

different oxidation degrees may by discriminated by the intensity of bands ascribed to the particular minor constituents.

**Figure 3.** Total fluorescence spectra of freshly-pressed rapeseed oil, R0, (**a**) stored for 6 months in darkness, RD6 (**b**), and exposed to light in green, RG6, (**c**) and colorless, RL6, (**d**) bottles. The same intensity scale.

Marked differences in the shape and intensity of the fluorescence bands were observed between fresh and stored samples. The most intense fluorescent bands in the short- and long-wavelength regions in the fresh samples correspond respectively to tocopherols (λexc/λem ca. 300/331 nm) and pheophytins (400/680 nm) [43]. The emission of phenolic compounds may also be observed at the short-wavelength side of this band. The intensity of these fluorescence bands decreased considerably in the stored samples.

The most pronounced differences between the fresh and stored oil spectra were observed in the intermediate spectral zone. Namely, a broad fluorescence band in the intermediate region (λexc/λem ca. 320/400 nm) appeared during storage. The chemical compounds associated with the emission observed in oils in this range have not been unambiguously identified so far. However, in several studies it was suggested that this emission belongs to oxidation products [22,23,44]. Similarly, we attribute this emission to oxidation products, such as degradation products of pheophytins and/or polar compounds formed in the subsequent oxidation reactions.

The EEMs of all of the oil samples were investigated by means of the PARAFAC method with the objective to resolve the fluorescence landscapes into individual contributions of fluorescent compounds. Based on core consistency and a visual inspection of both the residuals and the loadings, an optimal PARAFAC model was estimated with four components (explained variance 99.1%, core consistency value 70).

Figure 4 presents the excitation and emission loadings of the four fluorescent components and the respective scores.

**Figure 4.** Results of parallel factor analysis (PARAFAC) of total fluorescence spectra of fresh and stored oil samples: Excitation (**a**) and emission (**b**) profiles. Scores vs storage time: scores on component 1 (**c**), scores on component 2 (**d**), scores on component 3 (**e**), scores on component 4 (**f**).

The first component appeared at 300/328 nm in excitation/emission and corresponds to tocopherols [43]. The second PARAFAC component appears at 370/677 nm, with a narrow emission band. This component may be ascribed to pheophytins [43]. The third PARAFAC component appears at 320/420 nm in excitation/emission, the fourth at 360/530 nm. The origin of component 3 may be ascribed to compounds formed during oil oxidation. The origin of component 4 was less obvious. The loading of an emission profile of this component, besides the main band, with the maximum at about 530 nm, exhibits additional broad emission with low intensity on the short-wavelength side. This may indicate that emission of some fluorescent components was not fully resolved, and therefore this component corresponds to the more than one chemical compounds.

We showed the contributions of each of the four PARAFAC components in Figure 4c–f. The score values obtained in the PARAFAC decomposition were plotted against the time to explore the progression of fluorescent components throughout the storage. The systematic variations of the score values, corresponding to the respective components, were observed. Thus, the decay of tocopherol (component 1) and pheophytin (component 2) in time was markedly affected by the storage conditions. The effect of bottle color was also noticeable. The decay of component 2 (pheophytin) was the most advanced in samples stored under light in colorless bottles. These pigments were partially protected from the photodegradation in the green bottles. The decay of tocopherol emission was influenced by light and bottle color, particularly in the first months of storage.

The appearance of the oxidation product fluorescence (component 3) was observed for samples stored in light after one month. The contribution of component 4 was affected by storage in a less systematic way. This component was presented in fresh oil and was slightly stronger in samples stored under light and weaker in samples stored in darkness, as compared to fresh oil.

#### *3.3. Correlations between Chemical and Fluorescence Data*

Next, we used principal component analysis (PCA) to visualize the relationship between differently stored oil samples and the variables describing their properties.

Figure 5 shows the results of the PCA for the analytical parameters and the contributions of the fluorescent components. The distribution of samples of rapeseed oil depended systematically on storage time and conditions, as shown in Figure 5a. Samples exposed to light and protected from light clearly follow different paths during the oxidation. The first two principal components (PC1 and PC2) described 74% of the total data variance.

**Figure 5.** Results of principal component analysis (PCA) of chemical and fluorescent properties of oil samples: scores (**a**), correlation loadings (**b**). Fresh oil—R0; oils in colorless glass bottles, stored without light—RD; oils in colorless glass bottles exposed to light—RL; oils in green glass bottles, exposed to light—RG. Fl1, Fl2, Fl3, Fl4—fluorescent components obtained using PARAFAC. For the abbreviation of chemical parameters see Table 1.

The first principal component that explained 53% of the total data variance was linked to the time of storage. Samples spread along the PC1 axis, from positive to negative values, according to the storage time. The PC1 was positively correlated (correlation loadings ≥ 0.70) with the contents of tocopherols, carotenoids, and pheophytins, the oxygen concentration in the headspace and oil, and the first (tocopherols) and second (pheophytins) fluorescent components of the PARAFAC decomposition, as shown in Figure 5b. The PC1 was negatively correlated with the K270, contents of polar compounds (TAGP, Ox-TAG, and DAG), and PARAFAC components 3. Thus, the progress of oxidation was accompanied by oxygen uptake and the decay of the minor components, tocopherols, carotenoids, and pheophytins, with simultaneous formation of the oxidation products.

The second principal component (explaining 21% of the total data variance) accounted for both storage time and the variability, due to the storage conditions. The PC2 was positively correlated with the PV and K232 and negatively correlated with PARAFAC fluorescent component 4. The oxidative changes in samples of rapeseed oil during storage were clearly affected by storage conditions. Namely, the samples stored in darkness were characterized by higher concentrations of primary oxidation products, as evidenced by the PV, and K232.

The significant correlations, evident from PCA analysis and calculated Pearson correlation coefficients (*r*), occurred between the analytical parameters and the contributions of the fluorescent components obtained in PARAFAC. The contribution of the first fluorescent component, identified as tocopherol, was correlated positively with tocopherols, determined by HPLC (*r* = 0.83), carotenoids (*r* = 0.90), pheophytins (*r* = 0.63), and oxygen concentration in oil (*r* = 0.60) and negatively with the formation of Ox-TAG (*r* = −0.77), TAGP (*r* = −0.72), DAG (*r* = −0.63), and the K232 (*r* = −0.63) K270 (*r* = −0.77). The decrease of the contribution of the second fluorescent component, identified as pheophytin, was correlated positively with the degradation of tocopherols (*r* = 0.72) and pheophytins (*r* = 0.98) and negatively with the increase in K270 (*r* = −0.68) and DAG (*r* = −0.62). The contribution of the third fluorescent component, which increased during storage, was positively correlated with K270 (*r* = 0.72) and DAG (*r* = −0.64) and negatively with pheophytins (*r* = −0.91) and oxygen in the headspace (*r* = −0.64)). In contrast, the fourth fluorescent component was only negatively correlated with pheophytins (*r* = −0.68).

The decays of the first and second fluorescent components were significantly correlated (*r* = 0.69). The degradation of the first component was correlated with the formation of the third component (*r* = −0.58), while the second component was negatively correlated with the third (*r* = −0.85) and fourth (*r* = −0.59) components. The positive correlation existed also between the third and fourth components (*r* = 0.81).

It should be noted that, among the discussed correlations between fluorescent components and analytical parameters, only those between component 1 and tocopherols and component 2 and pheophytins may be considered as direct. Other correlations were rather indirect and were a consequence of correlations between tocopherols and pheophytins content and respective analytical parameters.

In order to quantitatively model the relationship between the overall fluorescence characteristics (PARAFAC scores) and the chemical parameters, regression analysis was performed using MLR and PLSR methods. We tested the regression models for all of the studied parameters; however, we only present and discuss those models that had acceptable quality (*R*<sup>2</sup> cal > 0.7).

Table 2 presents the results of multivariate calibrations. The best calibration models were obtained for the prediction of pheophytin and total tocopherol content, using both MLR and PLSR on PARAFAC fluorescence scores.


**Table 2.** Results of multiple linear regression (MLR) and partial least squares regression (PLSR) of chemical parameters and fluorescent components extracted using PARAFAC.

<sup>1</sup> For the abbreviation of chemical parameters see Table 1. Latent variables (LVs) of PLSR models; *R*2cal—determination coefficient of calibration; *R*2cv—determination coefficient of validation; root mean-square error of cross-validation (RMSECV);— relative error (RE).

Satisfactory relations were also found between TAGP and fluorescence; however, the prediction error was quite high for this class of substances. The regression analysis did not give satisfactory results for Ox-TAG. A significant relationship was established for K232, although these models had the highest relative errors of prediction.

Similarly to the observed correlations between individual fluorescent components and analytical parameters, quantitative relationships are also presented in Table 2, which may be considered as direct only for tocopherols and pheophytins. For other components, such as carotenoids, Ox-TAG, and TAGP, these relationships should be considered as indirect, and may be explained by the previously discussed correlations between various compounds involved in the oxidation processes.

The presented models confirmed that there is a quantitative relationship between fluorescence changes and some of chemical changes occurring during oxidation for the studied samples set. However, due to the indirect nature of these relationships, the respective models cannot be treated as universal models that enable valid determination of analytical parameters in other systems.

#### **4. Conclusions**

This study was aimed at investigating the potential of fluorescence in combination with chemometric methods for monitoring the rapeseed oil oxidation during storage in different conditions.

The results of chemical analyses revealed that the light exposure affected oxidation of rapeseed oil mainly in the initial period of storage. The minor amounts of pheophytins in oil studied speeded up its photooxidation. The green color of the glass bottle slowed down the degradation of pheophytins and tocopherols during the first few months and gave a less pronounced protective effect on the formation of oxidation products during the whole period of storage.

The PARAFAC of the front-face fluorescence excitation-emission matrices of rapeseed oil uniquely separated four fluorescent components, which had different evolution dynamics throughout the storage period, and revealed oxidative changes occurring in oil during storage. The fluorescence data were quantitatively related to some conventional chemical parameters describing the oxidation status of oil.

The present results show that fluorescence excitation-emission matrices associated with PARAFAC decomposition could be used for the direct monitoring of the oxidative degradation of rapeseed oils during storage.

**Author Contributions:** Conceptualization, E.S., K.W., W.K., A.G-S., I.K., T.G., F.C., V.M.P., C.S. and A.P.; data ´ curation, E.S., K.W., A.G.-S., V.M.P., C.S. and A.P.; formal analysis, I.K. and T.G.; funding acquisition, E.S.; ´ investigation, E.S., K.W., W.K., A.G.-S., V.M.P., C.S. and A.P.; methodology, E.S., K.W., W.K., A.G.- ´ S., I.K., T.G., F.C., ´ V.M.P., C.S. and A.P.; project administration, E.S.; supervision, E.S.; validation, E.S., K.W., W.K., A.G.-S., I.K., T.G., ´ F.C., V.M.P., C.S. and A.P.; visualization, E.S.; writing—original draft, E.S.; writing—review and editing, E.S., K.W., W.K., A.G.-S., I.K., T.G., F.C., V.M.P., C.S. and A.P. ´

**Funding:** This research was funded by the Ministry of Science and Higher Education, Poland, grant number NN312428239. The APC was funded by Pozna ´n University of Economics and Business.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **The Honey Volatile Code: A Collective Study and Extended Version**

#### **Ioannis K. Karabagias \*, Vassilios K. Karabagias and Anastasia V. Badeka**

Laboratory of Food Chemistry, Department of Chemistry, University of Ioannina, 45110 Ioannina, Greece; vkarambagias@gmail.com (V.K.K.); abadeka@uoi.gr (A.V.B.)

**\*** Correspondence: ikaraba@cc.uoi.gr or ioanniskarabagias@gmail.com; Tel.: +30-697-828-6866

Received: 23 September 2019; Accepted: 14 October 2019; Published: 17 October 2019

**Abstract:** Background: The present study comprises the second part of a new theory related to honey authentication based on the implementation of the honey code and the use of chemometrics. Methods: One hundred and fifty-one honey samples of seven different botanical origins (chestnut, citrus, clover, eucalyptus, fir, pine, and thyme) and from five different countries (Egypt, Greece, Morocco, Portugal, and Spain) were subjected to analysis of mass spectrometry (GC-MS) in combination with headspace solid-phase microextraction (HS-SPME). Results: Results showed that 94 volatile compounds were identified and then semi-quantified. The most dominant classes of compounds were acids, alcohols, aldehydes, esters, ethers, phenolic volatiles, terpenoids, norisoprenoids, and hydrocarbons. The application of classification and dimension reduction statistical techniques to semi-quantified data of volatiles showed that honey samples could be distinguished effectively according to both botanical origin and the honey code (*p* < 0.05), with the use of hexanoic acid ethyl ester, heptanoic acid ethyl ester, octanoic acid ethyl ester, nonanoic acid ethyl ester, decanoic acid ethyl ester, dodecanoic acid ethyl ester, tetradecanoic acid ethyl ester, hexadecanoic acid ethyl ester, octanal, nonanal, decanal, lilac aldehyde C (isomer III), lilac aldehyde D (isomer IV), benzeneacetaldehyde, *alpha*-isophorone, 4-ketoisophorone, 2-hydroxyisophorone, geranyl acetone, 6-methyl-5-hepten-2-one, 1-(2-furanyl)-ethanone, octanol, decanol, nonanoic acid, pentanoic acid, 5-methyl-2-phenyl-hexenal, benzeneacetonitrile, nonane, and 5-methyl-4-nonene. Conclusions: New amendments in honey authentication and data handling procedures based on hierarchical classification strategies (HCSs) are exhaustively documented in the present study, supporting and flourishing the state of the art.

**Keywords:** honey variety; honey code; HS-SPME/GC-MS; data handling; data bank; chemometrics

#### **1. Introduction**

The high consumer demand for authentic products along with the pressure on the market with products of low quality, distributed by cheap producing countries, creates the need for a multi-optional handling of natural-based products. A typical example of such products comprises honey—the sweet viscous solution obtained through the action of honeybees (*Apis mellifera*). The main types of honey include nectar and honeydew honeys. Nectar honeys are produced via the collection of the nectar of flowers by the honeybees.

On the other hand, honeydew honeys are characterized by the presence of secretions of plant-sucking insects (Hemiptera) living in the parts of the plants or conifer trees [1]. Given the historical meaning and symbolism of honey through the welfare of many civilizations [2], the latter has been subjected to exhaustive research. Apart from the basic components which are sugars and moisture, there are plenty of micro-constituents including minerals, phenolic compounds, organic acids, proteins, free amino acids, vitamins and volatile compounds and traces of lipid acids that have attracted researchers [3–7].

Among the aforementioned micro-constituents, volatile compounds are considered among the key parameters of honey sensory attributes. These contribute to the aroma providing the consumers with the emotional feelings related to regular consumption. It has been reported in the literature that volatile compounds of honey number in the hundreds, including esters, ethers, alcohols, acids, aldehydes, hydrocarbons, ketones, terpenes, norisoprenoids, carotenoid derivatives, furan and pyran derivatives, phenolic volatiles, benzene derivatives, quinones and other biomolecules originating from plants or bacteria metabolism with potential applications. The presence and quantity of these volatile compounds depends on the botanical and geographical origin of the honey [6–8].

The application of instrumental techniques has greatly favored the identification of the volatile compounds of honey. Numerous studies have been published using hydrodistillation, liquid–liquid extraction, simultaneous steam distillation extraction or Likens–Nickerson simultaneous distillation extraction micro-simultaneous steam distillation–solvent extraction, and ultrasonic solvent extraction for this purpose [7]. Some key volatile compounds that have been reported in the literature include benzene derivatives and phenolic volatiles for the case of strawberry tree honey [9]. Nonanal and *cis*-linalool oxide [2-[(*2S,5R*)-5-ethenyl-5-methyloxolan-2-yl]propan-2-ol] in combination with benzene derivatives and phenolic volatiles for Italian and Greek chestnut honeys [6,10]. Benzaldehyde, benzeneacetaldehyde and phenylethylalcohol were reported to be some characteristic volatile compounds of Spanish citrus and honeydew honeys [11,12]. The key volatile compounds of pine, fir, citrus, thyme, honeydew, and flower honeys harvested in Italy, Spain, Turkey, Greece, Morocco, and Brazil include benzaldehyde, benzeneacetaldehyde, octanal, nonanal, decanal, and different isomers of lilac aldehyde [3,4,6,8,13–16]. Norisoprenoids such as isophorone and 4-ketoisophorone (2,6,6-trimethyl-2-cyclohexene-1,4-dione) have been previously reported to serve as volatile markers of the floral origin of Sardinian strawberry tree and Indian saffron honeys [5,9]. *Alpha*-pinene, terpinolene, 2-phenylacetate and numerous other volatile compounds were considered as markers of the provenance of Argentinean honeys [17].

Based on the aforementioned, the objectives of the present study, which is collective in nature, were: (a) to classify clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys from different countries (Egypt, Greece, Morocco, Spain, and Portugal) according to botanical origin based on the use of specific volatile compounds in combination with chemometrics and (b) classify honey samples according to the honey code—that is, a combination of the grammatical sequences of the different honey types used in the study by using the first letter of each honey type nomenclature. To the best of our knowledge, over the last 10 years, this is only the second study in the literature that implements, among others, a hierarchical classification strategy (HCS) for honey authentication [10], and the novelty of the study herein is highlighted by the use of a large number of different types of honey samples harvested in different parts of the world. Therefore, the whole procedure is more complicated and exhaustive "crash tests" are provided with the use of a multivariate analysis of variance (MANOVA), linear discriminant analysis (LDA), *k*-nearest neighbors (*k*-NN), and factor analysis (FA).

#### **2. Materials and Methods**

#### *2.1. Honey Samples*

One hundred and fifty-one honey samples (*n* = 151) were collected between the years 2011 and 2018 from Egypt, Greece, Morocco, Spain, and Portugal. Honey samples from Greece were obtained from Attiki Bee Culturing Co. Alex.Pittas S.A. (Athens, Greece); honey samples from Egypt, Spain and Morocco were obtained from local shops; honey samples from Portugal were obtained from APISMAIA (Povoa de Varzim, Portugal) The honey samples were subjected to volatile compound analysis according to botanical origin as clover (*Trifolium alexandrinum*), citrus (*Citrus* spp.), chestnut (*Castanea sativa*), eucalyptus (*Eucalyptus* spp.), fir (*Abies cephalonica*), Pine (*Pinus* spp.) and thyme (*Thymus capitatus* L. and *Thymus* spp.), which was confirmed by melissopalynological analysis [16]. In particular, clover honeys (*n* = 8) originated from Egypt; citrus honeys originated from Egypt (*n* = 7), Spain (*n* = 8), Morocco (*n* = 6), and Greece (*n* = 10); chestnut honeys originated from Greece (*n* = 1) and Portugal (*n* = 3); eucalyptus honeys (*n* = 4) originated from Portugal; fir honeys (*n* = 31) originated from Greece; pine honeys (*n* = 39) originated from Greece; thyme honeys (*n* = 42) originated from Egypt (*n* = 7), Greece (*n* = 12), Morocco (*n* = 6), and Spain (*n* = 10). Honey samples were shipped to the laboratory and maintained firstly at room temperature in paper boxes for sampling which was started at once. Sampling and analysis followed the sequence of honey type harvesting through the aforementioned years. The paper boxes were kept away from UV light. The quantity of honey samples left was stored at 4 ± 1 ◦C.

#### *2.2. Honey Code Development*

The honey code was used to construct the group of objects that would be subjected to statistical analysis using the first letter of each honey type. Clover, citrus and chestnut honeys (*n* = 42) were represented by CCC; eucalyptus honeys (*n* = 4) by E; fir honeys (*n* = 31) by F; pine honeys (*n* = 39) by P, and thyme honeys (*n* = 42) by T. The main purpose of this hierarchical procedure was to test the classification ability of honey samples from different countries based on the use of specific volatile compounds and chemometrics according to honey type lettering (use of the first letter) [10].

#### *2.3. Analysis of Gas Chromatography–Mass Spectrometry in Combination with Headspace Solid-Phase Microextraction (HS-SPME*/*GC–MS)*

The experimental strategy for the isolation and semi-quantification of honey volatile compounds along with HS-SPME/GC–MS equipment and analysis conditions are given in details in previous work [10]. The mass spectral library used for the identification of volatile compounds was Wiley 7 (2005) of the National Institute of Standards and Technology (NIST). Only the volatile compounds that had ≥80% similarity with those of the Wiley MS library were tentatively identified using the GC–MS spectra. Data were expressed as concentration (Canalyte, mg/kg) based on the ratio of peak areas of the isolated volatile compounds to that of the internal standard (benzophenone) multiplied by the final concentration of the internal standard, assuming a response factor (RF) equal to 1 for all the compounds. An additional method of identification was considered and included the calculation of Kovats indices using a mixture of *n*-alkanes (C8–C20) which was supplied by Supelco (Bellefonte, PA, USA). The standard mixture was dissolved in *n*-hexane. The retention time of the standards was determined according to the temperature-programmed run used in the analysis of honey samples. MS and Kovats indices data were compared to those found in the Wiley MS library. A solvent delay of 5 min was inserted in the program, in order to avoid the elution of ethanol, in which the internal standard was dissolved. Each sample was analyzed in duplicate and the results were averaged.

#### *2.4. Statistical Analysis*

Multivariate analysis of variance (MANOVA), linear discriminant analysis (LDA), *k*-nearest neighbors (*k*-NN), and factor analysis (FA) were applied to the semi-quantitative data of volatile compounds. The first part of the statistical analysis included the botanical origin differentiation of clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys based on the use of the significant volatiles (*p* < 0.05) which served as the independent variables. The botanical origin served as the grouping variable consisted of 7 groups (clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys). In the second part of the statistical analysis, honey samples were grouped according to the honey code. The expectation was to investigate whether classification results could be improved.

For the MANOVA analysis, the indices of the multivariate hypothesis such as Pillai's Trace, Wilks' Lambda, Hotelling's Trace, and Roy's Largest Root were computed to determine whether there was a multivariate effect of volatile compounds (*p* < 0.05) on the botanical origin or the honey code of honey. The size of the effect was further evaluated by consideration of partial eta squared values (η2). It should also be stressed that the lower the value of Wilks' Lambda, the higher the differences between groups of objects.

Considering only the significant volatiles, LDA was then applied to classify honey samples according to group membership based on the use of original and cross-validation methods. LDA provides linear discriminant functions originating from the combinations of the significant variables (all independent variables are entered together/simultaneously in the analysis) multiplied by the standardized canonical discriminant function coefficients plus a constant, characteristic for each discriminant function [18]. Moreover, the tolerance test was also computed in the analysis. Tolerance may be defined as the proportion of a variable's variance not accounted for by other independent variables in the created discriminant function. It practically shows that a variable with very low tolerance contributes little information to the predictive model and may cause computational problems [19].

For the *k*-nearest neighbors analysis, the botanical origin of samples or the honey code served as the target parameter, while the significant volatiles (*p* < 0.05) served as the features. The number of the *k*-nearest neighbors was set by default equal to the higher number provided by the SPSS software—that is, 3–5. The classification ability of the constructed model was estimated by the application of training and holdout partitions. In the training sample, 70% of the cases were randomly assigned to partitions, while the rest of the cases were assigned to the holdout sample. The distances of an unknown object from all the members of the training set were calculated using the Euclidean distance in multi-dimensional space. The *k*-smallest distances between the unknown object and the training set sample were then identified. K is normally a small odd number, and the unknown variable is allocated to the class with the majority of these *k* distances [10]. For the performance of feature selection among the population of significant variables, *k*-NN analysis was run again using only the specified predictors that built the model (usually 3–5) in the previous step, in order to reduce the number of predictors to those having the lower error rate and *k* distance.

Afterwards, a data mining technique such as factor analysis (FA) was applied to the significant parameters in order to explain the total variance of the constructed model in the multidimensional space. At the same time, FA provides a reduction in the variables, in order to gain the most pronounced ones with the higher correlation and communalities of independent latent variables. The communalities indicate the common variance shared by factors with specific variables. A higher communality indicates that a larger amount of the variance in the variable has been extracted by the factor solution. For the most effective data collection during FA, communalities should be ≥0.4. The extraction method was principal component analysis (PCA). The rotation method used was Varimax with Keiser Normalization. The Varimax rotation is used in statistical analysis to simplify the expression of a particular sub-space in terms of just a few major items each. The actual coordinate system (practically constant) is the orthogonal basis that is being rotated to align with these coordinates. The sub-space can be defined with either PCA or FA. Varimax maximizes the sum of the variances of the squared loadings (squared correlations between variables and factors [20].

The accuracy and strength of factor analysis was supported further by the Kaiser–Meyer–Olkin (KMO) test, which comprises a measure of how well suited the data is for factor analysis. The acceptable value considered was that of KMO ≥ 0.50. In addition, the effectiveness and suitability of factor analysis was explored using Bartlett's test of sphericity. This test highlights the hypothesis that the correlation matrix is an identity matrix, which would indicate that the variables incorporated into the model are unrelated and therefore unsuitable for structure detection. Small probability values (*p* < 0.05) indicate that a factor analysis may be useful with data treatment [20]. Statistical analysis was run using the SPSS (version 20.0, IBM, Armonk, NY, USA) statistics software.

#### **3. Results and Discussion**

#### *3.1. Volatile Compounds of Clover, Citrus, Chestnut, Eucalyptus, Fir, Pine, and Thyme Honeys*

A considerable number of volatile compounds were putatively identified and semi-quantified. In total, 94 volatile compounds of different classes were isolated. Table 1 lists the volatile compounds according to retention time and their class. The majority of volatile compounds (62 volatiles) varied significantly (*p* < 0.05) according to the botanical origin of honey.

Figure 1 shows a typical chromatogram of clover honey from Egypt, indicating with numbers some selected key volatile compounds. In supplementary material (Figures S1–S6), typical chromatograms of citrus, chestnut, eucalyptus, fir, pine, and thyme honeys from the investigated regions are given.

**Figure 1.** A typical gas chromatogram of clover honey (no. 3) from Egypt indicating selected key volatile compounds. 1: 2-methyl-Butanal. 2: 3-methyl-Butanal. 3: Heptane. 4: 3-methyl-1-Butanol. 5: 2-methyl-1-Butanol. 6: Furfural. 7: Octanal. 8: Nonanal. 9: Decanoic acid ethyl ester. IS: Internal standard.

Clover honeys showed higher amounts (mg/kg) of 2-methylbutanal, 3-methylbutanal, 3-methyl-1-butanol, and 2-methyl-1-butanol compared to the other honey types. The aforementioned compounds comprise isoleucine- and leucine-derived volatiles and are found in a wide range of foods such as honey, beer, cheese, coffee, chicken, fish, chocolate, olive oil, and tea. These volatile compounds are associated with a fruity and malty flavor [21,22].

Citrus honeys were characterized by the higher amounts (mg/kg) of lilac aldehyde C (isomer III), dill ether, methylanthranilate and herboxide (isomer II). These compounds have been reported previously to dominate the volatile profile of citrus honeys among other honey types [3,4].

Chestnut honeys showed the highest amounts (mg/kg) of benzaldehyde, benzeneacetaldehyde, 1-octene, and furfural. The latter volatile compound may serve as an indicator of heat resistance/treatment of chestnut honeys.

Eucalyptus honeys were characterized by the presence of heptane and *beta*-damascenone, since these compounds were found in higher amounts (mg/kg). The presence of hydrocarbons in honey is a typical phenomenon, whereas *beta*-damascenone is a cyclic carotenoid derivative and possesses a rose odor [22].

Fir honeys had higher amounts (mg/kg) of nonanal, decanal, hexanoic acid ethyl ester, heptanoic acid ethyl ester, octanoic acid ethyl ester, nonanoic acid ethyl ester, decanoic acid ethyl ester, dodecanoic acid ethyl ester, tetradecanoic acid ethyl ester, nonane, 5-methyl-4-nonene, 6-methyl-5-hepten-2-one, geranyl acetone, 1-2-(furanyl)-ethanone, *alpha*-isophorone, 4-ketoisophorone, and 2-hydroxyisophorone. The volatile compounds 6-methyl-5-hepten-2-one and geranyl acetone comprise open-chain carotenoid-derived volatiles, whereas the isophorone related compounds belong to the class of norisoprenoids, resulting from the degradation of terpenoids [22]. Geranyl acetone and 6-methyl-5-hepten-2-one possess a strong floral, green and fruit-like odor. In a previous study, these compounds were characterized as exocrine products (cephalic secretions) of cleptoparasitic bees (Holcopasites) [23].


*Foods* **2019**, *8*, 508


**Table 1.** *Cont.*


**Table 1.** *Cont.* RT: Retention time. RIexp: Experimental retention index values using standard hydrocarbons naturally present in honey. ni: not identified. The were

 treated as zeros for chemometrics and not as missing values. *F*: Values of the F distribution. *p*: probability.

Pine honeys had higher amounts (mg/kg) of acetic acid, octane, *alpha*-pinene and *beta*-thujone. It is remarkable that *beta*-thujone was identified only in pine honeys, comprising a key volatile marker, even though the *p*-value (*p* = 0.053) was slightly higher than the level of confidence considered in the study. This finding is in agreement with a previous study on the botanical differentiation of citrus, fir, pine, and thyme honeys [8]. *Beta*-thujone has a menthol odor [24].

In the case of thyme honeys, 1-octen-3-ol, thymol methyl ether, carvacrol methyl ether, octyl ether, 4,7,7-trimethylbyciclo (3.3.0)-octan-2-one, and eugenol were found in higher amounts (mg/kg) compared to the other honey types. Ethers give an ether-like odor, whereas 1-octen-3-ol provides a floral and grassy odor. Furthermore, 1-Octen-3-ol is referred to as "mushroom alcohol", given the fact that is the main flavor component of mushrooms [25]. Eugenol is a phenylpropanoid volatile that has a pleasant, spicy, and clove-like odor [26].

*3.2. Classification of Clover, Citrus, Chestnut, Eucalyptus, Fir, Pine, and Thyme Honeys According to Botanical Origin and the Honey Code Using Volatile Compounds in Combination with Chemometrics*

3.2.1. Part I. Classification of Honeys According to Botanical Origin

#### MANOVA and LDA

The four qualitative criteria of the multivariate hypothesis, namely Pillai's Trace = 5.573 (*F* = 8.698, df = 540, *p* = 0.000, η*<sup>2</sup>* = 0.929), Wilks' Lambda = 0.000 (*F* = 22.251, df = 540, *p* = 0.000, η*<sup>2</sup>* = 0.972), Hotelling's Trace = 1034.279 (*F* = 102.151, df = 540, *p* = 0.000, η*<sup>2</sup>* = 0.994), and Roy's Largest Root = 789.550 (*F* = 526.367, df = 90, *p* = 0.000, η*<sup>2</sup>* = 0.999) showed that there was a statistically significant effect of the botanical origin of honey on the semi-quantitative data of volatile compounds.

More specifically, 62 of the 94 volatile compounds were found to be significant (*p* < 0.05) for the botanical origin differentiation of honeys. Afterwards, these volatiles were subjected to LDA. The minimum tolerance level of the analysis was set at 0.001. Results showed that 4-terpineol, borneol, *para*-cymene, carvacrol methyl ether, thymoquinone, thymol, and eugenol did not pass the tolerance test. Therefore, these volatile compounds were excluded (SPSS program) a priori from the discriminant analysis. Therefore, 56 volatile compounds were subjected to LDA.

Results showed that six canonical discriminant functions were formed: Wilks' Lambda = 0.000, X2 = 1624.974, df = 336, *p* = 0.000 for the first function; Wilks' Lambda = 0.000, X<sup>2</sup> = 1049.108, df = 275, *p* = 0.000 for the second function; Wilks' Lambda = 0.006, X<sup>2</sup> = 609.334, df = 216, *p* = 0.000 for the third function; Wilks' Lambda = 0.032, X2 = 406.682, df = 159, *p* = 0.000 for the fourth function; Wilks' Lambda = 0.122, X<sup>2</sup> = 249.608, df = 104, *p* = 0.000 for the fifth function; and Wilks' Lambda = 0.409, X<sup>2</sup> = 106,005, df = 51, *p* = 0.000 for the sixth function.

The first discriminant function recorded the higher eigenvalue (127.977) and a canonical correlation of 0.996, accounting for 71.5% of total variance. The second discriminant function recorded a much lower eigenvalue (39.902) and a canonical correlation of 0.988, accounting for 22.3% of total variance. The third discriminant function recorded a much lower eigenvalue (4.530) and a canonical correlation of 0.905, accounting for 2.5% of total variance. Similarly, the fourth discriminant function recorded an even lower eigenvalue (2.764) and a canonical correlation of 0.857, accounting for 1.5% of total variance. Moreover, the fifth discriminant function had a lower eigenvalue (2.360) and a canonical correlation of 0.838, accounting for 1.3% of total variance. Finally, the sixth discriminant function had the lowest eigenvalue (1.446) and a canonical correlation of 0.769, accounting for 0.8% of total variance. All six discriminant functions accounted for 100% of total variance.

Figure 2 shows the differentiation of honeys according to botanical origin based on the use of 56 volatile compounds and LDA. The group centroid values which represent the unstandardized canonical discriminant functions evaluated at group means are also plotted. Each centroid gives information about the coordinates (discriminant functions) of the group means in the polyparametric space. The abscissa is the first discriminant function, and the ordinate is the second. So, the respective group centroid values were as follows: (−6.609, −0.891), (−6.081, −1.951), (−12.106, 39.255), (−5.664, 13.355), (21.632, 0.974), (−5.265, −1.395), and (−4.712, −2.268) for clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys.

**Figure 2.** Classification of the 151 honey samples according to botanical origin based on the 56 volatile compounds and LDA. LDA: linear discriminant analysis.

The classification rate was 95.4% using the original and 81.5% using the cross-validation method. Supplementary Table S1 shows the significant variables (markers of botanical origin of clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys) that were ordered by absolute size of correlation within function. The higher the absolute value of correlation, the best discrimination the variable provides within the discriminant function. The most pronounced markers of the botanical origin of honey are marked with an asterisk. In particular, octanoic acid ethyl ester, nonanoic acid ethyl ester, decanoic acid ethyl ester, dodecanoic acid ethyl ester, decanal, nonanal, 5-methyl-4-nonene, hexanoic acid ethyl ester, heptanoic acid ethyl ester, 4-ketoisophorone, octanol, tetradecanoic acid ethyl ester, geranyl acetone, 6-methyl-5-hepten-2-one, 1-(2-furanyl)-ethanone, *alpha*-isophorone, and decanol contributed to discriminant function 1, whereas hexadecanoic acid ethyl ester contributed to discriminant function 2. It should be remembered that the first two discriminant functions explained 93.8% of total variance.

The botanical origin classification rate of honeys, based on the original method, followed the sequence: clover (87.5%), citrus (90.3%), chestnut (100%), eucalyptus (100%), fir (100%), pine (100%), and thyme (91.4%). In total, 12.5% of clover honeys (one sample) were allocated to pine honeys. In total, 9.7% of citrus honeys (three samples) were allocated to pine honeys. Finally, 8.6% of thyme honeys (three samples) were allocated to pine honeys.

On the contrary, the botanical origin classification rate of honeys, based on the cross-validation method followed the sequence: clover (62.5%), citrus (80.6%), chestnut (66.7%), eucalyptus (75%), fir (100%), pine (94.9%), and thyme (57.1%). In total, 25% of clover honeys (two samples) were allocated to pine honeys, whereas 12.5% of samples (one sample) were allocated to thyme honeys. In total, 19.4% of citrus honeys (six samples) were allocated to pine honeys. In total, 33.3% of chestnut honeys (one sample) were allocated to fir honeys. In total, 25% of eucalyptus honeys (one sample) were allocated to chestnut honeys. In total, 5.2% of pine honeys (two samples) were allocated in equal proportions

(2.6%) to clover and chestnut honeys, respectively. Finally, 42.8% of thyme honeys were allocated to clover (11.4%) (four samples), citrus (2.9%) (one sample), chestnut (11.4%) (four samples), and pine (17.1%) (six samples) honeys, respectively (Table 2).


**Table 2.** Discriminatory power of the LDA model based on the significant volatile compounds according to the botanical origin of honey.

<sup>a</sup> 95.4% of grouped cases using the original method were correctly classified. <sup>b</sup> Cross validation was performed only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases rather than that particular case. <sup>c</sup> 81.5% of cross-validated grouped cases were correctly classified.

It should be stressed that these results were accepted, given the fact that cross validation is a more pessimistic method of classification of a group of objects, since in cross validation, each case is classified by the functions derived from all cases rather than that particular case. The errors obtained in both classification techniques (original and cross-validation) reveal important findings regarding honey authentication. These errors show the contribution of numerous plants in the produced honeys, even in cases of honeydew honeys. Honeydew honeys are harvested after nectar honeys. Therefore, the contribution of flowering plants in honeydew honeys is quite common.

In addition, present findings may be related to beekeeper practices during the harvesting of different honey types. However, the classification results obtained support previous studies in the literature that focus on honey authentication using volatile compound analysis and chemometrics [3–5,8,11]. The results obtained in the present study, which is collective in nature, show a clear differentiation of honeydew vs. nectar honeys.

#### *K*-Nearest Neighbors

For the *k*-NN analysis, the number of samples was randomly assigned to training and holdout partitions. The training sample consisted of 110 honey samples (72.8%), while the holdout sample consisted of 41 samples (27.2%). All cases (151 honey samples) (100%) were used in the statistical analysis, comprising a valid procedure.

The overall classification rate was 77.3% for the training and 87.8% for the holdout sample and was satisfactory in both cases. The botanical origin classification rate of honey types for the training sample followed the sequence: clover (75%), citrus (70.8%), chestnut (0%), eucalyptus (0%), fir (95.8%), pine (92.3%), and thyme (58.3%). However, the zero classification rates of chestnut and eucalyptus honeys are attributed to the limited honey samples, since the majority of them were assigned to the holdout sample. Of the eight clover honey samples subjected to training analysis, six were assigned to clover and two to thyme honeys. Similarly, of the 24 citrus honey samples, 17 samples were assigned to citrus, one to clover, four to pine, and two to thyme honeys, respectively. One chestnut honey sample was assigned to eucalyptus honeys. Of the 24 fir honey samples, 23 were assigned to fir and only one to pine honeys. A similar trend was obtained for pine honeys—in which, 25 samples were assigned to pine and only one to fir honeys, respectively. Finally, of the 24 thyme honey samples, 14 were assigned to thyme, six to pine, three to clover, and one to citrus honeys, respectively.

Regarding the holdout sample classification rate, this was higher than the training sample by 10.5%. The botanical origin classification rate of honey types for the holdout sample followed the sequence: citrus (85.7%), chestnut (0%), eucalyptus (100%), fir (100%), pine (100%), and thyme (81.8%). The clover honeys were assigned previously to training sample. Therefore, no classification results were obtained. Of the seven citrus honey samples subjected to holdout analysis, six were assigned to citrus and one to pine honeys. Of the two chestnut honeys, one was assigned to citrus and one to eucalyptus honeys, respectively. Finally, of the 11 thyme honey samples, nine samples were assigned to thyme and two to pine, respectively (Table 3).


**Table 3.** Classification of clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys according to botanical origin using the 56 volatile compounds and *K*-Nearest Neighbors (*k*-NN) analysis.

Among the 55 significant volatile compounds (predictors), the most effective predictors (*k*-nearest neighbors) that built the model were acetic acid ethyl ester, formic acid ethyl ester, and 2-methylbutanal. Based on this observation, the *k*-NN analysis was run again by performing feature selection in order to investigate whether the classification results could be improved. The selected features were the aforementioned volatile compounds.

For the specified *k*-nearest neighbors analysis, the sample was divided again to training and holdout partitions. The training sample consisted of 100 honey samples (66.2%), whereas the holdout sample consisted of 51 (33.8%). Both partitions explained 100% of the procedure, indicating that all cases were valid. The analysis stopped when the absolute ratio was less than or equal to the minimum change, which was inserted by default equal to 0.01. The overall classification rate was 83% for the training and 74.5% for the holdout sample. The individual botanical classification rate was

differentiated given the fact that the sample assignment to training and holdout partitions was also differentiated (Table 4).


**Table 4.** Classification of clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys according to botanical origin using the 10 volatile compounds and *k*-NN.

Considering the total standard error of the forced predictors, but also that of the individual predictors, the model built with *k* = 3 (nearest neighbors) was more applicable. The 10 predictors were the three specified predictors, followed by furfural, lilac aldehyde C, benzaldehyde, nonanol, *para*-cymene, 5-methyl-4-nonene, and nonane. The total error rate of the three specified features was 0.74. The respective individual error rate for the seven predictors left was: 0.39, 0.27, 0.25, 0.21, 0.18, 0.17, and 0.17.

On the contrary, the total error rate of the model built with *k* = 4 nearest neighbors in relation to the three specified features was 0.75. The individual error rate for the seven predictors left was: 0.41, 0.28, 0.25, 0.23, 0.21, 0.20, and 0.20 for furfural, lilac aldehyde C, phenylethylalcohol, 5-methyl-4-nonene, 1-octen-3-ol, and nonane, respectively. The selection of the predictors during *k*-NN analysis with *k* = 3 and *k* = 4 nearest neighbors according to the botanical origin of honey is shown in Figure 3.

#### FA

During factor analysis, the 56 significant volatile compounds were reduced to 16 principal components (PCs) based on the rule of an eigenvalue greater than one. The rotated component matrix is given in Supplementary Table S2. The total variance explained was 80.524% (ca. 80.52%).

The fitness of data for factor analysis was estimated by the Kaiser–Meyer–Olkin (KMO) test, which comprises a measure of the effective performance of factor analysis, to a set of data, indicating the sampling adequacy. The acceptable value was considered that of KMO ≥ 0.50. The suitability and applicability of factor analysis was further evaluated using Bartlett's test of sphericity. This test highlights the hypothesis that the correlation matrix is an identity matrix, which would indicate that the variables incorporated into the model are unrelated and therefore unsuitable for structure detection. Small probability values (*p* < 0.05) indicate that a factor analysis may be useful with data treatment [20]. The value of the KMO test was 0.636. Furthermore, Bartlett's test of sphericity had the following qualitative values: X<sup>2</sup> = 10,587.564, df = 1540, and *p* = 0.000.

**Figure 3.** Predictor selection during *k*-NN analysis with *k* = 3 and *k* = 4 nearest neighbors. 1: Clover honeys. 2: Citrus honeys. 3: Chestnut honeys. 4: Eucalyptus honeys. 5: Fir honeys. 6: Pine honeys. 7: Thyme honeys. Forced predictor: Acetic acid ethyl ester, formic acid ethyl ester, 2-methylbutanal. VAR00010: Furfural. VAR00081: Lilac aldehyde C. VAR00016: Benzaldehyde. VAR00043: Nonanol. VAR00035: *para*-Cymene. VAR00023: 5-methyl-4-Nonene. VAR00014: Nonane (Model with *k* = 3). VAR00010: Furfural. VAR00081: Lilac aldehyde C. VAR00051: Phenylethylalcohol. VAR00023: 5-methyl-4-Nonene. VAR00026: 1-Octen-3-ol. VAR00059: Decanol.VAR00014: Nonane (Model with *k* = 4).

The factors that best explained the rotated component matrix were the following volatile compounds: Decanol (PC1, 10.454% of total variance), undecanoic acid ethyl ester (PC2, 10.380% of total variance), *para*-cymene (PC3, 7.768% of total variance), 2-hydroxyisophorone (PC4, 7.699% of total variance), nonane (PC5, 5.509% of total variance), dill ether (PC6, 5.058% of total variance), lilac aldehyde C (PC7, 4.412% of total variance), lilac aldehyde D (PC8, 4.352% of total variance), acetic acid ethyl ester (PC9, 3.918% of total variance), 5-methyl-2-phenylhexenal (PC10, 3.905% of total variance), decane (PC11, 3.596% of total variance), 1-(2-furanyl)-ethanone (PC12, 3.563% of total variance), benzeneacetonitrile (PC13, 3.240% of total variance), nonanol (PC14, 2.400% of total variance), 2-methylbutanal (PC15, 2.156% of total variance), and hexanoic acid ethyl ester (PC16, 2.113% of total variance).

3.2.2. Part II. Classification of Honeys According to the Honey Code (CCC-E-F-P-T)

#### MANOVA and LDA

As in the case of the botanical origin differentiation of honeys, the four qualitative criteria of the multivariate hypothesis had the following values: Pillai's Trace = 3.770 (*F* = 10.640, df = 364, *p* = 0.000, η*<sup>2</sup>* = 0.943), Wilks' Lambda = 0.000 (*F* = 23.673, df = 364, *p* = 0.000, η*<sup>2</sup>* = 0.974), Hotelling's Trace = 431.692 (*F* = *64.635*, df = 364, *p* = 0.000, η*2* = 0.991), and Roy's Largest Root = 361.350 (*F* = 234.282, df = 91, *p* = 0.000, η*<sup>2</sup>* = 0.997) showed that there was a statistically significant effect of the honey code on the semi-quantitative data of volatile compounds. More specifically, 65 of the 94 volatile compounds were found to be significant (*p* < 0.05) for the classification of honeys according to the honey code. Then, these volatiles were subjected to LDA. Results showed that 4-terpineol, (1R)-1,7,7-trimethyl-bicyclo-(2.2.1)-heptan-2-one, carvacrol methyl ether, thymoquinone, thymol, and eugenol did not pass the tolerance test. Therefore, these volatile compounds were excluded from the discriminant analysis. In that sense, 59 volatile compounds contributed to the structure matrix as shown in Supplementary Table S3.

Results showed that four canonical discriminant functions were formed: Wilks' Lambda = 0.000, X2 = 1009.601, df = 236, *p* = 0.000 for the first function; Wilks' Lambda = 0.014, X2 = 502.352, df = 174, *p* = *0.000* for the second function; Wilks' Lambda = 0.090, X<sup>2</sup> = 284.776, df = 114, *p* = *0.000* for the third function; and Wilks' Lambda = 0.369, X<sup>2</sup> = 117.597, df = 56, *p* = 0.000 for the fourth function. The first discriminant function recorded the higher eigenvalue (72.605) and a canonical correlation of 0.993, accounting for 87.7% of total variance. The second discriminant function recorded a much lower eigenvalue (5.321) and a canonical correlation of 0.917, accounting for 6.4% of total variance. The third discriminant function recorded a lower eigenvalue (3.124) and a canonical correlation of 0.87o, accounting for 3.8% of total variance. Similarly, the fourth discriminant function recorded the lowest eigenvalue (1.709) and a canonical correlation of 0.794, accounting for 2.1% of total variance. All four discriminant functions accounted for 100% of total variance.

In Figure 4, the differentiation of the 151 honey samples according to the honey code is shown. The group centroid values are as follows: (−4.968, −1.828), (−2.691, −4.647), (16.454, −0.115), (−4.012, −0.914), (−3.834, 3.844) for honeys encoded with CCC, E, F, P, and T, respectively.

**Figure 4.** Classification of the 151 honey samples according to the honey code based on 57 volatile compounds and LDA.

The classification rate was 96.0% using the original and 86.1% using the cross-validation method. The classification rate of honeys according to the honey code, based on the original method, followed the sequence: CCC (88.1%), E (100%), F (100%), P (100%), and T (97.1%). In total, 12.9% of CCC honeys were allocated to the E group (4.8%, two samples) and to the P group (7.1%, three samples). In total, 2.9% (three samples) of T honeys were allocated to the P group.

For the cross-validation method, the classification rate of honeys followed the sequence: CCC (81%), E (75%), F (93.5%), P (92.3%), and T (80%). In total, 4.8% (two samples) of CCC honeys were allocated to the E group; 2.4% (one sample) were allocated to the F group, and 11.9% of samples (five samples) were allocated to the P group. In total, 25% (one sample) of E honeys were allocated to CCC honeys. In total, 3.2% of F honeys were allocated in equal proportions to the E (one sample) and P (one sample) groups, respectively. The same holds for P honeys—in which, two samples (5.2% of the total population) were allocated to the E (one sample, 2,6%) and F (one sample, 2,6%) groups, respectively. Finally, 14.3% of T honeys were allocated to P honeys (five samples), and 5.7% (two samples) to E honeys (Table 5).

As can be observed, the classification of honeys according to the honey code based on original and cross validation methods provided higher prediction rates compared to the differentiation of honeys according to botanical origin. In Table 3, the key volatile compounds for the discrimination of honeys according to the honey code are marked with an asterisk.


**Table 5.** Discriminatory power of the LDA model based on the significant volatile compound according to the honey code.

<sup>a</sup> 96.7% of grouped cases using the original method were correctly classified. <sup>b</sup> Cross validation was performed only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases rather than that particular case. <sup>c</sup> 88.1% of cross validated grouped cases were correctly classified.

#### *K*-Nearest Neighbors

The training sample consisted of 111 honey samples that represented 73.5% of the total sample population. Similarly, the holdout sample consisted of 40 samples that represented 26.5% of the sample population. All cases (151 honey samples) (100%) were used in the statistical analysis, comprising a valid procedure. The scale features (predictors) (57 statistically significant volatile compounds) were normalized during analysis.

The overall classification rate was 81.1% for the training and 80% for the holdout sample and was satisfactory in both cases. The classification rate of honey types according to the honey code followed the sequence: CCC (86.7%), E (25%), F (96.2%), P (96.4%), and T (47.8%). Of the 30 CCC samples subjected to training analysis, one sample was assigned to the E group and three samples to the P group. Similarly, of the four E honey samples, three honey samples were assigned to the CCC group and only one sample to the E group. Of the 26 F honeys, 25 samples were assigned to the F group and only one sample to the P group. Of the 28 P honeys, 27 samples were assigned to the P group and only one sample to the F group. Finally, of the 23 T honey samples, 11 were assigned to T, six to CCC, and six to P, respectively. For the training sample, the honey code had the following hierarchy: CCC > E, F, P > T, CCC > T, and was in general applicable.

The classification rates for the holdout sample were improved for the F, P, and T honey groups. The classification rate of honey samples according to the honey code for the holdout sample followed the sequence: CCC (83.3%), F (100%), P (100%), and T (50%). As can be observed, the hierarchy in honey lettering was maintained by 3/5 cases (F, P > T and CCC > T). Of the 12 CCC honey samples assigned to the holdout sample, 10 were assigned to the CCC group and two to the P group. F and P honey samples were perfectly assigned to the respective groups. Finally, of the 12 T honey samples, seven were assigned to T group, two to the CCC, and four to the P group (Table 6).


**Table 6.** Classification of clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys according to the honey code using the 57 volatile compounds and *k*-NN analysis.

Among the 57 significant volatile compounds (predictors), the most effective predictors (*k*-nearest neighbors) that built the model were formic acid ethyl ester, acetic acid ethyl ester, and heptane. Therefore, the *k*-NN analysis was repeated by performing feature selection in order to investigate whether the classification results could be improved. The selected features were the aforementioned volatile compounds.

For the specified *k*-nearest neighbors, the sample was divided again to the training and holdout partitions. The training sample consisted of 110 honey samples (72.8%), whereas the rest of the honey samples represented the holdout sample (27.2%). Similar to the botanical origin differentiation of honeys, the analysis stopped when the absolute ratio was less than or equal to the minimum change which was inserted by default equal to 0.01.

The overall classification rate was 89.1% for the training and 63.4% for the holdout sample. The classification rate of the training sample was improved, whereas that of the holdout sample was decreased. However, discrepancies among the two methods followed (simple *k*-NN and *k*-NN analysis with feature selection) may be attributed to the number of the predictors assigned to the model and the random dividing of the sample to training and holdout partitions in relation to sample size.

The classification rate of honey samples according to the honey code for the training sample followed the sequence: CCC (87.5%), E (0%), F (100%), P (93.1%), and T (88.5%). These results show that the honey code was satisfactorily applicable given the hierarchy followed: CCC > E, F > P > T and CCC > T. The classification rate of honey samples according to the honey code for the holdout sample followed the sequence: CCC (50%), E (0%), F (72.7%), P (80%), and T (55.6%). Similarly, the honey code was satisfactorily applicable given the hierarchy followed: CCC > E, F, P > T, and CCC > T. The classification results along with each sample assignment are given in Table 7.

The model was built with *k* = 3, *k* = 4, and *k* = 5 nearest neighbors, which were automatically selected. Similar to the botanical origin differentiation of honeys, the forced predictors' error was considered for the selection of the best model. The 10 predictors that were obtained during the *k*-NN analysis with *k* = 3 neighbors were the three forced predictors (heptane, formic acid ethyl ester, acetic acid ethyl ester) followed by dodecanoic acid ethyl ester, benzaldehyde, nonanal, *alpha*-terpinene, geranyl acetone, 5-methyl-4-nonene, and lilac aldehyde B (isomer II). The total error rate of the three specified features was 0.6818. The respective individual error rate for the seven aforementioned predictors was: 0.30, 0.2727, 0.2455, 0.1818, 0.1616, 0.1455, 0.1364, and 0.1364.

On the other hand, the total error rate of the model built with *k* = 4 nearest neighbors in relation to the three specified features was 0.6636. The individual error rate for the six predictors left was: 0.3000, 0.1909, 0.1636, 0.1364, 0.1273, and 0.1273, for furfural, lilac aldehyde C (isomer III), decanoic acid, lilac aldehyde D (isomer IV), pentanoic acid, and nonane.

Finally, the total error rate of the model built with *k* = 5 nearest neighbors in relation to the three specified features was 0.6545. The individual error rate for the seven predictors left was: 0.2909, 0.20, 0.1364, 0.1545, 0.1182, 0.1091, and 0.1091, for furfural, lilac aldehyde C (isomer III), benzeneacetaldehyde, octanoic acid ethyl ester, nonanoic acid ethyl ester, nonanoic acid, and pentanoic acid. The selection of the predictors during *k*-NN analysis with *k* = 3, *k* = 4, and *k* = 5 nearest neighbors according to the honey code is shown in Figure 5.



#### FA

The 59 significant volatile compounds were reduced to 15 principal components (PCs) based on the rule of an eigenvalue greater than one. The rotated component matrix is given in Supplementary Table S4. The total variance explained was 81.547% (ca. 81.55%). As can be observed, the total variance explained of samples according to the honey code was higher than those that were grouped according to botanical origin. In addition, the KMO test had a higher value: KMO = 0.615. Furthermore, Bartlett's test of sphericity had the following qualitative values: X2 = 12,376.576, df = 1596, *p* = 0.000. The factors that best explained the rotated component matrix were the following volatile compounds: *Para*-cymene (PC1, 12.207% of total variance), undecanoic acid ethyl ester (PC2, 10.520% of total variance), decanol (PC3, 10.03% of total variance), 2-hydroxyisophorone (PC4, 7.788% of total variance), decanoic acid (PC5, 7.386% of total variance), dill ether (PC6, 5.078% of total variance), 1-(2-furanyl)-ethanone (PC7, 4.506% of total variance), undecane (PC8, 4.326% of total variance), lilac aldehyde B (PC9, 3.254% of total variance), dodecanoic acid (PC10, 2.989% of total variance), lilac aldehyde D (PC11, 2.986% of total variance), benzeneacetonitrile (PC12, 2.904% of total variance), decanoic acid ethyl ester (PC13, 2.821% of total variance), 4,7,7-trimethylbyciclo (3.3.0)-octan-2-one (PC14, 2.819% of total variance), and dodecanoic acid ethyl ester (PC15, 1.960% of total variance).

**Figure 5.** Predictor selection during *k*-NN analysis with *k* = 3, *k* = 4, and *k* = 5 nearest neighbors. 1: CCC. 2: E. 3: F. 4: P. 5: T. Forced predictor: Heptane, formic acid ethyl ester, acetic acid ethyl ester. VAR00070: Dodecanoic acid ethyl ester. VAR00016: Benzaldehyde. VAR00037: Nonanal. VAR00032: *Alpha*-terpinene. VAR00082: Geranyl acetone. VAR00023: 5-Methyl-4-Nonene. VAR00039: Lilac aldehyde B (isomer II). (Model with *k* = 3); VAR00010: Furfural. VAR00081: Lilac aldehyde C (isomer III). VAR00063: Decanoic acid. VAR00083: Lilac aldehyde D (isomer IV). VAR00011: Pentanoic acid. VAR00014: Nonane (Model with *k* = 4); VAR00010: Furfural. VAR00081: Lilac aldehyde C (isomer III). VAR00020: Benzeneacetaldehyde. VAR00038: Octanoic acid ethyl ester. VAR00060: Nonanoic acid ethyl ester. VAR00047: Nonanoic acid. VAR00011: Pentanoic acid (Model with *k* = 5).

#### **4. Conclusions**

Results of the present study showed that specific volatile compounds in combination with polyparametric statistical techniques such as MANOVA, LDA, *k*-NN and FA, may provide exhaustive information about the botanical origin of honey. Even though honey samples were harvested in different parts of the world, the classification of honeys according to botanical origin was, in general, very satisfactory. At the same time, the application of the honey code to the collective set of data and the use of the aforementioned statistical techniques resulted in a higher classification rate of the honey samples. The use of hierarchical classification strategies (HCSs) may expand the state of the art and flourish the complicated topic of "Honey authentication" by highlighting with numbers the distinction/differentiation rates, for instance, of monofloral or blends of nectar or honeydew honeys with specific and intense flavors.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2304-8158/8/10/508/s1, Table S1: Standardized canonical discriminant function coefficients of the discrimination model—structure matrix of the LDA model according to the botanical origin of honey; Table S2: Rotated component matrix of volatile compounds used for the botanical origin differentiation of clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys; Table S3: Standardized canonical discriminant function coefficients of the discrimination model—structure matrix of the LDA model according to the honey code; Table S4: Rotated component matrix of volatile compounds used for the distinction of clover, citrus, chestnut, eucalyptus, fir, pine, and thyme honeys according to the honey code; Figure S1: A typical gas chromatogram of citrus honey (no. 4) from Spain indicating selected key volatile compounds. 10: Herboxide (isomer II). 11: Lilac aldehyde C (isomer III). 12: Dill ether. IS: internal standard; Figure S2: A typical gas chromatogram of chestnut honey (no. 2) from Portugal indicating selected key volatile compounds. 13: Octane. 14: 5-methyl-2-Furaldehyde. 15: Benzaldehyde. 16: Benzeneacetaldehyde. IS: internal standard; Figure S3: A typical gas chromatogram of eucalyptus honey (no. 1) from Portugal indicating selected key volatile compounds. 17: Heptane. 18: beta-Damascenone IS: internal standard; Figure S4: A typical gas chromatogram of fir honey (no. 6) from Greece indicating selected key volatile compounds. 19: Nonane. 20: 1-(2-furanyl)-Ethanone. 21: 6-methyl-5-Hepten-2-one. 22: 5-methyl-4-Nonene. 23: Hexanoic acid ethyl ester. 24: Heptanoic acid ethyl ester. 25: Nonanal. 26: *alpha*-Isophorone. 27: 4-Ketoisophorone. 28: 2-Hydroxyisophorone. 29: Octanoic acid ethyl ester. 30: Decane. 31: Nonanoic acid ethyl ester. 32: Geranyl acetone. IS: internal standard; Figure S5: A typical gas chromatogram of pine honey (no. 2) from Greece indicating selected key volatile compounds. 33: Acetic acid. 34: Octane. 35: *alpha*-Pinene. 36: *beta*-Thujone. IS: internal standard; Figure S6: A typical gas chromatogram of thyme pine (no. 6) from Egypt indicating selected key volatile compounds. 37: *alpha*-Pinene. 38: I-Octen-3-ol. 39: *alpha*-Terpinene. 40: *para*-Cymene. 41: Camphor. 42: Carvacrol methyl ether. 43: Thymoquinone. 44: 4,7,7-trimethylbyciclo(3.3.0)-Octan-2-one. IS: internal standard.

**Author Contributions:** Conceptualization, I.K.K.; methodology, I.K.K. and A.V.B.; software, A.V.B. and I.K.K. validation, I.K.K.; formal analysis, I.K.K. and V.K.K.; investigation, I.K.K. and V.K.K.; resources, A.V.B.; data curation, I.K.K. and V.K.K.; writing—original draft preparation, I.K.K.; writing—review and editing, I.K.K.; visualization, I.K.K.; supervision, I.K.K..; project administration, I.K.K.; funding acquisition, A.V.B.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors are grateful to Attiki Bee Culturing Co., Alex. Pittas S.A, Protomagias 9, Kryoneri 14568, Athens, Greece, and professional beekeepers from the studied regions for the donation of honey samples. Miguel Maia (APISMAIA) is greatly acknowledged for the collection of Portuguese honey samples. Michael G. Kontominas is greatly acknowledged for the collection of honey samples from Egypt, Morocco and Spain.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article*
