*2.7. NIR-Chemometric Analyses*

Chemometric techniques for the calibration of the main physicochemical parameters of honey were applied. Spectral data of samples corresponding to the samples of the calibration set were analyzed by principal component analysis (PCA) [41]. Anomalous spectra were detected by applying the Mahalanobis distance (H-statistic). Considering an H-value greater than 3 (the spectra not belonging to the population), the equations are not used to make any prediction. The modified partial least squares (MPLS) regression method was used to obtain the NIR equations. Partial least squares (PLS) regression is similar to principal component regression (PCR) but uses both reference data and spectral information to form the factors useful for fitting purposes. Using the T ≥ 2.5 criterion, samples that presented high residual values when they were predicted were eliminated from the set. Therefore, statistical parameters of the calibration were obtained for each of the components after removing the samples for spectral (H criterion) or chemical (T criterion) reasons. To optimize the multivariate regression equations, the spectral scattering effects were taken into account with several mathematical treatments: multiplicative scatter correction (MSC), standard normal variate (SNV), D-trend (DT), and SNV-DT [42]. A nomenclature using 4 digits was used (1,4,4,1), in which the first digit is the number of the derivate, the second is the gap over which the derivative is calculated, the third is the number of data points in a running average or smoothing, and the fourth is the second smoothing.

Cross-validation is recommended to select the optimal number of factors and to avoid over fitting [43]. The calibration set is divided into several groups for the cross-validation. Each group is then validated using a calibration developed on the other group of samples. Validation errors generated are combined into a root mean square error of cross-validation (RMSECV). This statistic is considered the best single estimate for the prediction capability of the equations [44]. Cross-validation was performed by splitting the population into eight groups for all cases.

The performance of the models was determining by the squared correlation coefficient for predicted versus measured quantified in cross-validation and the ratio of standard deviation (SD) to SECV of the data set. RPD (ratio of performance to deviation) is the relation between SD and RMSEC, and it is desired to be larger than 2 for a good calibration, and an RPD ratio less than 1.5 indicates poor predictions and the model cannot be used for further prediction [44]. The statistics used to select the best equation for each physicochemical parameter were the highest RSQ (multiple correlation coefficients) and the lowest SECV (standard error of cross-validation) [25]. The software used for chemometric analysis was WinISI II version 1.50 (Infrasoft International, LLC, Silver Spring, Maryland, MD, USA).

### *2.8. Linear Discriminant Analysis*

Linear discriminant analysis (LDA) is a supervised classification technique, which uses a class member known for the analysis. In this case, the known variable was honey type determined by palynological analysis. Considering the pollen profile, six honey groups were characterized: heather, chestnut, eucalyptus, blackberry, honeydew, and multifloral. LDA was applied to the collected reference data set (physicochemical and botanical data) to determine a linear combination of these groups of subjects. LDA is considered as a dimensional reduction method to determine a lower dimension hyperplane on which the points will be projected from the higher dimension space [10]. A linear function of the variables is sought which maximizes the ratio of between-class variance and minimizes the ratio of within-class variance. STATGRAPHICS Centurion XVI software (Statpoint Technologies, Inc., The Plains, VA, USA) was used for treatment of data.
