*2.3. Spectral Data Acquisition*

Spectral data acquisition was performed both on whole cocoa beans and on de-husked cocoa bean powder. About 100 g of randomly chosen whole cocoa beans from each sample were scanned with a portable instrument (PoliSPEC-NIR, ITPhotonics, Breganze, Italy) and with the benchtop instrument (FOSS DS−2500 scanning monochromator FossNIR-System, Hillerød, Denmark). Both NIR data acquisitions were performed in reflectance mode, with the following parameters:


#### *2.4. Chemical Analyses*

Unless otherwise specified, analyses were performed according to official methods of analysis (AOAC, 2016). All chemical analyses were performed in triplicate on peeled and ground cocoa beans.

#### 2.4.1. Dry Matter

Dry matter is measured as subtraction of the moisture content measured using a gravimetric method based on AOAC method 931.04 [21]. Hereto, approximately 2 g of powder sample were dried at 101–103 ◦C to constant weight in a forced-air electric oven (UF55 Plus, Memmert, Schwabach, Germany). After the drying process was completed, the samples were immediately closed with glass lids to avoid exposure and stored in desiccators for one hour to equilibrate samples towards ambient temperature [27]. The moisture content was expressed as average percentage (%) based on loss in weight of three independent samples.

#### 2.4.2. Ash

For the measurement of ashes, the sample was charred on a plate and placed in a muffle furnace (Gefran Model 1200; Gefran Spa, Brescia, Italy) at 550 ◦C (AOAC 972.15A). Ash content was expressed as weight percentage (%).

#### 2.4.3. Fat Content

The fat content was measured by extraction with petroleum ether [21] in a TE-188 Soxhlet lipid extractor (model SOXTEC 255 Tecator-Foss Analytical, Hillerød, Denmark) with the following parameters: 60 min boiling, 50 min washing, 15 min drying. Fat content was expressed as weight percentage (%).

#### 2.4.4. Total Protein Content

Protein determination was carried out by the Kjeldahl method, as described in AOAC 2016 (method 970.22) (model Kjeltec 2300-Foss Analytical). The protein content was calculated from the concentration of total nitrogen by applying a conversion factor of 6.25.

#### 2.4.5. Total Phenolic Content

The total phenolic content was determined according to the colorimetric method of Folin–Ciocalteu [28]. Samples were defatted using the Soxhlet method (AOAC 963.15). Defatted powder (0.05 g) was added to 10 mL of a methanol-water (70:30 *v*/*v*) mixture at room temperature and stirred for 45 min. After centrifugation, 0.1 mL of solution was mixed with 3 mL of distilled water and 0.5 mL Folin–Ciocalteu reagent. The mixture was stored for 3 min after which 1 mL of aqueous Na2CO3 (200 g L<sup>−</sup>1) was added. The mixture was allowed to stand for 20 min at 40 ◦C and the total polyphenols were determined by spectrophotometry at 765 nm (spectrophotometer model Cary 60 UV-Vis Agilent Technologies Stevens Creek Blvd. Santa Clara, CA, USA). The standard curve was prepared using 0, 50, 100, 150, 200 and 250 mg L−<sup>1</sup> solutions of gallic acid in methanol. Total phenol values were expressed in terms of gallic acid equivalents (mg g−<sup>1</sup> of dry fat-free mass) [29]. The analyses were performed in triplicate.

#### 2.4.6. Fermentation Index

Fermentation index (FI) corresponds to the color change within the bean cotyledons during fermentation. This change is due to the decreasing anthocyanin content as beans progress through fermentation [30]. A 50 mg sample of previously prepared cocoa powder was weighed and mixed with 5 mL MeOH:HCl (97:3 *v*/*v*). Samples were extracted at 4 ◦C for 16–18 h, centrifuged for 5 min at 3500× *g*, and the clear supernatant was collected. Absorbance of the supernatant was read at wavelengths 460 nm and 530 nm using UV-VIS spectrophotometer (model Cary 60 UV-Vis Agilent Technologies Stevens Creek Blvd. Santa Clara, CA, USA). All the measurements were performed in triplicate.

The FI was obtained by calculating the ratio of the absorbance at 460 nm and 530 nm (FI = A460/A530). Values greater than 1 are considered as well-fermented, while less than 1 as under-fermented beans [31,32]. However, it must be noted that this accounts for the Forastero variety and with some precautions for the Trinitario variety (which can contain both purple and white beans). Criollo beans do not contain anthocyanin pigments, therefore FI cannot be used to describe the fermentation level for this variety. In our study, both Trinitario and Forastero beans were used, but white beans were always absent.

#### 2.4.7. pH and Titratable Acidity

Cocoa powder (5 g) was mixed with 100 mL hot water (100 ◦C), stirred, and allowed to stand for 30 min. After 30 min, when the suspension was cooled up to 25 ◦C, it was centrifuged for 10 min at 5000 rpm and vacuum filtered through Whatman No. 4 paper filter according to AOAC 2006 methods 970.21 (pH) and 942.15 (potentiometric titration) [21], AOAC Section 42.104 (16th Ed. 1995) [21,32–35].

The pH of the filtered solution was measured with a pH-meter model PC 80 + DHS (XS Instruments, Carpi, Italy) and then 25 mL aliquots of the same solution were titrated to pH 8.1 with 0.05 M NaOH. All data were measured in triplicate. Titratable acidity results are expressed as mMol NaOH/100 g powder [34] or % acetic acid [21].

It is important to note that this procedure was not for quantifying the actual pH of the cocoa bean itself, but rather to measure the acidity derived from bean acids diffusing into water; it is useful for comparison between the pH of solutions produced by different beans [30].

#### *2.5. Wavelenght Selection and Chemometric Analyses*

Spectral chemometric analyses were performed using firstly the wavelength selection and secondly the full spectra collected. Wavelengths selection was carried out through the interval partial least-square (iPLS) [36] and through the principal component regression (PCR) [37] by using R software version 3.2.5 (R Core Team, Auckland, New Zealand, 2016) and WinISI software (Infrasoft International, Port Matilda, PA, USA), respectively. In particular, the iPLS was carried out applying the forward mode, in which the full spectrum was subdivided in 30 intervals that are successively included in the analysis: the first step calculated 30 models (one for each interval) that were tested using the cross-validation; the interval which provides the lowest model root-mean-square error of cross-validation were selected as most informative. The selected intervals were calculated per each parameter investigated and used for the following modelling. The PCR is based on the identification of the principal factors variance among spectral absorbance data through the principal component analysis [38]. Wavelengths selection was performed on the spectra acquired with FOSS DS-2500 on cocoa powder.

The second approach considered the use of the full spectrum and mathematical treatment as reported by several authors [39–41] in foods for chemical prediction purpose. This procedure takes advantage of the mathematical treatment as multiplicative correction (MSC) of the dispersion used to correct the problems of dispersed light in reflectance spectroscopy or the spectra normalization using standard normal variation (SNV) and first or second derivatives often used to remove the deviation and slope of the baseline in the spectrum [42]. This approach was applied to spectra acquired with both instruments, on both whole and ground cocoa beans.

The calibration models were performed using the Modified PLS (MPLS) regression on wavelength selected and on full spectra, whereas PCR was applied on full spectra (WinISI software, Infrasoft International, Port Matilda, PA, USA). Prediction equations were validated using a 5-fold cross-validation. Samples with a predicted value that differed more than 2.5 SD from the reference value (T-statistics) were considered outliers and removed from the dataset. Several combinations of scatter corrections (NONE, no correction; SNV\_DT, standard normal variate and detrending; MSC, multiplicative scatter correction) and derivative mathematical treatments (0,0,1,1; 1,4,4,1; 2,5,5,1; where the first digit is the number of the derivative, the second is the gap over which the derivative is calculated, the third is the number of data points in the first smoothing and the fourth is the number of data points in the second smoothing) were tested. The performances of the prediction models were evaluated based on the number of the standard error of calibration (SEC), cross-validation (SECV), the coefficient of determination of cross-validation (R2cv) and the ratio performance to deviation of cross-validation (RPDcv) calculated as the ratio between SD and SECV [43]. Predictions were considered excellent when R<sup>2</sup> was greater than 0.91, good when R<sup>2</sup> ranged from 0.82 to 0.90, approximate when R2 was between 0.66 and 0.81, and poor when R<sup>2</sup> was less than 0.66 [44]. Prediction models with RPD greater than 2.5 were considered adequate for analytical purposes [45], whereas prediction models with RPD smaller than 1.5 were considered unsatisfactory [44].

#### **3. Results and Discussion**

#### *3.1. Chemical Properties*

Shell content was on average 13.25% (Table 1), with minimum and maximum values (11.13% and 18.34%, respectively) in line with those reported in the literature (12–20%) [46,47]. Although the shell provides protection to the nib from mold and insects infestations, the shell content should be as low as possible (10–14%) because it has very little commercial value for the cocoa processor: it is removed during cocoa bean processing and it mainly constitutes a waste material [48].


**Table 1.** Descriptive statistics of cocoa beans: SD = standard deviation; CV = coefficient of variation; TPC = total phenolic component; TA = titratable acidity; FI = fermentation index; DM = dry matter.

Dry matter was on average 94.51%, with a minimum of 93.30%. These values correspond to an average moisture content of 5.49% and maximum of 6.70%, which are mainly below the optimal commercial levels of 6.5–8.0% as reported in CAOBISCO/ECA/FCC [19] but are in line with data found in the literature [49]. Moisture is a parameter that depends on storage conditions: since storage conditions of the studied samples varied, this may have affected the final moisture levels.

The average ash content of 2.99% found in our samples was in line with data reported in the literature [48,50]. With regards to fat content, which is the most abundant macronutrient in cocoa beans, only one sample presented a value below 40 g/100 g (i.e., 36.96 g/100 g), while the average fat content was 44.72 g/100 g. These data are in line with other studies [21,50]. African cocoa beans have generally higher fat content than American beans [16], but this was not observable in our set of samples. However, according to literature, the fat content can vary greatly from values of about 40 g/100 g to values of 57–58 g/100 g depending on different factors such as: genotype, plant age, growing practices, fermentation, drying processes and environmental conditions [51,52].

FI is one of the most used parameters for determining the degree of fermentation of cocoa beans as an indirect measure of the anthocyanin content [29,35]. In our case study, 22 out of 56 samples had a FI slightly below 1, with a minimum value of 0.57, which would indicate a low fermentation degree. The maximum value was 2.24 and the average was 1.29. The coefficient of variation for this parameter was particularly high (38.43%). Since the FI is an indirect measurement of anthocyanin content, the high dispersion of data might be due to factors other than solely the fermentation degree. It has been reported that different hybrids or genotypes have different pigments and that phenolic compounds are quantitatively affected by cocoa growth conditions (microclimate and position of pods on the tree) [29].

The TPC in the dried fat-free mass of our samples exhibited a wide variation, ranging from 32.58 to 98.04 mg/g dry defatted powder. In fermented beans, TPC should be approximately 5% in the dried fat-free mass, and values above 10% are considered a sign of a bad fermentation [53]. The average value of TPC in our samples was 56.42 mg/g dry defatted powder (equals to 5.6%) that would indicate well-fermented beans. Moreover, few samples showed values close to 10%. Overall, the values are in line with those reported in Anyidoho, et al. [54] and Djikeng, et al. [55].

In dried cocoa beans, a high degree of acidity is usually associated with a pH of 5.0 or less [19]. Some studies report that beans of higher pH (5.5–5.8) are considered unfermented, with a low fermentation index, and result in chocolates with high astringency [32], while beans of lower pH (4.75–5.19) are considered as well-fermented. Other studies report that pH of 5–6 is considered good for flavor development, and cocoa beans with pH below 4.5 are not accepted by cocoa bean processers because they show low levels of flavor precursors, and high acidic-derived products [35]. The pH can still be considered as a good indicator of fermentation as higher pH correlates to a lower fermentation degree [16] and an "international acceptable range" of 5.00–5.55 for dried cocoa beans [56] can be

considered as a valid reference. In our case study, cocoa beans had an average pH of 5.58 with a minimum of 4.84. This describes a situation of well fermented samples.

The titratable acidity value is often associated with the beans' pH. The present results confirm an overall good fermentation of the samples with an average titratable acidity of 17.19 mmol NaOH/100 g powder, in line with data reported in the literature [57,58].

Overall, this set of samples included many variation factors (e.g., genetic variety, crop, fermentation and drying conditions, transport, and storage) giving rise to high coefficients of variation in most of the studied parameters [59].

#### *3.2. Spectral Characteristics of Cocoa Samples*

Figure 1 is representative for average NIR spectra of cocoa beans samples obtained by FOSS DS 2500. The spectra show high similarity with spectra found in the literature [21,27,50,51]. Since cocoa beans contain about 50% of fat (Table 1), absorption spectra are dominated by signals derived from C=O and CH2 groups [49]. The absorptions around 1930 nm are caused by the second overtone vibration of ester C=O and O–H asymmetric stretching [49,60]. Caporaso, et al. [61] reported that wavelength of 1919 nm has been attributed to the C=O stretching second overtone in the carbonyl groups (–CO2H or CONH) but this absorption band is very close to 1923 nm, which is assigned to the O–H group of water and therefore it might be influenced by this group.

ďĞŶĐŚƚŽƉͺǁŚŽůĞ ďĞŶĐŚƚŽƉͺŐƌŽƵŶĚ ƉŽƌƚĂďůĞͺǁŚŽůĞ ƉŽƌƚĂďůĞͺŐƌŽƵŶĚ

**Figure 1.** NIR spectra (mean) of whole (gray) and ground (black) cocoa beans acquired with benchtop spectrometer (FOSS DS 2500) and NIR spectra (mean) of whole (yellow) and ground (green) cocoa beans acquired with portable NIR spectrometer (PoliSPEC-NIR).

The combination vibrations of CH2 stretch and CH2 deformation appear around 2320 nm. Moreover, the absorption at 1744 nm has been previously assigned to C–H stretch first overtone (CH2) of lipids, and the CH2 group also absorbs at 1725 nm, due to the C–H stretch first overtone [61]. Similar wavelength values (i.e., 1750 nm and 1730 nm), associated with first overtones of symmetric and anti-symmetric C–H stretch vibration (CH2-groups), are reported by Krahmer et al. [49].

Fat content is also related to the absorption bands visible around 1200 nm, as reported by Hayati et al. [27]. The authors also argued that the bands in the wavelength regions of 1460–1490 nm and 1920–1980 nm are most likely related to moisture content (O–H bonds). However, absorbances around 1450 nm have been attributed to carbonyl groups (e.g., ketones and aldehydes) as well as O–H polymeric groups, which can be due to complex carbohydrates, and the region between 1400 nm and 1440 nm has also been attributed to aliphatic alcohols and phenols [61].

Absorbance around 1490 nm has been attributed in the literature to several possible chemical bond vibrations, including N–H stretch first overtone and O–H stretch first overtone, thus indicating amides or compounds such as cellulose [61]. Accordingly, Krahmer et al. [49] reported that first overtones of intermolecular H-bridges and stretch vibrations of amidic NH-groups can be observed in the region of 1400 to 1500 nm and the corresponding combination of two amides can be found around 2130 nm.

Barbin et al. [50] associated the broad peaks around 1190, 1460 and 1950 nm with O–H, C–H, N–H stretch first and second overtones and combination bands that can be attributed to water absorption and protein changes.

Peaks around 1215 nm are visible and are associated with –CH=CH second overtone [23] and even C–H stretching second overtone (–CH3 or –CH2) of carbohydrates is associated with this wavelength [61].

The absorbance at 2057 nm indicates an N–H stretch/amide 1st combination band, which has been attributed to protein, while the peaks at 2145 and 2313 nm have been tentatively attributed to C–H deformation and C–H deformation and C–H bend second overtones respectively, both indicating lipids [61].

#### *3.3. Calibration Models for Cocoa Beans Quality*

Variable selection is generally applied in the multivariate analysis to extract the most informative region, removing redundant information. However, among the approaches tested in this study, a lower prediction was observed for the PCR than the MPLS approach as observed in the study of Xie et al. [37]. In detail, in the present study, the PCR showed poor performance of prediction for all traits investigated (see Supplementary Table S2).

Comparing the performance of prediction using the MPLS between full and iPLS selected spectra, it was observed that among the eight parameters, the best prediction was achieved using the full spectra for seven of them (see Supplementary Table S3). The iPLS wavelength selection had a better performance in the fat prediction (R2cv of 0.86 and RPD of 2.88) that did not differ substantially from the prediction obtained using the whole spectrum (900–1680 nm; R2cv = 0.83 and RPD = 2.43).

The results of prediction performance for the benchtop (NIR FOSS DS 2500) and the portable (PoliSPEC-NIR) spectrometers are presented in Tables 2 and 3, which describe data obtained from whole cocoa beans and peeled-ground cocoa beans, respectively.


**Table 2.** Fitting statistics of prediction models for whole cocoa traits developed using cross-validation results for benchtop (NIR FOSS DS 2500) and portable (PoliSPEC-NIR) NIR-spectrometers.

NONE = no correction; SNV\_DET = SNV and detrend; MSC = multiplicative scatter correction; SD = standard deviation of reference data selected; SEcal = standard error in calibration; R2cal = coefficient of determination of calibration; SEcv = standard error in cross-validation; R2cv = coefficient of determination of cross-validation. TPC = total phenolic compound; TA = titratable acidity; FI = fermentation index; DM = dry matter.


**Table 3.** Fitting statistics of prediction models for ground cocoa traits developed using crossvalidation results for benchtop (NIR FOSS DS 2500) and portable (PoliSpec NIR) NIR-spectrometers.

NONE = no correction; SNV\_DET = SNV and detrend; MSC = multiplicative scatter correction; SD = standard deviation of reference data selected; SEcal = standard error in calibration; R2cal = coefficient of determination of calibration; SEcv = standard error in cross-validation; R2cv = coefficient of determination of cross-validation. TPC = total phenolic compound; TA = titratable acidity; FI = fermentation index; DM = dry matter.

Generally, most of the cocoa studies were performed on ground cocoa to reduce the effects of the physical sample properties on spectra collection [24]. Indeed, for both NIRS devices, the best performances of prediction were observed on ground sample, probably due to the enhanced homogeneity of the samples characterized by a similarity in the particles size and in a more compacted powder that affects the scattering of light.

In this study, spectra corrections by mathematical treatments to remove irrelevant data such as noise and background information were evaluated. In particular, SNV and MSC were used as pre-processed methods to remove the influence of solid particle size and the surface scattering; moreover, the methods above are mainly recognized as the best mathematical treatment in the equation models developed for whole cocoa. The SNV\_DT and MSC treatments improved the prediction accuracy for some quality parameters of both whole and ground cocoa bean samples, while for other parameters raw spectra gave the best results. This was in line with Barbin et al. [50] who found no considerable improvement of the predictive ability when comparing different pre-processing methods with the original raw data. Indeed, Barbin et al. [50] stated that since the complexity of the models was similar to that obtained with the original data, it is feasible to use the raw spectra to build prediction models for both whole beans and ground cocoa samples.

Moreover, to evaluate the performance of technologies on the market, the whole spectrum was considered to perform the prediction equations, although some researchers suggest that selection of spectral intervals could lead to higher prediction performances [35,62]. All the predictions performed against whole bean sample can be considered as approximate to poor [44] with the highest capability achieved for DM (R2cv = 0.72; RPDcv = 1.86) for the benchtop and for pH (R2cv = 0.70; RPDcv = 1.83) with portable device (Table 2). In general, the minor prediction capability in whole cocoa beans compared to the ground sample has been confirmed also in the study of Hernández-Hernández et al. [63], in which the poor performance of chemical predictions was attributed to the shell that reflects the incident light hindering the interaction with internal constituents. Although predictions on whole cocoa beans were not adequate for quantitative purposes, they could represent a fast

approach for food business operators to sort cocoa beans towards a specific transformation according to high or low value. Moreover, at germplasm banks and breeding programs, a rapid whole cocoa analysis reduces the time required for the shell removing (usually carried out by hand in the laboratory), suggesting NIRS devices are capable to identify functional genotypes to improve qualitative aspects in cocoa products [63].

Excellent performance was obtained in ground cocoa for protein content (R2cv = 0.91; RPDcv = 3.40) and very good prediction was achieved for DM (R2cv = 0.90; RPDcv = 3.20), ash (R2cv = 0.89; RPDcv = 2.98), pH (R2cv = 0.88; RPDcv = 2.96) and TA (R2cv = 0.86; RPDcv = 2.70) using the NIR FOSS DS 2500 spectrometer (850–2500 nm) (Table 3). The PoliSPEC-NIR spectrometer (900–1680 nm) had the best predicting performances for fat content (R2cv = 0.82; RPDcv = 2.34) in ground samples; whereas, for the other traits, the portable device showed lower performances compared to the benchtop (Table 3).

To deeper investigate if the divergences between the devices might depend on the different spectral range used, a further prediction equation was performed for the benchtop using the same spectral range (900–1680 nm, every 2 nm) of the portable tool. In the comparison with the performance obtained considering the whole spectrum, a greater performance of predictions was observed for ash (R2cv = 0.90; RPDcv = 3.20), protein (R2cv = 0.93; RPDcv = 3.84), DM (R2cv = 0.94; RPDcv = 4.16), and lipids (R2cv = 0.83; RPDcv = 2.43). However, although the TPC remained unpredictable, an increment was observed in the new prediction equation (R2cv = 0.46; RPDcv = 1.37). Although a good predictive capability was maintained, lower performance prediction was observed for TA (R2cv = 0.85; RPDcv = 2.60) and pH (R2cv = 0.82; RPDcv = 2.34).

Thus, to comprehend the origin of the performance divergences between devices, the component loadings were developed for each tool to assess and compare the interactions between wavelengths and functional groups (Figure 2). The loading plots permit to better understand which wavelengths are more informative for a specific trait variability, showing the range which is mostly considered to develop the model. A strong similarity between portable and benchtop devices were overall observed for chemical parameters directly quantified.

In particular, although the same ranges and performance of prediction (R2cv = 0.83) were obtained in both devices for lipid loading plot, the highest loadings were observed in the spectral region between the 1212 and 1232 nm and 1368 and 1398 nm for portable and benchtop, respectively.

Such association between those range and lipid variability has been confirmed by [64] in cereal food products. Similar patterns for the protein loading plot were observed between the two devices; however, the high loadings observed between 1200 to 1400 nm were related to C–H second overtone and N–H stretching first overtone of protein, respectively [65,66]. Moreover, a high loading was observed around 1100 nm exclusively for the benchtop device; this is probably due to the higher sensitivity of the device that is reflected in the best performance of prediction (R2cv = 0.93) the range 1100–1400 nm being considered as an essential spectral region for the protein quantification analysis [67].

A comparable loading plot was also observed for pH in which the highest trait variability was explained by the 910 [68] and 1398 nm for both devices. Divergences in titratable acidity loading patterns were found; however, the most informative wavelengths (930–950; 1106; 1390–1400 nm) are related to the second combination region of the carboxylic acids [69]. The loading plot of DM showed notable peaks between 1200–1224 and 1373–1394 nm, mainly related to the water [69].

**Figure 2.** Loadings for the first principal component of fat, protein, pH, titratable acidity (TA), total phenolic compound (TPC), fermentation index (FI), dry matter (DM), and ash for NIR spectra of the ground cocoa samples for DS 2500 (blue line) and PoliSPEC NIR (green line).

Ash being an inorganic matter cannot be directly detected by NIRS; its amount is indirectly measured by the association with organic bonds, thus the loadings plot and the highest variability observed for ash is 1200 nm and 1376 nm for the benchtop, and 1396 nm for the portable device account for other organic components. Otherwise, loadings plots observed for TPC and FI were not strictly related to a specific spectrum range, probably due to the lower variability collected with the samples considered. In general, the performance divergences between the two NIRS devices could be explained by the difference in the detector equipment; in detail, the semiconductors included in portable (PoliSPEC-NIR) and benchtop (NIR FOSS DS 2500) devices are Indium gallium arsenide (InGaAs) and silicon lead sulfide, respectively, which affect the spectral response and the prediction capability Lin, et al. [70].

In our study, the accuracy of prediction for both FI and TPC was not satisfactory for any of the instruments and for both whole and ground cocoa bean samples. The influence of variable fermentation degrees of cocoa samples can be crucial in the prediction of FI and TPC, which are strictly related to the fermentation level of cocoa beans. Sunoj, Igathinathane and Visvanathan [32] showed how factors such as pod storage duration (before the fermentation process), and fermentation time, had a significant effect on the fermentation index, which was seen to increase together with the increment of these two parameters. The authors argued that these parameters are indirectly affected by the samples' chemical composition, thus the accuracy of prediction models are generally lower than those reported for major components. The reason might fall on the fact that our samples included only commercial cocoa beans which were supposed to be well-fermented, although with some natural variations, thus reducing the variability for the TPC and FI. Moreover, there could have been a negative influence of lipid absorbances in the models for TPC: fat has been indicated as a disturbance factor as beans with higher relative fat content have lower non-fat solids, where polyphenols are concentrated [61].

Although the FI was not correctly predicted by the constructed models, the estimations in ground samples of parameters related to correct fermentation such as pH and TA were approximative and good with the portable and benchtop devices, respectively, in line with previous results [25,49]. This method could provide a rapid and low-cost multiparametric analysis for cocoa evaluation. Portable instruments are usually less expensive than benchtop solutions (about a fifth) [71], and the cost of analyses are mainly related to the development and upgrade of calibration curves. Moreover, compared to wet analyses, through the application of spectrometric methods the cost of the analytical determination is drastically reduced as the number of examined samples increases.

The presented prediction models might be the basis for an overall cocoa bean quality evaluation based on NIR spectra. However, despite the presented parameters being good indicators of cocoa bean quality, a grading classification of cocoa beans' was beyond the scope of the present work, as it would require the investigation of other indicators, also related to the sensorial profile of the beans, as reported in previous studies on cocoa quality indexes (CQI) [72,73].

#### **4. Conclusions**

The results of this paper demonstrated that NIRS portable and benchtop devices coupled with chemometrics methods could be adopted for the chemical evaluation of commercial cocoa beans. The performances of predictions are affected by the presence of shell and the sample particle sizes of cocoa beans. The current study has successfully demonstrated that NIR, as a nondestructive analytical method, can be considered as rapid and reliable option to traditional methods to quantify lipids, protein, pH, titratable acidity, dry matter and ash in cocoa ground beans.

The NIRS benchtop instrument provided better performance of quantification considering the whole (800–2500 nm) and the reduced spectrum (900–1680 nm) than the portable device. Variable selection through iPLS or PCR did not improve prediction models compared to full spectra analyses. Benchtop instrument showed excellent prediction capability

in DM (R2cv = 0.94), protein (R2cv = 0.93) and ash (R2cv = 0.90), whereas lipids (R2cv = 0.83), TA (R2cv = 0.86) and pH (R2cv = 0.88) were well predicted on ground beans considering wavelengths between 900–1680 nm. Those results indicate that models developed for benchtop devices are applicable for cocoa quality control as an excellent option to substitute conventional methods.

On the other hand, the NIRS portable device showed lower but valuable performance of prediction than benchtop spectrometer. The prediction obtained for handheld device represents an appealing strategy for food business operators to apply in the field to control and check the product in every phase of trade and transportation, and also to segregate whole cocoa beans targeted to a specific transformation in different supply chains.

Based on these results, further studies including a wider variability of fermentation phases, cocoa bean varieties and origins as well as additional production steps of the cocoa supply chain could be investigated to support the fair-trade cocoa sector.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/xxx/s1, Table S1: Cocoa origin of different commercial lot analysed in this study Table S2: Fitting statistics of prediction models for ground cocoa traits developed using full spectra and principal component regression (PCR) and cross-validation results for benchtop (NIR FOSS DS 2500); Table S3. Fitting statistics of prediction models for ground cocoa traits developed using selected wavelengths through the interval PLS (iPLS) and cross-validation results for benchtop (NIR FOSS DS 2500).

**Author Contributions:** Conceptualization, P.C., M.F., K.D. and L.F.; methodology, D.V.d.W., K.D., M.M.; formal analysis, M.F., S.C. and M.M.; resources, P.C. and L.F.; data curation, M.F., S.C. and M.M.; writing—original draft preparation, M.F. and S.C.; writing—review and editing, S.C., D.V.d.W., K.D., L.F. and P.C.; supervision, P.C. and L.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** All related data and methods are presented in this paper. Additional inquiries should be addressed to the corresponding author.

**Acknowledgments:** The authors gratefully acknowledge Roberta Marcuzzi for the valuable help in performing the analyses.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
