**1. Introduction**

Within the last two decades, components of biomass materials such as lignocellulosic residues have increasingly received more attention in the science community due to their potential for the production of biofuels, as well as new value-added compounds and biomaterials utilizing a "biorefinery" approach.

Olive trees are usually native to the Mediterranean countries, but cultivation has spread globally during the past two decades due to healthy benefits attributed to olive oil consumption. Currently, olive trees are cultivated in more than 40 countries, and the total dedicated surface is about 10.8 million ha in 2017 [1]. Olive tree pruning (OTP) biomass takes place immediately after fruit collection and is an essential operation performed every two years after fruit harvesting. In the Mediterranean region, residual biomass from olive tree pruning yield ranges from 1 to 5 and from 4 to 11 oven dry t ha−<sup>1</sup> respectively for Spanish and Italian orchards [2]. Older branches are cut down, gathered into to the center of each row of trees, and further treated. This agricultural residue must be promptly eliminated from the cultivation fields; otherwise a risk for vegetal diseases may arise. Currently, two di fferent applications for this pruning biomass exist, either grinding it or scattering the chips over the field or direct burning. Direct burning represents the most common method of disposal, which puts the field at risk of an uncontrollable fire and thus an economic cost, although some initiatives are being carried out to develop collecting techniques [3]. It has been proposed to use this waste as raw material for obtaining a broad range of products [4,5], including energy, biofuels as bioethanol [6–9], antioxidant compounds [10,11], oligosaccharides [12], and others [13].

For any application of olive tree pruning residue it is essential to know its chemical composition in order to establish the most adequate conversion process. Moreover, the ratio of main structural components i.e., cellulose, hemicellulose, and lignin, as well as other minor components in a particular biomass, is of crucial importance to determine the potential markets applications. The current methods of chemical characterization of the composition of biomass feedstock are labor-intensive and time consuming, and thus do not meet industrial requirements for providing quick measurements. Moreover, they are often based on wet-laboratory analysis, during which the integrity and structure of the material is destroyed [14]. By contrast, the non-destructive spectroscopy techniques provides analytical information without damaging the sample and in most of the cases, in addition, sample treatment is not required. Vibrational techniques such as IR spectroscopy are amongs<sup>t</sup> the non-destructive spectroscopy techniques that have attracted more interest in the last years [15,16]. Additionally, infrared spectroscopy techniques are fast, accurate, and low-cost analytical methods with high potential in biomass composition analysis [17,18]. In this context, di ffuse reflectance near infrared spectroscopy (NIR) has been suggested as a re- and non-destructive method to replace reference methods for determination of lignocellulosic components of feedstock and capacity for bioethanol production [19]. In fact, near-infrared spectroscopic scanning (NIR) coupled with multivariate calibration methods have been developed to characterize di fferent herbaceous feedstocks [20], switchgrass [21], pine [22], yellow-poplar [23], willow [24], *Miscanthus* [25], and bamboo [26]. NIR technique application for olive residues has been reported to analyze solid fuels for heat and power generation [27]. However, to date, there is no literature on NIR application to determine chemical composition of olive tree pruning biomass for liquid biofuels such as bioethanol or other valuable products.

This work attempts to demonstrate the e ffectiveness of near infrared di ffuse reflectance spectroscopy (NIR) as a rapid and non-destructive method as an alternative to the wet chemical analysis methods for the determination of structural carbohydrates, lignin, and ash in olive tree pruning biomass.

### **2. Materials and Methods**

### *2.1. Raw Material*

Olive tree pruning was collected from di fferent locations in Andalusia (Spain) after fruit harvesting. A total of 79 samples were gathered, 64 of which were used for calibration models, and 15 for external validation. Samples used in this work were collected manually, using a pruning shear, and corresponded to shoots less than 3 cm diameter with approximately 70% of leaves and 30% thin steams by weight. All samples were air-dried at room temperature to equilibrium moisture content of about 10% and milled using a centrifugal mill (Retsch ZM200, Retsch, Ins., Haan, Germany) to 2 mm particle size. A sub-sample of milled olive tree pruning was used for analysis of chemical composition (cellulose, hemicellulose, total lignin, extractives, and ash) and another sub-sample for NIR spectroscopy.

### *2.2. Wet Chemical Composition Analysis*

Chemical composition of the olive tree pruning was determined according to (National Renewable Energy Laboratory (NREL) procedures for biomass composition analysis [28]. Firstly, extractives content was determined by Dionex Accelerated Solvent Extractor System (ASE 200). Extraction was performed in water and ethanol. After extraction, cellulose and hemicellulose content was measured based on monomers content after a two-step acid hydrolysis procedure to fractionate the fiber. A first step with 72% (w/w) sulphuric acid at 30 ◦C for 60 min was used, followed by a second step in which the reaction mixture was diluted to 4% sulphuric acid and autoclaved at 121 ◦C for 1 h. Sugars concentration was determined by high-performance liquid chromatography (HPLC) in a Waters 2695 liquid chromatograph with refractive index detector. A CARBOSep CHO-682 LEAD column (Transgenomic, Omaha, NE, USA) operating at 75 ◦C with Milli-Q water (Millipore Corporation, Billerica, MA, USA) as mobile-phase (0.5 mL/min) was used. Anhydrous correction was applied to the quantification results of monomeric sugars to calculate the polymer of carbohydrates. The factor used to convert sugars monomers to anhydromonomers were 0.90 for glucose to glucan, galactose to galactan, mannose to mannan and 0.88 for xylose to xylan, arabinose to arabinan. Hemicellulose was calculated as the sum of xylan, arabinan, galactan, and mannan concentrations. Total lignin was calculated as the sum of acid soluble and acid insoluble lignin. All measurements were done in triplicate and results are presented as percentage on an oven-dry weight basis. Results from wet chemical analysis were used for calibration and validation of the near infrared method.

### *2.3. NIR Spectroscopy*

A total of 79 samples of whole biomass samples, ground to 2 mm particle size, were analyzed. The biomass samples were dried in oven at 40 ◦C for 48 h before testing. NIR spectra were measured in spinning Petri dishes, using a Perkin Elmer NIR Spectrometer; model Spectrum One NTS (Perkin Elmer Inc., Beaconsfield, UK), with diffuse reflectance accessory.

The spectra were collected by averaging 70 scans, six spectra were measured, and a mean spectrum of each sample was calculated for the construction of the predictive models. The spectral range selected for analysis was 10,000 to 4000 cm<sup>−</sup><sup>1</sup> and spectral resolution of 8 cm<sup>−</sup>1.

### *2.4. Development of NIR Calibration*

The spectral and wet chemical data were processed with Spectrum Quant+ software, version 4.51 (Perkin Elmer Inc., Billerica, MA, USA) for chemometric analysis. This includes reduction of number of variables by principal component analysis (PCA) and multiple regressions. A partial least square (PLS) multivariate calibration model was developed. To improve the correlation between the spectra and concentration data, a number of preprocessing techniques for spectral data have been utilized. To minimize baseline deviations caused by dispersion effect, the smoothing according to Savitzky–Golay algorithm [29] has been applied. The first and second derivatives were applied to remove additive and sloped baseline drifts, and finally, normalization methods like multiplicative signal correction (MSC) and standard normal variate (SNV) remove the multiplicative signal effects produced by differences of particle size in samples. The two pre-processing methods can also be combined, first and second derivative with SNV or MSC, to obtain better results. All of these pretreatments were tested.

### *2.5. Analysis and Validation of Calibration Models*

The application of PLS process allows building a linear regression model based on the relation between the spectral data matrix and analyte concentration matrix. The calibration results were assessed by statistic parameters, which decide how adequately the calibration fits the data and how adequately the calibration will predict external samples. The root mean square error of calibration (RMSEC) parameter was used to evaluate the calibration and it is a statistical term that measures the standard deviation of residuals (differences between observed and predicted values) in the regression equation and measures precision of fit between data and the calibration model.

The model validation in the present study included both full-cross validation (leave-one-out) and independent validation. Cross-validation (leave-one-out method, LOO) was used as a basis to calculate the optimal range of principal components and compare the prediction ability of different calibration models. In full cross-validation procedure, for a dataset on n samples, one sample is left out and the rest of (n–1) samples are calibrated using PLS to generate a subset model. The full cross-validation results were expressed using the term root mean square error of cross-validation (RMSECV), and cross-validation coefficient of determination (R2cv). On the other hand, 15 samples were selected randomly from the entire set, to be used for an external validation of the best calibration models chosen by the cross validation criterion. The result of the external validation is defined by the

statistic parameter root mean square error of prediction (RMSEP) and external validation coe fficient of determination (R<sup>2</sup>**ext**).

A key factor in determining the quality of a model is the optimum number of principal components (PCs) (or PLS factors) to include in the model. Too many PCs results in overfitting and too few PCs result in a low accurate model. There are various criteria to select the appropriate number of PCs; based on the variation of the eigenvalues with the number of PCs [30]; based on minimum value of RMSECV or an optimal RMSECV value that uses the significance F-test, according to Haaland and Thomas criteria [31]. In this work, the latter criterion was chosen.

### **3. Results and Discussion**

### *3.1. Wet Chemical Composition*

The chemical composition of olive pruning biomass is presented in Table 1. Results include mean value of composition, standard deviation (SD), coe fficient of variation (CV), and concentration range.

In OTP biomass, cellulose content is in the range of 8.6–19.8% (w/w). Hemicellulose, lignin content, and ash content from OTP samples are in the range of 9.4–16.3%, 15.0–19.9%, and 5.1–10.0% (w/w), respectively. It is worthwhile to mention that this lignocellulosic residue has an extractive content close to 40% (range 32.5–46.7%, w/w). The coe fficient of variation (CV) expresses the variation as a percentage of the mean and is often preferred as a measure of data dispersion (SD), since SD increases in proportion to concentration values. It is important to notice the low value of the coe fficient of variation for lignin and extractives. The compositional variance is a factor to consider in order to achieve robust models.

The proportion of the extractive fraction was greater than that reported for olive tree pruning; extractive contents ranged from 23.3% (w/w) [32] to 31.4% (w/w) [33]. The variability of composition was attributed mainly to the heterogeneity of the residue (changing proportions of small branches and leaves). The high proportion of extractives could be related to a higher content of leaves in the original samples; high content in extractives in olive leaves (38.8% dry weight) have been reported by other authors [34].

The summative mass closure gives a value of 88.3%; the remaining percentage (11.7%) may be attributed to other minor compounds such as acetyl groups (from hemicellulose fraction), crude protein, and other unanalyzed components, like pectins.


**Table 1.** Summary of chemical composition of calibration set. All data (% w/w) are on a moisture-free basis.

### *3.2. NIR Calibration*

### 3.2.1. Selection of Wavenumber Regions

The NIR radiation covers a wavelength range between 750 and 2500 nm (13,000 to 4000 cm<sup>−</sup><sup>1</sup> in wavenumber units). This range includes first, second, third overtones, and a combination of the fundamental bands. As the spectral information is redundant, the wavenumber selection can improve the robustness of multivariate calibration models if the right choices are made. The use of derivative filters was explored as a tool for spectral resolution enhancement. A set of reflectance spectra (expressed in absorbance unities vs. wavenumber) of olive tree pruning samples are shown in Figure bands between 10,000 cm<sup>−</sup><sup>1</sup> and 7500 cm<sup>−</sup><sup>1</sup> have been associated with third overtones and their low intensity and excessive noise makes them less suitable to calibration process (He and Hu, 2013). The greatest variability between spectra is observed in the range 7500–4200 cm<sup>−</sup>1. We can observe this variability in first derivative spectra represented in Figure 1B. There are three interesting zones: 7100 to 6900 cm<sup>−</sup>1, 6000 to 5600 cm<sup>−</sup><sup>1</sup> (associated with lignin and extractives first overtones bands), and the region between 5500 and 4000 cm<sup>−</sup>1. The latter includes the main band of water at approximately 5200 cm<sup>−</sup><sup>1</sup> and combination bands associated with stretching vibrations of CH, CH2, and CH3 bonds of carbohydrates, although the spectral noise in this range is high. In Figure 1C, we can observe the second derivative. There are two well-resolved bands in the second derivative associated with phenolic compounds at 6900 and 5980 cm<sup>−</sup>1. However, care should be taken when choosing the wavenumber range, in order to avoid a loss of information. A reduced wavenumber region (7500–5500 cm<sup>−</sup>1) was also selected to create calibration models and compare the results with those obtained from the full spectral range studies. This range mainly includes the first overtone bands, corresponding to OH stretching vibrations associated with carbohydrates and phenolic groups, and CH stretching vibrations of both aliphatic and aromatic bonds [35]. The reduced spectral range excludes water principal band extending between 5200 and 5000 cm<sup>−</sup>1, which consists of multiple overlapping bands. The apparent location of these bands changes as the spectra are measured from one to another.

**Figure 1.** Near infrared spectra of olive tree pruning samples: (**A)** Raw spectra, (**B**) first derivative spectra; and (**C**) second derivative spectra.

### 3.2.2. NIR Calibration Development

PLS regression was performed on the near infrared spectra by using spectral preprocessing methods and using different spectra ranges. Results of the best prediction models of chemical components of OTP using full spectra (10,000–4000 cm<sup>−</sup>1) and selected spectra region (7500–5500 cm<sup>−</sup>1), respectively, are presented in Table 2. The results show how the calibration quality varies with the wavenumber range and spectral mathematical treatments. In all cases, the application of Savitzky–Golay derivation with smoothing step (five points) is essential to reach good correlations. The MSC and SNV were compared as normalization methods (either alone or in combination with first derivative filter). The two preprocessing methods gave similar results.

Regarding cellulose and extractives content, the models based on the restricted wavenumber 7500–5500 cm<sup>−</sup><sup>1</sup> gave the best results using first derivative with SNV for cellulose and first derivative with SNV as pretreatment for extractives. The coefficient of determination for calibration (R2cal) indicates a good linear fit (0.95 for cellulose and 0.91 for extractives), while lower errors of prediction (RMSECV) were found for the restricted range model: 1.75 for extractives and 0.94 for cellulose. The coefficients of determination (R2cv) also give the best results in restricted-range models (0.80 for cellulose and 0.65 for extractives).

For hemicellulose and lignin content, the models based on the restricted wavenumber 7500–5500 cm<sup>−</sup><sup>1</sup> gave similar results to those obtained with full spectra range, which indicates that reduction of wavenumber range does not improve the calibration results. For hemicellulose content, the statistics of fit were also equivalent (0.72 and 0.73). The same is true for validation parameters, RMSECV give 0.88 and R2cv 0.72 for reduced range and 0.89 and 0.71 for full spectrum, respectively.

For total lignin, although the R2cal values (0.56) indicate some degree of fit, it is remarkable the relatively low correlation in all models studied and the rapid increase of prediction errors when introduce a greater number of principal components in model. The data pretreatment in the two wavenumber ranges (full and restricted) was the second derivative. Good fits with low number of PCs are achieved, but the prediction quality becomes noticeably worse and RMSECV values are higher than the other calibration models.

Organic nitrogenous compounds present in this agricultural waste biomass, in addition to an insufficient organic nitrogenous compounds extraction, can cause interference with the wet chemical characterization method used [36]. The interference with the acid insoluble lignin values caused by the presence of organic nitrogen compounds has been explained by the Maillard reactions. Indeed, with the wet chemical method utilized, during the second hydrolysis step (121 ◦C, 1 h), a fraction of the organic nitrogen compounds that was not removed during extraction step, could react with sugars produced during cellulose and hemicellulose, forming insoluble substances that could cause an overestimation of the lignin values.

For ash content, prediction models for OTP developed with the full spectra regions exhibited the best correlation values (R2cal of 0.96). The RMSECV (0.52), clearly indicate a best predictive power compared to reduced spectral range models (0.74).


**Table 2.** Results of PLS1 calibration and prediction models developed for chemical composition of olive pruning biomass (% w/w).

The predictive power data represented in Table 2 give poor results in some cases. There are various factors that account for these results. First, the variability of the biomass composition data, expressed as standard deviation of the mean or coefficient of variation, was particularly low in the case of hemicellulose and lignin. The calibration and prediction results obtained are similar to those tested by other authors on biomass feedstocks, as yellow poplar [23], Miscanthus [25], or a mixture of biomasses [21]. The relatively poor values of calibration and external validation statistics associated with lignin have been also described for other complex biomass such as Eucalyptus [37].

Figure 2 provides a graphical representation of the prediction ability for the five parameters according to cross validation and external samples analysis.

**Figure 2.** Near infrared (NIR)-predicted vs. measured plot of cellulose (**a**), hemicellulose (**b**), lignin (**c**), ash (**d**) and extractives (**e**) content in olive tree pruning: (-) cross validation and (-) external validation.

### 3.2.3. External Validation of Model

The 15 samples not included in calibration model were used for external validation. These samples were chosen so that the mean and standard deviation are very similar to the calibration samples. Figure 2 shows also the prediction results for cellulose, hemicellulose, lignin, ash, and extractives corresponding to 15 samples of OTP according to the best model for each parameter depicted in Table 2. The correlation between predicted composition and measured composition is based on the optimal calibration applied over every analyte (Table 2). The predictive ability of the model to these external samples is defined in this work by the root mean square error of prediction (RMSEP), which is a measure of the variability of the difference between the predicted and reference values for a set of validation samples. The RMSEP values are shown in Table 2, and the RMSEP values found for the five parameters for whole range wavenumber method are: 1.75 for cellulose, 0.95 for hemicellulose, 2.25 for extractives, 1.38 for lignin, and 0.51 for ash. This RMSEP values are comparable to cross-validation values, although in the case of cellulose, the RMSEP value is about 60 percent larger than the RMSECV one. For the models constructed using reduced spectral range, RMSEP give the following values: Extractives 2.33, cellulose 1.06, hemicellulose 0.84, lignin 1.00, and ash 0.80. Nevertheless, coefficient of determination (R2ext) for external validation shows higher values than cross-validation coefficients.

As a summary of the discussion above, it can be stated that the analyses of statistical data collected give an idea of the predictive capacity of the models. The RMSEC values indicate a good fit, but RMSECV values denote only discrete ability of prediction. Some other criteria are also available for this purpose. There are references about 0.5 as a threshold value of R2cv in LOO cross validation; values over 0.5 are indicative of good prediction capacity [38], although it is true that LOO procedure utility as goodness of prediction has been questioned [39]. But, in any case, the calibration models of this work have values over 0.5 except for lignin. It is not easy to explain this poor behavior for lignin. On the one hand, it must be pointed out that the high content in extractives of this biomass may be interfering with spectral data analysis associated with lignin, only partly resolved applying the second derivative. On the other hand, it is important to consider that the total lignin content value comprises both the insoluble and the acid soluble lignin, and the later measurement method may imply errors associated with the use of an absorptivity coe fficient not specifically determined for OTP. Moreover, the acid insoluble lignin could be overestimated by using the wet chemical analysis due to condensation reaction of relatively high concentration of protein found in OTP samples (around 8%, w/w) [40].

For ash content measurement, previous studies have shown that is possible to determine inorganic compounds using NIR spectroscopy [41]. Simple inorganic constituents will not absorb directly in NIR region, and the ash content was indirectly determined through correlation with NIR absorbing organic compounds. The influence of inorganic elements on organic compounds NIR bands is expected to occur over full spectrum and not only in the NIR range of 7500–5500 cm<sup>−</sup>1. The monatomic ions can be coordinated with C–H, N–H, and O–H bonds, which produce absorption bands across the whole NIR spectrum.

Finally, the comparison between coe fficients of determination calculated for the two methods of validation (internal and external), reflects a best fit of the external sample set, but taking into account the relatively low number of samples available, the use of cross validation as a reference evaluation of calibration models is justified.
