1. Introduction
Biomass is an important carbon-neutral, renewable bio-resource that is widely available throughout the world. It mainly consists of three polymers: cellulose, hemicellulose, and lignin, whose composition varies based on the type of biomass [
1]. Hardwood and herbaceous biomass contain approximately 43–47% and 33–38% cellulose, 25–35% and 26–32% hemicellulose, and 16–24% and 17–19% lignin, respectively [
2]. This composition of biomass can be converted into useful energy through various processes, such as combustion, gasification, torrefaction, or fermentation, making it a suitable alternative to fossil fuels. However, its low energy density, high moisture content, and high oxygen–carbon ratio make it challenging to store, transport, and utilize effectively. Therefore, a deep understanding of biomass properties is necessary to design the best thermal conversion methods [
3,
4,
5]. In the current scenario, biomass is used mainly by the residential (cooking and heating) and industrial (combined heat and power) sectors through direct combustion, which negatively impacts health, the economy, energy, and the environment [
6]. Research on bio-based energy technologies, such as clean cooking stoves, gasifiers, biogas, bio-char, bio-briquettes, and pellets, have yielded strong results in laboratory settings. However, due to inadequate and unreliable knowledge regarding the properties of biomass fuel, the overall efficiency and performance of these technologies remain only satisfactory. Additionally, various operation and maintenance challenges persist. Trading biomass based on volume and weight rather than its actual energy properties is still common. Therefore, the rapid, reliable, and non-destructive assessment of biomass properties is of utmost importance for identifying the actual energy potential and for proper technical and monetary management and utilization [
5].
Biomass can be assessed for energy usage by evaluating its HHV and ultimate analysis. The HHV is an important and standard indicator of the energy content of biomass [
7]. A bomb calorimeter is used to measure the HHV, which is destructive in nature [
8]. The ultimate analysis provides information on the elemental composition of biomass in terms of wt.% of C, H, N, S, and O. The heating value of the biomass is directly correlated with C, H, and O composition [
9]. Biomass with higher C and H and/or O and H contents and lower N and S contents is recommended for energy usage as it improves the HHV of the biomass [
9,
10].
Biomass is a good absorber of NIR radiation in the range of 3595 to 12,489 cm
−1. It predominantly interacts with the bonds of non-symmetrical molecules, including C, O, H, and N [
11,
12], making it suitable for use in conjunction with NIRS and chemometrics for assessing the energy-related properties of biomass, including HHV and ultimate analysis parameters, such as C, H, N, S, and O [
13]. Several previous studies have utilized NIRS to develop models for rapid and accurate measurement of various biomass properties for energy usage. For instance, Posom et al. [
14] developed a reliable online method for measuring the HHV of sugarcane using NIRS. Phuphaphud et al. [
15] developed spectroscopic models using visible and shortwave NIR to predict and classify the energy content of growing cane stalks for breeding programs. Huang et al. [
10] developed a prediction model for the HHV as well as the elemental composition (C, H, and N) of straw using NIRS. Posom et al. predicted the HHV [
3] and elemental composition (C, H, N, O, and S) [
16] of grounded bamboo using NIRS. Skvaril et al. [
17] reviewed the application of NIRS in biomass energy conversion processes. Zhang et al. [
18] studied the fast analysis of HHV and elemental composition of sorghum biomass using NIRS. Xue et al. [
19] studied the use of an online NIRS system for measurement of crop straw fuel properties. These studies demonstrate the potential for NIRS to provide rapid, reliable, and non-destructive alternative methods for characterizing biomass for energy usage compared to traditional destructive thermal analysis techniques.
NIRS, combined with a broad range of wavelengths and suitable chemometric models, offers extensive applications in various fields, such as food quality control, agriculture, biofuels, and drug analysis [
13]. NIRS has been successfully employed for on-line, at-line, off-line, and in-line analysis, using instruments from different NIR ranges. For instance, in-line fiber-optic NIR spectra (300–1160 nm) have been utilized to classify durian pulp samples based on their dry matter content and soluble solids content [
20]. FT-NIRS (800–2500 nm) has enabled rapid measurement of macronutrients such as nitrogen, phosphorus, and potassium in durian leaves, aiding in the production of high-quality durian fruits through optimal fertilization practices [
21]. FT-NIR (700–2500 nm) has been employed to predict total phenolics and antioxidants in hulled and naked oats of different genotypes [
22]. Vis-NIR (570–1031 nm) and Mid-NIR (860–1760 nm) spectroscopy have been utilized for starch content prediction in cassava [
23]. The Micro-NIR portable spectrometer (900–1676 nm) has been found to have applications in the classification and quantification of crude oils and fuels [
24]. Additionally, a portable NIR analyzer (1300–2600 nm) has been used for rapid confirmation of the presence of illicit drugs, such as cocaine [
25]. NIRS provide better spectral reproducibility with a higher signal-to-noise ratio compared to other complementary analytical techniques, such as Raman and IR spectroscopies, making it one of the most important parameters in quantitative calibration [
26]. The better penetration depth in samples, minimal or no sample preparation, shorter acquisition times, and wide range of application in diverse fields highlight the multidisciplinary nature of NIRS. In contrast, the presence of a strong water absorption band in the NIR region limits the applicability of NIRS for samples with a high water content. In such cases, Raman spectroscopy can be a suitable alternative as it is relatively unaffected by water interference and can effectively analyze aqueous solutions and biological samples without significant water-related issues [
27,
28]. However, it is important to note that Raman scattering is inherently a weak phenomenon, often requiring longer acquisition times and being more sensitive to sample fluorescence [
26]. In addition, the cost of instrumentation for Raman and IR spectroscopy is higher compared to NIRS. These factors showcase the acceptance of NIRS as a rapid, reliable, and non-destructive method, resulting in energy, environmental, cost, and time savings.
Despite NIRS being a rapid, reliable, and non-destructive analytical method, individual calibration models based on spectral data and each reference parameter must be developed for the NIR-based assessment of biomass properties. This procedure might be time-consuming and costly; however, in the long term, it will be beneficial for rapid and reliable evaluation procedures to assess biomass properties for their different applications.
In this study, a built-in code in MATLAB-R2020b was used to develop PLSR calibration models using spectral data from ten different biomass varieties (including five fast-growing tree varieties and five agricultural residue varieties); reference data obtained from a bomb calorimeter for HHV (J/g); a CHNS/O elemental analyzer for wt.% of C, N, H, S, and O; and a thermogravimetric analyzer for wt.% ash content. The main objectives of this research are:
To develop PLSR models using no preprocessing, traditional preprocessing, multi-preprocessing 5-range and 3-range methods, GA, and SPA for assessing biomass properties for energy usage by employing NIRS.
To compare the performance of the PLSR models based on R2C, RMSEC, R2P, RMSEP, RPD, and bias.
To select the better performing PLSR-based model for each parameter and establish it as a reliable and non-destructive alternative method for rapidly assessing biomass properties for energy usage.
The research outcomes of this study have practical applications in real life. The developed model offers a rapid, reliable, and non-destructive alternative to traditional laboratory methods for assessing biomass properties. This benefits biomass traders in determining a fair price based on actual energy properties, rather than relying solely on volume or weight. Industries relying on biomass for energy can optimize system efficiency and cost-effectiveness through informed feedstock selection. The model is applicable for process monitoring and quality control in biomass-based energy production facilities. This facilitates real-time adjustments by engineers and operators, ensuring consistent and efficient energy production. Policymakers, energy companies, and researchers can utilize these findings for the proper identification, management, and utilization of bio-resources to meet future energy demands. Moreover, the research outcomes pave the way for NIR-based research in various fields to adopt or enhance similar approaches.
2. Materials and Methods
Figure 1 shows the overall research methodology for the evaluation of HHV and ultimate analysis parameters of grounded biomass for energy usage using NIRS combined with PLSR.
2.1. Sample Preparation
The biomass samples were collected from the Terai low flatland and mid-hill regions of Nepal, with altitudes ranging from 86 to 1940 m above sea level. The study included five fast-growing species: (1) Alnus nepalensis, (2) Pinux roxiburghii, (3) Bombusa vulagris, (4) Bombax ceiba, and (5) Eucalyptus camaldulensis. Also included were five agricultural residues: (1) Zea mays (cob), (2) Zea mays (shell), (3) Zea mays (stover), (4) Oryza sativa, and (5) Saccharum officinarun. Alnus nepalensis and Pinux roxiburghii were collected from the mid-hill region; Bombax ceiba, Eucalyptus camaldulensis, and Saccharum officinarum were collected from the Terai region; and Zea mays (cob, shell, stover), Bombusa vulagris, and Oryza sativa were collected from both Terai and the mid-hill region of Nepal.
During preparation, all collected samples except for
Oryza sativa were manually chopped into smaller pieces, i.e. less than 30 mm × 15 mm (refer to
Figure 2a); dried in the open sun; and stored in an airtight aluminum bag to maintain their biomass properties by preventing the exchange of air and moisture during transport to the Near-Infrared Spectroscopy Research Center for Agricultural Product and Food at School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Thailand. The samples were ground using a multi-functional high-speed disintegrator (WF-04, Thai grinder, Thailand). The particle size of the grounded biomass was evaluated at the Scientific and Technological Research Equipment Center (STREC) at Chulalongkorn University, Bangkok, Thailand, using the instrument, Mastersizer 3000 (MAL1099267, Hydro MV).
Figure 3 shows the representative particle size distribution of the ground biomass used in this research, ranging from 0.01 to 3080 µm. The ground samples were stored in airtight plastic ziplock bags before and during the experiment.
2.2. Spectral Data Collection
As shown in
Figure 2c,d, the grounded biomass samples were placed in a glass vial (20 mm diameter and 48 mm height) and scanned using an FT-NIR spectrometer (MPA, Bruker, Ettlingen, Germany) in a transflectance mode at the controlled temperature of 25 ± 2 °C. The spectrometer operates with a resolution of 16 cm
−1, with a background scan time and sample scan time of 32 scans (average), logging absorbance data-log(1/R) within wavenumber range of 3595 to 12,489 cm
−1, where R is the diffuse reflectance detected from the grounded biomass sample. Prior to scanning, the FT-NIR spectrometer was normalized by performing a gold plate background scan. The primary purpose of performing a background scan on every new ground sample was to compensate for instrumental drift and ambient environmental influences, such as temperature, light, relative humidity, etc., on the measurement setup [
12].
All the grounded samples were scanned twice without changing their positions, with no NIR leakage occurring during scanning. The average absorbance value for each sample, with respect to its wavenumber, was considered spectroscopic data for model development.
Figure 4a shows the raw spectrum of ten different grounded biomasses within the wavenumber range between 3594.87 to 12,489.48 cm
−1, which were used to evaluate the HHV and ultimate analysis parameters.
2.3. Reference Analysis
Due to the complex nature of NIR absorbance data, it must be correlated with reference values obtained using a standard laboratory method [
29]. Thus, the reference data, which include HHV, C, H, N, S, and O, were evaluated after being scanned from a FT-NIR spectrometer.
2.3.1. Higher Heating Value (HHV)
The HHV of the grounded biomass is measured using the isoperibol method with an automatic bomb calorimeter (IKA C 200, Staufen, Baden-Württemberg, Germany). Before the start of the experiment, the bomb calorimeter was calibrated with two tablets of benzoic acid (IKA C 723), each with a total weight of 1.0092 g and a gross calorific value of 26,462 J/g. To verify the calibration, the test was repeated with a single tablet of benzoic acid and the results were compared. A cotton thread (IKA C 170.4) with a gross calorific value of 50 J/cotton twist was used for ignition in the bomb to measure the HHV of the grounded sample. To ensure that the space in the bomb was saturated with water vapor throughout the entire experiment period, 2 mL of aqua pro (IKA C5003.1) were added into 1 L of water and poured into the bomb calorimeter vessel [
14]. The HHV (J/g) of each grounded sample was replicated twice, and the average value was considered as reference data for model development. A quantity of 0.5 ± 0.2 g of grounded sample was weighted using an electronic balance (Mettler Toledo JS1203C) with a resolution of 0.0001 g. Including preparation, the total experimental time to measure HHV for each sample was approximately 40 min.
2.3.2. Ultimate Analysis
The ultimate analysis includes quantification of wt.% of C, H, N, S, and O on a dry basis in the ground biomass to determine the major elemental composition. The wt.% of C, H, N, and S in the ground sample were measured using the CHNS/O analyzer (Thermo Scientific
TM FLASH 2000, Waltham, MA, USA). The wt.% of ash content in the ground biomass was measured using the thermogravimetric analyzer (TG 209 F3 Tarsus, Netzsch, Bavaria, Germany). The wt.% of O on a dry basis in the ground biomass sample was calculated as a difference [
30]:
2.3.3. Outlier and Standard Error of Laboratory
Outliers for all the measured reference data were calculated using the following equation, where X
i is the measured value of sample i, and
and SD are the average and standard deviation of the measured values of all samples, respectively:
If Equation (2) is satisfied for any sample i, the sample is considered as an outlier and is not considered in the total dataset for model development [
31].
Similarly, the standard error of laboratory (SEL), which explains the precision of the reference method, was calculated for the bomb calorimeter and CHNS/O elemental analyzer using the following equation, where y
1 and y
2 are the replicates of each sample reference value measurement and N
T is the total number of experiment samples:
2.4. Spectral Preprocessing
Spectral preprocessing is one of the most important components of NIR calibration. Ten different varieties of grounded biomass samples were scanned to collect spectral data, whose physical, chemical, and biological properties may vary from sample to sample. Although the raw spectrum for all the biomass samples appears similar, instrumental errors, variations in light scattering during sample scanning, and a large number of redundant and interfering variables can introduce unwanted and harmful signals into the spectrum (refer to
Figure 4a). To improve spectral features, it is important to remove noise, address overlapping peaks and baseline shifts, handle collinearity within the spectral data, and enable easy data interpretation for calibration [
32,
33], and NIR spectral preprocessing is necessary before model development.
To date, models have been developed using a traditional preprocessing approach (refer
Figure 4b) on the entire available wavelength range for the prediction and evaluation of respective samples. However, there has been a lack of exploration regarding the pretreatment of raw spectra by employing different preprocessing techniques on distinct sections of the entire wavelength range. It is thought that a multi-preprocessing approach, i.e., a unique preprocessing technique that divides the entire spectrum into different sections using different spectral preprocessing methods based on random pairs, will improve the assessment of the biomass properties for energy usage using NIRS. Based on this hypothesis, this study introduced a novel multi-preprocessing approach: the 5-range and 3-range methods (refer to
Figure 4c,d) as unique components to improve the assessment of biomass properties using NIRS. The research outcomes from the multi-preprocessing technique with PLSR will serve as a pivotal milestone in the research and development of NIRS. This will benefit NIRS-related research from diverse fields by permitting the upgrading of existing models and their effective utilization in various applications.
Therefore, in this study, the raw spectrum was subjected to two distinct pretreatment approaches. The first approach adhered to the traditional methodology, entailing the application of a single spectral preprocessing method to the entire wavenumber range (3595 to 12,489 cm−1). Meanwhile, the second approach introduced a novel and innovative multi-preprocessing technique, whereby the entire wavenumber range was partitioned into multiple sections and underwent pretreatment using a comprehensive combination of various preprocessing methods. For the traditional approach, ten different types of spectrum pretreatment methods were used for the calibration models. These included (1) first derivative (segment = 5 and gap = 5), (2) second derivative (segment = 5 and gap = 5), (3) constant offset, (4) SNV, (5) MSC, (6) vector normalization, (7) min-max normalization, (8) mean centering, (9) first derivative (segment = 5 and gap = 5) + vector normalization, and (10) first derivative (segment = 5 and gap = 5) + MSC.
For the multi-preprocessing approach, the entire wavenumber range was divided into different sections and pretreated with various pretreatment combination sets obtained from seven different preprocessing methods, as indicated by the following markings: 0 = empty (all the absorbance values = 0), 1 = raw spectra, 2 = SNV, 3 = MSC, 4 = first derivative (5,5), 5 = second derivative (5,5), and 6 = constant offset.
For the multi-preprocessing 5-range method (refer
Figure 4c), the following procedures were adopted:
- (1)
Equally dividing the entire wavenumber range into five sections: 3625.72–5392.30 cm−1, 5400.02–7166.59 cm−1, 7174.31–8940.89 cm−1, 8948.60–10,715 cm−1, and 10,722.9–12,489.48 cm−1. However, since the wavenumber range from 3594.87 to 12,489.48 cm−1 is not equally divisible by 5, the last four independent variables were excluded from the total dataset, resulting in 1150 out of 1154 variables being considered for model development.
- (2)
Generating all possible combinations of multi-preprocessing sets from 0 to 6.
- (3)
Selecting the most effective multi-preprocessing combination by evaluating different numbers of random pairs to develop the PLSR-based model.
Similarly, for the multi-preprocessing 3-range method (refer
Figure 4d), the following procedures were adopted:
- (1)
Dividing the entire wavenumber range into three sections: 3594.87–5492.59 cm−1, 7498.31–5500.30 cm−1, and 7506.02–12,489.48 cm−1.
- (2)
Generating all possible combinations of multi-preprocessing sets from 0 to 6.
- (3)
Selecting the most effective multi-preprocessing combination by evaluating different numbers of random pairs to develop the PLSR-based model.
Figure 4c,d shows the spectrum of the grounded biomass obtained from the multi-preprocessing method with the (a) 5-range and (b) 3-range methods, respectively. In
Figure 4c, the raw spectrum was pretreated with the preprocessing combination set of 3, 0, 1, 0, and 1—i.e., MSC from 3625.72–5392.30 cm
−1, empty from 5400.02–7166.59 cm
−1, raw spectra from 7174.31–8940.89 cm
−1, empty from 8948.60–10,715 cm
−1, and raw spectra from 10,722.9–12,489.48 cm
−1. Similarly, in
Figure 4d, the raw spectrum was pretreated with the preprocessing combination set of 4, 4, and 6—i.e., second derivative from 3594.87–5492.59 cm
−1, first derivative from 7498.314–5500.30 cm
−1, and constant from 7506.02–12,489.48 cm
−1. The best combination set for multi-preprocessing is determined by the optimum LVs obtained from full cross-validation.
MATLAB-R2020b (MathWorks, Natick, MA, USA) built-in code was used to select the optimal combination set of multi-preprocessing methods for developing a PLSR calibration model.
2.5. Model Development
The accuracy of the model is one of the major concerns of NIRS. Accuracy can be improved by using different spectral pretreatments and appropriate data analysis methods. Various research articles related to NIRS modeling have concluded that PLSR is one of the most effective and commonly used quantitative analysis techniques [
14,
34,
35,
36]. Therefore, this study proposes PLSR-based models that can handle highly collinear spectroscopic data [
37] for the assessment of grounded biomass properties. In this study, the following models were developed to match its objectives: (1) full wavenumber range–PLSR with no preprocessing and traditional preprocessing techniques, (2) multi-preprocessing PLSR 3-range method, (3) multi-preprocessing PLSR 5-range method, (4) GA-PLSR, and (5) SPA-PLSR.
To develop PLSR models using different methods, the total data obtained after removing outliers was manually divided into an 80% calibration set and a 20% validation set, as illustrated in
Figure 1. The total data set consists of ten different varieties of biomass comprised of five fast-growing trees and five agricultural residues. Therefore, it is crucial to stratify the total dataset to ensure both the calibration and validation sets encompass representative samples, covering the entire range of variation within the overall sample population. Allocating 80% of the total dataset as the calibration set, which includes the maximum and minimum reference values, ensures proportional representation of all biomass varieties in the model development process. This approach reduces bias, facilitates effective learning of underlying patterns and relationships, and helps prevent issues, such as overfitting or underfitting, to generate a regression model [
33]. The calibration set was first subjected to full cross-validation to select the optimal number of LVs. This number ensures the smallest possible standard error for data analysis—considering too few LVs leads to underfitting, and considering too many LVs leads to overfitting. If several LVs show similar or comparatively better model performance, the smallest number of LVs was selected for model development [
38]. The PLSR models for assessing biomass properties for energy usage were created using in-house code in MATLAB-R2020b (Mathworks, USA).
GA and SPA are the wavelength selection methods that select the highly influential wavenumbers from the spectra and have been shown to provide better performance when combined with PLSR compared to PLSR with the full wavenumber range only, thus avoiding overfitting [
39,
40,
41]. SPA selects the variables with minimum collinearity and assesses them based on the value of the root mean square error obtained from the validation set. In SPA, uninformative variables are eliminated until the model’s performance no longer increases [
42]. GA selects variables with a minimum amount of redundant information, starting with one variable and adding a new one to the loop in each iteration, thereby maximizing its fitness. The model developed with GA-PLSR shows the lowest prediction error as it maximizes the fitness and co-variance between the spectral and reference data [
43,
44]. In GA-PLSR and SPA-PLSR, the new calibration dataset was processed through full-cross validation to select the optimum LVs, which were then considered for PLSR model development.
The accuracy of the NIR model should be compared with the reference method. Therefore, the performance of the model was determined in terms of R
2c, RMSEC, R
2P, RMSEP, RPD, and bias [
45]. These parameters can be calculated as follows, where y is the measured value,
is the predicted value, i is subscript used to indicate the number of the sample,
is the mean of the measured value, N
T is the number of samples, SD is the standard deviation of the measured values of the validation set, and n is the number of samples in the validation set:
The better model was selected based on the tradeoff value between the highest R
2c, R
2P, and RPD and the lowest RMSEC, RMSEP, and bias. In this study, the performance results, namely the R
2 and RPD value, were interpreted based on the recommendations of Williams et al. (2019) [
46] and Zornoza et al. (2008) [
47], respectively.
As per the recommendations of Williams et al. (2019), R
2 up to 0.25 are not usable for NIRS calibration; 0.26–0.49 indicates poor calibration, and reasons for this should be researched; 0.50–0.64 is considered okay for rough screening; 0.66–0.81 is okay for rough screening and some other appropriate calibrations; 0.83–0.90 is usable with caution for most applications, including research; 0.92–0.96 is usable in most applications, including quality assurance; and 0.98+ is excellent and can be used in any application [
46]. Similarly, according to Zornoza et al. (2008), an RPD value of less than 2 is considered insufficient for applications; RPD between 2 and 2.5 makes approximate quantitative predictions possible; RPD values between 2.5 and 3 are considered good for prediction; and RPD greater than 3 indicates an excellent prediction [
47].
3. Results and Discussion
3.1. Comparison of Near-Infrared Spectra of Ground Biomass with Pure Cellulose and Hemicellulose
The energy potential and conversion efficiency of fast-growing trees and agricultural residues can be influenced by the composition of lignocellulosic matter [
48].
Figure 5 compares the near-infrared spectra of pure cellulose and pure lignin with 90 samples of fast-growing trees and 110 samples of agricultural residues, all exhibiting average absorbance values. The figure reveals that the vibration band between approximately 5181–6150 cm
−1 corresponds to the lignin band (with low absorbance for cellulose), while the range between approximately 6150–6800 cm
−1 corresponds to the cellulose band (with low absorbance for lignin) [
34]. Notably, the spectra of fast-growing trees and agricultural residues exhibit distinct peaks resembling those of both pure cellulose and pure lignin at approximately 4019 cm
−1, 4405 cm
−1, 4762 cm
−1, 5181 cm
−1, and 6897 cm
−1. This resemblance of distinct peaks provides strong evidence that the ground biomass of fast-growing trees and agricultural residues contains cellulose and lignin.
The peak at 4019 cm
−1 results from the combination of C-H stretching and C-C stretching in cellulose, whereas the peak at 4405 cm
−1 corresponds to the combination of O-H stretching and C-O stretching in cellulose. The peak at 4762 cm
−1 corresponds to the combination of O-H bending and C-O stretching in polysaccharides. The peak at 5181 cm
−1 corresponds to the combination of O-H stretching and HOH bending in polysaccharides. The peak at 6897 cm
−1 corresponds to the first overtone of the fundamental O-H stretching band in water and starch [
49].
The cellulose and lignin content in biomass is a critical factor in determining its HHV. Biomass with a higher lignin content and lower cellulose content exhibits an improved HHV, and vice versa [
50]. This finding confirms the suitability of the selected fast-growing tree and agricultural residue varieties for various applications that rely on lignocellulose matter. These applications include biomass trading for direct combustion, biomass pellet production, the paper and pulp industry, biomass-based construction and building material, and bioenergy and biofuel production, among others.
3.2. Higher Heating Value and Ultimate Analysis in Ground Biomass
Figure 6 displays a histogram of the HHV and ultimate analysis values used in the calibration set and validation set for the development of various PLSR-based models. The calibration set is represented by the blue color, while the validation set is represented by the red color. Equation (2) was employed to calculate the outliers, and any data points that satisfied the defined relation were excluded from the model development process.
The normal distribution of all the reference datasets, i.e., HHV (J/g) and wt.% of C, N, H, and O, on a dry basis—used for model development—was analyzed using SPSS 16.0. A histogram analysis revealed that all the datasets exhibit a bell-shaped normal distribution. This suggests that the data points are clustered around the mean value, demonstrating a nearly normal distribution. Additionally, the calculated standard deviations of these datasets (refer to
Table 1) were found to be low, further indicating that the data points are closely packed around the mean.
Furthermore, a one-sample Kolmogorov–Smirnov test was performed using SPSS 16.0 to calculate the p-values for HHV (J/g) and wt.% of C, N, H, and O, resulting in values of 0.704, 0.060, 0.368, 0.565, and 0.119, respectively. Since all obtained p-values are greater than the significance level of 0.05, the reference data used for modelling is considered to have a uniform distribution.
The findings regarding the normal distribution, low standard deviations, and the concentration of data points around the mean value support the validity and reliability of the model developed in this research.
Table 2 presents the average HHV, ultimate analysis parameters (C, N, H and O), and ash content of different fast-growing trees and agricultural residues that were included as reference data for developing the model. The HHV is measured in J/g, and the ultimate analysis parameters and ash content are expressed as wt.% on a dry basis. In the case of the biomass samples analyzed using the CHNS/O analyzer (Thermo Scientific FLASH 2000), no sulfur content was detected. This could be attributed to the typically low levels of sulfur present in biomass, which may fall below the lower limit of detection of the analyzer. Therefore, for the purpose of this study, the sulfur content in the ground biomass is assumed to be zero and has not been considered in the model development. The wt.% of O was then calculated using Equation (1).
As per previous research, the HHV of biomass is positively correlated with C and H contents, while it is negatively correlated with O and N contents [
51].
Table 1 indicates that fast-growing trees have higher average values of HHV, C, and H contents and lower O and N contents compared to agricultural residues. These results are consistent with the correlation observed between the measured data of the HHV and elemental composition of ground biomass.
Table 1 shows the statistical summary data for the HHV (J/g) and ultimate analysis parameters, i.e. wt.% of C, N, H, and O on a dry basis used in the calibration set and validation set for the model development.
3.2.1. Higher Heating Value
Out of the 200 samples, 4 were identified as outliers and were removed from the total data set to develop PLSR-based models for evaluating the HHV. The SEL for the bomb calorimeter used to evaluate HHV was calculated to be 255.7708 J/g.
Table 3 displays the optimal result of various PLSR-based models using the full wavenumber range (3594.87–12,489.5 cm
−1) to evaluate the HHV of the ground biomass from the fast-growing trees and agricultural residues.
Figure 7a shows the scatter plot of HHV measured and predicted values from the calibration and validation sets using GA-PLSR. The GA-PLSR with 14 LVs and spectral pretreatment first derivative using 692 important wavenumbers yielded the best performance results, with an R
2C of 0.9505, RMSEC of 188.0117 J/g, R
2P of 0.9574, RMSEP of 170.3282 J/g, RPD of 4.89, and bias of −21.9648 J/g. The model included a sufficient number of homogenous samples, from both fast-growing trees and agricultural residues, for model development and had a wider HHV range, resulting in higher R
2C, R
2P, and RPD, and lower RMSEC and RMSEP values compared to other models. Compared to the full-PLSR model performance, the GA improved the PLSR model accuracy by 8.5069%. Similarly, the multi-preprocessing 5-range method improved the accuracy of the full-PLSR model by 4.1839%. According to Williams et al. (2019) [
46] and Zornoza et al. (2008) [
47], the GA-PLSR model for evaluating HHV is acceptable for most applications with excellent prediction, including quality assurance.
Figure 8 shows the average absorbance values obtained after preprocessing with the first derivative, highlighting the 692 selected wavenumbers (marked in red) obtained from GA, which is within the full spectral range of 3594.87–12,489.5 cm
−1. The figure highlights important peaks in the following ranges: 4003.73–4111.73 cm
−1, 4366.3–4451.16 cm
−1, 5091.45–5114.59 cm
−1, and 5130.02–5292.02 cm
−1, which may significantly influence the model performance.
In the range of 4003.73–4111.73 cm
−1, the wavenumber 4019 cm
−1 represents the combination of C-H stretching and C-C stretching in cellulose and is used as a reference. Similarly, the range of 4366.3–4451.16 cm
−1 includes the reference wavenumber 4405 cm
−1, which corresponds to the combination of O-H and C-O stretching in cellulose. Polysaccharides are characterized by the combination of O-H stretching and HOH bending, which is represented by the reference wavenumber 5102 cm
−1 in the range of 5091.45–5114.59 cm
−1. Additionally, the range of 5130.02–5292.02 cm
−1 includes the reference wavenumber 5200 cm
−1, which corresponds to the combination of O-H stretching and HOH deformation of O-H molecular water [
49]. Lignocellulosic biomass derives its primary energy from cellulose, hemicellulose, and lignin [
52,
53]. As can be seen in
Figure 7, the important peaks with vibration bonds of C-H, C-C, O-H, and C-O stretching and HOH deformation of O-H molecular water correspond to the structure of cellulose and lignin. Therefore, they are likely to have the greatest influence on the assessment of the HHV of ground fast-growing trees and agricultural residues. This study is in line with previous studies by Sirisomboon et al. [
54] and Lestander et al. [
55], in which the authors reported that vibration bonds of C-H, C-C, and O-H stretching contribute significantly to the HHV of bamboo and biofuels, respectively. Additionally, Zhang et al. [
18] reported that the vibration bond of C-H stretching in an aromatic CH
3 structure can be used to assess the HHV of sorghum biomass. Posom et al. [
5] indicated in their study that the vibration of C-H stretching highly influences the prediction of the HHV of
leucaena Leucocephala pellets.
3.2.2. Ultimate Analysis
The sulfur content in the ground biomass samples of fast-growing trees and agricultural residues was not detected using the CHNS/O analyzer (Thermo Scientific
TM FLASH 2000). This may be because the S content in the biomass is too low to be detected [
56]. Therefore, PLSR-based models for the wt.% of S were not developed in this study.
wt.% of C
The SEL for the CHNS/O elemental analyzer used to evaluate the wt.% of C content in grounded biomass was calculated as 1.6936 wt.%.
Table 3 shows the overall optimum results of PLSR-based models for the evaluation of wt.% of C content in the grounded biomass within the full wavenumber range of 3594.87–12,489.48 cm
−1. Out of the 120 samples, 11 samples were identified as an outlier and removed from the total dataset for model development. The model developed through GA-PLSR with spectrum preprocessing of first derivative (gap = 5 and segment = 5) and 9 LVs provided better results with an R
2C, RMSEC, R
2P, RMSEP, RPD, and bias value of 0.7851, 0.9753 wt.%, 0.7217, 0.9740 wt.%, 1.93, and 0.1877 wt.%, respectively. Compared with full-PLSR, the GA-PLSR method improved the model accuracy by 8.5069%. Similarly, the multi-preprocessing 5-range method improved the PLSR model by 8.1842%. The scatter plot of the GA-PLSR method for the wt.% of C content in grounded biomass is shown in
Figure 7b. According to the recommendation by Williams et al. (2019) [
46], the PLSR model with the GA method is usable for rough screening and some other appropriate calibrations, based on the obtained R
2 value. Similarly, considering the RPD value, as suggested by Zornoza et al. (2008) [
47], the model is acceptable for the prediction of wt.% C content in the grounded biomass.
Figure 9 shows the average absorbance values obtained after preprocessing with the first derivative, highlighting the 50 selected wavenumbers (marked in red) obtained from the GA, which is within the full spectral range of 3594.87–12,489.5 cm
−1. The high peaks with positive values marked in red at a specific wavenumber indicate the functional group, spectra-structure, and material type, which might be significant in the assessment of wt.% of C. In
Figure 8, significant peaks can be noticed at 3650, 4019, 4405, 4878, and 7042 cm
−1, respectively.
The peak at 3650 cm
−1 corresponds to the functional group of O-H, the spectral structure with the fundamental stretching vibrational absorption band of O-H (-CH
2-OH), and the material type of primary alcohols. The peak at 4019 cm
−1 corresponds to the functional group of C-H/C-C, the spectral structure of the C-H stretching and C-C stretching combination, and the material type of cellulose. The positive peaks at 4405 cm
−1 and 4878 cm
−1 are associated with the functional group O-H/C-H and a combination of N-H/C-N/N-H amide II and amide III; spectral structure O-H stretching and C-O stretching; and N-H in-plane bend, C-N stretching, and N-H in-plane bend combination with material-type cellulose and amides/proteins, respectively. The peak at 7042 cm
−1 corresponds to an O-H aromatic with the spectral structure of an O-H first overtone of the fundamental stretching band, as well as the material type of hydrocarbons [
49]. Lignin contains a high carbon content [
57]. According to Zhang et al. [
19], vibration bands related to C-H stretching, CH
2, C-H aromatics, O-H stretching, and HOH deformation are essential for predicting the C content of sorghum biomass. Similarly, Posom and Sirisomboon [
58] found that N-H stretching, N-H deformation, C-N stretching, O-H stretching, and C-O stretching of starch significantly contribute to the model development of C content in bamboo. The average absorbance plot for wt.% of C shows the peaks at 3650, 4019, 4405, 4878, and 7042 cm
−1, which complement the vibration bands reported in previous studies and also the spectra of pure lignin and pure cellulose. While these observed vibration bands at different peaks may have a significant impact on the overall performance of the model, this study suggests that the FT-NIRS may not provide sufficiently high-resolution spectra to create an accurate prediction model for wt.% of C.
wt.% of H
The SEL for the CHNS/O elemental analyzer to evaluate wt.% of H content in grounded biomass was calculated as 0.3206 wt.%. The optimal results of different PLSR-based models for evaluation of wt.% of H within the full wavenumber range were presented in
Table 3. Before modeling, outliers from the reference values were calculated, and 27 out of the 120 samples were detected as outliers. Therefore, 93 grounded biomass samples were used for the model development. The best model was developed from the wavelength selection method, GA-PLSR, within the wavenumber range of 3594.87–12,489.48 cm
−1 and spectral preprocessing from SNV. The best performing model for the evaluation of wt.% of H content in the grounded biomass produced an R
2C of 0.8814, RMSEC of 0.1041 wt.%, R
2P of 0.7678, RMSEP of 0.1434 wt.%, RPD of 2.14, and bias of −0.0356 wt.%. The GA-PLSR model exhibits a minimal improvement in model accuracy of 0.0092% compared to the full-PLSR model.
Figure 7c shows the scatter plot of measured versus predicted wt.% of H content in the grounded biomass obtained using GA-PLSR. According to William et al. (2019) [
46], based on the R
2 value, the model can be used for rough screening and some other appropriate calibrations. To improve the performance of the model, it is recommended to include additional representative biomass samples with a high concentration and wide range of wt.% of H content that are uniformly and representatively distributed in both the calibration and validation sets and are obtained from both fast-growing trees and agricultural residue varieties.
Figure 10 shows the average absorbance spectrum that was pretreated with the SNV and uses red marks to highlight the important wavenumbers obtained using GA. The important peaks selected at 4019, 4608, 5155, 6897, and 8163 cm
−1 may have a significant influence on the performance of the model for the evaluation of wt.% of H content in the grounded biomass samples. The peak at 4019 cm
−1 is associated with the functional group of C-H/C-C, and the spectral structure of C-H stretching and C-C stretching combination, with material-type cellulose. The peak at 4608 cm
−1 is associated with the combination of C-H stretching and C-H deformation in alkenes. Similarly, the peak at 5155 cm
−1 corresponds to a combination of O-H stretching and HOH bending in water. The peak at 6897 cm
−1 corresponds to the spectral structure of O-H, arising from the first overtone of the fundamental stretching band, with a material-type starch/polymeric alcohol. The peak at 8163 cm
−1 is associated with the second overtone of the C-H fundamental stretching band and material-type hydrocarbons [
49]. The selected peaks mostly fall within a similar range compared to the study conducted by Posom and Sirisomboon [
58]. This finding supports the results of the current study, indicating that these selected peaks are likely to have a significant influence on the performance of the models.
wt.% of O
Based on the assumption that the sulfur content in biomass is zero, as its wt.% is too low to be detected by instruments, the wt.% of O in biomass is calculated using Equation (1). The optimal results of the PLSR-based models for predicting the wt.% of O content in the grounded biomass are shown in
Table 3. Before modelling, outliers from the reference values were calculated, and 21 out of 120 samples were detected as outliers. Therefore, 99 grounded biomass samples were used for the model development. The best result was obtained from the multi-preprocessing PLSR 5-range method with a spectral preprocessing combination set of 3, 2, 4, 6, and 0, i.e. MSC, SNV, first derivative, constant offset, and empty, respectively, from the range 3625.72–12,489.48 cm
−1, which are equally divided into five sections.
Figure 7d shows the scatter plot for the measured and predicted wt.% of O. With 12 LVs, the best performing model for evaluating the wt.% of O content in the grounded biomass produced an R
2C of 0.6674, RMSEC of 1.4461 wt.%, R
2P of 0.6289, RMSEP of 0.1.5275 wt.%, RPD of 1.7147, and a bias of −0.4456 wt.%. Compared with full-PLSR, the multi-preprocessing 5-range method improved the model accuracy by 4.0085%. Based on Williams et al. (2019) [
46] and Zornoza et al. (2008) [
47], the model with the multi-preprocessing PLSR 5-range method is usable only for rough screening. Therefore, to improve the performance of the model, the inclusion of a larger number of representative samples spanning a wide range of oxygen contents is recommended. This will enable the model to capture the variability in oxygen levels across different biomass compositions. Additionally, minimizing instrumental errors through proper calibration and maintenance of the CHNS/O analyzer and thermogravimetric analyzer is essential. Exploring alternative methods for measuring the ash content in the biomass could also contribute to improving the accuracy of wt.% of O predictions.
Figure 11 shows the regression coefficient plot for wt.% of O content in the grounded biomass, which is obtained from the multi-preprocessing PLSR 5-range method. Significant peaks were observed at wavenumbers 3650, 5155, 5675, 5952, 6330, and 7042 cm
−1. The peak at 3650 cm
−1 corresponds to the O-H functional group typically found in primary alcohols. Similarly, the peaks at 5155 cm
−1 represent a combination of O-H stretching and HOH bending in water. The negative peak at 5675 cm
−1 and the positive peak at 5952 cm
−1 are associated with the spectra-structure of the first overtone of the fundamental stretching band of C-H, with hydrocarbons, methylene, and aromatic hydrocarbons as the material types, respectively. The peak at 6330 cm
−1 corresponds to the functional group of the O-H combination band observed in alcohols, such as R-C-O-H. The peak at 7042 cm
−1 corresponds to the first overtone of the fundamental stretching band of O-H, which is typically present in hydrocarbons and aromatic compounds [
49]. A previous study by Posom and Sirisomboon [
58] showed peaks at similar wavenumbers with vibration bands of C-H aromatic, O-H stretching of alcohol, O-H stretching, and HOH bending of water, which supports the findings of this study. Hence, these vibration bands may have a significant influence on the development of the model for the assessment of wt.% of O in grounded biomass.
wt.% of N
The SEL of the CHNS elemental analyzer for evaluating the wt.% of N content in grounded biomass was calculated as 0.0761 wt.%.
Table 3 shows the optimal outcomes of the PLSR-based models for predicting the wt.% of N content in grounded biomass. Out of the 120 samples, 25 samples were identified as an outlier and removed from the total dataset for model development. The best prediction result of the wt.% of N in grounded biomass was obtained using the multi-preprocessing PLSR 5-range method with a spectral preprocessing combination set of 4, 4, 5, 3, and 4, which included the first derivative followed by the first derivative, second derivative, MSC, and first derivative, respectively, in five equally divided sections from 3625.72–12,489.48 cm
−1.
Figure 7e shows the scatter plot of the measured versus predicted wt.% of N content in the grounded sample using the multi-preprocessing PLSR 5-range method. The best performance for evaluating wt.% of N content in the grounded biomass resulted in an R
2C of 0.8682, RMSEC of 0.0675 wt.%, R
2P of 0.8410, RMSEP of 0.0973 wt.%, RPD of 2.65, and bias of −0.0309 wt.%. Compared with full-PLSR, the multi-preprocessing 5-range method improved the model accuracy by 3.7587%. According to William et al. (2019) [
46], the model is suitable for most applications, including research. Based on the recommendation of Zornoza et al. (2008) [
47], the prediction of wt.% of N content from the multi-preprocessing PLSR 5-range method with an RPD value of 2.65 is considered good for prediction.
Figure 12 shows the regression coefficient plot for wt.% of N content in the grounded biomass obtained from the multi-preprocessing PLSR 5-range method. The figure displays numerous positive and negative high and low peaks. The high peaks at 4019, 4307, 4673, 5200, 5952, 6711, and 12,453 cm
−1 might significantly contribute to the evaluation of wt.% of N content. The negative peak at 4019 cm
−1 might correspond to a C-H stretching and C-C stretching combination with the material type shown as cellulose. The positive peaks at 4307 cm
−1, 4673 cm
−1, 5200 cm
−1, and 5952 cm
−1 might be associated with the structure of a C-H stretching and CH
2 deformation combination (material: polysaccharides), C-H stretching and C=O stretching combination and C-H deformation combination (material: lipids), O-H stretching and HOH deformation combination (material: O-H molecular water), and C-H (first overtone of fundamental stretching band) and aromatic C-H (material: hydrocarbons, aromatic), respectively. The peak at 6711 cm
−1 might be associated with O-H (first overtone of fundamental stretching band) with the material type shown as starch/polymeric alcohol. The common natures of the peaks were noticed in the range between 11,500 and 12,500 cm
−1, for which 12,453 cm
−1 is described as a reference, which might correspond to the spectral structure of a C-H combination, with the material type being hydrocarbon and aliphatic [
49]. The selected regression coefficient peaks show similar peaks compared to the study performed by Posom and Sirisomboon [
58], with vibration bands of C-H stretching, C-C stretching, O-H stretching, and HOH deformation combination. This supports the findings of our study and suggests that these peaks are likely to have a vital influence on the performance of the model.
3.3. Comparison with Previous Work
Although various studies have been conducted on the development of models for evaluating HHV and ultimate analysis parameters using NIRS with a similar wavenumber range and reference mean value combined with chemometrics, no research or reports have been published to date on the application of NIRS and spectral multi-preprocessing techniques for fast-growing trees and agricultural residues of Nepalese biomass, which encompasses ten different biomass varieties.
In a previous study, Nakawajana et al. [
59] evaluated the HHV of grounded cassava rhizome using PLSR and achieved an R
2 of 0.90. Similarly, Nakawajana et al. [
34], Posom et al. [
3], Zhang et al. [
18], and Posom et al. [
5] developed PLSR models for rick husk, grounded bamboo, sorghum biomass, and
Leucaena leucocephala pellets, respectively, with an R
2 0.79, 0.92, 0.96, and 0.96. All the studies used NIRS scanning of biomass on diffuse reflectance mode. However, the GA-PLSR model in this study outperformed previous research by using NIRS scanning of biomass in transflectance mode for evaluating HHV.
The PLSR-based models developed from multi-preprocessing 5-range methods for ultimate analysis showed better performance in evaluating oxygen content compared to the PLS model developed by Jetsada et al. [
58] for bamboo, which had R
2P values of 0.52 for oxygen. However, the results of this study for the evaluation of C, N, and H contents were lower, with Jetsada et al. [
58] showing R
2P values of 0.80 for C, 0.85 for H, and 0.97 for N for bamboo. Similarly, the models developed by Zhang et al. [
18] for sorghum biomass with R
2P values of 0.96 for wt.% of C, 0.87 for wt.% of H, 0.86 for wt.% of N, and 0.83 for wt.% of O, and by Huang et al. [
10] for straw with R
2P values of 0.97 for wt.% of C, 0.77 for wt.% of H, and 0.87 for wt.% of N showed better results than the PLSR-based model in this study. Nhuchhen [
60] predicted the ultimate parameters of torrified biomass with respect to proximate analysis, resulting in R
2 values of 0.83 for wt.% of C, 0.70 for wt.% of H, and 0.84 for wt.% of O, respectively. The proposed model in this study showed better performance for H and O, but the performance of C content in the grounded biomass could be improved.
The PLSR-based models developed from multi-preprocessing 5-range methods for ultimate analysis showed better performance in evaluating oxygen content compared to the PLS model developed by Jetsada et al. [
58] for bamboo, which had R
2P values of 0.52 for oxygen. However, the results of this study for the evaluation of C, N, and H contents were lower, with Jetsada et al. [
58] showing R
2P values of 0.80 for C, 0.85 for H, and 0.97 for N for bamboo. Similarly, the models developed by Zhang et al. [
18] for sorghum biomass with R
2P values of 0.96 for wt.% of C, 0.87 for wt.% of H, 0.86 for wt.% of N, and 0.83 for wt.% of O, and by Huang et al. [
10] for straw with R
2P values of 0.97 for wt.% of C, 0.77 for wt.% of H, and 0.87 for wt.% of N showed better results than the PLSR-based model in this study. Nhuchhen [
60] predicted the ultimate parameters of torrified biomass with respect to proximate analysis, resulting in R
2 values of 0.83 for wt.% of C, 0.70 for wt.% of H, and 0.84 for wt.% of O, respectively. The proposed model in this study showed better performance for H and O, but the performance of C content in the grounded biomass could be improved.
In general, having a sufficient number of homogenous biomass samples with a wider range of reference values and low SEL from a bomb calorimeter and CHNS/O elemental analyzer could have played a catalytic role in achieving a higher model performance when evaluating HHV and N. However, the lower model performance for evaluating C, O, and H content may be due to a lower number of relevant variables or the selected variables in the calibration set not having a strong correlation with C, O, and H content in biomass. To enhance the model performance for evaluating C, O, and H content, the number of representative samples with a high concentration of C, O, and H should be increased, and possible contamination during sample preparation should be avoided. Additionally, the ambient environment of the laboratory should be properly controlled, and possible NIR radiation leakage during sample scanning should be rechecked. Outliers should be addressed properly, instrumental and analysis errors should be monitored correctly, or alternative modeling techniques should be considered for accurate evaluation.
Based on a comparison with previous studies, this research provides strong evidence that the model’s performance can be enhanced by conducting NIRS scanning of ground biomass in transflectance mode rather than diffuse reflectance mode, and by applying a spectral multi-preprocessing technique. To update the model for robust application, the number of ground biomass samples must be increased and validated using unknown samples.
4. Conclusions
PLSR-based models were developed and compared using NIRS to evaluate HHV, and ultimate analysis, i.e., wt.% of C, H, N, and O content in the grounded biomass in transflectance mode, was employed to assess the biomass properties for energy usage. The model with the optimum performance was selected based on the parameters of R2C, RMSEP, R2P, RMSEP, RPD, and bias. The models for HHV (J/g), and wt.% of N are suitable for most applications, including research, while the models for wt.% of C, wt.% of O, and wt.% of H were only fair and usable for rough screening. The performance of fair models could be improved by incorporating more representative samples collected from various geographical locations in Nepal, thereby considering the wide statistical range of the reference values.
This study showed that the multi-preprocessing 5-range method, a novel approach to spectral preprocessing for PLSR model development, improved model accuracy compared to the traditional method of preprocessing NIR spectra across the entire wavenumber range with a single process. Therefore, this research provides a foundation in NIRS, indicating that preprocessing the entire wavenumber range with various preprocessing techniques could enhance model accuracy. The recommended models can serve as a reliable and non-destructive alternative method for rapidly assessing biomass properties for energy usage when employing NIRS. However, to create a robust model, it is necessary to expand the model with data from various samples and validate it with unknown samples. Adopting these models could significantly reduce the economic gap between biomass traders for energy usage and other applications. Furthermore, the research outcomes could guide academic and research institutions, policymaking think tanks, and energy companies in planning for the proper identification, management, and utilization of bio-resources to meet future energy demand and supply. The research outcomes also generate possibilities for NIR-based research to adopt or apply similar approaches.