2.4.1. Input Features

For constructing a reliable regression model, a first key step is to properly identify input features (or independent variables) [30]. A most popular method for identifying the input features is to conduct univariate correlation regression, and features with high degree of correlation (e.g., in terms of high correlation coefficient) with the output are fed into the estimation model [31,32]. The primary drawback of this method is that feature(s) with weak, but certain, underlying correlations with the output may be excluded from the model, which may tend to decrease the modeling accuracy. Beker et al. [33] argued that all features with either explicit (strong) or implicit (weak) correlations with the output variables should be included in a machine learning model in order to attain high modeling accuracy. In this regard, we assigned features as the input of the model that have been demonstrated empirically to exert potential effects on the adsorption amount, and that are less expensive and more rapid to be experimentally measured than the adsorption isotherm. Section 3 will discuss the effect of the inclusion of these "less significant" features on the model accuracy.

In this study, the adsorption isotherm is represented with a series of discrete (adsorption amount versus equilibrium pressure) data points (Figure 3). Thus, the estimation of adsorption isotherm is, in fact, transformed to the estimation of adsorption amounts at given equilibrium pressures. In this way, the equilibrium pressure is an essential input variable for the construction of the estimation model. An alternative option to estimate the adsorption isotherm would be to use an adsorption model (e.g., the Langmuir type model) to represent the isotherm and then correlate the adsorption model parameters (e.g., Langmuir volume and Langmuir pressure) with certain input features. However, our preliminary evaluation of this alternative option turned out to fail in accurately reproducing the adsorption isotherm, which is probably due to the weak correlation of Langmuir pressure with input features, such as coal properties and experimental conditions, as mentioned earlier.

**Figure 3.** An example of the experimentally measured adsorption isotherm represented with discrete equilibrium points. Pi and Yi represent the equilibrium pressure and the corresponding adsorption amount for the ith equilibrium points.

For the coal samples that were used in this study, coal properties that exhibit strong control on methane adsorption capacity are ash (Figure 4a) and fixed carbon (Figure 4b) (with R<sup>2</sup> ≥ 0.6), which, therefore, are assigned as the input features. The vitrinite reflectance (Figure 4c) exhibits a generally linear positive effect on the adsorption although the correlation is relatively loose (R<sup>2</sup> = 0.36), which is also included in the input feature bank. Other factors, including inherent and equilibrium moisture, vitrinite content, and experimental temperature, which show weak correlations (with R<sup>2</sup> ≤ 0.1) with the adsorption capacity (Figure 4d through 3g), are also included in the model construction given numerous documentations of their potential effect on adsorption isotherm (e.g., [1,4,15–18]).

As mentioned earlier, our goal is to develop an estimation model that is based on data that are less expensive and less time-consuming to obtain, so that the adsorption isotherms can be fast estimated with reasonable accuracies. Therefore, other coal properties that may exert potential influence on methane adsorption isotherms, such as micro-pore surface area/volume [34–36] and surface functional groups [13,37] of coals, are not considered because such information requires experimental endeavors that inevitably bring in additional expenses. Besides, the experimental determination of the pore characteristics is rather complicated while using techniques, such as gas (N2/CO2) adsorption [38], focused ion beam scanning electron microscopy (FIB-SEM, [39]) and small-angle neutron scattering (SANS, [40]), which require special experimental apparatus and they may be even more time-consuming than the measurement of adsorption isotherms.

As a short summary, the input features for constructing the estimation model for the adsorption isotherm are: coal properties (including ash, fixed carbon, inherent moisture, vitrinite, and vitrinite reflectance) and experimental conditions (equilibrium pressure, equilibrium moisture, and temperature).

#### 2.4.2. Determination of Optimal GBDT Hyperparameters

Prior to conducting GBDT regressions, the whole database comprising of 165 samples and adsorption isotherms is randomly divided into three sub-sets, namely the training (99 samples, 60%), validation (33 samples, 20%), and testing (33 samples, 20%) sets (Figure 5). The training set was used for training the GBDT network, while the validation set was for monitoring the performance and for determining the optimal model parameters (which is to be addressed in the following paragraph). The testing set was assumed to be "unseen" during the model construction process and used for testing the generalization capability of the constructed regression model. It should be noted again that each adsorption isotherm is represented with eight (adsorption amount versus equilibrium pressure) discrete data points, and, thus, the training, validation, and testing sets are, in effect, constituted with a number of 99 × 8 = 792, 33 × 8 = 264, and 33 × 8 = 264 data points, respectively (Figure 5).

**Figure 4.** Correlation analysis of input features with the adsorption capacity represented with Langmuir volume. (**a**) ash (a.r.); (**b**) fixed carbon (d.a.f.); (**c**) vitrinite reflectance; (**d**) inherent moisture (a.r.); (**e**) equilibrium moisture; (**f**) vitrinite (m.m.f) and (**g**) temperature.

**Figure 5.** Illustration of the database structure and division of the database into the training, validation and testing sets. P—equilibrium pressure; A—ash; FC—fixed carbon; V—vitrinite; Romax—vitrinite reflectance; IM—inherent moisture; EM—equilibrium moisture; T—temperature; Y—adsorption amount. Subscript *j* denotes the *j*th sample; Superscript *I* on "P" and "Y" denote the *i*th equilibrium point on the adsorption isotherm.

The empirical results from [41,42] demonstrate that the accuracy and generalization capability of the GBDT can be significantly influenced by three parameters, namely the number of estimators, the shrinkage factor, and the maximal tree depth. As such, these parameters should be optimized in order to ensure the accuracy and robustness of the GBDT. In this study, the optimal values for the three parameters were determined through the exhaustive grid search method [43]. That is, all possible combinations of the parameter values were run sequentially, and the optimal parameterization was determined to be the one that results in the lowest root mean squared error (RMSE) for the validation set. Previous studies [41,42] suggested that a satisfactory performance of GBDT can be obtained with relatively small shrinkage factors (<0.1) and low-level tree complexity (with tree depth<6). As such, the shrinkage factor was varied from 0.005 to 0.105 with a step of 0.01, and the maximum tree depth was varied from two to seven with a step of 1 in this study. The optimal number of trees is a problem-dependent hyperparameter, which was set to vary from 500 to 5000 with a step of 500.

## 2.4.3. Evaluation Matrices

The performance of the GBDT estimation was quantitatively evaluated through four metrics, namely average absolute error (AAE), average relative error (ARE), root mean square error (RMSE), and determination coefficient (R2). The definitions for these metrics are:

$$\text{AAE} = \frac{1}{N} \sum\_{i=1}^{N} |y\_i - f\_i| \tag{5}$$

$$\text{ARE} = \frac{1}{N} \sum\_{i=1}^{N} \left| \frac{y\_i - f\_i}{y\_i} \right| \tag{6}$$

$$\text{RMSE} = \frac{1}{N} \sum\_{i=1}^{N} \left( y\_i - f\_i \right)^2 \tag{7}$$

$$R^2 = \frac{\sum\_{i=1}^{N} (y\_i - f\_i)^2}{\sum\_{i=1}^{N} (y\_i - \overline{y})^2} \tag{8}$$

where *y* and *f* are the measured and estimated adsorption amounts, respectively; *y* is the mean value of the measured adsorption amount; and, *N* is the number of data points.

#### *2.5. Comparison with BP-ANN and SVM*

The BP-ANN and SVM are powerful supervised machine learning algorithms that have been successfully applied in solving nonlinear regression problems in a variety of fields [32,44–46]. A most popular version of the BP-ANN is the multilayer perception network (MLPN), which is comprised of one input layer, one or more hidden layer, and one output layer. The training of a MLPN is, in essence, an iterative process of updating the weights and biases of the nodes by using the back propagation algorithm in order to minimize an error function. The basic philosophy behind the SVM is to convert the nonlinear regression problem in the true space into linear approximations in a higher dimensional feature space by minimizing a regularized loss function. Mathematical details on the BP-ANN and SVM have been extensively addressed previously (see e.g., [24,39]), which, therefore, are not repeated in this paper.

The LIBSVM pakage [47] and the neural network module that were implemented in the Matlab (V2019) were used for conducting SVM and BP-ANN regressions, respectively. The data points and input variables are identical with that in the GBDT regression. A BP-ANN with three layers (one input, one hidden, and on output layer) has proven to be capable of approximating any continuous function with any accuracy [32], which, therefore, was adopted in this study. It should be noted that (i) the performance of a BP-ANN can be significantly influenced by the number of neurons in the hidden layer [44] and (ii) for an SVM with a kernel of radial basis function (RBF, which is most frequently used for regression), the regression accuracy is associated with regulation and error goal parameters [48,49]. In order to attain a fair comparison, parameters that may a ffect the BP-ANN and SVM performance were tuned and optimized while using the exhaustive grid search, in a similar manner with the GBDT. Table 2 shows the optimal key model parameters for the BP-ANN and SVM.


**Table 2.** Modeling parameters for the back-propagation artificial neural network (BP-ANN) and support vector machine (SVM).
