*2.3. Data Description and Preprocessing*

Conventional log data of DR, GR, DT, and RHOB and the corresponding actual (laboratory-measured) TOC values collected from Barnett shale formation were used to train the four AI models considered in this study. Before training, all the data was pre-processed to remove unrealistic values and outliers. After data pre-processing, 838 data points of the different well logs and their corresponding actual TOC values were found to be valid for model buildup. The use of 545, 545, 587, and 671 of the data to train TSK-FIS, M-FIS, FNN, and SVM models, respectively, were found to optimize the performance of the AI models in predicting the TOC. The number of training data was selected based on the optimization process, as discussed later in this paper.

Table 1 compares the different statistical features of the training data that are used to learn the four AI models developed in this study. These statistical parameters are very important for consideration when the AI models are applied to estimate the TOC using new data. In this study, before testing and validating the developed AI models, the statistical parameters of the testing and validation data were determined to ensure that these data (i.e., testing and validation data) are within the range of the training data used to develop the AI models which are summarized in Table 1.


**Table 1.** Statistical features of the data used to train the Takagi-Sugeno-Kang fuzzy interference system (TSK-FIS), Mamdani fuzzy interference system (M-FIS), functional neural network (FNN), and support vector machine (SVM) models.

The relative importance of the selected training well log data on the predictability of the TOC values was then studied. Figure 2 compares the relative importance between the different conventional well logs used to train the four AI models and the laboratory-measured TOC values. As indicated in Figure 2 and for the data used to train all AI models, TOC is strongly dependent on the RHOB, while it is moderately related to DR, DT, and GR.

**Figure 2.** The relative importance of the data used to train (**a**) Takagi-Sugeno-Kang fuzzy interference system (TSK-FIS), (**b**) Mamdani fuzzy interference system (M-FIS), (**c**) functional neural network (FNN), and (**d**) support vector machine (SVM) models.

#### *2.4. AI Model's Development*

Four AI models namely: TSK-FIS, M-FIS, FNN, and SVM models were developed in this study to estimate the TOC using conventional well logs of DR, DT, GR, and RHOB. The four conventional well logs, used to train the AI models, were selected based on their relative importance to the core measured TOC, as discussed earlier and shown in Figure 2. However, the selection conforms to their published reported relationship with TOC. For example, DR is believed to be affected by the presence of kerogen in the source rock [41]; DT decreases with the increase in the TOC [42]; several studies have confirmed that GR could significantly enhance TOC prediction [41,43], but the relationship is controversial to others [44,45]; and RHOB decreases with the increase in the kerogen content, and hence, organic matter in the formation increases [7]. Because of the above-listed reasons, the four conventional well logs of DR, DT, GR, and RHOB are considered to develop the TOC models in this study.

All AI models were optimized for their design parameters and the training-to-testing data ratio. Table 2 summarizes the optimized design parameters of the AI models.


**Table 2.** The optimum design parameters for TSK-FIS, M-FIS, FNN, and SVM models to estimate the TOC.

#### *2.5. Evaluation Criterion*

The predictability of the developed AI models, used to estimate the TOC for the training, testing, and validation data sets, was evaluated based on the absolute average percentage error "Equation (8)", correlation coefficient "Equation (9)", coefficient of determination "Equation (10)", and the visual check of the actual and predicted TOC.

$$AAPE = \frac{1}{N} \sum\_{i=1}^{N} \left( \left| \frac{(RF\_d)\_i - (RF\_m)\_i}{(RF\_a)\_i} \right| \times 100 \right) \tag{8}$$

$$R = \frac{\sum\_{i=1}^{N} \left[ \left( (RF\_a)\_i - \overline{RF\_a} \right) \times \left( (RF\_m)\_i - \overline{RF\_m} \right) \right]}{\sqrt{\sum\_{i=1}^{N} \left[ (RF\_a)\_i - \overline{RF\_a} \right]^2 \sum\_{i=1}^{N} \left[ (RF\_m)\_i - \overline{RF\_m} \right]^2}} \tag{9}$$

$$R^2 = \left[\frac{\sum\_{i=1}^{N} \left[ \left( (RF\_a)\_i - \overline{RF\_a} \right) \times \left( (RF\_m)\_i - \overline{RF\_m} \right) \right]}{\sqrt{\sum\_{i=1}^{N} \left[ \left( (RF\_a)\_i - \overline{RF\_a} \right)^2 \sum\_{i=1}^{N} \left[ (RF\_m)\_i - \overline{RF\_m} \right]^2 \right]}} \right]^2\tag{10}$$

where in all previous equations a and m denote the actual and estimated RF, respectively.

#### *2.6. Application Examples to Barnett and Devonian Shale*

The predictability of the four AI models considered in this study was evaluated using data of two different depositional environments. The first formation is the Mississippian Barnett shale, which was considered earlier by the United States Energy Information Administration as the main source rock of hydrocarbon in FWB [3,46]. In 2011, the proven reserve of this formation was more than 31 trillion cubic feet (TCF) with a cumulative gas production rate of 8.0 TCF. Several studies, such as Pollastro et al. [46], Romero-Sarmiento et al. [47], and Thomas [48] reported the general geologic information about Barnett shale. The second formation is the Devonian shale in WCSB, which is an organic-rich source rock in

the Devonian conventional hydrocarbon system [49]. The oil and gas in place in this formation are 61.7 Billion barrels, and 443 Tcf, respectively. According to recent production data, this shale is rich in liquid [50].

### **3. Results and Discussion**
