Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia

Rahman, Muhammad Muhitur; Shafiullah, Md; Alam, Md Shafiul; Rahman, Mohammad Shahedur; Alsanad, Mohammed Ahmed; Islam, Mohammed Monirul; Islam, Md Kamrul; Rahman, Syed Masiur

doi:10.3390/app13063832

Open AccessArticle

Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia

by

Muhammad Muhitur Rahman

^1,*

,

Md Shafiullah

^2,*

,

Md Shafiul Alam

³

,

Mohammad Shahedur Rahman

⁴,

Mohammed Ahmed Alsanad

⁵

,

Mohammed Monirul Islam

⁶

,

Md Kamrul Islam

¹

and

Syed Masiur Rahman

³

¹

Department of Civil and Environmental Engineering, College of Engineering, King Faisal University, Al-Ahsa 31982, Saudi Arabia

²

Interdisciplinary Research Center for Renewable Energy and Power Systems, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

³

Applied Research Center for Environment & Marine Studies, King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia

⁴

Civil Engineering Department, College of Engineering, Imam Mohammad Ibn Saud Islamic University, Riyadh 13318, Saudi Arabia

⁵

Department of Environment and Agricultural Natural Resources, College of Agricultural and Food Sciences, King Faisal University, Al-Ahsa 31982, Saudi Arabia

⁶

Department of Biomedical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa 31982, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(6), 3832; https://doi.org/10.3390/app13063832

Submission received: 27 February 2023 / Revised: 12 March 2023 / Accepted: 15 March 2023 / Published: 17 March 2023

(This article belongs to the Special Issue Computing and Artificial Intelligence for Visual Data Analysis II)

Download

Browse Figures

Versions Notes

Abstract

:

Greenhouse gas (GHG) emissions must be precisely estimated in order to predict climate change and achieve environmental sustainability in a country. GHG emissions are estimated using empirical models, but this is difficult since it requires a wide variety of data and specific national or regional parameters. In contrast, artificial intelligence (AI)-based methods for estimating GHG emissions are gaining popularity. While progress is evident in this field abroad, the application of an AI model to predict greenhouse gas emissions in Saudi Arabia is in its early stages. This study applied decision trees (DT) and their ensembles to model national GHG emissions. Three AI models, namely bagged decision tree, boosted decision tree, and gradient boosted decision tree, were investigated. Results of the DT models were compared with the feed forward neural network model. In this study, population, energy consumption, gross domestic product (GDP), urbanization, per capita income (PCI), foreign direct investment (FDI), and GHG emission information from 1970 to 2021 were used to construct a suitable dataset to train and validate the model. The developed model was used to predict Saudi Arabia’s national GHG emissions up to the year 2040. The results indicated that the bagged decision tree has the highest coefficient of determination (R²) performance on the testing dataset, with a value of 0.90. The same method also has the lowest root mean square error (0.84 GtCO₂e) and mean absolute percentage error (0.29 GtCO₂e), suggesting that it exhibited the best performance. The model predicted that GHG emissions in 2040 will range between 852 and 867 million tons of CO₂ equivalent. In addition, Shapley analysis showed that the importance of input parameters can be ranked as urbanization rate, GDP, PCI, energy consumption, population, and FDI. The findings of this study will aid decision makers in understanding the complex relationships between the numerous drivers and the significance of diverse socioeconomic factors in defining national GHG inventories. The findings will enhance the tracking of national GHG emissions and facilitate the concentration of appropriate activities to mitigate climate change.

Keywords:

machine learning; bagged decision tree; boosted decision tree; gradient boosted decision tree; greenhouse gas emission

1. Introduction

Sunlight that reaches Earth is mostly deflected into outer space. Part of it, however, is absorbed by the atmosphere, leading to the “greenhouse effect” and an increase in global temperatures. Airborne substances known as greenhouse gases (GHG) are responsible for the trapping of heat near the Earth’s surface [1,2,3]. Although carbon dioxide (CO₂) accounts for the majority of global greenhouse gas emissions (about 73%), other greenhouse gases such as methane (CH₄), nitrous oxide (N₂O), and fluorinated gases also contribute significantly (19%, 5%, and 3%, respectively) [4]. Carbon dioxide is released into the atmosphere via the combustion of fossil fuels (coal, oil, and gas), solid waste, trees, and other organic matter, as well as by specific chemical processes (e.g., manufacturing industries) [5,6]. Global emissions of major GHG, such as carbon dioxide, methane, nitrous oxide, and others are increasing steadily [7]. As a result of human activity, GHG emissions are on the rise, leading to a major climatic catastrophe on Earth [8].

Previous research has found that population, economic development, gross domestic product, electricity consumption, industrialization, soil disturbance and erosion, urbanization, and other factors influence GHG emissions [9,10,11]. Across Africa, Europe, Asia, North America, and the Middle East, per capita GDP was reported to be more responsible for an increase in CO₂ emissions than the overall population or energy intensity [12]. Rapid GDP growth and population growth have caused Middle Eastern countries to increase their energy consumption. A previous study on 58 countries revealed a positive correlation between GDP and CO₂ emissions [13]. Another study in G7 countries, however, disagreed regarding such a relationship [11]. As of 2014, the Middle East’s energy demand comprised 5% of the world’s energy demand, nearly quadrupling from 1970s levels [9]. As a result, residential energy demand rose from 4 percent of overall consumption in 1971 to 24 percent in 2010 [14]. In 2014, major GCC countries accounted for over 77% of the Middle East’s total energy consumption [9]. Emissions of greenhouse gases from the Arab world reached roughly 0.5 GtCO2eq in 2010, with 60% of that coming from Iran, Kuwait, Saudi Arabia, and the United Arab Emirates [9]. Additionally, the electricity production industry in these four nations accounts for roughly 76% of all CO₂ emissions. Mezghani and Haddad [15] discovered that GDP levels had a positive effect on electricity use and CO₂ emissions. Al Khathlan et al. [16] reported a positive relationship between per capita income and emissions. However, they also reported that switching the economy to gas from oil would decrease GHG emissions.

Carbon dioxide emissions in the Middle East indicated that total energy consumption, GDP, FDI net inflows, and total commerce are significant long-term contributors to CO₂ emissions [17,18]. In addition, it has been reported that these variables exhibit a positive, bidirectional, long- and short-term causal relationship [19,20]. Since FDI net inflows raise CO₂ emissions, it is crucial to study the conditions for foreign investment to improve environmental protection and boost technology transfer through foreign enterprises to mitigate environmental degradation. Baloch et al. [10] analyzed the relationship between financial instability and CO₂ emissions in Saudi Arabia’s economy for the period 1971–2016. Financial instability was reported to have had a negligible effect on Saudi Arabia’s CO₂ emissions. However, energy usage has a negative influence on environmental quality since it generates a substantial quantity of CO₂ emissions. Both oil-based and non-oil-based GDPs contribute to an enormous quantity of CO₂ emissions. All these fundamental variables are reported to have bidirectional causation. A recent study from Malaysia reported associations between CO₂ release, economic growth, renewable energy, urbanization, and agriculture [21]. Furthermore, this analysis also demonstrates that the relationship between CO₂ release and economic growth follows the environmental Kuznets curve (EKC). This research reveals that, despite a rise in CO₂ emissions and economic development, CO₂ emissions eventually decline after a certain level of growth is reached. Solarin et al. [22] studied the impact of biomass energy use on CO₂ emissions in 80 developed and developing nations. The findings of the study indicate that a rise in GDP, urbanization rate, and population increases CO₂ emissions. In wealthy nations, financial growth, foreign direct investment, and openness lower CO₂ emissions, whereas the converse is true for poor nations. It was also reported in that study that the 1997 international accord reached by industrialized nations in the Japanese city of Kyoto, dubbed the “Kyoto Protocol” has a negative impact on CO₂ emissions and that an environmental Kuznets curve (EKC) exists. Using hierarchical regression modeling approaches, Mendonça et al. [23] studied the effect of clean energy, population, and GDP on CO₂ emissions for fifty major economies from 1990 to 2015. The authors found that economic and population growth are associated with a rise in CO₂ emissions. Additionally, GHG emissions are inversely proportional to consumption reduction, energy efficiency, and investment in clean technology [23].

A number of studies have used a variety of machine-learning techniques to make projections about greenhouse gas emissions. Kalra et al. [24] used linear regression, DT, random forest (RF), and artificial neural networks (ANN) to model the correlation between global temperature and the atmospheric levels of nitrous oxide, methane, and carbon dioxide over 65 years. Based on the mean square error, the authors found that ANN was superior to the other algorithms. Three distinct ML algorithms were employed by Magazzino et al. [25] to study the correlation between solar and wind power generation, coal production, GDP per capita, and GHG emissions in China, the USA, and India. Based on the results of the ML simulations, it is clear that a complete transition from fossil to renewable energy sources is necessary to successfully reduce GHG emissions. Javanmard and Ghaderi [26] used data from 1990 to 2018 to train nine algorithms (ANN, AR, ARIMA, SARIMA, SARIMAX, RF, SVR, KNN, and LSTM) that forecasted GHG emissions in Iran to 2028. The authors found that an optimization model led to a 13–32% improvement in detection performance. To estimate carbon dioxide emissions from Group 20 countries, Mardini et al. [27] employed an adaptive neuro-fuzzy inference method and an ANN network to build multi-stage prediction models with clustered data. The growth of the economy and the use of energy were two factors taken into account by the authors. With regard to estimating future CO₂ emissions, both the neuro-fuzzy inference system and the ANN network performed better than multiple linear regression.

Using annual data from 1970–2017, Magazzino and Mele [28] built a novel ML method called causal direction from dependence (D2C) to examine the interplay among the release of CO₂, energy usage, and GDP in Russia. The authors discovered evidence for causal relationships among the three factors. For this reason, Ahmed et al. [29] also used three cutting-edge ML algorithms (support vector machine, ANN, and long-short term memory) to determine how much of an effect energy consumption has on GHG emissions in China, India, the United States of America, and Russia. The results showed a decline in future GHG emissions for the USA and Russia but an increase for China and India. In another study using the LSTM model, it was shown that energy usage, financial sector development, GDP, population, and clean energy all had significant impacts on China and India’s greenhouse gas emissions [30]. Qin and Gong [31] used ML models, including decision trees and random forests, to analyze what drives China’s CO2 output. The authors noted that GDP, government spending, and overseas investment can all have an impact on CO2 output. The feature relevance of GDP, finance general budget revenue, and foreign investment, as calculated by random forest, was 0.45, 0.12, and 0.08, respectively. Li et al. [32] used the k-nearest neighbors (KNN) model, ensemble models, and ANN to look at the connection between CO₂ emissions and China’s economic growth, industry trends, urban development, R&D expenditure, actual usage of foreign investment, and the development rate of energy use consumption from 2000 to 2018. When compared to other models employed in the research, the KNN model made the most accurate predictions.

Using a data set spanning from 1990 to 2018, Bakay and Abulut [33] used deep learning (DL), support vector machine (SVM), and ANN methods to predict greenhouse gas emissions from Turkey’s power generation industry. With an R² of between 0.861% and 0.998%, the authors judged all the models to be adequate for estimating greenhouse gas emissions. Abulut [34], in a separate study conducted in Turkey, used DL, SVM, and ANN to successfully forecast GHG emissions from the transportation sector. To better estimate soil GHG emissions from an agricultural area, Hamrani et al. [35] analyzed three types of ML regression models, namely classical regression, shallow learning, and DL. The authors verified that the LSTM model had the highest R² coefficient, and the lowest root mean square error (RMSE) values of all the ML models tested. The cyclic and periodic fluctuations in CO₂ fluxes were well replicated by the traditional regression models (RF, SVM, and LASSO), but the peak values of N₂O fluxes were not predictably predicted. They discovered that shallow ML was the most responsive to hyperparameter adjustment, but it was also the least successful at estimating GHG fluxes compared to the other ML models they studied.

Kumari and Singh [36] employed six models to forecast India’s greenhouse gas emissions: an ARIMA model, a SARIMAX model, a Holt–Winters model, a linear regression model, an RF model, and a DL-based long short-term memory (LSTM) model. They found that LSTM, SARIMAX, and Holt–Winters (H-W) were the three most reliable models for projecting India’s GHG emissions across the indicators used. Amarpuri et al. [37] also looked into GHG emissions in India, but they used a DL hybrid model involving a convolution neural network and long short-term memory network (CNN-LSTM) to make their predictions. To anticipate the total CO₂ emissions from rice crops in India from 2019 to 2025, Singh et al. [38] examined and contrasted four prediction models: RF, SARIMAX, H-W, and SVR. Compared to SARIMAX and RF, the MAPE and MSE values for the H-W and the SVR were found to be lower. These findings suggest that H-W and SVR are reliable models for estimating paddy crops’ cumulative CO₂ emissions.

As was noted in the above discussion, a range of machine learning strategies have been applied in several different research projects to generate forecasts regarding the emissions of greenhouse gases. However, decision tree-based machine learning requires more focus. While DT-based ML algorithms have been used for energy management in electric city buses [39], internal temperature monitoring in a greenhouse [40], variable selection and its application to gas-liquid two-phase CO₂ flow measurements [41], identifying the effects of the built environment’s characteristics on driving distance [42], predicting soil drainage classes [43], developing a CO₂ emission benchmark [44], GHG emission reduction in a shared micro-mobility system [45], and water table modeling of shallow groundwater [46], its use in GHG emission prediction requires more attention. In Saudi Arabia in particular, DT-based ML has been applied for tracking CO₂ fronts in carbonate reservoirs [47]; finding the relationship between disaggregated energy use, economic growth, and environmental quality [48]; forecasting energy production [49]; groundwater mapping [50]; air quality index creation [51]; and the prediction of power in a wind farm [52]. However, to the best of the authors’ knowledge, DT-based ML has not been used to predict GHG emissions in Saudi Arabia. Here lies this paper’s primary contribution.

The article is divided into the subsequent sections in the following sequence: in Section 2, a method is presented, which covers topics such as model development statistics and forecasting strategy. The results related to the GHG emission projection models are illustrated in Section 3. The conclusion of the paper can be found in Section 4.

2. Materials and Methods

2.1. Data Source

This research used emission data from the PRIMAP-hist dataset [53]. The PRIMAP-hist dataset combines several existing data sources for every country and Kyoto gas from 1750 to 2019, including all UNFCCC (United Nations Framework Convention on Climate Change) members and the vast majority of non-UNFCCC regions. The primary IPCC (Intergovernmental Panel on Climate Change) 2006 categories are resolved by the data. For CO₂, CH₄, and N₂O, there are subsector statistics for energy, industrial processes, and agriculture. The PRIMAP-hist dataset compiles both country-reported and third-party data. However, in this paper, only country-reported data were used for Saudi Arabia. The PRIMAP-hist GHG emission dataset can be obtained from [53]. The population data were taken from the World Bank repository, where the country-wise population data are available from the year 1970 to 2021 [54]. However, since the GHG data are only available up to the year 2019, the population data up to the same year were used to calculate country-wise yearly per capita GHG emissions. In addition to population data, energy consumption, GDP, urbanization, per capita income (PCI), and foreign direct investment (FDI) were also collected and processed from the world bank repository [54]. To understand the trends in different data on Saudi Arabia, a scatter plot is provided in Figure 1. Most of the data show an upward trend from the year 1970 to 2021. In order to observe the correlation of different data, in Figure 2, a correlation matrix plot is provided. It is evident from the figure that GHG emissions are highly correlated with population, energy, and GDP and moderately correlated with other parameters.

2.2. Model Development

This study implements a supervised learning technique called a decision tree-based AI model to predict the GHG emissions in Saudi Arabia. Based on the binary tree concept, it divides several nodes into a decision tree. The dataset is divided into smaller classes using the decision tree algorithm, and the outcome is displayed as a leaf node [55]. The decision tree trains the dataset as a prediction tree structure. Therefore, it is sometimes referred to as tree structure classification/regression (Figure 3). The three nodes that make up DT are the root node, internal node, and leaf node. Initially, the root node is divided into other nodes, which are internal nodes. The leaf nodes represent the decision’s outcome, while the internal nodes reflect the model’s data properties and decision rules. The DT model can be defined as [52]:

b_{m} = \{\begin{matrix} 1 \\ 0 \end{matrix} i f x \in X_{m}, x r e a c h e s a s p e c i f i c n o d e m

(1)

where

m

is a node and

X_{m}

is a subset of

x

reaching the node

m

.

2.2.1. Bagged Decision Tree

Overfitting is common in individual decision trees. Bootstrap aggregation, often known as bagging, is an ensemble technique that lessens the impacts of overfitting and enhances generalization. There are many strategies to protect the model from overfitting, including good data collection for the model, test–train split, k-fold cross validation, feature selection, regularization, pruning algorithms, and so on. In this study, ‘good data’ were ensured by collecting high-quality data, cleaning, and processing the data, and verifying the data via a statistical summary. For the bagged decision tree, the model can be protected from overfitting by increasing the number of trees and number of samples per leaf and limiting the tree depth. Multiple subsets can be produced from the training data with the aid of bootstrapping. After that, several trees are constructed using the bootstrapped samples. Bootstrapping uses sampling with replacements to produce various subsets by selecting repeated data points at random from the training set and can be written as [49]:

f b g (x) = \frac{1}{B_{g}} \sum_{b = 1}^{B_{g}} F^{* b} (x)

(2)

where

B_{g}

= total bootstrap samples and

F^{* b} (x)

= forecasted fitted regression model.

The generalized structure of the bagged decision tree is shown in Figure 4. The bagged decision tree offers two key benefits, including lower model variance and the ability to train numerous trees at once.

2.2.2. Boosted Decision Tree

Each tree in a boosting system is reliant on the ones that came before it. By adjusting the residual of the trees that came before it, the algorithm learns. As a result, boosting tends to increase accuracy while carrying a slight risk of reduced coverage in a decision tree ensemble. The boosted decision tree configuration steps are given below:

Step 1: Adding the boosted decision tree component to the pipeline;

Step 2: Selecting the training model option (single parameter or parameter range);

Step 3: Assigning the maximum number of leaves per tree;

Step 4: Assigning the maximum number of samples per leaf node;

Step 5: Providing a learning rate between 0 and 1;

Step 6: Defining the total number of decision trees;

Step7: Training the model;

Step 8: Submitting the pipeline.

2.2.3. Gradient Boosted Decision Tree

Machine learning’s gradient-boosted decision trees take a model’s forecasting accuracy and gradually boost it through a series of incremental learning stages. To minimize the loss function, the parameters, values, or biases used to determine the target value are updated with each iteration of the decision tree. Boosting is a method of rapidly increasing the forecast accuracy of a model to an optimal level; the gradient is the incremental modification performed at each stage. The steps for developing this model are as follows:

Step 1: Determining the average value of the output target variables;

Step 2: Calculating the residual of each target variable by subtracting it from the average value;

Step 3: Constructing a decision tree;

Step 4: Predicting the target label using all the trees in the ensemble;

Step 5: Carrying out a new residual calculation by subtracting the predicted value of step 3 from the actual value;

Step 6: Repeating Steps 3–5 until the number of iterations meets the number given by the hyper-parameter;

Step 7: Final prediction using the trained model.

The model was developed using the MATLAB program. Among the 52 data points (i.e., 1970 to 2021), 35 sampling years were used to train the model and 17 sampling years were used to test the developed model. A double exponential smoothing technique was used to predict the data until 2040. Figure 5 depicts the conceptual structure for the proposed AI model for predicting Saudi Arabia’s GHG emissions.

2.3. Performance Evaluation Matrices

To assess the prediction effectiveness of models, a variety of performance matrices were employed such as mean absolute percentage error (MAPE), root mean square error (RMSE), R-squared (R²), and root mean standard deviation ratio (RSR). The MAPE quantifies the average size of the forecasting errors without taking into account their direction. It evaluates precision for continuous variables.

M A P E = \frac{\sum_{i = 1}^{N} |y_{i} - x_{i}|}{N}

(3)

where

y_{i}

is the prediction,

x_{i}

is the actual value, and

N

is the number of samples.

The root mean square error is a popular statistic for measuring the disparities between a model’s anticipated and reported values (RMSE). The RMSE is calculated by adding up all the square roots of the second moment of the differences between the predicted and observed values in a given sample.

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(x_{i} - y_{i})}^{2}}{N}}

(4)

R² is a statistical measure that depicts the percentage of variance for a dependent variable in a regression model that is explained by one or more independent variables, which is given by the equation below:

R^{2} = 1 - \frac{S u m o f s q u a r e d r e g r e s s i o n (S S R)}{T o t a l s u m o f s q u a r e s (S S T)} = 1 - \frac{\sum {(x_{i} - y_{i})}^{2}}{\sum {(x_{i} - \bar{x_{i}})}^{2}}

(5)

where

\bar{x_{i}}

is the mean value.

Like R², the Willmott’s Index of Agreement (WIA) value suggests the correlation between the predicted and the actual outputs. The WIA can be expressed as:

W I A = 1 - \frac{\sum {|y_{i} - x_{i}|}^{2}}{\sum {(|y_{i} - x_{i}| + |x_{i} - \bar{x}|)}^{2}}

(6)

The root mean standard deviation ratio (RSR) is the ratio of the RMSE between the simulated and observed values to the standard deviation of the data. It is given by the following equation:

R S R = \frac{R M S E}{S T A D E V_{o b s}}

(7)

where STADEV_ob_s is the observed standard deviation. RSR ranges from 0 to a big positive number. The lower the RSR, the lower the RMSE and the better the model simulation performance.

2.4. Shapley Analysis

Shapley analysis is a technique used to determine the contribution of each feature in a machine learning model to the outcome of the prediction. This analysis is based on the Shapley value concept. The values are used to fairly attribute a player’s contribution to the final outcome of a game. In a cooperative game in which a group of players creates value by cooperating, Shapley values represent the marginal contribution of each player to the final outcome if the total payoff of the game can be measured. To calculate the Shapley value for a feature in a machine learning model, all possible combinations of features and their corresponding prediction outcomes are considered. For each feature, the difference between the prediction outcome when the feature is included and when it is omitted from the model is computed. The value for a feature is then the mean difference in prediction outcome across all possible feature combinations that include the feature [56].

3. Results and Discussion

3.1. Model Fitting Results

The findings of applying different decision tree machine learning algorithms, such as the bagged decision tree, boosted decision tree, and gradient boosted decision tree to forecast the emission of greenhouse gases in Saudi Arabia are provided in this section. In addition, the results obtained from the DT models are compared with the results of the fundamental machine learning model, namely the feedforward neural network, to emphasize their efficacy. Detailed information about the FFNN can be found in [57,58,59,60]. The hyper-parameters of the models are shown in Table 1.

Figure 6 shows the development of the three decision tree models. The scatter plot of training and test datasets for the bagged decision tree model can be seen in Figure 6a,b. As is shown, 35 samples were used to train the bagged decision tree model while 17 samples were used to test the performance of the developed model. The performance of the bagged decision tree model at predicting the GHG emissions of Saudi Arabia is visualized by plotting the actual GHG emissions and bagged DT estimated GHG emissions (Figure 6c). It is observed that for most of the data points, the actual and predicted data are very close, which guarantees robust predictions by the proposed algorithm. Data points 8 and 12 show fewer prediction errors, and data points 10 and 15 show more prediction errors.

According to the scatter plot of the boosted DT model (Figure 6a,b), 17 samples are used to test the model, while 35 samples are used to train the model. The emission prediction performance of the boosted DT is visualized in Figure 6c. It shows that in some data points, the predicted and the actual GHG emissions are very close. For example, data point 8 has the same predicted and actual emissions as the boosted DT model. The highest prediction errors are seen for data points 1, 10, and 14. In addition to the bagged and boosted DT models, the GHG emissions are also predicted with the gradient-boosted DT. The scatter plot of the training and test data points for the gradient-boosted DT model is shown in Figure 6a,b. The performance of the developed gradient-boosted DT model is visualized by plotting the actual emissions and the estimated emissions in CO₂ equivalent tons (Figure 6c).

The prediction results from the machine learning algorithms are presented and the forecast accuracy is assessed using different indices such as R², RMSE, MAPE, RSR, and WIA. Table 2 shows the statistical performance measures of three decision tree models for both training and testing datasets. The bagged decision tree shows the best R² performance with a value of 0.901 for the testing dataset. The same algorithm also shows the lowest RMSE of 0.84 and MAPE of 0.29, indicating the best performance. Even when compared to FFNN, bagged decision tree shows better performance. Similar observations have been made by other researchers who used DT-based AI models. Using data from four Chinese megacities between 2003 and 2017, Zhang et al. [61] applied the XGBoost model to identify factors influencing CO₂ emissions. With a root mean square error of only 0.036, the developers of the XGBoost model emphasize its superior applicability and accuracy when estimating the carbon emissions of growing megacities. CO₂ emissions continue to be primarily driven by population, land area, and GDP, albeit under the combined influence of several other factors. Predictions made using XGBoost were also proven to be highly accurate by Li and Sun [62]. The method was employed by the authors to make projections about CO₂ emissions in Chinese cities. Greenhouse gas generation (CO₂, CH₄) intensity in a specific region of the cryolithozone was forecasted using XGBoost by Timofeev et al. [63] utilizing data from a different geographic network of multifunctional monitoring stations, with high accuracy over the medium to long term.

Similarly, Romeiko et al. [64] used a gradient-boosting regression tree to assess the effects of spatial life cycle on climate change and eutrophication. They used the production of corn in the Midwest as an example. The findings indicated that the gradient-boosting regression tree model that was created with monthly meteorological data provided the highest level of accuracy in terms of forecasting the life-cycle impact of global warming as well as life-cycle eutrophication. In a similar vein, Cui et al. [65] employed a gradient-boosting tree prediction model optimized using the enhanced whale algorithm to successfully forecast China’s carbon emission levels from 2020 to 2035.

3.2. Future Prediction of GHG Emission by Developed AI Models and Mitigation Measures

We used the linear regression model to project the input variables from 2022 to 2040; then, we used them to develop decision tree models to predict the GHG emission for the years 2022 to 2040. As is shown in Figure 7, an upward trend of GHG emissions is observed with all models including the linear regression, bagged DT model, boosted DT model, and gradient boosted DT model. The linear regression model predicts the highest GHG emissions in the year 2040, which is around 904.97 million CO₂ equivalent tons. Among the several DT models, the gradient-boosted DT predicts the lowest GHG emissions in Saudi Arabia for the year 2040. The decision trees’ as well as FFNN projected emissions in Saudi Arabia for the year 2040 are summarized in Table 3.

According to the analysis presented above, GHG emissions will increase 27–30% by 2040 compared to the year 2021 according to the DT-based AI models. In order to reduce greenhouse gas emissions significantly, it will be necessary to employ a wide range of mitigation strategies in addition to those already in use. The Kingdom must formulate a comprehensive strategy to reduce carbon emissions across all its major economic sectors, including the energy sector, the transportation sector, and the industrial sector, if it is to meet its net-zero emissions goals. It is inevitable that our capacity to cut pollution through carbon reduction and supply planning will be stretched to its limits by factors including but not limited to technological, economic, and welfare program considerations. With this cut, renewable energy generation can help the Kingdom get closer to the goal of net-zero carbon emissions. However, a full transition away from fossil fuels is not anticipated for the Kingdom in all main sectors due to the intermittent nature and quasi-distribution pattern of green energy supplies. Future emission scenarios may instead benefit from research into carbon capture and storage (CCS); carbon capture, utilization, and storage (CCUS); reforestation; and emissions trading. The successes of CCS and CCUS, however, are expected to be restricted in the near future. The afforestation project will help reduce the emissions of greenhouse gases while also providing environmental benefits; however, hydrological and ecological concerns, as well as a high-water consumption footprint, will plague this initiative. As a result, the path to achieving zero carbon footprint by 2040 is anticipated to be an iterative one, adapting to the development of innovations, substantial changes in the energy security spectrum, carbon sink creation, and offset programs via emissions trading.

3.3. Feature Importance

Once the Shapley value is computed for each feature in a machine learning model, the information can be used to determine the importance of each feature. The feature with the highest Shapley value is deemed the most significant because it contributes the most to the prediction outcome. Shapley values can also be used to plot feature importance graphs, which provide a visual representation of the importance of a feature in a model [66]. Based on the Shapley analysis of the bagged decision trees, the importance of the features (input variables or predictors) was ranked as follows: urbanization rate, GDP, PCI, energy consumption, population, and FDI (Figure 8).

4. Conclusions

Efforts to meet the Kingdom’s environmental sustainability targets is complicated by the rising trend of greenhouse gas emissions in Saudi Arabia. Therefore, the Kingdom must determine what drives trends in GHG emissions across all relevant sectors and devise policies to reduce those emissions without slowing the country’s progress. In light of this constraint, this study used machine learning to make predictions about future GHG emissions by analyzing the effects of a subset of socioeconomic factors (population, energy, GDP, urbanization, per capita income, and FDI).

The future GHG emission prediction results using machine learning models confirmed the superiority of the bagged decision tree model over the boosted decision tree model as well as the gradient boosted decision tree model. The bagged decision tree model also performed better than the feed forward neural network model. For the validation set, the bagged decision tree achieved the highest R² performance (0.901). The same algorithm achieved the best performance as measured by the RMSE (0.84 GtCO₂e) and MAPE (0.29 GtCO₂e). Urbanization rate, GDP, PCI, energy consumption, population, and FDI were ranked by Shapley analysis to be the most important input parameters. Therefore, the model can be used by the relevant authorities to create scenarios employing alternative policy measures to reduce national GHG emissions. While all the machine learning models performed adequately, their results could be better if they were fed with more variables and extended time series data. Furthermore, this study can be expanded to model GHG emissions from different sectors of the country by investigating other interesting machine-learning approaches, such as support vector regression (SVR), and deep learning models.

This research will help policymakers to understand the interplay between various drivers and the role that various socioeconomic factors play in determining GHG inventories at the national level. The results will be used to better monitor national GHG emissions and prioritize efforts to reduce the effects of climate change. However, researchers should take the lead by conducting rigorous quantitative assessments of multiple climate change adaptation options to inform policymakers.

Author Contributions

Conceptualization, S.M.R., M.S. and M.M.R.; methodology, M.S.A.; software, M.S. and M.S.A.; formal analysis, M.S. and M.S.A.; resources, M.K.I. and M.M.I.; data curation, M.S.A. and M.M.R.; writing—original draft preparation, M.M.R., M.S.R. and M.S.; writing—review and editing, M.M.R., M.S.R., M.S., S.M.R. and M.A.A.; supervision, S.M.R. and M.M.R.; project administration, M.S. and M.M.R.; funding acquisition, M.M.R. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deputyship for Research and Innovation at the Ministry of Education, Saudi Arabia, for funding this research work (project number INST043).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, M.M.R. (mrahman@kfu.edu.sa), upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fleming, J.R. Joseph Fourier, the ‘greenhouse effect’, and the quest for a universal theory of terrestrial temperatures. Endeavour 1999, 23, 72–75. [Google Scholar] [CrossRef]
Liu, J.; Dumitrescu, C.E. Flame development analysis in a diesel optical engine converted to spark ignition natural gas operation. Appl. Energy 2018, 230, 1205–1217. [Google Scholar] [CrossRef]
Liu, J.; Dumitrescu, C.E. 3D CFD simulation of a CI engine converted to SI natural gas operation using the G-equation. Fuel 2018, 232, 833–844. [Google Scholar] [CrossRef]
Olivier, J.G.; Schure, K.M.; Peters, J.A.H.W. Trends in global CO₂ and total greenhouse gas emissions. PBL Neth. Environ. Assess. Agency 2017, 5, 1–11. [Google Scholar]
Liu, J.; Li, Y.; Zhang, C.; Liu, Z. The effect of high altitude environment on diesel engine performance: Comparison of engine operations in Hangzhou, Kunming and Lhasa cities. Chemosphere 2022, 309, 136621. [Google Scholar] [CrossRef]
Liu, J.; Wang, B.; Meng, Z.; Liu, Z. An examination of performance deterioration indicators of diesel engines on the plateau. Energy 2023, 262, 125587. [Google Scholar] [CrossRef]
Climate Watch, Global Historical Emissions, 2022. Available online: http://cait.wri.org (accessed on 21 January 2023).
Cetin, M.; Ecevit, E.; Yucel, A.G. The impact of economic growth, energy consumption, trade openness, and financial development on carbon emissions: Empirical evidence from Turkey. Environ. Sci. Pollut. Res. 2018, 25, 36589–36603. [Google Scholar] [CrossRef]
Bayomi, N.; Fernandez, J.E. Towards sustainable energy trends in the Middle East: A study of four major emitters. Energies 2019, 12, 1615. [Google Scholar] [CrossRef] [Green Version]
Baloch, M.A.; Danish; Meng, F.; Zhang, J.; Xu, Z. Financial instability and CO₂ emissions: The case of Saudi Arabia. Environ. Sci. Pollut. Res. 2018, 25, 26030–26045. [Google Scholar] [CrossRef]
Ari, I.; Şentürk, H. The relationship between GDP and methane emissions from solid waste: A panel data analysis for the G7. Sustain. Prod. Consum. 2020, 23, 282–290. [Google Scholar] [CrossRef]
Pani, R.; Mukhopadhyay, U. Variance analysis of global CO₂ emission–A management accounting approach for decomposition study. Energy 2011, 36, 486–499. [Google Scholar] [CrossRef]
Saidi, K.; Hammami, S. The impact of CO₂ emissions and economic growth on energy consumption in 58 countries. Energy Rep. 2015, 1, 62–70. [Google Scholar] [CrossRef] [Green Version]
Farzaneh, H.; McLellan, B.; Ishihara, K.N. Toward a CO₂ zero emissions energy system in the Middle East region. Int. J. Green Energy 2014, 13, 682–694. [Google Scholar] [CrossRef]
Mezghani, I.; Ben Haddad, H. Energy consumption and economic growth: An empirical study of the electricity consumption in Saudi Arabia. Renew. Sustain. Energy Rev. 2017, 75, 145–156. [Google Scholar] [CrossRef]
Alkhathlan, K.; Javid, M. Energy consumption, carbon emissions and economic growth in Saudi Arabia: An aggregate and disaggregate analysis. Energy Policy 2013, 62, 1525–1532. [Google Scholar] [CrossRef]
Rahman, M.M.; Hasan, A.; Shafiullah; Rahman, M.S.; Arifuzzaman; Islam, K.; Islam, M.M.; Rahman, S.M. A critical, temporal analysis of Saudi Arabia’s initiatives for greenhouse gas emissions reduction in the energy sector. Sustainability 2022, 14, 12651. [Google Scholar] [CrossRef]
Rahman, M.M.; Rahman, S.M.; Shafiullah, M.; Hasan, M.A.; Gazder, U.; Al Mamun, A.; Mansoor, U.; Kashifi, M.T.; Reshi, O.; Arifuzzaman, M.; et al. Energy demand of the road transport sector of Saudi Arabia–application of a causality-based machine learning model to en-sure sustainable environment. Sustainability 2022, 14, 16064. [Google Scholar] [CrossRef]
Rahman, M.M.; Rahman, M.S.; Chowdhury, S.R.; Elhaj, A.; Razzak, S.A.; Abu Shoaib, S.; Islam, K.; Islam, M.M.; Rushd, S.; Rahman, S.M. Greenhouse gas emissions in the industrial processes and product use sector of Saudi Arabia—An emerging challenge. Sustainability 2022, 14, 7388. [Google Scholar] [CrossRef]
Rahman, M.; Rahman, S.; Rahman, M.; Hasan, A.; Shoaib, S.; Rushd, S. Greenhouse gas emissions from solid waste management in Saudi Arabia—Analysis of growth dynamics and mitigation opportunities. Appl. Sci. 2021, 11, 1737. [Google Scholar] [CrossRef]
Ridzuan, N.H.A.M.; Marwan, N.F.; Khalid, N.; Ali, M.H.; Tseng, M.-L. Effects of agriculture, renewable energy, and economic growth on carbon dioxide emissions: Evidence of the environmental Kuznets curve. Resour. Conserv. Recycl. 2020, 160, 104879. [Google Scholar] [CrossRef]
Solarin, S.A.; Al-Mulali, U.; Gan, G.G.G.; Shahbaz, M. The impact of biomass energy consumption on pollution: Evidence from 80 developed and developing countries. Environ. Sci. Pollut. Res. 2018, 25, 22641–22657. [Google Scholar] [CrossRef] [PubMed]
De Souza Mendonça, A.K.; Barni, G.D.A.C.; Moro, M.F.; Bornia, A.C.; Kupek, E.; Fernandes, L. Hierarchical modeling of the 50 largest economies to verify the impact of GDP, population and renewable energy generation in CO₂ emissions. Sustain. Prod. Consum. 2020, 22, 58–67. [Google Scholar] [CrossRef]
Kalra, S.; Lamba, R.; Sharma, M. Machine learning based analysis for relation between global temperature and concentrations of greenhouse gases. J. Inf. Optim. Sci. 2020, 41, 73–84. [Google Scholar] [CrossRef]
Magazzino, C.; Mele, M.; Schneider, N. A machine learning approach on the relationship among solar and wind energy production, coal consumption, GDP, and CO₂ emissions. Renew. Energy 2020, 167, 99–115. [Google Scholar] [CrossRef]
Javanmard, M.E.; Ghaderi, S. A hybrid model with applying machine learning algorithms and optimization model to forecast greenhouse gas emissions with energy market data. Sustain. Cities Soc. 2022, 82, 103886. [Google Scholar] [CrossRef]
Mardani, A.; Liao, H.; Nilashi, M.; Alrasheedi, M.; Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod. 2020, 275, 122942. [Google Scholar] [CrossRef]
Magazzino, C.; Mele, M. A new machine learning algorithm to explore the CO₂ emissions-energy use-economic growth trilemma. Ann. Oper. Res. 2022, 1–19. [Google Scholar] [CrossRef]
Ahmed, M.; Shuai, C. Analysis of energy consumption and greenhouse gas emissions trend in China, India, the USA, and Russia. Int. J. Environ. Sci. Technol. 2022, 1–16. [Google Scholar] [CrossRef]
Ahmed, M.; Shuai, C.; Ahmed, M. Influencing factors of carbon emissions and their trends in China and India: A machine learning method. Environ. Sci. Pollut. Res. 2022, 29, 48424–48437. [Google Scholar] [CrossRef]
Qin, J.; Gong, N. The estimation of the carbon dioxide emission and driving factors in China based on machine learning methods. Sustain. Prod. Consum. 2022, 33, 218–229. [Google Scholar] [CrossRef]
Li, S.; Siu, Y.W.; Zhao, G. Driving factors of CO₂ emissions: Further study based on machine learning. Front. Environ. Sci. 2021, 9, 721517. [Google Scholar] [CrossRef]
Bakay, M.S.; Ağbulut, Ü. Electricity production based forecasting of greenhouse gas emissions in Turkey with deep learning, support vector machine and artificial neural network algorithms. J. Clean. Prod. 2020, 285, 125324. [Google Scholar] [CrossRef]
Ağbulut, Ü. Forecasting of transportation-related energy demand and CO₂ emissions in Turkey with different machine learning algorithms. Sustain. Prod. Consum. 2022, 29, 141–157. [Google Scholar] [CrossRef]
Hamrani, A.; Akbarzadeh, A.; Madramootoo, C.A. Machine learning for predicting greenhouse gas emissions from agricultural soils. Sci. Total. Environ. 2020, 741, 140338. [Google Scholar] [CrossRef] [PubMed]
Kumari, S.; Singh, S.K. Machine learning-based time series models for effective CO₂ emission prediction in India. Environ. Sci. Pollut. Res. 2022, 1–16. [Google Scholar] [CrossRef]
Amarpuri, L.; Yadav, N.; Kumar, G.; Agrawal, S. Prediction of CO₂ emissions using deep learning hybrid approach: A Case Study in Indian Context. In Proceedings of the 2019 Twelfth International Conference on Contemporary Computing (IC3), Noida, India, 8–10 August 2019; pp. 1–6. [Google Scholar] [CrossRef]
Singh, P.K.; Pandey, A.K.; Ahuja, S.; Kiran, R. Multiple forecasting approach: A prediction of CO₂ emission from the paddy crop in India. Environ. Sci. Pollut. Res. 2022, 29, 25461–25472. [Google Scholar] [CrossRef]
Shi, J.; Xu, B.; Zhou, X.; Hou, J. A cloud-based energy management strategy for hybrid electric city bus considering real-time passenger load prediction. J. Energy Storage 2022, 45, 103749. [Google Scholar] [CrossRef]
Cai, W.; Wei, R.; Xu, L.; Ding, X. A method for modelling greenhouse temperature using gradient boost decision tree. Inf. Process. Agric. 2022, 9, 343–354. [Google Scholar] [CrossRef]
Sun, C.; Wang, L.; Yan, Y.; Zhang, W.; Shao, D. A novel heterogeneous ensemble approach to variable selection for gas-liquid two-phase CO₂ flow metering. Int. J. Greenh. Gas Control. 2021, 110, 103418. [Google Scholar] [CrossRef]
Ding, C.; Cao, X.; Næss, P. Applying gradient boosting decision trees to examine non-linear effects of the built environment on driving distance in Oslo. Transp. Res. Part A Policy Pract. 2018, 110, 107–117. [Google Scholar] [CrossRef]
Beucher, A.; Møller, A.; Greve, M. Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma 2019, 352, 351–359. [Google Scholar] [CrossRef]
Jeong, K.; Hong, T.; Kim, J. Development of a CO₂ emission benchmark for achieving the national CO₂ emission reduction target by 2030. Energy Build. 2018, 158, 86–94. [Google Scholar] [CrossRef]
Peng, H.; Nishiyama, Y.; Sezaki, K. Estimation of Greenhouse Gas Emission Reduction from Shared Micromobility System. In Proceedings of the 2021 IEEE Green Energy and Smart Systems Conference (IGESSC), Long Beach, CA, USA, 1 November 2021; pp. 3–5. [Google Scholar] [CrossRef]
Koch, J.; Gotfredsen, J.; Schneider, R.; Troldborg, L.; Stisen, S.; Henriksen, H.J. High resolution water table modeling of the shallow groundwater using a knowledge-guided gradient boosting decision tree model. Front. Water 2021, 3, 1–2. [Google Scholar] [CrossRef]
Abu Alsaud, S.; Katterbauer, K.; Alqasim, A.; Marsala, A. A Decision Tree Framework for Tracking CO₂ Fronts in Carbonate Reservoirs from Deep Measurements Data. In Proceedings of the International Petroleum Technology Conference, Riyadh, Saudi Arabia, 21 February 2022. [Google Scholar] [CrossRef]
Kahia, M.; Moulahi, T.; Mahfoudhi, S.; Boubaker, S.; Omri, A. A machine learning process for examining the linkage among disaggregated energy consumption, economic growth, and environmental degradation. Resour. Policy 2022, 79, 103104. [Google Scholar] [CrossRef]
Alaraj, M.; Kumar, A.; Alsaidan, I.; Rizwan, M.; Jamil, M. Energy production forecasting from Solar photovoltaic plants based on meteorological parameters for Qassim region, Saudi Arabia. IEEE Access 2021, 9, 83241–83251. [Google Scholar] [CrossRef]
Mallick, J.; Almesfer, M.K.; Alsubih, M.; Talukdar, S.; Ahmed, M.; Ben Kahla, N. Developing a new method for future groundwater potentiality mapping under climate change in Bisha watershed, Saudi Arabia. Geocarto Int. 2022, 37, 1–33. [Google Scholar] [CrossRef]
Al-Najjar, D.; Al-Najjar, H.; Al-Rousan, N.; Assous, H.F. Developing machine learning techniques to investigate the impact of air quality indices on tadawul exchange index. Complex 2022, 2022, 1–12. [Google Scholar] [CrossRef]
Barashid, K.; Munshi, A.; Alhindi, A. Wind farm power prediction considering layout and wake effect: Case study of Saudi Arabia. Energies 2023, 16, 938. [Google Scholar] [CrossRef]
Gütschow, J.; Jeffery, M.L.; Gieseke, R.; Gebel, R.; Stevens, D.; Krapp, M.; Rocha, M. The PRIMAP-hist national historical emissions time series. Earth Syst. Sci. Data 2016, 8, 571–603. [Google Scholar] [CrossRef] [Green Version]
The World Bank. World Development Indicators DataBank. 2022. Available online: https://databank.worldbank.org/source/world-development-indicators (accessed on 12 March 2023).
Kadavi, P.R.; Lee, C.-W.; Lee, S. Landslide-susceptibility mapping in Gangwon-do, South Korea, using logistic regression and decision tree models. Environ. Earth Sci. 2019, 78, 116. [Google Scholar] [CrossRef]
Shapley, L. A Value for N-Person Games. Contributions to the Theory of Games. Ann. Math. Stud. 1953, 2, 307–317. [Google Scholar] [CrossRef]
Shahriar, M.S.; Shafiullah; Rana, J.; Ali, A.; Ahmed, A.; Rahman, S.M. Neurogenetic approach for real-time damping of low-frequency oscillations in electric networks. Comput. Electr. Eng. 2020, 83, 106600. [Google Scholar] [CrossRef]
Shafiullah, M.; Abido, M.A.; Al-Mohammed, A.H. Chapter 3-Artificial intelligence techniques. In Power System Fault Diagnosis; Elsevier: Amsterdam, The Netherlands, 2022; pp. 69–100. [Google Scholar] [CrossRef]
Shafiullah, M.; AlShumayri, K.A.; Alam, M.S. Machine learning tools for active distribution grid fault diagnosis. Adv. Eng. Softw. 2022, 173, 103279. [Google Scholar] [CrossRef]
Jamal, A.; Reza, I. Shafiullah Modeling retroreflectivity degradation of traffic signs using artificial neural networks. IATSS Res. 2022, 46, 499–514. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, H.; Wang, R.; Zhang, M.; Huang, Y.; Hu, J.; Peng, J. Measuring the critical influence factors for predicting carbon dioxide emissions of expanding megacities by XGBoost. Atmos 2022, 13, 599. [Google Scholar] [CrossRef]
Li, Y.; Sun, Y. Modeling and predicting city-level CO₂ emissions using open access data and machine learning. Environ. Sci. Pollut. Res. 2021, 28, 19260–19271. [Google Scholar] [CrossRef]
Timofeev, A.V.; Piirainen, V.Y.; Bazhin, V.Y.; Titov, A.B. Operational analysis and medium-term forecasting of the greenhouse gas generation intensity in the cryolithozone. Atmos 2021, 12, 1466. [Google Scholar] [CrossRef]
Romeiko, X.X.; Guo, Z.; Pang, Y. Comparison of Support Vector Machine and Gradient Boosting Regression Tree for Predicting Spatially Explicit Life Cycle Global Warming and Eutrophication Impacts: A case study in corn production. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3277–3284. [Google Scholar] [CrossRef]
Cui, X.; Shaojun, E.; Niu, D.; Chen, B.; Feng, J. Forecasting of carbon emission in China based on gradient boosting decision tree optimized by modified whale optimization algorithm. Sustainability 2021, 13, 12302. [Google Scholar] [CrossRef]
Štrumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 2013, 41, 647–665. [Google Scholar] [CrossRef]

Figure 1. Scatterplot of full dataset for Saudi Arabia (Note: 1e7 is equal to 1 × 10⁷).

Figure 2. Correlation matrix plot of the dataset.

Figure 3. Decision tree model.

Figure 4. Bagged decision tree.

Figure 5. Structure of the proposed tree-based AI model.

Figure 6. DT model output comparison: (a) Scatter plot for the training dataset; (b) scatter plot for the testing dataset; and (c) estimated vs. DT models predicted GHG emissions.

Figure 7. Predicted GHG emissions up to 2040 employing various decision tree models.

Figure 8. Shapley explanation of the predictors.

Table 1. Machine learning model hyper-parameters.

Model	Hyper-Parameters
Feed forward neural network (FFNN)	3 layers (input, hidden, and output), tan sigmoid activation function, resilient backpropagation training algorithm, and 3 nodes in the hidden layer.
Bagged decision tree	Employed number of trees = 1000; learning rate = 0.001
Boosted decision tree	Employed number of trees = 5000; learning rate = 0.001
Gradient boosted decision tree	Employed number of learning cycles and bins = 100 and 50. boosting method = least square.

Table 2. Statistical performance measures for employed decision tree models.

Model	Stage	RMSE (GtCO₂e)	MAPE (GtCO₂e)	RSR	WIA	R²
Bagged decision tree	Training	0.87	0.31	0.454	0.95	0.898
Bagged decision tree	Testing	0.84	0.29	0.438	0.95	0.901
Boosted decision tree	Training	0.98	0.33	0.508	0.94	0.873
Boosted decision tree	Testing	0.94	0.33	0.489	0.94	0.878
Gradient boosted decision tree	Training	1.01	0.34	0.527	0.93	0.864
Gradient boosted decision tree	Testing	1.00	0.35	0.523	0.93	0.862
Feedforward neural network (FFNN)	Training	1.18	0.43	0.612	0.91	0.878
Feedforward neural network (FFNN)	Testing	1.13	0.39	0.588	0.91	0.879

Table 3. DT-based AI model projected GHG emissions in 2040 for Saudi Arabia.

Linear Regression	Bagged Decision Tree	Boosted Decision Tree	Gradient Boosted Decision Tree	Feed Forward Neural Network
(Million ton CO₂eq.)
904.97	867.36	873.62	852.29	815.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rahman, M.M.; Shafiullah, M.; Alam, M.S.; Rahman, M.S.; Alsanad, M.A.; Islam, M.M.; Islam, M.K.; Rahman, S.M. Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia. Appl. Sci. 2023, 13, 3832. https://doi.org/10.3390/app13063832

AMA Style

Rahman MM, Shafiullah M, Alam MS, Rahman MS, Alsanad MA, Islam MM, Islam MK, Rahman SM. Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia. Applied Sciences. 2023; 13(6):3832. https://doi.org/10.3390/app13063832

Chicago/Turabian Style

Rahman, Muhammad Muhitur, Md Shafiullah, Md Shafiul Alam, Mohammad Shahedur Rahman, Mohammed Ahmed Alsanad, Mohammed Monirul Islam, Md Kamrul Islam, and Syed Masiur Rahman. 2023. "Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia" Applied Sciences 13, no. 6: 3832. https://doi.org/10.3390/app13063832

APA Style

Rahman, M. M., Shafiullah, M., Alam, M. S., Rahman, M. S., Alsanad, M. A., Islam, M. M., Islam, M. K., & Rahman, S. M. (2023). Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia. Applied Sciences, 13(6), 3832. https://doi.org/10.3390/app13063832

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Source

2.2. Model Development

2.2.1. Bagged Decision Tree

2.2.2. Boosted Decision Tree

2.2.3. Gradient Boosted Decision Tree

2.3. Performance Evaluation Matrices

2.4. Shapley Analysis

3. Results and Discussion

3.1. Model Fitting Results

3.2. Future Prediction of GHG Emission by Developed AI Models and Mitigation Measures

3.3. Feature Importance

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI