Next Article in Journal
Exploring Climate-Induced Agricultural Risk in Saudi Arabia: Evidence from Farming Communities of Medina Region
Previous Article in Journal
Exploring the Evolution Trend of China’s Digital Carbon Footprint: A Simulation Based on System Dynamics Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Determinants of Yearly CO2 Emission Fluctuations: A Machine Learning Perspective to Unveil Dynamics

1
Department of Advanced Convergence, Handong Global University, Pohang 37554, Republic of Korea
2
School of Global Entrepreneurship and Information Communication Technology, Handong Global University, Pohang 37554, Republic of Korea
3
School of Management and Economics, Convergence, Handong Global University, Pohang 37554, Republic of Korea
4
Department of Global Development and Entrepreneurship, Convergence, Handong Global University, Pohang 37554, Republic of Korea
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(10), 4242; https://doi.org/10.3390/su16104242
Submission received: 12 April 2024 / Revised: 13 May 2024 / Accepted: 15 May 2024 / Published: 17 May 2024

Abstract

:
In order to understand the dynamics in climate change, inform policy decisions and prompt timely action to mitigate its impact, this study provides a comprehensive analysis of the short-term trend of the year-on-year CO2 emission changes across ten countries, considering a broad range of factors including socioeconomic factors, CO2-related industry, and education. This study uniquely goes beyond the common country-based analysis, offering a broader understanding of the interconnected impact of CO2 emissions across countries. Our preliminary regression analysis, using the ten most significant features, could only explain 66% of the variations in the target. To capture the emissions trend variation, we categorized countries by the change in CO2 emission volatility (high, moderate, low with upward or downward trends), assessed using standard deviation. We employed machine learning techniques, including feature importance analysis, Partial Dependence Plots (PDPs), sensitivity analysis, and Pearson and Canonical correlation analyses, to identify influential factors driving these short-term changes. The Decision Tree Classifier was the most accurate model, with an accuracy of 96%. It revealed population size, CO2 emissions from coal, the three-year average change in CO2 emissions, GDP, CO2 emissions from oil, education level (incomplete primary), and contribution to temperature rise as the most significant predictors, in order of importance. Furthermore, this study estimates the likelihood of a country transitioning to a higher emission category. Our findings provide valuable insights into the temporal dynamics of factors influencing CO2 emissions changes, contributing to the global efforts to address climate change.

1. Introduction

Greenhouse gases (GHGs) are one of the main reasons behind natural disaster risks. Combined with socioeconomic conditions, governance, and conflict, these complex and dynamic phenomena are causing huge damage [1]. Extensive research has been conducted on natural disaster risks, related topics, and their possible implications, resulting in the establishment of several indicators suitable for explaining and quantifying their significance and possible impact [2,3,4]. While research has illuminated various aspects of these disaster risks, contributing thus to the progress achieved in this field, the current frameworks designed to reduce their impacts are often designed for long-term durations [3], which represents a major constraint, considering the capricious nature of these hazards and their escalating repercussions on human lives. Furthermore, even when those frameworks are implemented, the peril persists, thereby increasing the vulnerability of the nations categorized as least developed [5], limiting those nations in ameliorating their positions. As a fact, considering the WRI and its subcomponents, it is more likely for a country, either developed or not, to remain in its position of vulnerability and susceptibility within five consecutive years. Also, the least developed countries have a one percent probability of improving their position, but only after 5 years [6]. Among GHGs, carbon dioxide (CO2) receives particular attention due to its high rate of production as a result of human activities and its negative environmental impacts, such as air pollution, temperature rise, etc.; this situation is alarming, since the least developed countries which pollute and emit less are more exposed to the disasters induced by the production of CO2 compared to developed countries, which pollute and emit the most [7]. Thanks to technological advances, several studies have provided accurate forecasting and projections of CO2 emissions, as well as deep insight into the interplay of other components, such as the political, geographical, economic, environmental, and societal components in their production, thus enhancing our understanding on the subject, which supports current frameworks such as the Paris Agreement and other decarbonization pathways. Understanding the factors contributing to CO2 emissions, whether they are direct (like burning fossil fuels) or indirect (like deforestation), can be relatively straightforward; however, explaining the changes in CO2 emissions over a period is a more complex task because such a process involves not only understanding the factors contributing to emissions, but also understanding their dynamics over time. Such a process requires a deep understanding of a wide range of fields, including technological, economic, and policy changes, as well as changes in energy use, land use, and population growth. Moreover, the relationship between these factors and CO2 emissions can be non-linear and involve complex feedback loops. For example, economic growth might lead to increased energy use and CO2 emissions, but it could also drive technological innovation that reduces emissions [8,9]. This subject is even more complex considering the possible implication of decarbonization on the economy of nations which, in the majority, are sustained by high CO2 emitters such as coal, oil, and cement. The limited or absent clear responses to explain the change in CO2 emissions over time and on a global scale, coupled with the urgent need to provide a more inclusive response to address this threat, have prompted this research, which aims to answer these questions:
  • Over time, how have economic factors, CO2-related industries, educational levels and population dynamics interacted to influence the short-term change in the trend of CO2 emissions across diverse countries with diverse characteristics, with respect to the factors mentioned?
  • What insight in terms of the identification and quantification of the temporal dynamics and the influence of these factors can machine learning techniques highlight to deepen one’s understanding of this change over time on a global scale?
This study pioneers a holistic approach to understanding global CO2 emissions, addressing a critical gap in existing research. Traditional studies often limit their focus to individual countries and a narrow set of factors, which impedes a comprehensive global understanding. Our research, however, broadens this perspective by analyzing an extensive set of factors across a diverse range of countries. This innovative approach illuminates the complex dynamics driving the changes in CO2 emissions, providing crucial insights for formulating effective global responses to this pressing environmental issue. This research stands out for its use of a unique dataset that amalgamates data from 10 different countries, each with distinct characteristics. This allows us to acknowledge both the direct and indirect factors that have been identified by experts and researchers as potential influencers of changes in CO2 emissions. On top of this, what sets our study apart is the application of advanced machine learning and statistical techniques, to understand the temporal dependency and the contribution of these factors regarding changes in CO2 emissions. This approach enables us to provide a clear, quantifiable understanding of how these factors interplay and contribute to changes in CO2 emissions on a global scale, replacing the blurred comprehension that currently exists. Following these steps, this study contributes to the field by demonstrating the generalizability of these results, given the diverse backgrounds of the selected countries. This not only enhances the applicability of our findings but also paves the way for future research in this area. To achieve this task, the remainder of the paper is organized as follows: after the Section 1 dedicated to the introduction, the Section 2 will discuss the materials and methods considered, followed by the Results and Discussion sections. Finally, the last section is for the conclusions.

2. Materials and Methods

2.1. Data Collection

A dataset comprising twenty-six distinct features, spanning the period from 1960 to 2022, was meticulously compiled from two esteemed sources: ourworldindata.org and the World Bank Open Data. The objective of this analysis was to yield a more generalizable outcome; hence, the selection of countries was based on a diverse range of factors including economic level, population dynamics, geographical location, education level, and their respective contributions to CO2 emissions. The countries under consideration were the United States, United Kingdom, South Korea, China, India, France, Brazil, Democratic Republic of the Congo, Nigeria, and South Africa (refer to Appendix A for further details). The dataset, organized on an annual basis, encompasses five categories of features: population dynamics (2 features), economic factors (3 features), education (7 features), CO2-related industry (5 features), and CO2-related emissions and temperature (7 features). To handle the missing values in the dataset, the iterative imputation technique from the Fancy Impute library in Python was employed (for more information, refer to Appendix B). This approach ensures the integrity and completeness of the data, thereby enhancing the reliability of the subsequent analysis.

2.2. Data Preparation

Python 3.10 on the last distribution of anaconda was used. The data were prepared for the supervised machine learning techniques (both regression and classification). Considering the varying trend for each country for the absolute change in CO2 emissions (Appendix D), which is the target variable, it was necessary to mitigate this considering its potential negative impact on the performance of the algorithms. To capture the short-term trend of the target, countries were grouped based on the percentile of volatility of the mean value for 3 years of the target. This data-driven approach instills generalizability in the findings and helped overcome the limitation of the regression technique which suffers from the extreme variability of the target.
Figure 1 explains the grouping process.
In order to capture the unique characteristics and conditions of each country, preserve the temporal order, avoid data leakage, and provide a realistic evaluation and understanding of the target’s temporal dynamics, we first grouped the data by country. Following this, we defined a rolling period of 3 years, to obtain the mean value for each country over this period, which served as the new target. By the same process of grouping data by country, the standard deviation was determined for each of them. The percentiles (25% as moderate low threshold and 75% as moderate high threshold) were used as thresholds to define the low, moderate and high volatility in the 3 years of mean values for each country. This stage helped to categorize each countries’ variation using data-driven approach that can be adapted to other datasets. Considering that the standard deviation is always positive but the variations sometimes take negative values, it was imperative to specify the direction of the variation as either negative or positive. Thus, after verification of the sign of the new target, that sign was picked and assigned to the level of volatility defined by the threshold. This process resulted in six classes of volatility: High positive and negative, Moderate positive and negative, and Low positive and negative. By doing so, not only is the volatility defined, but also the direction, making it more comprehensive for the interpretation of the prediction. The outcome of this process is presented in the Section 3.

2.3. Machine Learning Algorithms

For the regressions approach, the performance of the following regressors was compared: Linear Regression [10], Ridge Regression [11], Bagging Regressor [12], Random Forest Regressor [13], Gradient Boosting Regressor [14], XGBoost Regressor [15], AdaBoost Regressor [16] and KNeighbors Regressor [17]. Concerning classification, the performance of the following classifiers was compared: Logistic Regression (LogReg) [18], Decision Tree (DT) [19], Random Forest Classifier (RF) [20], XGBoost classifier (XGB) [21], Multi-Layer Perceptron classifier (MLP) [13], Bagging (BC) [22], AdaBoost (ABC) [23], Gradient Boosting (GB) [24], Support Vector (SVC) [25], and Gaussian Naïve Bayes (GNB) [26].

2.4. Metrics

Two rounds of evaluation were considered in the two approaches: the first consisting of the selection of the best performing model and the second of the final evaluation of the best performing model. To achieve this, the following metrics were considered:
o
For the selection of the best performing algorithm:
Regression: Cross validation score [27], Mean squared error [28], Residuals [29], R-squared [30]
Classification: Cross validation score [27], Accuracy score, Confusion matrix [31] and classification report [32].
o
For final evaluation:
Regression: Mean squared error, Residuals, R-squared
Classification: Accuracy score, Matthew correlation coefficient [33], Confusion matrix and classification report.

2.5. Explainable Machine Learning Techniques

To instill confidence in the prediction, the following explainable techniques were considered.

2.5.1. Partial Dependence Plots (PDPs)

It provides plots showing the marginal effect that features have on the predicted outcome of a machine learning model. A PDP can show whether the relationship between the feature and target is complex, monotonic or linear. It is an important technique since it has a causal interpretation, which means that is explains the outcome of a prediction [34,35].
It is defined as:
f s   x s = 1 n i = 1 n f ( x s , x c ( i ) )
where
xs are the features for which the PDP is to be plotted,
xc are the other features used in the machine learning model f,
x c ( i ) are the actual features in the model which we are not interested, and
n is the number of instances in the dataset.
This analysis was achieved using the Partial Dependence Display package from the sklearn library. The outcome of this analysis is presented in the Section 3.

2.5.2. Sensitivity Analysis

Sensitivity analysis is a useful technique for understanding the impact of changes in the input features on the model outcome. By doing so, it provides insight into the most important features by quantifying the uncertainty in the model’s output [36]. It is often used to measure the correlation between changes in an input variable and the resulting changes in the output variable, and aims to study how the uncertainty in the output can be allocated to different sources of uncertainty in the inputs [37]. This process can be represented as follows:
Considering a model as a function g :   R N   R M , where N is the number of input variables and M the number of output variables. The input variables are represented as a vector x = [ x 1 ,   x 2 x N ] , and the output variables are represented as a vector y = [ y 1 ,   y 2 y M ] . The model maps the input variables to the output variables, for instance, y = g(x).
For a given input variable x i   , the sensitivity S i   of the output variable y j   with respect to x i   can be calculated as follows:
  S i = y j x i ˙ x i y j  
Formula (2) represents the relative change in y j for a relative change in x i .
This analysis was achieved using saltelli from the SALib package for sample generation, following the defined problem, and sobol from the same SALib, to get the first and total order sensitivity indices. The result of this process is provided in the Section 3.

2.5.3. Feature Analysis

A correlation analysis, especially the Pearson Correlation between features and the Canonical Correlation [38] among groups of features, was considered to evaluate the interplay of the features over time, to complete the one achieved using PDPs and sensitivity analysis.

2.6. Research Design

The process followed in this research is represented in Figure 2.
Once the data are collected from the different sources and for the considered country, they are merged to make a unique dataset. Missing values are imputed using the iterative imputation, then grouped by country before applying a temporal splitting of the train from 31 December 1960 to 31 December 2009 and the test from 31 December 2010 to 31 December 2022. Features were scaled using the Standard Scaler from the sklearn library. The Pearson and Canonical correlation analysis took place for the analysis of the interplay of features. The first approach of the modeling consisted in the regression technique to predict the average value of 3 years of the absolute change in CO2 emissions, a variable which could capture the short-term trend of the target. Iteratively, the baseline modeling and grouping by volatility was considered for improvement, since the other did not improve it. The poor performance resulting from this approach led us to consider the classification technique. To achieve this, the process explained in Figure 1 was applied to the new target, and, to improve the performance of the classifiers, class balancing techniques such as Synthetic Minority Over -sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN) and class weight were considered. After this stage, the XAI is achieved and the result, including the one resulting from the correlation analysis, was analyzed and interpreted.

3. Results

3.1. Regression Analysis

In the process of selecting the best regressor, it appeared that the Gradient Boosting Regressor algorithm provided the best score (Figure 3, Appendix C.1 and Appendix C.2). To improve its performance, a reduction of dimensionality was applied using the PCA and feature selection techniques.
This general poor performance led us to consider the option of grouping the target based on each country’s volatility, to mitigate that difference, which tends to reduce the effectiveness of the model to capture hidden patterns during the training. With the Gradient Boosting Regressor as the best model, PCA (using n_components of 0.98) and a selection of the best 10 features using the Recursive Feature Elimination (RFE) [39] with the best model were applied. These best 10 features are: CO2 from cement, CO2 from gas, CO2 from coal, share of cumulative CO2 emissions, population—education: incomplete primary, population (number), GDP per capita, change in GDP, annual CO2 emissions growth (%) and absolute CO2 change. The summary of the performance of the best model based on the above mentioned techniques is presented in Figure 4.
This approach provided two groups: high volatility (China and United States) and low volatility (the remaining countries). When applying the same process of model selection, there is an observed improvement for countries in the low category, considering the result of the AdaBoost Regressor (70% r-squared), despite its inefficiency in generalizing the test set (cross validation score: −0.89).
Overall, using regression appeared not to be suitable, since it is not able to explain the target. This limitation justifies the need to find alternatives, among which we have the transformation of the target in classes following the process presented in Figure 2.

3.2. Classification Analysis

Grouping Target in Classes

The process explained in Figure 2 could provide six imbalanced classes, presented in Table 1. A better understanding of this classification is provided in Figure 5, which depicts the temporal dynamics of the mean value for 3 years for each country.
The High (positive and negative) category represents the group of high polluting countries despite efforts to reduce the level of CO2 emissions over time. The Moderate (positive and negative) category is the group of emitters whose level of CO2 emissions is important but still tolerable compared to the previous group, And the Low category (positive and negative) represents those countries whose emission is quite good compared to the others. It appears, over time, that countries belonging to a given category remained in it but fluctuated in both positive and negative directions (Table 2), except India, which in the early 2000s moved from the moderate to high volatility group (Figure 6).
Following the different thresholds considered in the process of transformation, a country could find itself in a different category over time. The general occurrence of classes over time is presented in Figure 7.
Figure 8 demonstrates that the positive trend occurred almost every year, making their number higher than the negatives in each category. This trend confirms the general increase in CO2 emissions worldwide despite efforts to reduce it. With this categorization complete, the performance of the classifiers applied to these data is provided in Table 3.
It appears that the Decision Tree model performed better compared to the other classifiers. Its flexibility and capacity to analyze data regardless of prior consideration regarding the distribution of the input data makes this model robust enough to capture hidden trends. Even after a search of the best parameters, the confusion matrix, as well as the classification report of the DT, present errors in the prediction of 1 instance out of 19 from the class of Moderate positive (providing a recall of 0.95, a precision of 1.00 and an F1 score of 0.97), which is predicted as Low positive, 5 instances out of 33 from the class of Moderate negative (having a recall of 0.85, a precision of 0.97 and an F1 score of 0.97), predicted as High negative, and 1 instance out of 16 from the class of High positive (displaying a recall of 0.94, a precision of 1.00 and an F1 score of 0.97), predicted as Moderate negative.
Based on this result, and despite the small misclassification, the model could capture the general short-term trend (3-year average) of the target. The features contributing to this prediction are used to understand their interplay and contribution over time on the target.

3.3. Feature Analysis

3.3.1. Feature Importance Analysis

Using the decision tree algorithm after a grid search of the best parameters, this analysis reveals that eight features contribute to the prediction (Figure 9).
From the present group of features, two features from the economic group (change in GDP and GDP per capita), one from the population group (population number), one from the education group (population—education: incomplete primary), one from CO2-related industry (CO2 from oil), and the remaining three from CO2-related activity (GHG by world region, share of cumulative CO2 emissions and the mean value of 3 years) contributed to the prediction. A summary of the PDPs (Table 4) explains their contributions.
Evaluating each country’s trend over time for the features considered (Figure 10) provides a better understanding of the overall dynamics.
In all the categories, the increase or decrease in the 3-year average determines the sign (positive or negative) of the category (Figure 10j). Coupled with the fluctuation in the 3-year average, the dynamics in the population number influences the categorization of a country. High emitters have a large population compared to others (Figure 10a). Over time, countries that have an important variation in the change in GDP tend to emit less compared to those that have low variations (Figure 10f). The trend of the GDP per capita (Figure 10g) coupled with the GDP (Figure 10l) suggests that they cannot clearly explain the trend in CO2 emissions, since some rich countries emit less compared to others. It also appears that the level of education (Figure 10d,i), more specifically, early access to education, could potentially explain the target. When it comes to the CO2 related industry (Figure 10b,k), countries that emits the more (Figure 10e,h) appear to be the wealthy (Figure 10l), and have a lower variations in the GDP (Figure 10f) compared to the other group of countries. Furthermore, in the group of countries considered, the wealthier a country is, the higher the number of children that have early access to education (Figure 10d,i), result demonstrating the complexity of possible dynamics. This analysis could depicts the existing but complex interaction among features, which, once understood, could deepen our understanding of the present dynamics. To capture possible dependency between features, which could not be achieved using the feature importance, or PDPs which assumes independence among features, the sensitivity analysis was applied.

3.3.2. Sensitivity Analysis

Using 1000 samples with bounds between −5 and 9, it appears that seven variables, slightly different to those from the feature important analysis, with an impact on the performance of the model (Figure 11)
In capturing possible interactions among features, this analysis could identify and quantify four key groups that contribute to the short-term dynamics of year-on-year changes in CO2 emissions. These groups are population, CO2-related industries (including coal and oil), economic activity assessed through the GPD, the contribution to temperature rise, associated CO2 emissions, and early access to education assessed by the population—education: incomplete primary. These features could be grouped into two: those having a direct impact (population, CO2 from coil, oil, mean-3y), and those which explain it indirectly (GDP, contribution to temperature rise and population—education: incomplete primary).

3.3.3. Correlation Analysis

To deepen the understanding on the interaction of features, the Pearson and Canonical correlation was used. Figure 12 provides the correlation table of the features in the dataset.
In comparing the group of features, there is a strong positive correlation among variables in the CO2 emissions and CO2-related industry with the group related to the level of education, economic features, population number, but not with the population growth rate or GDP per capita. In more detail, there is a strong positive correlation (≥0.50) between the annual CO2 emissions and GDP (0.84), population number (0.55), population—education: lower primary (0.50), population—education: lower secondary (0.71), population—education: upper secondary (0.82), population—education: post secondary (0.86), contribution to temperature rise (0.86), share of cumulative CO2 emissions (0.71), cumulative CO2 emissions (0.85), CO2 from coal (0.92), CO2 from oil (0.84), CO2 from gas (0.73), CO2 cement (0.77), GHG emissions by world region (0.98); there is also a strong negative correlation between this same variable and the population growth rate (−0.37). This result confirms the existing studies about the interplay of the considered variables to the emission of CO2. For instance, rich countries are high polluters, and the concentration of population is one reason behind CO2 emissions fluctuations. Also, education plays an important role in understanding and developing ways to mitigate CO2 emissions [8]. While such observation seems straightforward, it is not the case for the absolute change in CO2 emissions which is the target variable. Indeed, only population—education: primary (0.53/0.63), population—education: lower secondary (0.6/0.72), CO2 from coal (0.57/0.70) and CO2 from cement (0.56/0.69) have a strong positive correlation with the target. In the short-term, we can observe an average increase of 1.26% in the correlation coefficient between CO2-related features with the target, as well as in the education features and population number, in comparison to what it was with the absolute change in CO2 emissions. Thanks to the Canonical correlation, it is possible to deeply visualize this correlation direction. However, when it comes to the corresponding categories, this direction is still strong, but changes to a negative. This suggests that as the value increases, it is more likely for the target to be in the High positive category. To deepen understanding on the interplay of features already provided by the sensitivity analysis, the canonical correlation analysis allows us, through plotting, to visualize the direction of these features over time. To achieve this, after grouping features based on their groups, a comparison between them was achieved in the following order: population with CO2 industry, population with CO2 related emissions, population with education, population with economy, CO2 industry with CO2-related emissions, CO2 industry with education, CO2-related emissions with education, CO2 industry with economy, CO2-related emissions with economy, and education with economy. Figure 13 presents the result of this analysis.
Two trends are displayed from this analysis. Over time, there is a linear trend in the group of CO2-related emissions and industry with education (Figure 10g and Figure 13f), and economy (Figure 13h,i), population with education (Figure 13c), and education with economy (Figure 13j), and a nonlinear one for population with CO2-related emissions and industry (Figure 13a,b) and population with economy (Figure 13d). This result suggests a strong positive correlation between the group of features having a significant impact on the fluctuations of the CO2 emissions overtime. In the group of countries considered, regardless of the decreasing trend in population growth rate of some countries, this analysis unveils that the dynamics in the population differently affect each country’s economy. As the economy grows, it tends to positively impact the CO2 emissions, influencing the short-term variations of CO2 emissions (Figure 13h,i). Early access to education displays a linear trend with the economy growth (Figure 13j), CO2 emissions (Figure 13i) and population dynamics (Figure 13d), suggesting that the more a country emits, the more it becomes wealthy and is able to implement laws to support education. The level of education increases with the growth of the economy (Figure 13j) and CO2 emissions (Figure 13g), suggesting the wealthier the country, access to education become easy overtime. The contribution to temperature rise does not solely depend on the emissions of CO2. However, as for the early access to education, this indicator is meaningful in explaining the target of this analysis. These trends could help anticipate future dynamics in the monitoring of CO2 emissions, resulting in the implementation of adequate policy to tackle this threat while maintaining a good level of economy.

3.4. Discussion

The significant improvement in model performance from an R-squared of 67% to an accuracy of 96% after applying the proposed data transformation technique indicates the effectiveness of this approach. Indeed, the increase in accuracy demonstrates that the transformation technique was successful in capturing the underlying patterns in the data. This suggests that CO2 emissions changes are not just a function of the factors considered but also their volatility and direction. By categorizing the data into six classes of volatility (high positive and negative, moderate positive and negative, and low positive and negative), the model can capture more nuanced information. This approach recognizes that the rate and direction of change can be just as important as the magnitude of the change itself.
The use of percentiles to define thresholds for volatility is a data-driven approach that adapts to the specific characteristics of the dataset. This method is more flexible and potentially more accurate than using arbitrary or fixed thresholds.
Furthermore, the technique is designed to be adaptable to other datasets, enhancing its generalizability. This is a significant contribution, as it means the approach could be applied to other countries or variables, extending its usefulness beyond this specific study.
Some statistics of the categories provided in Table 5, coupled with the sensitivity analysis result suggest that:
  • High negative: Countries in this group have a high average population and GDP, a high CO2 emission from both coal and oil. Despite their high GDP and CO2 emissions, these countries have seen a decrease in CO2 emissions over time. However, they also have a high contribution to temperature rise. These countries’ high contribution to global warming is due to their extensive use of fossil fuels for energy production, industrial processes, and transportation. Despite recent decreases in emissions, the cumulative effect of their past and present emissions continues to drive global temperature rise.
  • High positive: Countries in this group also have a high average population, a slightly lower GDP compared to the High negative group, and they have seen an increase in CO2 emissions over time. Surprisingly, they have a lower contribution to temperature rise compared to the High negative group. This could be due to their past contribution.
  • Low negative: Countries in this group have a lower average population and GDP compared to the High groups. They have lower CO2 emissions from both coal and oil and have seen a decrease in CO2 emissions over time. Finally, they contribute less to temperature rise compared to the High groups.
  • Low positive: Countries in this group have a similar average population to the Low negative group. They have a lower GDP compared to the Low negative group but they have seen an increase in CO2 emissions over time. They also contribute less to temperature rise compared to the High groups.
  • Moderate negative: Countries in this group have a moderate average population and GDP. They have moderate CO2 emissions from both coal and oil, and they have seen a decrease in CO2 emissions over time. They contributed moderately to temperature rise compared to the High groups.
  • Moderate positive: Countries in this group have a similar average population to the Moderate negative group. They have a lower GDP compared to the Moderate negative group and they have seen an increase in CO2 emissions over time. They have a lower contribution to temperature rise compared to the Moderate negative group.
A comparison of the two categories, High negative and High positive, suggests that:
  • Population (number): The average population is higher in the High positive group (approximately 807 million) compared to the High negative group (approximately 481 million). This suggests that countries with larger populations tend to have increasing CO2 emissions.
  • CO2 emissions from Coal: Both groups have high CO2 emissions from coal, but the High positive group has slightly higher emissions on average (approximately 1.76 billion tonnes) compared to the High negative group (approximately 1.45 billion tonnes).
  • CO2 emissions from Oil: The High negative group has higher CO2 emissions from oil (approximately 1.58 billion tonnes) compared to the High positive group (approximately 885 million tonnes).
  • 3-Year Mean Change in CO2 emissions (Mean-3y): The High negative group shows a decrease in CO2 emissions over time (average change of −73 million tonnes), while the High positive group shows an increase (average change of 107 million tonnes).
  • GDP: is higher on average in the High negative group (approximately 9.99 trillion USD) compared to the High positive group (approximately 4.27 trillion USD). This suggests that wealthier countries tend to have decreasing CO2 emissions.
  • Population with Incomplete Primary Education: The High positive group has a higher average population with incomplete primary education (approximately 43.2 million) compared to the High negative group (approximately 20.4 million).
  • Contribution to Temperature Rise: The High negative group has a higher average contribution to temperature rise (0.173) compared to the High positive group (0.099).
The High negative group tends to have wealthier countries with larger CO2 emissions from oil and a larger contribution to temperature rise. On the other hand, the High positive group tends to have countries with larger populations, higher CO2 emissions from coal, and a larger population with incomplete primary education.
Considering the Low negative and Low positive groups:
  • Population: The average population is approximately the same in both groups (approximately 67 million). This suggests that population size does not significantly differentiate these two groups.
  • CO2 emissions from Coal: The Low negative group has slightly higher CO2 emissions from coal on average (approximately 104 million tonnes) compared to the Low positive group (approximately 82 million tonnes).
  • 3-Year Mean Change in CO2 emissions (Mean-3y): The Low negative group shows a decrease in CO2 emissions over time (average change of −3 million tonnes), while the Low positive group shows an increase (average change of 4.8 million tonnes).
  • GDP: The GDP is slightly higher on average in the Low negative group (approximately 171 billion USD) compared to the Low positive group (approximately 138 billion USD).
  • CO2 emissions from Oil: The Low negative group has slightly higher CO2 emissions from oil (approximately 24 million tonnes) compared to the Low positive group (approximately 23 million tonnes).
  • Population with Incomplete Primary Education: The Low positive group has a slightly higher average population with incomplete primary education (approximately 3.94 million) compared to the Low negative group (approximately 3.90 million).
  • Contribution to Temperature Rise: The Low negative group has a slightly higher average contribution to temperature rise (0.0102) compared to the Low positive group (0.0085).
The Low negative group tends to have slightly higher CO2 emissions from coal and oil, a higher GDP, and a higher contribution to temperature rise, but shows a decrease in CO2 emissions over time. On the other hand, the Low positive group tends to have a slightly larger population with incomplete primary education and shows an increase in CO2 emissions over time.
Finally, for the Moderate negative and Moderate positive groups:
  • Population: The average population is slightly higher in the Moderate positive group (approximately 79 million) compared to the Moderate negative group (approximately 72 million).
  • CO2 emissions from Coal: The Moderate positive group has slightly higher CO2 emissions from coal on average (approximately 124 million tonnes) compared to the Moderate negative group (approximately 115 million tonnes).
  • 3-Year Mean Change in CO2 emissions (Mean-3y): The Moderate negative group shows a decrease in CO2 emissions over time (average change of −10.7 million tonnes), while the Moderate positive group shows an increase (average change of 10.6 million tonnes).
  • GDP: The GDP is significantly higher on average in the Moderate negative group (approximately 1.96 trillion USD) compared to the Moderate positive group (approximately 979 billion USD).
  • CO2 Emissions from Oil: The Moderate negative group has higher CO2 emissions from oil (approximately 212 million tonnes) compared to the Moderate positive group (approximately 171 million tonnes).
  • Population with Incomplete Primary Education: The Moderate positive group has a higher average population with incomplete primary education (approximately 5.82 million) compared to the Moderate negative group (approximately 2.10 million).
  • Contribution to Temperature Rise: The Moderate negative group has a higher average contribution to temperature rise (0.029) compared to the Moderate positive group (0.0225).
The Moderate negative group tends to have higher GDP, higher CO2 emissions from oil, and a higher contribution to temperature rise, but shows a decrease in CO2 emissions over time. On the other hand, the Moderate positive group tends to have a larger population, higher CO2 emissions from coal, and a larger population with incomplete primary education, but shows an increase in CO2 emissions over time. Countries with higher populations and GDPs tend to have higher CO2 emissions and contribute more to temperature rise. However, some of these countries have seen a decrease in CO2 emissions over time, suggesting that they may be taking steps to mitigate their impact on climate change. Countries with lower and moderate populations and GDPs show a diverse range of CO2 emissions and contributions to temperature rise, and some of these countries are effectively managing their CO2 emissions while others are still facing challenges. To provide a rough estimate of the shift from one category to another, we can consider the average values of the key variables for each category. For instance, the difference in average population between the High and Low categories is approximately 400 million. Therefore, an increase in population by this amount could potentially cause a shift from Low to High, or vice versa. For the CO2 emissions from coal, the average difference the High and Low categories is approximately 1.3 billion tonnes. Therefore, an increase in CO2 emissions from coal by this amount could potentially cause a shift from Low to High, or vice versa. The 3-Year Mean Change in CO2 Emissions (Mean-3y) category indicates that the difference on average in the High and Low categories is approximately 80 million tonnes. Therefore, a change within the 3-year mean change in CO2 emissions by this amount could potentially cause a shift from negative to positive, or vice versa. The difference in average GDP between the High and Low categories is approximately 9 trillion USD, suggesting that an increase in GDP by this amount could potentially cause a shift from Low to High, or vice versa. Concerning the Population with Incomplete Primary Education category, on average, the difference in between the High and Low categories is approximately 20 million, meaning that an increase in this population by this amount could potentially cause a shift from Low to High, or vice versa. Finally, in the Contribution to Temperature Rise category, the difference in average contribution to temperature rise between the High and Low categories is approximately 0.07; thus, an increase in the contribution to temperature rise by this amount could potentially cause a shift from Low to High, or vice versa. These estimates provide a rough idea of the magnitude of change in each variable that could potentially cause a shift from one category to another. However, it is important to note that these are just estimates, and the actual thresholds might be different due to the complex interactions among the variables.
Putting together the results of the feature importance analysis, PDPS and correlation analysis, this study could pinpoint the complexity of explaining the short-term trend in CO2 emissions on a global scale. Indeed, it appears that no matter the country, the number of its inhabitants is the most important signal about future CO2 emissions, and thus, its change over time. This is explained by the human impact on its direct environment in terms of construction, deforestation, etc. [40]. Fossil fuels remain a threat to the environment. This study demonstrates how particular attention should be paid to coal and oil production, since they can solely and in a very short time negatively impact the environment. This matter is quite complex because these two are strongly correlated with the wealth of countries, making it critical to find alternatives [41]. Indirectly, early access to education, similarly to the monitoring of the temperature rise appear to be among the game changers in this matter, suggesting a rapid possibility of improvement if properly used.

4. Conclusions

This study analyzed the short-term variations in CO2 emissions across ten countries. By employing machine learning techniques on a unique dataset, we identified the key factors influencing these variations. Population growth, particularly population size, and the coal industry emerged as strong contributors. Early access to education and contribution to temperature rise, while less impactful, warrant further investigation. This result sheds light on critical factors and their contribution to the year-on-year change in CO2 emissions over time and could potentially contribute in the implementation of policy that will address education and environment by promoting investment in early childhood education (The analysis suggests a link between early education and economic growth, potentially leading to higher CO2 emissions later). By investing in early education, countries might be able to foster more sustainable education, focused on environmental awareness, population and economy, promoting family planning and economic incentives, and considering the complex relationship between population growth and economy as suggested in this analysis. When a country reaches a certain level of population and pollution, policies that encourage smaller families, coupled with economic incentives, could help manage population growth while maintaining economic stability. The level of granularity of the data, coupled with the considered group of features, represent the major limitations of this study. Indeed, a yearly analysis does not capture the monthly or weekly trend in the data, which could provide more insight in the interpretation of the result. Also, the group of features considered are not the only ones to explain the change in CO2 emissions. Further analysis will consider reducing the level of granularity of the dataset and the adding other groups of features, which could potentially explain the target in different terms (mid or long-term analysis).

Author Contributions

Conceptualization, H.C. and C.M.M.; methodology, H.C.; software, C.M.M.; validation, H.C., Y.-S.K. and C.M.M.; formal analysis, C.M.M.; investigation, C.M.M.; resources, H.C.; data curation, C.M.M.; writing—original draft preparation, C.M.M.; writing—review and editing, C.M.M. and S.J.; visualization, C.M.M. and S.J.; supervision, H.C.; project administration, H.C.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

All authors agreed with the content and gave explicit consent to submit.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors express special appreciation to Hyebong Choi for his invaluable advices and support.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Features Description and Rationale

VariablesSourceUnitDescriptionRationaleSourcePeriod Considered
1Year-on-year change in CO2 emissions[42]TonnesAbsolute annual change in carbon dioxide emissionsTarget of the analysisGlobal Carbon Budget, 2023)1960 to 2022
2Annual greenhouse gas emissions by world region[43]TonnesEmissions, cumulative emissions and the global mean surface temperature response by country, gas (CO2, Ch4, N2 O or GHG) and source emissions (fossil, land use)Regional greenhouse gas emission will certainly affect the neighboring countries level of CO2 emissions[44]1960 to 2021
3Annual temperature anomalies[45]CelsiusThe deviation of a specific month’s average surface temperatureFluctuations of the temperature can be informative regarding the change in CO2 emissions[46]1960 to 2023
4Annual emissions of carbon dioxide (CO₂) from flaring[47]TonnesAnnual emissions of carbon dioxide (CO₂) from flaring based on territorial emissions (excluding traded goods and international aviation)The amount of excess of oil or gas burned during their production can explain changes in CO2 emissions[48]1960 to 2022
5Annual emissions of carbon dioxide (CO₂) from cement[47]TonnesAnnual emissions of carbon dioxide (CO₂) from cements based on territorial emissions (excluding traded goods and international aviation)The production of concrete is an important source of CO2 emissions[49]1960 to 2022
6Annual emissions of carbon dioxide (CO₂) from gas[47]TonnesAnnual emissions of carbon dioxide (CO₂) from gas based on territorial emissions (excluding traded goods and international aviation)The production of gas releases a significant amount of CO2[8,48]1960 to 2022
7Annual emissions of carbon dioxide (CO₂) from oil[47]TonnesAnnual emissions of carbon dioxide (CO₂) from oil based on territorial emissions (excluding traded goods and international aviation)The production of oil is directly linked to CO2 emissions, thus, affecting its change[8,48]1960 to 2022
8Annual emissions of carbon dioxide (CO₂) from coal[47]TonnesAnnual emissions of carbon dioxide (CO₂) from coal based on territorial emissions (excluding traded goods and international aviation)The production of coal represents a major source of CO2 emissions[8]1960 to 2022
9Cumulative CO2 emissions[50]TonnesSum of CO2 emissions produced from fossil fuels and industryThe total amount of CO2 emissions accumulated during a period can significantly affect the change in CO2 emissions in a yearly period[51]1960 to 2022
10Annual CO2 emissions growth[50]PercentageAnnual percentage growth of total emissions of CO2 excluding land use usageCO2 emissions growth is an important indicator to explain changes in CO2 emissions[8]1960 to 2022
11Share of Cumulative CO2 emissions[52]TonnesCumulative CO2 emissions measured as a percentage of global total cumulative emissions of CO2Understanding which country contributes the most is an important indicator of change in CO2 emissions.[53]1960 to 2022
12Contribution to the global mean surface temperature rise[54]CelsiusEach country’s contribution to global surface mean temperature rise from cumulative CO2, Ch4, N2OThis factor can indirectly explain variations of CO2 emissions[55]1960 to 2021
13Population growth rate[56]PercentageAverage exponential growth of the population over a given periodThe increased concentration in population generally results in many activities like urbanization, deforestation, etc., which have the potential to influence the level of CO2 emissions[8,57]1960 to 2021
14Population (number)[58]NumberPopulation by countryIdem[8,57]1960 to 2022
15Population with no education[59]NumberEducational attainmentEducation plays a significant role in reducing the vulnerability of a society, and can increase awareness to pollution[60]1960 to 2022
16Population with primary education[59]NumberEducational attainmentIdemIdem1960 to 2022
17Population with incomplete primary education[59]NumberEducational attainmentIdemIdem1960 to 2022
18Population with Secondary education[59]NumberEducational attainmentIdemIdem1960 to 2022
19Population with upper secondary education[58]NumberEducational attainmentIdemIdem1960 to 2022
20Population with lower secondary education[59]NumberEducational attainmentIdemIdem1960 to 2022
21Population under 15[59]NumberEducational attainmentIdemIdem1960 to 2022
22Global Domestic Product (GDP)[61]US dollarSum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the productsThere is a certain correlation between the prosperity of country and its level of CO2 emissions. 1960 to 2022
23Global Domestic Product per capita[62]US dollarGDP divided by midyear populationIdem[12,63]1960 to 2022
24Change in GDP[64]PercentageAnnual percentage growth of GDP at market prices based on constant local currency.Idem[12,63]1960 to 2022

Appendix B. Statistic Description of the Dataset

Annual CO2 EmissionsGHG Emissions by World RegionTemperature AnomalyCO2 from FlaringCO2 from CementCO2 from GasCO2 from OilCO2 from CoalCumulative CO2 Emissions
count630.00630.00630.00630.00630.00630.00630.00630.00630.00
mean1,180,717,000.001,804,513,000.000.297,744,875.0039,100,780.00155,176,800.00377,323,700.00590,266,200.0049,326,940,000.00
std2,067,422,000.002,450,693,000.000.3413,920,800.00120,124,900.00346,134,300.00645,444,900.001,255,671,000.0086,020,780,000.00
min1,647,474.0051,732,420.00−0.310.0050,771.000.00688,832.000.0033,411,650.00
25%119,760,600.00428,377,400.00−0.020.003,499,390.00493,724.8034,982,420.0021,795,300.002,533,510,000.00
50%394,471,300.00669,718,000.000.252,121,456.008,233,956.0019,968,290.00165,657,100.00135,809,200.0013,698,860,000.00
75%634,479,800.002,143,402,000.000.586,483,802.0024,980,980.0090,109,130.00268,966,300.00416,902,800.0051,740,210,000.00
max11,396,780,000.0013,710,640,000.000.9388,436,970.00858,232,600.001,743,539,000.002,642,692,000.008,250,736,000.00426,914,600,000.00
Share of Cumulative CO2 EmissionsContribution to Temperature RisePopulation—Education: Post SecondaryPopulation—Education: Upper SecondaryPopulation—Education: Lower SecondaryPopulation—Education: PrimaryPopulation—Education: Incomplete PrimaryPopulation—Education: No EducationPopulation—Education: under 15Population (Number)
count630.00630.00630.00630.00630.00630.00630.00630.00630.00630.00
mean5.330.0516,722,340.0036,436,890.0043,712,180.0032,234,490.0014,956,620.0042,991,600.0082,316,990.00277,365,400.00
std9.270.0627,933,400.0053,308,720.0097,421,520.0052,721,110.0022,983,330.0079,366,320.00115,380,700.00395,267,600.00
min0.010.0036,100.0068,500.00430,700.000.000.000.006,612,300.0015,276,560.00
25%0.270.011,326,700.004,598,100.005,399,125.003,810,300.00444,000.002,542,500.0011,825,100.0050,089,850.00
50%1.160.025,783,600.0013,641,200.0010,568,000.0010,126,800.003,799,500.005,732,150.0025,066,800.0066,412,130.00
75%4.750.0515,051,400.0039,150,200.0028,268,300.0027,610,400.0019,467,000.0030,892,300.0061,191,000.00250,691,100.00
max38.780.28154,720,400.00250,631,200.00537,276,300.00200,622,500.0082,623,900.00292,338,700.00380,274,300.001,425,894,000.00
Population Growth RateGDPGDP Per CapitaChange in GDPAnnual CO2 Emission Growth (%)Absolute CO2 Change
count630.00630.00630.00630.00630.00630.00
mean1.572,098,799,000,000.0012,938.544.023.4326,837,800.00
std0.983,839,755,000,000.0015,577.384.8110.74103,135,300.00
min−0.39−71,767,060,000.00−6173.54−27.27−48.33−547,516,900.00
25%0.68166,345,200,000.001367.951.74−1.09−961,013.50
50%1.37693,487,400,000.005221.583.853.075,909,344.00
75%2.451,841,303,000,000.0023,704.806.736.8424,604,410.00
max5.9220,529,460,000,000.0083,951.6125.0182.62911,781,900.00

Appendix C. Regression Results Summary

ModelsResidualsMean Squared ErrorR-SquaredMean Cross Validation Score
BaselinePCAFSBaselinePCAFSBaselinePCAFSBaselinePCAFS
Linear regression−59,327,819.08--3.103195 × 1020--−1.144--−253.80--
Ridge Regression−40,263,611.43--1.699994 × 1020--−0.17--−29.06--
Random Forest Regressor−24,435,046.47--59,685,956 × 108--0.58--−0.89--
Bagging Regressor−22,162,025.93--59,739,629 × 108--0.58--−40.29--
Gradient Boosting Regressor−14,687,525.64−20,216,617.8−18,239,234.850,060,367 × 108609,461 × 1064,886,957 × 1050.650.570.66−1.88−0.71−0.71
XGBoost Regressor−28,289,899.56--79,866,357 × 108--0.44--−53.54--
KNeighbors Regressor−34,832,710.33--1.110808 × 1020--0.23--−0.32--
Adaboost Regressor−23,440,275.88--64,210,504 × 108--0.55--−1.24--

Appendix C.1. Grouping by Volatility by Targets: High Volatility

ModelsResidualsMean Squared ErrorR-SquaredMean Cross Validation Score
Linear regression−59,327,819.083.103195 × 1020−1.144−253.80
Ridge Regression−40,263,611.431.699994 × 1020−0.17−29.18
Random Forest Regressor−28,879,206.477,216,968 × 1080.50−0.85
Bagging Regressor−28,411,174.116,158,571 × 1080.57−3.34
Gradient Boosting Regressor−17,772,056.504,940,930 × 1080.66−1.55
XGBoost Regressor−28,289,899.567,986,635 × 1080.44−0.94
KNeighbors Regressor−29,801,474.761.1108049 × 10200.23−0.32
Adaboost Regressor−7,268,139.5664,677,311 × 1080.53−0.60

Appendix C.2. Grouping by Volatility by Targets: Low Volatility

ModelsResidualsMean Squared ErrorR-SquaredMean Cross Validation Score
Linear regression−59,327,819.083.103195 × 1020−1.14−253.80
Ridge Regression−40,263,611.431.699994 × 1020−0.17−29.18
Random Forest Regressor−28,455,959.396,675,874 × 1070.54−0.89
Bagging Regressor−297,537 × 102244,154 × 1090.61−1.35
Gradient Boosting Regressor−17,282,934.1491,648 × 1080.66−1.99
XGBoost Regressor−28,289,899.567,986,635 × 1080.44−0.94
KNeighbors Regressor−34,832,710.331.1108049 × 10200.23−0.32
Adaboost Regressor−7,268,139.564,388,766 × 1080.70−0.97

Appendix D. Year-on-Year Change in CO2 Emissions for the Considered Countries

Figure A1. Year-on-Year Change in CO2 Emissions for the Considered Countries.
Figure A1. Year-on-Year Change in CO2 Emissions for the Considered Countries.
Sustainability 16 04242 g0a1

References

  1. Patel, S.S.; McCaul, B.; Cáceres, G.; Peters, L.E.R.; Patel, R.B.; Clark-Ginsberg, A. Delivering the promise of the Sendai Framework for Disaster Risk Reduction in fragile and conflict-affected contexts (FCAC): A case study of the NGO GOAL’s response to the Syria conflict. Prog. Disaster Sci. 2021, 10, 100172. [Google Scholar] [CrossRef]
  2. Garschagen, M.; Doshi, D.; Reith, J.; Hagenlocher, M. Global patterns of disaster and climate risk—An analysis of the consistency of leading index-based assessments and their results. Clim. Chang. 2021, 169, 11. [Google Scholar] [CrossRef]
  3. Kim, B.J.; Jeong, S.; Chung, J.-B. Research trends in vulnerability studies from 2000 to 2019: Findings from a bibliometric analysis. Int. J. Disaster Risk Reduct. 2021, 56, 102141. [Google Scholar] [CrossRef]
  4. Shi, P.; Ye, T.; Wang, Y.; Zhou, T.; Xu, W.; Du, J.; Wang, J.; Li, N.; Huang, C.; Liu, L.; et al. Disaster Risk Science: A Geographical Perspective and a Research Framework. Int. J. Disaster Risk Sci. 2020, 11, 426–440. [Google Scholar] [CrossRef]
  5. Bloice, L.; Burnett, S. Barriers to knowledge sharing in third sector social care: A case study. J. Knowl. Manag. 2016, 20, 125–145. [Google Scholar] [CrossRef]
  6. Mukendi, C.M.; Choi, H. Temporal Analysis of World Disaster Risk: A Machine Learning Approach to Cluster Dynamics. In Proceedings of the 2023 14th International Conference on Information and Communication Technology Convergence (ICTC), IEEE, Jeju Island, Republic of Korea, 11–13 October 2023; pp. 973–978. [Google Scholar] [CrossRef]
  7. IHME, Global Burden of Disease Study. Deaths That Are from All Causes Attributed to Air Pollution per 100,000 People, in Both Sexes Aged Age-Standardized. 2019. Available online: https://ourworldindata.org/air-pollution (accessed on 15 December 2023).
  8. Li, S.; Siu, Y.W.; Zhao, G. Driving Factors of CO2 Emissions: Further Study Based on Machine Learning. Front. Environ. Sci. 2021, 9, 721517. [Google Scholar] [CrossRef]
  9. Venditti, B. Here’s How CO2 Emissions Have Changed since 1900. In Proceedings of the World Economic Forum, El Sheikh, Egypt, 22 November 2022; Available online: https://www.weforum.org/agenda/2022/11/visualizing-changes-carbon-dioxide-emissions-since-1900/ (accessed on 15 December 2023).
  10. James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. Linear Regression. In An Introduction to Statistical Learning; Springer Texts in Statistics; Springer International Publishing: Cham, Switzerland, 2023; pp. 69–134. [Google Scholar] [CrossRef]
  11. Özkale, M.R.; Altuner, H. Bootstrap confidence interval of ridge regression in linear regression model: A comparative study via a simulation study. Commun. Stat. Theory Methods 2023, 52, 7405–7441. [Google Scholar] [CrossRef]
  12. Pérez-Rodríguez, J.; Fernández-Navarro, F.; Ashley, T. Estimating ensemble weights for bagging regressors based on the mean–variance portfolio framework. Expert Syst. Appl. 2023, 229, 120462. [Google Scholar] [CrossRef]
  13. Ghunimat, D.; Alzoubi, A.E.; Alzboon, A.; Hanandeh, S. Prediction of concrete compressive strength with GGBFS and fly ash using multilayer perceptron algorithm, random forest regression and k-nearest neighbor regression. Asian J. Civ. Eng. 2023, 24, 169–177. [Google Scholar] [CrossRef]
  14. Cai, J.; Xu, K.; Zhu, Y.; Hu, F.; Li, L. Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl. Energy 2020, 262, 114566. [Google Scholar] [CrossRef]
  15. Zhao, W.P.; Li, J.; Zhao, J.; Zhao, D.; Lu, J.; Wang, X. XGB Model: Research on Evaporation Duct Height Prediction Based on XGBoost Algorithm. Radioengineering 2020, 29, 81–93. [Google Scholar] [CrossRef]
  16. Wei, H. AdaBoost Regression Predicts the Ranking of College Students Using the Super Star Learning APP. In Proceedings of the 2023 IEEE International Conference on Electrical, Automation and Computer Engineering (ICEACE), Changchun, China, 29–31 December 2023; pp. 355–362. [Google Scholar] [CrossRef]
  17. Yao, B. Walmart Sales Prediction Based on Decision Tree, Random Forest, and K Neighbors Regressor. Highlights Bus. Econ. Manag. 2023, 5, 330–335. [Google Scholar] [CrossRef]
  18. Boateng, E.Y.; Abaye, D.A. A Review of the Logistic Regression Model with Emphasis on Medical Research. J. Data Anal. Inf. Process. 2019, 7, 190–207. [Google Scholar] [CrossRef]
  19. Charbuty, B.; Abdulazeez, A. Classification Based on Decision Tree Algorithm for Machine Learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
  20. Shaik, A.B.; Srinivasan, S. A Brief Survey on Random Forest Ensembles in Classification Model. In International Conference on Innovative Computing and Communications; Bhattacharyya, S., Hassanien, A.E., Gupta, D., Khanna, A., Pan, I., Eds.; Lecture Notes in Networks and Systems; Springer: Singapore, 2019; Volume 56, pp. 253–260. ISBN 9789811323539. [Google Scholar] [CrossRef]
  21. Abdurrahman, G.; Sintawati, M. Implementation of xgboost for classification of parkinson’s disease. J. Phys. Conf. Ser. 2020, 1538, 012024. [Google Scholar] [CrossRef]
  22. Chandramouli, A.; Hyma, V.R.; Tanmayi, P.S.; Santoshi, T.G.; Priyanka, B. Diabetes prediction using Hybrid Bagging Classifier. Entertain. Comput. 2023, 47, 100593. [Google Scholar] [CrossRef]
  23. Hao, L.; Huang, G. An improved AdaBoost algorithm for identification of lung cancer based on electronic nose. Heliyon 2023, 9, e13633. [Google Scholar] [CrossRef] [PubMed]
  24. Gezici, B.; Tarhan, A.K. Explainable AI for Software Defect Prediction with Gradient Boosting Classifier. In Proceedings of the 2022 7th International Conference on Computer Science and Engineering (UBMK), Diyarbakir, Turkey, 14–16 September 2022; pp. 1–6. [Google Scholar] [CrossRef]
  25. Alam, S.; Sonbhadra, S.K.; Agarwal, S.; Nagabhushan, P. One-class support vector classifiers: A survey. Knowl. Based Syst. 2020, 196, 105754. [Google Scholar] [CrossRef]
  26. Naiem, S.; Khedr, A.E.; Idrees, A.M.; Marie, M.I. Enhancing the Efficiency of Gaussian Naïve Bayes Machine Learning Classifier in the Detection of DDOS in Cloud Computing. IEEE Access 2023, 11, 124597–124608. [Google Scholar] [CrossRef]
  27. Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv 2018, arXiv:1811.12808. [Google Scholar] [CrossRef]
  28. Hodson, T.O.; Over, T.M.; Foks, S.S. Mean Squared Error, Deconstructed. J. Adv. Model. Earth Syst. 2021, 13, e2021MS002681. [Google Scholar] [CrossRef]
  29. Ma, Y.; Xie, Z.; Chen, S.; Qiao, F.; Li, Z. Real-time detection of abnormal driving behavior based on long short-term memory network and regression residuals. Transp. Res. Part C Emerg. Technol. 2023, 146, 103983. [Google Scholar] [CrossRef]
  30. Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]
  31. Heydarian, M.; Doyle, T.E.; Samavi, R. MLCM: Multi-Label Confusion Matrix. IEEE Access 2022, 10, 19083–19095. [Google Scholar] [CrossRef]
  32. Kharwal, A.M.N. Classification Report in Machine Learning. Available online: https://www.mendeley.com/catalogue/bb23c245-6fe2-37d1-a8ba-4041334de8c9/ (accessed on 15 December 2023).
  33. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
  34. Christoph, M. Interpretable Machine Learning: PArtial Dependence Plot. Available online: https://christophm.github.io/interpretable-ml-book/pdp.html (accessed on 15 December 2023).
  35. Molnar, C.; Freiesleben, T.; König, G.; Herbinger, J.; Reisinger, T.; Casalicchio, G.; Wright, M.N.; Bischl, B. Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process. In Explainable Artificial Intelligence; Longo, L., Ed.; Communications in Computer and Information Science; Springer Nature: Cham, Switzerland, 2023; Volume 1901, pp. 456–479. ISBN 978-3-031-44063-2. [Google Scholar] [CrossRef]
  36. Kong, G.; Hu, S.; Yang, Q. Uncertainty method and sensitivity analysis for assessment of energy consumption of underground metro station. Sustain. Cities Soc. 2023, 92, 104504. [Google Scholar] [CrossRef]
  37. Iooss, B.; Saltelli, A. Introduction to Sensitivity Analysis. In Handbook of Uncertainty Quantification; Ghanem, R., Higdon, D., Owhadi, H., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 1–20. ISBN 978-3-319-11259-6. [Google Scholar] [CrossRef]
  38. Akour, I.; Rahamneh, A.A.; Al Kurdi, B.; Alhamad, A.; Al-Makhariz, I.; Alshurideh, M.; Al-Hawary, S. Using the Canonical Correlation Analysis Method to Study Students’ Levels in Face-to-Face and Online Education in Jordan. Inf. Sci. Lett. 2023, 12, 901–910. [Google Scholar] [CrossRef]
  39. Yin, Y.; Jang-Jaccard, J.; Xu, W.; Singh, A.; Zhu, J.; Sabrina, F.; Kwak, J. IGRF-RFE: A hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset. J. Big Data 2023, 10, 15. [Google Scholar] [CrossRef]
  40. Zhai, J.; Kong, F. The Impact of Multi-Dimensional Urbanization on CO2 Emissions: Empirical Evidence from Jiangsu, China, at the County Level. Sustainability 2024, 16, 3005. [Google Scholar] [CrossRef]
  41. Ngcobo, R.; De Wet, M.C. The Impact of Financial Development and Economic Growth on Renewable Energy Supply in South Africa. Sustainability 2024, 16, 2533. [Google Scholar] [CrossRef]
  42. Global Carbon Budget. Year-On-Year Change in CO₂ Emissions—GCB. 2023. Available online: https://ourworldindata.org/grapher/absolute-change-co2 (accessed on 15 December 2023).
  43. Jones; Matthew, W.; Peters, G.P.; Gasser, T.; Andrew, R.M.; Schwingshackl, C.; Gütschow, J.; Houghton, R.A.; Friedlingstein, P.; Pongratz, J.; et al. Annual Greenhouse Gas Emissions by World Region [Dataset]. National Contributions to Climate Change [Original Data]. 2023. Available online: https://ourworldindata.org/grapher/ghg-emissions-by-world-region (accessed on 16 December 2023).
  44. Wei, T.; Wu, J.; Chen, S. Keeping Track of Greenhouse Gas Emission Reduction Progress and Targets in 167 Cities Worldwide. Front. Sustain. Cities 2021, 3, 696381. [Google Scholar] [CrossRef]
  45. Copernicus Climate Change Service. ‘Annual Temperature Anomalies’ [Dataset]. Copernicus Climate Change Service, ‘ERA5 Monthly Averaged Data on Single Levels from 1940 to Present 2’ [Original Data]. 2024. Available online: https://ourworldindata.org/grapher/annual-temperature-anomalies (accessed on 15 December 2023).
  46. NASA’s Scientific Visualization Studio. Global Temperature Anomalies from 1880 to 2019. Scientific Visualization Studio. Available online: https://svs.gsfc.nasa.gov/4787#section_credits (accessed on 15 December 2023).
  47. Global Carbon Budget. ‘Other Industry—GCB’ [Dataset]. Global Carbon Project, ‘Global Carbon Budget’ [Original Data]. 2023. Available online: https://ourworldindata.org/grapher/co2-by-source (accessed on 15 December 2023).
  48. Molteni, M.; Walker, G.; Parmar, D.; Sutton, M.; Licence, P.; Woodward, S. Can “Electric Flare Stacks” Reduce CO2 Emissions? A Case Study with Nonthermal Plasma. Ind. Eng. Chem. Res. 2023, 62, 19649–19657. [Google Scholar] [CrossRef]
  49. Concrete needs to lose its colossal carbon footprint. Nature 2021, 597, 593–594. [CrossRef] [PubMed]
  50. Global Carbon Budget. ‘Cumulative CO2 emissions—GCB’ [Dataset]. Global Carbon Project, ‘Global Carbon Budget’ [Original data]. 2023. Available online: https://ourworldindata.org/grapher/cumulative-co-emissions (accessed on 15 December 2023).
  51. Liu, Z.; Deng, Z.; Davis, S.J.; Giron, C.; Ciais, P. Monitoring global carbon emissions in 2021. Nat. Rev. Earth Environ. 2022, 3, 217–219. [Google Scholar] [CrossRef] [PubMed]
  52. Global Carbon Budget. ‘Share of Global Cumulative CO2 Emissions—GCB’ [Dataset]. Global Carbon Project, ‘Global Carbon Budget’ [Original Data]. 2023. Available online: https://ourworldindata.org/grapher/share-of-cumulative-co2 (accessed on 15 December 2023).
  53. Gillett, N.P. Warming proportional to cumulative carbon emissions not explained by heat and carbon sharing mixing processes. Nat. Commun. 2023, 14, 6466. [Google Scholar] [CrossRef]
  54. Jones; Matthew, W.; Peters, G.P.; Gasser, T.; Andrew, R.M.; Schwingshackl, C.; Gütschow, J.; Houghton, R.A.; Friedlingstein, P.; Pongratz, J.; et al. ‘Contribution to Global Mean Surface Temperature Rise’ [Dataset]. ‘National Contributions to Climate Change’ [Original Data]. 2023. Available online: https://ourworldindata.org/grapher/contribution-temp-rise-degrees (accessed on 15 December 2023).
  55. Ritchie, H.; Rosado, P.; Roser, M. Data Page: Global Warming: Contributions to the Change in Global Mean Surface Temperature. Available online: https://ourworldindata.org/grapher/contributions-global-temp-change (accessed on 15 December 2023).
  56. World Population Prospects. ‘Growth Rate—Sex: All—Age: All—Variant: Estimates’ [Dataset]. UN. 2023. Available online: https://ourworldindata.org/grapher/population-growth-rates (accessed on 15 December 2023).
  57. The Connections Between Population and Climate Change Info Brief. Washington, 2024. Available online: https://populationconnection.org/resources/population-and-climate/ (accessed on 15 December 2023).
  58. Gapminder—Population v7 (2022), Gapminder—Systema Globalis (2022), HYDE (2017), and United Nations—World Population Prospects (2022), ‘Population (Future Projections) (Future Projections)’ [dataset]. Gapminder, ‘Population v7’; Gapminder, ‘Systema Globalis’; PBL Netherlands Environmental Assessment Agency, ‘HYDE 3.2’; United Nations, ‘World Population Prospects’ [original data]. 2023. Available online: https://ourworldindata.org/grapher/population-long-run-with-projections (accessed on 15 December 2023).
  59. Centre, W. No Education. [Dataset]. Wittgenstein Centre (2018) [Original Data]. 2023. Available online: https://ourworldindata.org/grapher/world-population-level-education (accessed on 15 December 2023).
  60. Tang, M.M.; Xu, D.; Lan, Q. How does education affect urban carbon emission efficiency under the strategy of scientific and technological innovation? Front. Environ. Sci. 2023, 11, 1137570. [Google Scholar] [CrossRef]
  61. Word Bank. GDP (Constant 2015 US$). 2023. Available online: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD (accessed on 15 December 2023).
  62. Word Bank. GDP per Capita (Constant 2015 US$). 2023. Available online: https://data.worldbank.org/indicator/NY.GDP.PCAP.KD (accessed on 15 December 2023).
  63. Vigna, L.; Friedrich, J. Global per Capita Emissions Explained—Through 9 Charts. Available online: https://www.weforum.org/agenda/2023/05/global-per-capita-emissions-explained-charts/ (accessed on 15 December 2023).
  64. World Bank; OECD. ‘GDP’ [Dataset]. 2023. Available online: https://ourworldindata.org/grapher/co2-gdp-growth (accessed on 15 December 2023).
Figure 1. Category grouping process.
Figure 1. Category grouping process.
Sustainability 16 04242 g001
Figure 2. Research design.
Figure 2. Research design.
Sustainability 16 04242 g002
Figure 3. Comparison of r-squared by model.
Figure 3. Comparison of r-squared by model.
Sustainability 16 04242 g003
Figure 4. Comparison of r-squared by feature approaches.
Figure 4. Comparison of r-squared by feature approaches.
Sustainability 16 04242 g004
Figure 5. Comparison of r-squared by volatility group.
Figure 5. Comparison of r-squared by volatility group.
Sustainability 16 04242 g005
Figure 6. Temporal dynamics of the target by country.
Figure 6. Temporal dynamics of the target by country.
Sustainability 16 04242 g006
Figure 7. Dynamics of class over time.
Figure 7. Dynamics of class over time.
Sustainability 16 04242 g007
Figure 8. Classes in the dataset.
Figure 8. Classes in the dataset.
Sustainability 16 04242 g008
Figure 9. Importance of features in the decision tree model.
Figure 9. Importance of features in the decision tree model.
Sustainability 16 04242 g009
Figure 10. Temporal plot of contributing features.
Figure 10. Temporal plot of contributing features.
Sustainability 16 04242 g010
Figure 11. Sensitivity analysis result.
Figure 11. Sensitivity analysis result.
Sustainability 16 04242 g011
Figure 12. Correlation table.
Figure 12. Correlation table.
Sustainability 16 04242 g012
Figure 13. Canonical correlation: coefficient plot.
Figure 13. Canonical correlation: coefficient plot.
Sustainability 16 04242 g013
Table 1. Range of values by classes.
Table 1. Range of values by classes.
Category NamesMin ValuesMax ValuesRangeCountries
High positive3.244667 × 1036.769703 × 108676,967,055.333United States, China, India
High negative−1.906534 × 108−1.370680 × 105190,516,332
Moderate positive4.88107 × 1053.741389 × 10736,925,079.3United Kingdom, France, South Korea, Brazil, India
Moderate negative−3.209079 × 107−4.587947 × 10531,631,995.3
Low positive2.683000 × 1032.632321 × 10726,320,527South Africa, Nigeria, Democratic Republic of the Congo
Low negative−2.045842 × 107−8.609333 × 10320,449,810.667
Table 2. Summary of classifiers’ performance.
Table 2. Summary of classifiers’ performance.
ModelMCCAUCPrecisionRecallF1 ScoreMean CV
Logreg0.670.730.760.730.680.74
DT0.950.960.960.960.960.93
RF0.870.890.910.890.890.86
XGB0.830.850.890.850.840.91
GB0.840.860.890.860.850.85
SVC0.580.640.700.640.580.73
MLP0.600.660.740.660.640.75
GNB0.500.580.660.580.550.75
Table 3. Confusion matrix and classification report.
Table 3. Confusion matrix and classification report.
Confusion MatrixPrecisionRecallF1-ScoreSupport
High negative12000000.711.000.8312
High positive02700001.001.001.0027
Low negative 00150101.000.940.9716
Low positive00023000.961.000.9823
Moderate negative50002800.970.850.9033
Moderate positive00010181.000.950.9719
High negativeHigh positiveLow negativeLow positiveModerate negativeModerate positive
Table 4. Summary of PDPs analysis.
Table 4. Summary of PDPs analysis.
ClassIncreaseDecrease
High negative
  • Population number
3-year average
High positive
  • CO2 from coal
  • Population number
  • 3-year average
---
Low negative---
  • Population number
  • GDP
  • 3-year average
Low positive
  • 3-year average
  • CO2 from oil
  • Population education: Incomplete Primary
  • Population number
  • Change in GDP
Moderate negative
  • GDP
  • Population number
  • 3-year average
Moderate positive
  • CO2 from oil
  • Population education: Incomplete Primary
  • Change in GDP
  • 3-year average
  • CO2 from coal
  • Population number
Table 5. Summary of mean values by features in each category.
Table 5. Summary of mean values by features in each category.
FeaturesHigh NegativeHigh PositiveModerate NegativeModerate PositiveLow NegativeLow Positive
Population (number)4.81 × 1088.07 × 1087.18 × 1077.91 × 1076.74 × 1076.74 × 107
CO2 from coal1.45 × 1091.76 × 1091.15 × 1081.24 × 1081.04 × 1088.21 × 107
Mean-3y−7.31 × 1071.07 × 108−1.07 × 1071.06 × 107−3.02 × 1064.76 × 106
GDP9.99 × 10124.27 × 10121.96 × 10129.79 × 10111.72 × 10111.38 × 1011
CO2 from oil1.58 × 1098.85 × 1082.12 × 1081.71 × 1082.43 × 1072.31 × 107
Population—Education: Incomplete Primary2.04 × 1074.32 × 1072.10 × 1065.82 × 1063.90 × 1063.94 × 106
Contribution to temperature rise0.1730450.0987580.0290830.0225070.0102120.008529
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mukendi, C.M.; Choi, H.; Jung, S.; Kim, Y.-S. Determinants of Yearly CO2 Emission Fluctuations: A Machine Learning Perspective to Unveil Dynamics. Sustainability 2024, 16, 4242. https://doi.org/10.3390/su16104242

AMA Style

Mukendi CM, Choi H, Jung S, Kim Y-S. Determinants of Yearly CO2 Emission Fluctuations: A Machine Learning Perspective to Unveil Dynamics. Sustainability. 2024; 16(10):4242. https://doi.org/10.3390/su16104242

Chicago/Turabian Style

Mukendi, Christian Mulomba, Hyebong Choi, Suhui Jung, and Yun-Seon Kim. 2024. "Determinants of Yearly CO2 Emission Fluctuations: A Machine Learning Perspective to Unveil Dynamics" Sustainability 16, no. 10: 4242. https://doi.org/10.3390/su16104242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop