Measuring the Impact of COVID-19 Vaccination Rates on Carbon Emissions Using LightGBM Model: Evidence from the EU Region

Yue, Xinran; Li, Yan

doi:10.3390/systems12080284

Open AccessArticle

Measuring the Impact of COVID-19 Vaccination Rates on Carbon Emissions Using LightGBM Model: Evidence from the EU Region

by

Xinran Yue

and

Yan Li

^*

Business School, Shandong University, Weihai 264209, China

^*

Author to whom correspondence should be addressed.

Systems 2024, 12(8), 284; https://doi.org/10.3390/systems12080284

Submission received: 16 June 2024 / Revised: 24 July 2024 / Accepted: 2 August 2024 / Published: 4 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

COVID-19 vaccination status has become a significant factor influencing carbon emissions in recent years. This paper explores the relationship between vaccination programs and CO₂ emissions to provide scientific support for future emergency management. The study utilizes daily carbon emissions data and daily vaccination program data from six sectors within the European Union. It compares the accuracy of various machine learning models by incorporating 11 economic control variables. Additionally, it quantitatively decomposes the contribution of each variable to carbon emissions during the pandemic using SHAP values. The findings indicate that the LightGBM model predicts carbon emissions much more accurately than other models. Furthermore, COVID-19-related variables, such as daily vaccination volumes and cumulative vaccination totals, are identified as significant factors affecting carbon emissions.

Keywords:

daily carbon emissions; vaccination; machine learning; LightGBM model; economics

1. Introduction

With global industrial and economic development, carbon emissions have become a critical focal point for balancing the global economy and the environment. The United Nations Framework Convention on Climate Change (UNFCCC) recognizes climate change as an important global policy issue. To ensure environmental sustainability and achieve Sustainable Development Goal 8 (SDG8), which calls for “urgent action to address climate change and its impacts,” a more accurate method of estimating carbon emissions is essential.

The COVID-19 pandemic outbreak in 2020 had a profound impact on the global economy, leading to numerous adverse consequences, including a decline in gross domestic product (GDP), an increase in unemployment, and unnatural stock price volatility [1]. Vaccination is considered the primary means of preventing COVID-19, and its effectiveness has been confirmed by studies [2]. Vaccination programs have been found to significantly reduce economic policy uncertainty (EPU) [3], promote sustainable economic development [4], and help to stabilize international trade channels [5].

The COVID-19 pandemic has significantly impacted global daily carbon emissions, resulting in notable differences compared to pre-pandemic levels. During the pandemic, disruptions in global supply chains and the shift to remote work have led to a short-term reduction in global carbon emissions [6,7,8]. However, some researchers estimate that the industrial processes and transportation involved in vaccine production and distribution may partially offset these reductions by increasing carbon emissions [9]. Additionally, the pandemic affects carbon emissions through various other pathways that are difficult to measure directly. Therefore, accurately predicting carbon emission fluctuations during the pandemic is crucial. This knowledge will lay the foundation for effectively forecasting carbon emissions during similar future public health emergencies.

The pandemic’s impact on carbon emissions is multifaceted, and the changes within the pandemic itself are challenging to quantify directly. During the mid-stages of the outbreak, countries invested heavily in developing vaccines and implementing vaccination programs. Vaccination data are a critical variable for quantifying the pandemic’s impact. This study utilizes vaccination program data as a measure of the pandemic’s actual status. To accurately assess the impact of vaccination programs on carbon emissions, it is essential to use daily data and control for other variables, such as stock indices. Moreover, the pandemic’s effects are typically nonlinear, and machine learning methods are well suited for handling complex data, providing more accurate predictions by uncovering hidden information. Therefore, this study, based on a thorough literature review, selects the EU as the research region and employs high-precision carbon emission and vaccination data. The study uses the LightGBM algorithm and 14 other machine learning models to simulate carbon emissions, comparing and validating the models with daily vaccination data from 2021 to 2023.

The innovations of this paper include (1) evaluating models using high-precision daily data; (2) comparing multiple machine learning models to assess the prediction accuracy of vaccination data on daily carbon emissions; (3) evaluating the importance of each variable’s contribution using the SHAP model.

The rest of this article is organized as follows. Section 2 reviews previous articles on the subject and main methods used in the article, including LightGBM and the model comparison approach. Section 3 describes the selection of variables and data sources used. Section 4 discusses the empirical results and further describes and analyzes them. Section 5 concludes by summarizing the empirical evidence and offering policy recommendations.

2. Literature Review

2.1. Factors Affecting Daily Carbon Emissions

Research on daily carbon emissions is currently limited, making it crucial to identify suitable independent and control variables to establish an accurate model. While current research suggests that, in the absence of policy guidance, economic growth leads to an increase in carbon emissions [10], existing studies have primarily focused on the influence of financial and economic factors on carbon prices [11].

However, there is a notable gap in the literature regarding direct research on carbon emissions. Some studies have identified that GDP per capita, foreign direct investment, urbanization, and energy intensity significantly increase carbon dioxide emissions [12]. Conversely, green finance has been shown to positively impact carbon emission reduction [13]. Inefficient carbon markets, on the other hand, negatively affect low-carbon development [14].

Given that carbon allowances are closely related to carbon emissions, it is essential to investigate the specific factors influencing carbon allowances to verify their impact on carbon emissions. Previous studies indicate that economic development indicators, traditional energy sources, clean energy sources, and carbon price substitutes all affect carbon futures prices [15]. Notably, Brent crude oil, heating oil, and gasoline are net transmitters of information to the market and significantly impact carbon prices [16]. Additionally, economic development, black energy, clean energy, and various stock indices in Europe positively influence carbon prices [15,17,18]. Although renewable energy stocks do not directly return to carbon prices, their volatility affects carbon prices [19]. Coal and crude oil have long-term effects on carbon futures prices, while natural gas and the European Stoxx50 index have short-term effects [20]. During COVID-19, market volatility in carbon prices declined and spilled over to clean energy indices like BIST Sustainability [21]. Macroeconomic variables such as the T-bill impact carbon prices, but carbon prices are not sensitive to these macro variables [22]. Furthermore, Yahşi, Çanakoğlu, and Ağralı argue that the S&P Clean Energy Index is the most influential variable in explaining changes in carbon prices, followed by the DAX and coal prices [23].

For the above factors, the summary is shown in Table 1.

In addition to these findings, Ren and Zheng conducted the first study on the effect of vaccination on the price of carbon emissions [4]. Carbon emissions are more directly related to climate change than carbon market prices. Therefore, daily carbon emissions and vaccination are chosen as the dependent and independent variables in this paper.

2.2. Carbon Emission Projection Methods

Quantitative research typically falls into two categories: traditional econometric studies and those utilizing machine learning methods. Econometric studies are well suited for addressing linear relationships and have advantages in the theoretical foundations and explanatory power of modeling social network relationships [24]. However, for more complex nonlinear data relationships, machine learning models are often more appropriate, offering higher explanatory and evaluation performance. Interpretable machine learning has emerged as an important research direction in recent years. Currently, commonly used machine learning models include linear regression (LR), k-nearest neighbors (KNN) regressor, SVR (support vector regression), ridge, lasso, elastic net, Bayesian ridge, MLP regressor, decision tree, extra tree, extreme gradient boost (XGBoost), random forest, Ada boost, and gradient boost.

Specifically, linear regression is a basic linear model that makes predictions by fitting a linear relationship between data points and target variables [25]. The KNN regressor uses an average or weighted average of the target variable values of the KNN regressor to make predictions [26]. SVR is a regression version of support vector machines that fits a linear or nonlinear function by finding the maximum interval between data points [27]. Ridge, lasso, and elastic net are regularized linear models that control model complexity by adding penalty terms to avoid overfitting [28,29,30]. Bayesian ridge is a Bayesian linear regression method that combines ideas from Bayesian statistics to estimate model parameters [31]. The MLP regressor (multi-layer perceptron) is an artificial neural network model that learns complex nonlinear relationships through the connection of multiple layers of neurons [32,33]. Decision tree and extra tree are models based on tree structures that make predictions by branching through the tree [34,35]. XGBoost, random forest, Ada boost, and gradient boost are ensemble learning models that improve overall performance by combining predictions from multiple weak classifiers or regressors [36,37,38,39].

In addition to these 14 models, this study introduces the light gradient boosting machine (LightGBM) model, which has demonstrated high accuracy in assessing environmental risks [40]. The LightGBM model optimizes the shortcomings of the XGBoost model by using gradient-based one-side sampling (GOSS), exclusive feature bundling (EFB), and leaf-wise (best-first) tree growth algorithms [41].

This paper will study the relationship between the number of vaccinations and carbon emissions, primarily using the LightGBM model, and compare the results with those of the remaining 14 models.

3. Methodology

3.1. LightGBM

LightGBM is a novel ensemble tree-based machine learning algorithm based on gradient boosting decision trees (GBDTs) [41], with each new training iteration optimizing the results of the previous one. LightGBM addresses the efficiency and scalability issues encountered when using GBDTs with high-dimensional features and large datasets. By employing a leaf-wise growth strategy in its decision tree training, LightGBM is particularly well suited for handling large-scale, high-dimensional data. It enables fast training and prediction while maintaining high accuracy and low resource consumption. It has several salient features:

(1): Leaf-wise tree growth strategy: It selects the leaves that will produce the greatest loss reduction.
(2): Histogram-based splitting algorithm: This method discretizes continuous features and constructs histograms to create split points.
(3): Innovative techniques: LightGBM incorporates gradient-based one-sided sampling (GOSS) and exclusive feature bundling (EFB). GOSS is a novel sampling method that down-samples instances based on gradients, while EFB is a near-lossless method that reduces the number of valid features.

These features are implemented in the Python LightGBM library, enhancing its performance and efficiency.

The objective function of the LightGBM model is explained by the following equation,

\begin{matrix} O b j = \frac{1}{2} [\frac{{(\sum_{i \in I_{L}} g_{i})}^{2}}{\sum_{i \in I_{L}} h_{i} + λ} + \frac{{(\sum_{i \in I_{R}} g_{i})}^{2}}{\sum_{i \in I_{R}} h_{i} + λ} - \frac{{(\sum_{i \in I} g_{i})}^{2}}{\sum_{i \in I} h_{i} + λ}] \end{matrix}

(1)

of which

I_{L}

and

I_{R}

denote the sample datasets on the left and right sides of the decision tree, respectively.

I

is the sample dataset on a node (internal or leaf node);

g_{i}

and

h_{i}

denote the first-order and second-order gradient statistics of the loss function, respectively; λ is a constant.

The LightGBM model performs well and can be used as a wrapper learner for feature selection and for solving classification and regression problems. The LightGBM model is an implementation of the decision tree algorithm and the importance of each feature for prediction can be obtained by training the model. The importance of feature j for a single decision tree is shown in the following equation:

\begin{matrix} J_{j}^{2} (T) = \sum_{t = 1}^{L - 1} i_{t}^{2} (v_{t} = j) \end{matrix}

(2)

where

L

denotes the number of leaf nodes of the decision tree;

L - 1

denotes the number of non-leaf nodes of the tree;

v_{t}

denotes the number of features associated with the node

t

-associated features;

i_{t}^{2}

denotes the node

t

square of the loss reduction after splitting.

The global importance of feature

j

is measured by accumulating the importance of

j

on a single decision tree (base learner) as shown in the following equation:

\begin{matrix} J_{j}^{2} = \frac{1}{N} \sum_{n = 1}^{N} J_{j}^{2} (T_{n}) \end{matrix}

(3)

where

N

is the number of decision trees (base learners);

T

represents a single decision tree;

J_{j}^{2}

represents the feature

j

of global importance.

3.2. SHAP Model

SHAP (Shapley Additive Explanations) values are an additive feature attribution method [42] used to explain the outputs of machine learning models. By fairly distributing the contribution of each feature to the prediction results, SHAP values provide an intuitive understanding of the model’s decision-making process. This method interprets the model output as the sum of individual effects of the input features, revealing each feature’s contribution by calculating the Shapley values. SHAP satisfies three desirable properties of interpretation: consistency, local accuracy, and missingness.

By calculating each feature’s contribution to the model output, SHAP helps to understand the model’s decision-making process and identifies the factors with the greatest impact on prediction results. Applying SHAP provides a comprehensive understanding of the model’s predictive behavior, enabling further optimization of the model or refinement of the data to improve prediction accuracy and interpretability. SHAP values assign importance (contribution) scores

ϕ

to

i^{t h}

features, indicating the extent to which the model output of a single instance is influenced by its

i^{t h}

features, where the Shapley value for feature

i

is calculated as follows:

\begin{matrix} ϕ_{i} = \sum_{S \subseteq N \ \{i\}} \frac{|S|! \cdot (|N| - |S| - 1)}{|N|!} [f_{S \cup \{i\}} (x_{S \cup \{i\}}) - f_{S} (x_{S})] \end{matrix}

(4)

where

ϕ_{i}

is the SHAP value that represents the first

i

SHAP value of the feature, i.e., the feature

i

degree of contribution to the prediction result.

N

denotes the set of all features.

S

denotes the set of

N

, a subset of the features, and

S \subseteq N \ \{i\}

denotes the subset of features that do not contain the feature

i

of the subset of features.

f_{S \cup \{i\}}

and

f_{S}

denote the subset of features

i

and the subset without features

i

subsets of the model prediction function.

x_{S \cup \{i\}}

and

x_{S}

denote the model prediction function for the subset of features and the subset without features without, respectively.

i

denotes the subsets of the input feature vectors.

Due to the local accuracy property of the SHAP value, the sum of all feature importance scores is equal to the output of the predictive model.

f (x^{*}) = ϕ_{0} + \sum_{i = 1}^{D} ϕ_{i}

, where

ϕ_{0}

is the expected value of the feature function and

ϕ_{i}

denotes the SHAP value of the first

i

SHAP value of the first feature.

3.3. Evaluation Criteria for the Model

In addition, due to the nonlinear relationship between daily COVID-19 vaccination and single-day carbon emissions, multiple models are needed to measure their complex relationship. In this paper, multiple prediction models such as KNN regressor, SVR, gradient boost, bagging, LightGBM, etc., are evaluated using k-fold cross-validation. The mean square error (MSE) is the factor selected to measure the performance evaluation function for the evaluation.

\begin{matrix} M S E = \frac{\sum (y_{i} - {\hat{y}}_{i})}{N} \end{matrix}

(5)

Cross-validation is a model evaluation technique commonly used in machine learning to estimate the performance of a model on unknown data. The principle is to evaluate the performance of a model by dividing the dataset into a training set and a validation set and repeating the process several times. Among them, k-fold cross-validation is a common cross-validation method. In k-fold cross-validation, the dataset is divided into k parts, where k − 1 parts are used as training data and the remaining one is used as test data [43]. Then, the model is trained k times, each time using a different training set and the corresponding test set. K copies of the data are cycled through as the test set for testing. This principle is applicable to sets with a small amount of data. In other words, each subset of data is rotated to be used as test data so that each sample has a chance to be used to validate the model. Ultimately, the results of the k evaluations are averaged as a performance evaluation metric for the model. One of the advantages of k-fold cross-validation is that it allows for a fuller utilization of the dataset because each sample is used for both training and testing. In addition, by iterative use of multiple training and testing sets, bias due to uneven division of the dataset can be reduced.

\begin{matrix} C V - M S E = (\frac{\sum M S E k}{K}) \end{matrix}

(6)

4. Sample Selection and Data Sources

The European Union, the third largest economy in the world, was among the first to establish a robust carbon market mechanism. The COVID-19 pandemic significantly impacted both the EU’s economy and its environmental initiatives. To comprehend this impact, this study simulates the carbon emissions of EU member states during the pandemic, providing a foundation for related research.

4.1. Dependent Variable: Daily Carbon Emissions

Daily carbon emission data from 19 February 2021 to 28 December 2023 were selected for the sample. The end-of-day carbon emission data from six sectors were used for empirical analysis [44]. As illustrated in Figure 1, the EU’s carbon emissions exhibited significant fluctuations during this period due to various factors, including the pandemic, but overall maintained a relatively stable trend.

4.2. Dependent Variable: Vaccine-Related Indicators

Vaccination-related indicators were obtained using the Our World in Data database [45]. The database is an open-source database focusing on major issues of poverty, disease, hunger, climate change, war, existential risk, and inequality created and maintained by academics and researchers at the University of Oxford, and is highly reliable and authoritative. After applying logarithmic transformation to the original data, the results are presented in Table 2.

4.3. Control Variables

Based on prior research, 11 relevant factors were selected as control variables. In addition to vaccine-related indicators, these factors include commodities, financial markets, and energy market variables.

The original data were subjected to incremental processing, and the results were then logarithmically transformed. The descriptive statistics of these results are presented in Table 3.

5. Empirical Analysis

5.1. Data Pre-Processing

The collected data need to be pre-processed before validation is performed to ensure the usability of the data. Data preprocessing includes data cleaning and data standardization. Due to the existence of market closures in financial markets such as stocks and futures, dates with incomplete data connoting blank values are removed by matching dates before validation is performed.

Due to the large values of vaccination data from EU member states, the natural logarithm (

l n (x)

) was applied to the COVID-19 vaccination data and the values of ln were increased for the control variables. Given the large number of independent and control variables involved and the presence of multicollinearity in some data, a correlation analysis with carbon emissions and a covariance test were conducted. It was found that only a few variables had a significant effect on carbon emissions, as listed in Table 4.

Through this correlation screening, 14 variables—DV, DVR, TBPH, PV, PFVPH, TV, TVPH, Rotterdam, ERETI, T Bill, Brent, STOXX50, DPV, and IPENG—were selected as independent and control variables to predict the carbon emissions of the European Union during the epidemic.

5.2. Model Selection

The dataset was divided into a training set and a test set. The training set, comprising 80% of the samples randomly selected from the preprocessed dataset, was used to train the model. The remaining 20% of the samples formed the test set, which was used to validate and evaluate the model’s accuracy.

5.2.1. Model Training

The computer used for model training and validation was configured with a 64-bit Windows 11 operating system with 16 GB of running memory and an AMD Ryzen5 5600 4.50 ghz processor. The LightGBM algorithm was invoked on the training set data using the sklearn interface to the Python 4.10 programming language. The main software packages and machine learning libraries used for the modeling process are Pandas(2.2.0), Numpy(1.26.0), Scikit-learn(1.4.0), and others. The model input variables are DV, DVR, TBPH, PV, PFVPH, TV, TVPH, Rotterdam, ERETI, T Bill, Brent, STOXX50, DPV, and IPENG.

When constructing a predictive model using the LightGBM algorithm, the parameters have a significant impact on the model prediction. Therefore, LightGBM parameters were optimized using random search parameters, including ‘n estimators’, ‘learning rate’, ‘bagging fraction’, ‘feature fraction’, ‘max depth’, and ‘min child samples’.

5.2.2. Model Validation

In this paper, k-fold cross-validation is used to evaluate several prediction models, including LR, KNN regressor, SVR, ridge, lasso, elastic net, Bayesian ridge, MLP regressor, decision tree, extra tree, XGBoost, random forest, Ada boost, and gradient boost. The mean square error (MSE) is selected as the performance evaluation metric. MSE is calculated as the mean of the squared differences between the predicted and actual observed values; a lower MSE indicates a better model fit. The models’ fitting accuracy is compared using the MSE values obtained from the k-fold cross-validation.

The results of the model evaluation are shown in Figure 2.

The results show that the MSE mean value of the k-fold crossover based on the LightGBM model is significantly smaller than the MSE value of the other models, which is 0.569804751, and this result can indicate that the prediction efficiency of the LightGBM model is significantly better than the rest of the models.

6. Result and Discussion

6.1. Result

This section analyzes the contribution of COVID-19 vaccination in predicting carbon emissions. At some point, SHAP values weigh the importance of predictor variables [42]. The larger the SHAP value, the higher the degree of contribution of the variable to the model. The specific importance levels are described in Figure 3.

From Figure 3, we can see that, among all the selected variables, daily vaccination rates (DVs) are the most significant factor affecting the carbon emissions of EU member states. Specifically, higher daily vaccination rates correlate with more substantial changes in carbon emissions. In contrast, the total booster vaccination rate (TB) is the input variable with the least impact on carbon emissions. Among the economic variables that we controlled, the European Renewable Energy Total Return Index (ERETI) has the most significant effect on CO₂ emissions. This indicates that renewable energy plays a crucial role in reducing carbon emissions.

All the variables that we selected positively affect CO₂ emissions, which aligns with our expectations. From the data, we can infer that high daily vaccination rates mean that more people can return to their workplaces each day, improving marginal productivity. Consequently, the economy and society can resume normal production sooner, reducing the abnormal impacts of policies like lockdowns on the economy.

The total booster vaccination rate has a smaller impact on carbon emissions. This is likely because many companies require employees to return to work after receiving the first dose of the vaccine. Therefore, the booster shot has less influence on employees returning to work and, subsequently, on carbon emissions. This finding suggests that while booster shots are essential for public health, their role in promoting economic recovery and reducing carbon emissions is limited.

The research results indicate that daily vaccination rates and the European Renewable Energy Total Return Index are key factors influencing the carbon emissions of EU member states, while the impact of total booster vaccination rates is smaller. Policymakers can use these findings to optimize vaccination strategies and energy policies to achieve both economic recovery and carbon emission reductions.

6.2. Comparison with Other Studies

In this study, the relationship between vaccination programs and carbon emissions during the pandemic was examined from the perspective of emergency management. Various models were compared to discuss the impact of vaccination rates on carbon emissions. This study shows that LightGBM is highly effective in associating daily vaccination rates with carbon emissions, and this impact is an important variable not to be overlooked during the COVID-19 period.

Comparing the risk assessment model proposed in this study with models from other similar research can better elucidate the differences between this study and others. Currently, there are few detailed studies on COVID-19 and carbon emissions. As far as we know, few studies on carbon emissions have used the LightGBM model to simulate and assess actual carbon emissions. Compared to other carbon dioxide prediction models proposed in the literature [46], such as SVM, the LightGBM model introduced in this study, as shown in Figure 2, demonstrates a higher model fit. The LightGBM model can effectively address the issue of carbon dioxide assessment during the COVID-19 period, with significantly higher accuracy than other models.

When measuring and assessing the daily variation in carbon dioxide emissions during the COVID-19 pandemic, incorporating daily COVID-19 vaccination rates as an important indicator can effectively enhance the model’s effectiveness and accuracy. Although previous studies have pointed out that booster vaccinations might be related to daily carbon emissions and air pollution [47], the current research shows that this correlation is extremely weak and can be ignored after including the total daily vaccination rate and other vaccine-related variables.

Our results suggest that policymakers should be aware that vaccination plans for emergency incidents can have a positive impact on carbon emissions, thereby exacerbating the greenhouse effect. They can use the models and results of this study to measure the potential association between vaccination rates and carbon dioxide to develop more sustainable vaccination policies, thus curbing the trend of worsening greenhouse effects.

7. Conclusions

This study utilizes daily vaccination data from the EU, collected after the implementation of vaccination programs, and daily carbon emission data from the EU region. Using the LightGBM algorithm and 14 other machine learning models, this study simulates carbon emissions and compares and validates the models with daily vaccination data from 2021 to 2023. The results indicate that, compared to the other models, the LightGBM model demonstrates significant accuracy advantages in assessing carbon emissions during the large-scale vaccination period following the pandemic outbreak. This suggests that the LightGBM model could be used in the future to evaluate the impact of pandemics on carbon emissions, excluding fluctuations caused by economic, energy, and other macro factors. This has important implications for predicting carbon emission fluctuations resulting from public health emergencies.

(1): Through cross-validation analysis, multiple models were compared for their evaluation effects, calculating evaluation metrics and the fitting degree of daily CO₂ emissions. This study considered the performance of different models on different datasets to ensure the accuracy and reliability of the evaluation results. It was found that the MSE value of the LightGBM model was significantly lower than other models, indicating that LightGBM performed very well in terms of fitting accuracy. Further analysis showed that the LightGBM model is highly efficient in handling large-scale data, with good model stability, making it more promising in practical applications. Upon extension, this model can not only be used for sustainable development assessment in public health but can also be expanded to other environmental monitoring and management fields, providing scientific evidence and decision support for policymakers.
(2): Daily total vaccination count and total vaccinations per hundred people have a significant contribution to daily CO₂ emissions. On one hand, as the vaccination program progresses, the larger the vaccination count, the more people return to work, increasing CO₂ emissions from industrial production. On the other hand, higher vaccination counts also increase CO₂ emissions from vaccine manufacturers and their supply chains. Among the selected factors, the daily vaccination count has the highest impact on CO₂ emissions, while booster shots, although increasing CO₂ emissions related to vaccine companies and transportation, do not significantly affect companies’ production resumption decisions, and thus have the least impact on daily CO₂ emissions.
(3): The model can calculate the contribution of important evaluation indicators to daily CO₂ emissions. Besides vaccination-related indicators, other daily macro factors also have certain impacts on daily CO₂ emissions. The European New Energy Index has the greatest influence, while the influence of the Stoxx 50 and the US three-year Treasury yield is minimal. Quantifying the contribution of different indicators to the evaluation results helps to understand the systematic relationship between indicators and fitting results, aiding further understanding of the “black box” of machine learning.
(4): The number and quality of indicators are crucial for model fitting. When using machine learning and other quantitative analysis methods, ignoring the association path of indicators to CO₂ emissions and only using relevant daily data for fitting predictions requires a sufficient number of indicators and corresponding sample sizes. In cases where there are too many indicators, preliminary screening can be performed through correlation analysis before fitting the model. Increasing the number of relevant indicators can effectively break down the processes affecting CO₂ emissions. Increasing the sample size helps to further improve the fitting accuracy of the model.

In future emergency management for unexpected events, machine learning models like LightGBM can be considered to assess the impact of events on carbon emissions. Moving forward, due to the current lack of research on daily CO₂ emissions, future research should focus on three directions. First, more variables should be included in the prediction model: including more variables and samples in the model helps to improve the accuracy of simulated assessments of CO₂ emissions and quantifies the mechanisms of CO₂ fluctuations during public health events. Second, improving existing models: since only 14 basic models were used in this paper, without enhancing the model algorithms, this is a direction that future similar studies can attempt. Third, collecting public health event data from other regions: collecting vaccination data and related daily CO₂ data from regions outside the EU, such as North America and Asia, to verify whether this model can be generalized to other regions globally.

Author Contributions

Conceptualization, X.Y.; methodology, X.Y.; software, X.Y. and Y.L.; validation, X.Y.; formal analysis, X.Y.; investigation, X.Y.; resources, X.Y.; data curation, X.Y.; writing—original draft preparation, X.Y.; writing—review and editing, X.Y. and Y.L.; visualization, X.Y.; supervision, Y.L.; project administration, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data, models, and code generated or used during the study appear in the submitted article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khalaf, A.T.; Wei, Y.; Wan, J.; Kadir, S.Y.A.; Zainol, J.; Jiang, H.; Abdalla, A.N. How Did the Pandemic Affect Our Perception of Sustainability? Enlightening the Major Positive Impact on Health and the Environment. Sustainability 2023, 15, 892. [Google Scholar] [CrossRef]
Boëlle, P.-Y.; Valdano, E. The importance of increasing primary vaccinations against COVID-19 in Europe. Infect. Dis. Model. 2024, 9, 1–9. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Li, M.; Fu, R. A sustainable pandemic response: The impact of COVID-19 vaccination coverage on economic policy uncertainty. Int. Rev. Econ. Financ. 2024, 92, 316–332. [Google Scholar] [CrossRef]
Ren, H.; Zheng, Y. COVID-19 vaccination and household savings: An economic recovery channel. Financ. Res. Lett. 2023, 54, 103711. [Google Scholar] [CrossRef] [PubMed]
Pham, S.D.; Nguyen, T.T.T.; Li, X.-M. Stabilizing global foreign exchange markets in the time of COVID-19: The role of vaccinations. Glob. Financ. J. 2024, 59, 100923. [Google Scholar] [CrossRef]
Liu, Y.; Zhu, J.; Tuwor, C.P.; Ling, C.; Yu, L.; Yin, K. The impact of the COVID-19 pandemic on global trade-embodied carbon emissions. J. Clean. Prod. 2023, 408, 137042. [Google Scholar] [CrossRef] [PubMed]
Kumar, A.; Singh, P.; Raizada, P.; Hussain, C.M. Impact of COVID-19 on greenhouse gases emissions: A critical review. Sci. Total Environ. 2022, 806, 150349. [Google Scholar] [CrossRef] [PubMed]
Gary, V.; Sarah, S.; Deborah, N. Long-Term Effects of COVID-19, and Its Impact on Business, Employees, and CO₂ Emissions, a Study Using Arc-GIS Survey 123 and Arc-GIS Mapping. Sustainability 2022, 14, 13689. [Google Scholar] [CrossRef]
Klemeš, J.J.; Jiang, P.; Van Fan, Y.; Bokhari, A.; Wang, X.-C. COVID-19 pandemics Stage II–energy and environmental impacts of vaccination. Renew. Sustain. Energy Rev. 2021, 150, 111400. [Google Scholar] [CrossRef]
Ma, X.; Zhao, C.; Song, C.; Meng, D.; Xu, M.; Liu, R.; Yan, Y.; Liu, Z. The impact of regional policy implementation on the decoupling of carbon emissions and economic development. J. Environ. Manag. 2024, 355, 120472. [Google Scholar] [CrossRef]
Wang, Z.-J.; Zhao, L.-T. The impact of the global stock and energy market on EU ETS: A structural equation modelling approach. J. Clean. Prod. 2021, 289, 125140. [Google Scholar] [CrossRef]
Wang, J.-H.; Mamkhezri, J.; Khezri, M.; Karimi, M.S.; Khan, Y.A. Insights from European nations on the spatial impacts of renewable energy sources on CO₂ emissions. Energy Rep. 2022, 8, 5620–5630. [Google Scholar] [CrossRef]
Sadiq, M.; Chau, K.Y.; Ha, N.T.T.; Phan, T.T.H.; Ngo, T.Q.; Huy, P.Q. The impact of green finance, eco-innovation, renewable energy and carbon taxes on CO₂ emissions in BRICS countries: Evidence from CS ARDL estimation. Geosci. Front. 2024, 15, 101689. [Google Scholar] [CrossRef]
Xu, R.; Chou, L.-C.; Zhang, W.-H. The effect of CO₂ emissions and economic performance on hydrogen-based renewable production in 35 European Countries. Int. J. Hydrog. Energy 2019, 44, 29418–29425. [Google Scholar] [CrossRef]
Li, Z.-P.; Yang, L.; Li, S.-R.; Yuan, X. The long-term trend analysis and scenario simulation of the carbon price based on the energy-economic regulation. Int. J. Clim. Change Strateg. Manag. 2020, 12, 653–668. [Google Scholar] [CrossRef]
Hoque, M.E.; Soo-Wah, L.; Billah, M. Time-frequency connectedness and spillover among carbon, climate, and energy futures: Determinants and portfolio risk management implications. Energy Econ. 2023, 127, 107034. [Google Scholar] [CrossRef]
Jiménez-Rodríguez, R. What happens to the relationship between EU allowances prices and stock market indices in Europe? Energy Econ. 2019, 81, 13–24. [Google Scholar] [CrossRef]
Ben Ismail, N.; Alcouffe, S.; Galy, N.; Ceulemans, K. The impact of international sustainability initiatives on Life Cycle Assessment voluntary disclosures: The case of France’s CAC40 listed companies. J. Clean. Prod. 2021, 282, 124456. [Google Scholar] [CrossRef]
Dutta, A.; Bouri, E.; Noor, M.H. Return and volatility linkages between CO₂ emission and clean energy stock prices. Energy 2018, 164, 803–810. [Google Scholar] [CrossRef]
Zhao, X.; Han, M.; Ding, L.; Kang, W. Usefulness of economic energy data at different frequencies for carbon price forecasting in the, E.U.E.T.S. Appl. Energy 2018, 216, 132–141. [Google Scholar] [CrossRef]
Doğan, M.; Raikhan, S.; Zhanar, N.; Gulbagda, B. Analysis of dynamic connectedness relationships among clean energy, carbon emission allowance, and BIST indexes. Sustainability 2023, 15, 6025. [Google Scholar] [CrossRef]
Chevallier, J. Carbon futures and macroeconomic risk factors: A view from the EU ETS. Energy Econ. 2009, 31, 614–625. [Google Scholar] [CrossRef]
Yahşi, M.; Çanakoğlu, E.; Ağralı, S. Carbon price forecasting models based on big data analytics. Carbon Manag. 2019, 10, 175–187. [Google Scholar] [CrossRef]
Zhu, C.; Su, Y.; Fan, R.; Qin, M.; Fu, H. Exploring provincial carbon-pollutant emission efficiency in China: An integrated approach with social network analysis and spatial econometrics. Ecol. Indic. 2024, 159, 111662. [Google Scholar] [CrossRef]
Wang, Y.-A.; Huang, Q.; Yao, Z.; Zhang, Y. On a class of linear regression methods. J. Complex. 2024, 82, 101826. [Google Scholar] [CrossRef]
Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar] [CrossRef]
Liu, C.; Qian, Q. Twin proximal support vector regression with heteroscedastic Gaussian noise. Expert Syst. Appl. 2024, 250, 123840. [Google Scholar] [CrossRef]
Hastie, T. Ridge regularization: An essential concept in data science. Technometrics 2020, 62, 426–433. [Google Scholar] [CrossRef] [PubMed]
Ranstam, J.; Cook, J.A. LASSO regression. J. Br. Surg. 2018, 105, 1348. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Abbas, H.W.; Sajid, Z.; Dao, U. Assessing the Impact of Risk Factors on Vaccination Uptake Policy Decisions Using a Bayesian Network (BN) Approach. Systems 2024, 12, 167. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the ICML’96: Thirteenth International Conference on International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; Volume 96, pp. 148–156. [Google Scholar]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Statistics 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Dai, H.; Huang, G.; Zeng, H.; Yu, R. Haze risk assessment based on improved PCA-MEE and ISPO-LightGBM model. Systems 2022, 10, 263. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 31, 3149–3157. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
Zhang, H. Application of Cross-Validation in Model Comparison. Adv. Appl. Math. 2023, 12, 1866–1873. [Google Scholar] [CrossRef]
Ke, P.; Deng, Z.; Zhu, B.; Zheng, B.; Wang, Y.; Boucher, O.; Arous, S.B.; Zhou, C.; Andrew, R.M.; Dou, X. Carbon Monitor Europe near-real-time daily CO2 emissions for 27 EU countries and the United Kingdom. Sci. Data 2023, 10, 374. [Google Scholar] [CrossRef] [PubMed]
Mathieu, E.; Ritchie, H.; Ortiz-Ospina, E.; Roser, M.; Hasell, J.; Appel, C.; Rodés-Guirao, L. A global database of COVID-19 vaccinations. Nat. Hum. Behav. 2021, 5, 947–953. [Google Scholar] [CrossRef] [PubMed]
Namboori, S. Forecasting Carbon Dioxide Emissions in the United States Using Machine Learning. Ph.D. Thesis, National College of Ireland, Dublin, Ireland, 2020. [Google Scholar]
Ounsaneha, W.; Laosee, O.; Rattanapan, C. Influence of Environmental Risk Exposure on the Determinants of COVID-19 Booster Vaccination in an Urban Thai Population. Int. J. Environ. Res. Public Health 2024, 21, 745. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Daily vaccination and carbon emissions. Daily vaccination data are logarithmically processed and CO₂ emissions are in Mt.

Figure 2. Model evaluation results. The red dot represents smaller MSE values, i.e., high predictive fitness; the blue dot indicates the opposite.

Figure 3. Analysis of SHAP values.

Table 1. Control variables.

Controlled Factors	Symbol	Explain	Reference
Carbon market	C Market	European Carbon Emissions Futures, EU Carbon Emissions Futures listed on the Intercontinental Exchange in the US.	[4]
Brent Crude Oil Futures Price	Brent.	Standardized futures contract prices.	[13]
CAC 40 Market Index	CAC40	Compiled by the Paris Stock Exchange (PSE) from the share prices of its top 40 listed companies. It is the barometer of the French economy, and is known as one of the three major stock indices in Europe, together with the French CAC40 and the German DAX.	[15]
European Renewable Energy Total Return Index	ERETI	European New Energy Total Return Index, published as an index.
FTSE-100 Index	FTSE	The market capitalization of the 100 largest listed stocks by market capitalization traded on the London Stock Exchange is weighted and is a barometer of the UK economy.
DAX30 Index	DAX	An average index of stock prices weighted by market capitalization, taking into account dividend income.
Stoxx 50 Index	STOXX50	Market-capitalization-weighted average form, the weighted average price of all stocks in the Stokes 50 Index.	[20]
IPE Natural Gas Closing Price	IPENG	Standardized futures contract prices.
Rotterdam Coal Futures	Rotterdam	Standardized price representation of futures contracts.
US 3-month Treasury Bill Yields	T Bill	Published in the form of interest rates, representing the yield on 3-month U.S. government treasury bills.	[22]
Standard and Poor’s Composite Index	S&P	Market-capitalization-weighted average form, the weighted average price of all stocks in the S&P 500.	[23]

Table 2. Descriptive statistics for independent variables.

Index	Meaning of the Indicator	Mean	Std	Min	Max	Skewness
TV	Total vaccinations	20.46561	0.341136	18.96301	20.67279	−2.37505597
PV	People vaccinated	19.56716	0.159744	18.63656	19.63876	−3.08147792
PFV	People fully vaccinated	19.4858	0.320796	17.69328	19.60842	−2.53125286
TB	Total boosters	18.0764	2.510475	11.32851	19.45811	−1.14011376
DVR	Daily vaccinations raw	12.04133	2.473965	5.902633	15.47559	−0.98224572
DV	Daily vaccinations	11.92405	2.455365	6.079933	15.25222	−0.97247736
TVPH	Total vaccinations per hundred	5.145694	0.341134	3.643097	5.3529	−2.37506788
PVPH	People vaccinated per hundred	4.247248	0.159743	3.316728	4.318821	−3.08160689
PFVPH	People fully vaccinated per hundred	4.165892	0.320792	2.373044	4.28854	−2.53134862
TBPH	Total boosters per hundred	2.756594	2.510231	−3.91202	4.138202	−1.13224469
DVPM	Daily vaccinations per million	5.807854	2.470602	0	9.142597	−0.99304321
DPV	Daily people vaccinated	9.618789	2.787705	3.044522	14.66705	0.50462919
DPVPH	Daily people vaccinated per hundred	−5.00686	2.123437	−6.90776	−0.65201	0.50407158

Table 3. Descriptive statistics for control variables.

	Mean	Std	Min	50%	Max	Skewness
CAC40	0.000267	0.011601	−0.05093	0.000804	0.068828	−0.0192
Brent.	0.000222	0.025132	−0.13312	0.002736	0.081564	−0.61355
IPENG	−0.00021	0.04721	−0.18066	0.002468	0.146575	−0.3157
ERETI	−0.00019	0.020246	−0.07688	−0.00015	0.100297	0.268231
FTSE	0.000137	0.008745	−0.03955	0.000731	0.038391	−0.51286
DAX	0.000148	0.011775	−0.04508	0.000523	0.076232	0.147707
S&P	0.000204	0.011798	−0.0442	0.000264	0.05568	0.023749
STOXX50	0.00019	0.012103	−0.05092	0.000829	0.071745	0.021431
Rotterdam	0.000571	0.046191	−0.53688	0.001175	0.326216	−2.21381
T Bill	0.004027	0.036872	−0.1747	0.003715	0.213574	0.3533
C Market	−0.00019	0.020246	−0.07688	−0.00015	0.100297	0.268231

Table 4. Correlation analysis of variables.

Symbol	Correlation
Emission	1
TV	0.191088
PV	0.229879
DVR	0.263666
DV	0.271767
TVPH	0.191088
PFVPH	0.22786
TBPH	0.259859
DPV	−0.033269
Brent.	0.017134
IPENG	−0.05608
ERETI	0.032286
STOXX50	−0.003391
Rotterdam	0.041965
T Bill	0.021114

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yue, X.; Li, Y. Measuring the Impact of COVID-19 Vaccination Rates on Carbon Emissions Using LightGBM Model: Evidence from the EU Region. Systems 2024, 12, 284. https://doi.org/10.3390/systems12080284

AMA Style

Yue X, Li Y. Measuring the Impact of COVID-19 Vaccination Rates on Carbon Emissions Using LightGBM Model: Evidence from the EU Region. Systems. 2024; 12(8):284. https://doi.org/10.3390/systems12080284

Chicago/Turabian Style

Yue, Xinran, and Yan Li. 2024. "Measuring the Impact of COVID-19 Vaccination Rates on Carbon Emissions Using LightGBM Model: Evidence from the EU Region" Systems 12, no. 8: 284. https://doi.org/10.3390/systems12080284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Measuring the Impact of COVID-19 Vaccination Rates on Carbon Emissions Using LightGBM Model: Evidence from the EU Region

Abstract

1. Introduction

2. Literature Review

2.1. Factors Affecting Daily Carbon Emissions

2.2. Carbon Emission Projection Methods

3. Methodology

3.1. LightGBM

3.2. SHAP Model

3.3. Evaluation Criteria for the Model

4. Sample Selection and Data Sources

4.1. Dependent Variable: Daily Carbon Emissions

4.2. Dependent Variable: Vaccine-Related Indicators

4.3. Control Variables

5. Empirical Analysis

5.1. Data Pre-Processing

5.2. Model Selection

5.2.1. Model Training

5.2.2. Model Validation

6. Result and Discussion

6.1. Result

6.2. Comparison with Other Studies

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI