1. Introduction
Coal remains the main fuel resource for the power sector in India, despite the use of natural-gas-based, solar, wind, and hydropower plants for power generation. Since coal is abundantly available in India, its use as a fuel makes it a reliable source of energy for the power and industrial sectors. With a population of more than 1.3 billion people, India requires a vast amount of power to meet its demand for the domestic, agricultural, and industrial sectors. According to Udemba et al. (2021) [
1], India’s power generation will be the fastest due to increased demand, mainly driven by increased agricultural use and increased economic and industrial activities. Electricity generation, through an optimal mix of energy resources and transmission, plays a crucial role in the growth of emerging economies [
2].
As per the CEA report, power generation using coal (and lignite) will increase by 23.7 percent to a level of 267 GW by 2029–2030 compared with that of 2021. This is despite the fact that Indian coal has a very high ash content (~50 percent), which reduces its heating value to a low level of 15 MJ/kg compared with the normal range of 21–33 MJ/kg. The coal sector meets more than half of the energy needs (−216 GW based on coal) of the country and employs approximately 0.5 million people. Thus, the coal sector is the most important and stable resource for power generation in India.
Coal-based power generation is a significant contributor to greenhouse gas (GHG) emissions, accounting for approximately 30% of global GHG emissions in 2018 [
3]. Therefore, a balanced view must be taken between coal-based electricity generation and GHG emissions to meet the country’s commitment to reducing GHG emissions by 30% from the 2015 level. The increased demand for coal must be offset by a much higher share of renewable sources for electricity generation.
CO
2 emissions and economic activity have a causal relationship [
4]. The Kaya identity defines carbon emissions from anthropogenic activities as a function of population, GDP, and energy consumption [
4]. Policymakers use this causal relationship to assess the intensity of emissions in the short and long terms [
5,
6,
7].
Most developed nations are adopting policies to reduce their carbon footprint, even at the cost of economic development [
8]. However, the situation is different in a developing country like India, where GDP growth is a critical factor for formulating growth policies. This is evident from the fact that India’s per capita GDP in 2021 was only USD 2204, compared with USD 89,301 for the USA and USD 38,237 for EU countries. This large gap forces India to formulate policies that prioritize double-digit growth in GDP with high electricity production and consumption.
India is the second-largest coal producer globally, with coal deposits mainly found in the eastern and southern central parts of the country, serving as the backbone of the Indian industrial growth story. Coal provides 53% of India’s energy needs and is a vital revenue generator for the government. It is also the primary contributor to industrial employment and a crucial source of freight revenue for Indian railways, accounting for 44% of the total freight revenue [
9,
10]. With a demonstrated reserve of 319.02 billion tonnes of coal in 2018, it is the most abundant natural resource, generating 500,000 jobs.
At the end of 2021, the total installed capacity for electricity generation in India was 459.15 GW, with 25.5% owned by the central government, 27.1% by the state government, and 47.4% by the private sector [
11], as shown in
Figure 1. Of the total electricity produced, coal contributed 53% as fuel, while lignite, gas, and diesel contributed 8.3%; hydropower 12.2%; nuclear 1.8%; and renewable energy sources 24.8%, as shown in
Figure 2. India has power plants with a total capacity of 78,000 MW that are mainly owned, operated, and used by industries. In 2020–2021, the electricity supply data showed a requirement of 12,75,534 million units (MUs) with an availability deficit of 4.87 MU [
12]. However, the aggregate transmission and commercial losses were 21.35%, which is very high compared with the USA, which had only 6.6% in 2018.
Electricity consumption is a dominant factor in economic growth. Previous research established that electricity availability to rural farmers was the most critical factor in India’s agricultural growth and green revolution, surpassing other factors such as the improved quality of fertilizers and farm automation equipment [
13,
14]. Similarly, most industrial activities, such as metal processing, cement, automobile and ancillary production, metalworking, crude extraction, and refining, are highly dependent on electricity consumption as a source of power. As depicted in
Figure 3, the industrial sector consumed 42.7% of the electricity generated, followed by the domestic and agriculture sectors in 2020. India’s annual GDP growth was 8.26% in 2016 but decreased to 4.18% in 2019 due to pandemic disruptions. Regarding the future GDP, the growth of GDP will be driven by the industrial sector, thereby increasing electricity consumption (Cosmas et al., 2019) [
15].
Several researchers [
16,
17,
18,
19,
20,
21,
22] established a positive correlation between uninterrupted and high-quality electric supply and an increase in the contribution of the industrial sector to national GDP growth. Based on this argument, we can summarize that electricity availability is the most crucial factor for India’s economic development and growth. With the stiff target of double-digit growth in the near future, the demand for electricity will continue to increase, resulting in a similar growth in coal consumption. Therefore, there is a need to model the growth of electricity and coal along with the industrial output for the future and implement policies for green growth in the coal sector that are specific to India’s context.
Although it is an older concept, green marketing gained prominence at the Rio + 20 conference in Brazil in 2012, where path-breaking guidelines were formulated on green economic policies [
23]. The conference’s outcome document clearly emphasized the need for a green economy and green economic growth. Green growth is a set of measures that promote economic growth and development while ensuring that nature continues to provide the resources and environmental services on which our health and overall happiness depend. Green growth aims to accelerate investment and innovation that support sustainable development and create new economic opportunities. Furthermore, green strategies must lead to pro-environmental behavior from all stakeholders in the entire chain of economic activity, not limited to the reallocation of capital, labor, location, land, and technology. These strategies can lead to a greener outcome toward the development of green innovations. Green marketing strategies can demonstrate the competence of organizations and become the critical driver for influencing the entire marketing process, thereby bringing vital revenue and contributing toward making the Earth a sustainable system.
This study explored the causal relationships between coal, electricity, and industrial activity and developed policy recommendations to green the Indian coal sector. The novelty of this study lay in the use of a mix of modeling approaches, such as linear cointegration, non-linear cointegration, and ARIMA models. We assessed the forecasting performance of these models using in-sample training and testing to identify the best-performing models for predicting the values that coal, electricity, and industrial activity may take in the forecasted period. The set of modeling techniques applied in this study carries specific advantages, such as ruling out spurious relationships, capturing non-linear dynamics due to structural breaks, considering the impact of asymmetric variations, and ensuring robustness and consistency of interpretation. To the best of our knowledge, this is the first attempt of its kind. Additionally, this study considered the economic disruption caused by the COVID-19 pandemic and realigned the forecast of the economic indicators accordingly.
2. Literature Review
A large body of literature is available on the empirical analysis of energy, electricity, coal consumption, growth of economic activities, and greenhouse gas emissions. Since the industrial production index (IIP) correlates better with electricity consumption in India than GDP, we used it in our study. As we mainly examined the Granger causality association between coal production, electricity generation, and the industrial production index in the present study, we focused on studies mainly related to the Indian context. Some recent research work on the VECM are given in
Table 1.
Previous regional studies confirmed GDP growth as the main source of carbon dioxide or sulfur dioxide emissions [
31,
32,
33]. Grossman and Krueger (1995) employed a panel dataset of different countries and established that amplified domestic production has resulted in environmental degradation. Using a nonparametric approach to panel data, [
34] confirmed the affirmative and non-linear relationship between GDP and carbon emissions. However, they discarded the option of the polynomial relationship between both variables. In other words, this study did not find an environmental Kuznets curve (EKC) for the selected country. Conversely, Narayan and Narayan (2010) [
35] studied the panel data of 43 growing countries and established an EKC in the long term.
In [
36], researchers analyzed a panel dataset comprising eight countries and investigated the relationship between GDP, energy use, trade expansion, population, and environmental quality. They discovered that the environmental Kuznets curve (EKC) existed in two countries; however, in the long term, the association was of the inverted N-type for the other six countries. In a similar vein, de Souza et al. (2018) [
37] utilized panel data from five countries to explore the impact of renewable energy on pollution levels. They posited that the use of renewable energy sources mitigates pollution, whereas the use of zero renewable energy contributes to environmental degradation. Furthermore, Mert et al. (2019) [
38] used the autoregressive distributed lag (ARDL) technique to analyze data from 26 countries. They established the existence of the EKC in five countries and found that implementing pollution mitigation legislation significantly improved the environmental quality in these countries.
Mert et al. (2019) [
38] used the Dumitrescu–Hurlin methodology with panel data from European countries and found that trade expansion and the use of green energy led to a decrease in the intensity of pollution in those countries. Therefore, they advocated for the industrial sector to use green energy. Balogh (2017) [
39] employed the generalized method of moments (GMM) approach with GDP, FDI, tourism, agriculture, and trade expansion as variables to calculate the CO
2 emissions in 168 countries. They found that the use of green and nuclear energy helped to mitigate pollution in the selected countries. These conclusions were also confirmed by Shahbaz et al. (2019) [
40] in their panel data study. Sharma et al. (2020) [
41] established an N-type association between GDP and CO
2 emissions. Based on this argument, we decided to use the index of industrial production, electricity generation, and coal production as research variables in the present study.
4. Methodology
4.1. Johansen Cointegration Test
Cointegration between two or more non-stationary series indicates a systematic co-movement between them over the long run. Engle and Granger (1987) [
42] demonstrated that cointegration between two or more I(1) series might indicate (a) the absence of a spurious correlation, (b) a causal relationship in at least one direction, and (c) long-run Granger’s causality of cointegrating vectors from a vector error correction model. We applied the Johansen cointegration test [
43], which is considered one of the best linear cointegration techniques, to analyze the cointegration relationships between CoP, IIPG, and ELG. The Johansen cointegration test, which was suggested by Johansen (1988) [
43], is equation-based, unlike the residual-based cointegration technique similar to the one proposed by Engle and Granger (1987) [
42]. We used VAR lag order selection criteria based on sequential modified LR test statistics, final prediction error, and Akaike information criterion to reduce the bias and increase the accuracy of the cointegration tests. The resulting optimal lag length is the maximum lag interval for differenced endogenous variables in the Johansen test.
The vector error correction model (VECM), rather than the VAR/Granger causality model, should be used for causality analysis of the variables. The VECM is capable of examining both short- and long-run causality analysis. The VECM captures the effect of error correction term changes and differences in independent variable lagged terms on dependent variables. The VECM can be expressed as shown in Equations (A1)–(A3) given in
Appendix A.
Here, the β’s are the coefficients to be estimated, p is the optimal lag, εt−1 is the error correction term (ECT), and the ut’s are serially uncorrelated error terms. The lagged ECT coefficients’ t-statistics are used to examine long-run causality from independent to dependent variables. Similarly, lagged independent variable F-statistics may be used to examine short-run causality in the ECM. If = 0 is rejected in Equation (A1), it shall signify short-run granger causality from IIPG to CoP. The coefficient of ECT shows the speed of adjustment from the perturbed state to the equilibrium state. If = 0 is rejected in Equation (A1), this establishes the existence of a long-run causality relationship between one or more independent variables in CoP.
4.2. Regime Shift Cointegration Model
This and similar cointegration methodologies proposed by Johansen (1988) [
43] and others have been criticized for being unrealistic in assuming a constant cointegrating relationship between variables over the entire data span [
44]. The problem of an overgeneralized assumption becomes too acute when the period of data of the study is long. In the case of one or more structural breaks, the above cointegration tests may produce misleading results.
We applied cointegration tests with two endogenously determined regime shifts as Hatemi-j (2008) [
45] suggested and is called the Hatemi-J model. The model introduces two endogenous regime shifts for slope and level slope dummies. The Hatemi-J model is given in Equation (A3).
Here, α0 is the common intercept, while α1 and α2 are the intercept dummies reflecting the first and second regime shifts’ differential repercussions over α0, respectively. β01 is the base slope, while β11 and β21 are the first and second regime shifts’ differential slope coefficients of IIPG. Similarly, β02, β12, and β22 can be defined with respect to ELG. The error term in the above Hatemi-J model is represented by εt. The endogenous regime shifts were incorporated using two dummy variables D1t and D2t, as shown below.
T1 and T2 indicate the relative timings of the two regime shifts and can have fractional values in the range (0, 1).
The Hatemi-J model uses modified ADF*, Zα*, and Zt* tests while examining the cointegration relationships in endogenous regime shifts to avoid misspecification errors in the residual-based cointegration approach.
4.3. Non-Linear Auto-Regressive Distributed Lag (NARDL) Model of Cointegration
The NARDL model was suggested by [
46]. This enables simultaneous estimation of long-run and short-run asymmetric nonlinearities. The NARDL framework for CoP, IIPG, and ELG is given below.
Here, Δ signifies the first difference operator. The long-run coefficients are represented by ω
1Y, ω
2Y, and ω
3Y, while short-run coefficients are represented by α
1Y, α
2Y, and α
3Y. IIPG
t+ and IIPG
t− (say) reflect the positive and negative changes in the partial sum of IIPG
t. Similarly, other partial sums can be interpreted. The null hypothesis (no asymmetric cointegration) is examined using F-statistics. For instance, for Equation (A10), H
0 is ω
1CoP = ω
+2CoP = ω
−2CoP = ω
+3CoP = ω
−3CoP = 0. The long- and short-run symmetries can be examined by applying the standard Wald test [
46]. The long-run symmetry null hypothesis is δ
+ = δ
−, where δ
+= −ω
jY/ω
1Y and δ
− = −ω-
jY/ω
1Y, where j = 1, 2, … Similarly, the short-run symmetry null hypothesis is
where j = 1, 2… for Equation (A10).
4.4. General ARIMA Model
In this study, the ARIMA technique was used for the univariate modeling of CoP, IIPG, and ELG for their possible use in out-of-sample forecasting. The general seasonal ARIMA model, can adequately explain the seasonal changes and trend effects observed practically in the time series [
47]:
Here, L is the backward shift operator, d is the order of difference needed to make Xt stationary, is the moving average parameter, and represents the fixed seasonal autoregressive parameter. The Xt may be expressed as ARIMA (p, d, q) if the stationary series obtained after the d difference of Xt can be expressed as ARIMA (p, q).
6. Discussion
This study aimed to examine (i) the causal relationships between the monthly data of coal production (CoP), a general index of industrial production (IIPG), and electricity generation (ELG) from April 1999 to December 2020; (ii) verify whether the COVID-19 pandemic impacted the observed historical trends in the research variables and their interrelatedness; and (iii) estimate the three research variables from January 2021 to December 2025. To decode the dynamics of CoP, IIPG, and ELG, a mix of linear cointegration and non-linear cointegration models were used. Next, the forecasting performances of the three cointegration and autoregressive integrated moving average (ARIMA) models were compared based on in-sample training data points to identify the best forecasting model candidate for each research variable. Finally, the respective best forecasting model was applied to forecast the 60 data points for each variable.
Initially, we applied the linear cointegration model to the research variables, as they possessed I(1) characteristics. The linear cointegration technique revealed bidirectional short-run causality between CoP and ELG, indicating the continued impact of coal production on electricity generation in India. A sudden reduction in coal production due to environmental concerns may have adverse effects on electricity generation or increase dependence on coal imports. Moreover, the reverse causality from ELG to CoP suggested that an increase in electricity demand may call for more coal mining activities. Alternatively, policymakers may need to shift toward renewable energy sources. The linear cointegration technique established the existence of two long-run co-movement causalities from CoP and ELG to IIPG with moderate speeds of adjustment (−0.14 and −0.21, respectively). Additionally, significant unidirectional causalities were detected from CoP to IIPG and ELG to IIPG. Therefore, there existed causality from CoP to IIPG and ELG to IIPG both in the long run and short run. These findings align with previous studies [
50,
51] and carry substantial policy implications, as policymakers need to synergize the connectedness between CoP and ELG to accelerate IIPG.
The cointegration tests suggested by Hatemi-j (2008) [
45] established the presence of two endogenously determined regime shift cointegration relationships in each of the three models with CoP, IIPG, and ELG as dependent variables. October 2011 was identified as the regime shift location by the two models with CoP and IIPG as dependent variables. The monthly plots of IIPG displayed a local decline in growth trend in 2011, which may be attributed to the overall fall in GDP to 5.3% compared with 8.5% in 2010. This decline in GDP growth was caused by prolonged policy paralysis (coalition government politics), lack of ease of business environment (including delayed tax reforms), reduced domestic demand due to increasing inflation, weakened currency, and reduced external demand due to the US and Euro crises, which was consistent with the findings of Sen and Sen (2019) [
52]. It severely impacted the industrial sector’s performance. The CoP plot also showed a local dip in the growth trend in 2011, which was accurately captured as a regime shift location by the CoP regime shift cointegration model.
The government’s corrective actions resulted in the gradual recovery of the Indian economy. After 2011, the GDP growth gradually increased and peaked at 8.26% in 2016. The new government formed in 2014 successfully created a positive atmosphere through a series of infrastructure development initiatives and policy reforms. This was reflected in IIPG, which also showed gradual improvement until 2015 before rapidly improving in 2016. The regime shift location of April 2014 captured by the ELG model was consistent with the ELG plot, as marked by a shift to a higher growth pedestal and a change in seasonal characteristics. This increased shift in ELG was observed around 2019 before declining due to the overall slowdown of the Indian economy in 2019 and the COVID-19 pandemic’s effect in 2020. The CoP plot also showed a local increase in its growth trend in 2015, which continued until 2017 before experiencing a fall.
In line with the findings of linear cointegration, the regime shift model of IIPG (as a dependent variable) established a causal relationship between ELG and CoP to IIPG [
29,
53]. However, this regime shift model revealed additional insights into how the relationship between ELG to IIPG was elastic until October 2011, changed to an inelastic nature from November 2011 to September 2015, and then partially regained elasticity afterward. IIPG’s increasing trend was moderate from 2011 to 2015; however, ELG continued with its regular growth trend. Thus, the elasticity of CoP reduced in this period. The causal relationship elasticity from CoP to IIPG during the two regime shifts did not show significant variations. For instance, a 1% increase in CoP resulted in a 0.23% increase in IIPG.
The NARDL techniques revealed that CoP and ELG, as dependent variables, possessed asymmetric cointegration relationships with their respective independent variables. The short-run asymmetric causality findings from Wald tests aligned with the findings from coefficient estimation analysis of CoP and ELG asymmetric cointegration models. The long-run asymmetric relationship analysis in the CoP (as a dependent variable) model indicated that a 1% increase (decrease) in IIPG resulted in a 0.40% (0.35%) increase (decrease) in CoP. This has important policy implications as it suggests that positive and negative changes in IIPG will have a different magnitude of impact on CoP. The short-run asymmetric effects analysis revealed that a 1% increase (decrease) in IIPG decreased (increased) CoP by 0.83% (0.83%), while a 1% increase (decrease) in ELG decreased (decreased) CoP by 0.37% (2.11%). This analysis suggested the combined usage of short-run and long-run asymmetric variations in IIPG and ELG to evaluate their impact on coal production (CoP variable) planning.
The ELG NARDL model showed asymmetric long-run causality from IIPG+ to ELG, wherein a 1% increase in IIPG required a 0.42% increase in electricity generation, which was consistent with previous studies [
8,
15,
27]. This provides an important indicator to strategize electricity generation to meet the increase in the index of industrial production. A 1% short-run increase (decrease) in CoP increased (decreased) ELG by 2.16% (2.70%), while a 1% increase (decrease) in IIPG decreased (decreased) ELG by 0.05% (0.79%). This analysis suggested the combined usage of short-run and long-run asymmetric variations in IIPG and CoP to evaluate their impact on electricity generation (ELG variable) planning.
ARIMA, which is a popular univariate model, and the three multivariate cointegration models were applied to the training data series (April 1999 to December 2017) to generate respective forecasting models. Based on the post-facto forecasting performance of these models on the testing data (January 2018 to December 2019), the Johansen cointegration model emerged as the best forecasting model for CoP and IIPG, while ARIMA was the best-suited model for forecasting ELG, which was consistent with the results found by Dua et al. (2023) and Telarico (2023) [
54,
55]. The forecasting performance of the four models for each variable during the COVID-19 period (January 2020 to December 2020) showed a distinct decline compared with their respective test data period performance, thereby establishing the socio-economic disruption effect in the three research variables. The monthly in-sample average values of the three research variables (2015 to 2020) and their forecasted monthly average values (2021 to 2025) are presented in
Table 14. Based on the above estimates, India needs to plan to produce 67.84 million tonnes of coal and generate 150679 GWh of electricity units to achieve a 156.32 general index of industrial production compared with its present value of 122.22.