1. Introduction
There has been a significant impact by the pandemic on the energy sector. The outbreak led to a dampened demand for oil, resulting in plummeting oil prices and production declines. The COVID-19 pandemic caused a significant disruption in the standard of living around the world, leaving behind an entirely new behavior in commercial patterns and business ideas that hardly affected electrical consumption [
1,
2]. In 2020, to decrease the number of people infected, governments in many countries established lockdowns and strict restrictions on their inhabitants, closing educational centers and leisure businesses, which limited people from leaving their places, except for in emergencies [
2]. Only the essential workers in the health systems or other crucial sectors were allowed to commute. These decisions highly impacted the human lifestyle, along with the contraction in industrial activities, which eventually led to a significant reduction in greenhouse gas emissions and energy demand [
3,
4]. In the U.S., states such as Florida and California reported changes in the seasonal energy consumption pattern during the lockdowns. For instance, the latter outlined a reduction of up to 12%; while, for Florida, the changes did not imply a reduction in all cases [
5].
In Europe, studies showed that COVID-19 impacted the level of weekly electricity demand, and even after it, consumption patterns have permanently been modified (Werth et al., 2020) [
6]. Latin America and the Caribbean experimented with similar conditions, and most countries displayed a decrease to a greater or lesser extent during 2020 compared to non-COVID scenarios; Bolivia and Peru showed shifts of about 30% [
7]. Unsurprisingly, small businesses dropped 22% in the first months of 2020, with over 3.3 million stores inactive just in the U.S. The reason mentioned before and the acceleration to a digital age mean that the many businesses affected have left the energy industry in a high-uncertainty situation [
8]. The oil and gas sector was one of the mostly impacted by COVID-19; as the economic activity started to decelerate across the globe, claims for fossil fuel and derivatives dove [
9]. As a result, analysis of oil and gas prices and consumption became critical for investors, companies, and governments since they were looking for solutions to the imminent energy crisis. In this article, an investigation was carried out to analyze the changes in petroleum fuel consumption patterns for power generation and the price of petroleum fuel, simultaneously caused by the COVID-19 pandemic.
An accurate forecast is an essential and effective solution for energy management systems, allowing them to keep a reliable source of power for industry and houses, even during disruptive events such as the COVID-19 pandemic or new outbreaks like monkeypox. Forecasting models are key in the power system operation and energy demand [
10]. However, predicting the accuracy of those models is challenging, because it depends on factors like unpredictable and fluctuating behavior, rather than clear patterns in the data. Human activity also plays a complex role, making accurate predictions difficult. Since COVID-19 appeared, many studies have been published, analyzing the effect of the pandemic on renewable energy sources, fossil fuels, energy consumption, and human behavior [
5]. Nevertheless, most of them have ignored several factors that also impacted the energy demand and oil prices, causing biased results.
The oil and gas sector is one of the most influential industries globally, directly impacting society patterns of consumption and, thus, behavior. Oil price influences the costs of other production and manufacturing. Having a clear idea of the oil and gas price demeanor is vital for governments, companies, and investors. However, it is biased by many other factors, such as political insatiability in petroleum producer countries, economic recessions, and energy demand. During COVID, as the economy slowed, oil prices reached a historical minus zero caused by the drop in demand and an unexpected increase in supply, which led to the collapse (OECD 2020). Subsequently, this impacted the price of refined petroleum products and other downstream items, notably gasoline. As economies reopened, the initial price downturn gave way to reduced oil production and some renewed demand. As a result, prices for oil products partially recovered [
9].
In the present study, data were taken from the “U.S. Energy Information Administration” over seven years, from 2016 to 2022, to analyze and investigate the oil fuel consumption for power generation and fossil fuel prices during the pre-, during-, and post-pandemic periods. The data covered the entire United States. The whole country was taken into consideration to avoid the uncertainty and biases of fuel consumption and price. The increases in the annual population, weather, and seasonal factors were considered to investigate the impact of the COVID-19 pandemic. Therefore, the forecasting has been performed using all benchmark methods, the STL, ETS, and ARIMA methods, to compare multiple methods and pick the best one that predicts the general pattern of change in price and consumption, irrespective of seasonal and pandemic impacts.
Later, to isolate only the pandemic impact, the ARIMAX model has been used by eliminating all seasonal effects. ARIMA models are employed to forecast time-series data based solely on its own past values to capture the moving average (MA) and autoregressive (AR) components and take stationarity into account through differencing [
11,
12]. Without considering any external inputs, ARIMA models presume that the underlying data are created by combining its own historical values. In contrast, exogenous variables (X), which are outside elements that could affect the time series, are included in ARIMAX models to further extend ARIMA [
12,
13]. Exogenous variables allow ARIMAX models to capture the influence of outside variables on the time-series behavior, in addition to the autocorrelation and moving average features of the data. ARIMAX models are particularly suitable for predicting irregular behavior because they can account for the influence of external factors that may contribute to the irregularity or unpredictability in the time series [
14]. These external factors could include seasonality, weather conditions, economic indicators, or other relevant variables that may affect the time-series behavior. By incorporating these exogenous variables, ARIMAX models can better capture and explain irregular patterns, leading to more accurate predictions [
14]. Therefore, several exogeneous variables like temperature, price and mileage traveled have been incorporated, upon which petroleum fuel consumption is likely to depend. The ARIMAX model is supposed to capture the stochastic and non-smoothing behavior of the pandemic on petroleum fuel consumption and price in this article.
There has been plenty of research conducted to understand the impact of COVID-19 on different sectors, including the fluctuation of energy demand [
15]. But none of the work so far has investigated the impact of only the pandemic itself in the future forecasting of fuel consumption. Also, the volatility of fuel price has been studied due to COVID-19 in the previous literature. But there is hardly any explicit and significant research performed on how fuel prices will be affected only due to the pandemic by isolating the COVID impact. Also, the correlation of other exogeneous variables, apart from seasonal influence, like the mileage travelled, average temperature, etc., was not studied in the early literature to provide an accurate forecast of fuel price and consumption for any future pandemic period. The added value and the essential novelty of this paper revolve around the interconnectedness of the demand analysis and forecasting of consumption and cost during the COVID-19 pandemic, considering the new demand and behavioral and cultural changes. It also examines the impact of the COVID-19 pandemic as an exogenous variable on the forecast model performance, which will help to predict the anomalies if any identical kind of pandemic appears in the near future. Moreover, it will assist to differentiate the regular forecasting of fuel price and consumption from the pandemic’s impact, which can be utilized by the designated authority for the energy plan and distribution, depending on different scenarios. To the best of our knowledge, this is the first study that has developed the ARIMAX model to predict oil price and consumption during the pandemic and has studied the effect of the pandemic on fuel price and demand using the ARIMAX model, and compared different scenarios for the pandemic.
2. Fuel Consumption Pattern Analysis
In the United States and generally around the globe, governments implemented travel restrictions and an economic slowdown to mitigate the coronavirus outbreak, causing the drop in petroleum product consumption to its lowest level in more than 30 years. In the first quarter of 2020, the total petroleum U.S. demand averaged 14.1 million (b/d), which was 31% lower than in the same period in 2019 [
9]. This changed the whole product supply chain, mainly in motor gasoline, distillate fuel oil, jet fuel, and chemical feedstocks.
Based on a machine-learning model, Ou et al. (2020) included pandemic scenarios and trip activities to forecast future U.S. fuel demand, showing a decrease of 22% in gasoline consumption in comparison with non-COVID schemes [
16]. Güngör et al. (2020) analyzed the effect of COVID-19 on Turkish gas consumption from 2014 to 2020; their results displayed variations of up to 30% after and before the global event [
17]. Tian et al. (2021) studied the impact of COVID-19 on urban transportation in Canada, with a reduction of at least 60% in the main cities, which aligns with the reports of petroleum product consumption reduction; in the case of diesel, up to 49.8% of a drop in May 2020 [
18]. Smith et al. (2021) showed results which expected fossil fuel consumption to reach the pre-crisis level in 2023 [
3]. The COVID-19 pandemic will continue impacting petroleum product demand and, thus, many aspects of human behavior. This study analyzes seasonality and autocorrelation to distinguish the seasonal and non-seasonal impact on fuel consumption. The forecasting model has been selected according to the nature of the data.
Figure 1 exhibits the seasonality of petroleum fuel consumption in the United States, showing the peak in oil barrels consumed at the beginning of the year; then, it decreases and increases again during summer. Clearly, 2020 and 2018 are the outliers in this analysis.
2.1. Seasonality Analysis of Petroleum Fuel Consumption
The seasonality of U.S. petroleum fuel consumption for five years is explored in this segment. Generally, during non-pandemic years, gasoline and petroleum product consumption exhibit seasonal patterns, principally increasing in the summer season and dropping in the winter, due to the increased human activity and air conditioning usage to avoid the high temperatures. Other peaks in consumption are due to holidays such as Thanksgiving. The measures taken by many countries to reduce the impact of the Coronavirus, such as closing schools and industries, implementing remote work, and enforcing lockdowns, greatly affected the usual patterns of oil demand. This impact was particularly notable in petroleum products like gasoline and diesel. These dramatic changes in human lifestyle immensely influenced the environment and the oil supply chain. Most countries tried to reduce oil and gas consumption in favor of renewables, increasing the crisis in the sector.
Figure 2a displays the monthly petroleum fuel consumption from 2016 to 2022 in the U.S. It shows how the energy demand in 2020 became the lowest in the data set, with clear lows in April and September during the lockdowns.
From the seasonal analysis of monthly fuel consumption in
Figure 2b, it is clearly seen that the consumption was higher at the beginning of every year (January). It then drastically dropped until the summertime, at which point it went back to increased consumption. Once again, the impact of COVID is clear, with 2020 having the lowest peak every month; also, the monthly seasonality in this time series has been disrupted due to the reason previously mentioned.
2.2. Autocorrelation Analysis of Fuel Consumption
This section aims to find the time correlation in the fuel consumption time series and then remove it to show only the demand reduction related to the pandemic. To select any correlation between the fuel consumption in the time series, the partial autocorrelation function (PACF) was calculated, as shown in
Figure 3.
The PACF plot shows the correlation between the fuel consumption time series at lag t (L
t) for up to a specific number of lags. The PACF can be mathematically described as in Equation (1) [
19,
20]. The PACF analysis aims to find the relationship between the two time series points without considering the effect of all time points (lags) in between. In contrast, the autocorrelation function (ACF) is used to find the correlation between the consumption time series for different lags (seasonal or calendar patterns). However, the previous consumption analysis shows a lack of seasonality in 2020 during the COVID-19 pandemic compared to last year. Therefore, the PACF is used in this section to find notable demand trends that are not seasonal. The PACF, as shown in
Figure 3b, presents the correlation between the electrical demand time series and the lagged points, at lag k, after removing all time series points (1, 2, …, k − 1) between them [
19,
20].
The PACF plot does not show any significant values occurring or repeating, indicating the lack of seasonality in fuel consumption for power generation. In
Figure 3, the ACF and PACF are used to detect the seasonality of the data set. However, the COVID pandemic in 2019 means that the data do not follow a seasonality. The significant spike at lag 1 in the ACF suggests a non-seasonal MA (1) component and the PACF shows no pattern.
2.3. Peak Consumption Analysis
Table 1 displays the petroleum consumption statistical data such as maximum, minimum, and average amounts at monthly intervals. The results showed that there was a significant decrease in oil demand in 2020 as compared to previous years. For instance, the peak oil demand decreased from 2506 million barrels (Mbl) in 2019 to 1741 Mbl in 2020. The peak demand in 2020 was reduced by 49% and 81% compared to 2017 and 2018, respectively. Similarly, the minimum and mean oil demand in 2020 were lower than in previous years. In 2020, petroleum demand hit a record low of 1169 Mbl, down from 1451 Mbl in 2018 and 1417 Mbl in 2019. In comparison to 2019 and 2018, 2020 exhibited a reduction of 17% and 19%, respectively. The outbreak of COVID-19 and subsequent lockdowns caused an immediate shift in peak oil and energy demands in the U.S. from March to May 2020, as a reduction in energy usage and transportation needs was observed. This study considers the data available on the US Energy Information Administration (EIA) website [
21] up until August 2022. The maximum, minimum, and average are calculated over all the months. The fuel consumed in 2020 refers to the average fuel consumed over the year.
The following empirical equation has been used to calculate the change in fuel consumption with respect to the pandemic year 2020:
It is clear that the fuel consumption started to increase after the onset of the pandemic, particularly in 2022, with almost 68% increase for maximum consumption, 10% increase for minimum consumption, and 28% increase for average consumption, compared to the pandemic year 2020. This occurred due to the opening of the industrial and business sectors after the pandemic lock-down was over.
4. Forecasting Model for Fuel Consumption
Forecasting models are generally developed to predict demand profiles and follow fluctuating demand [
22]. As illustrated before, the stochastic and non-smooth behavior of petroleum fuel consumption during and after the COVID-19 pandemic increases the challenges of accurately predicting the demand compared to previous years. Normally, point forecasts are used to generate electrical demand with a single estimate value for each time step [
22,
23]. However, this is mainly limited to the time-series data and cannot capture the degree of uncertainty in the data that much. In highly stochastic and unpredicted conditions, a forecast model with the ability to handle new and unpredicted conditions (such as the COVID-19 pandemic) and work under different degrees of uncertainty is required. The point forecast model generates a future demand profile over a specific period without updating the observation. Therefore, another model was used in this article, called ARIMAX, to investigate the correlation of fuel consumption with different external observations. Before applying the ARIMAX model, all the data were turned into stationary data. Several traditional time-series forecasting methods have been considered for predicting the price and consumption of energy. But, there has been a limitation of traditional time-series forecasting data to predict the unusual behavior by any pandemic. Therefore, the ARIMAX model has been chosen by incorporating the external variables. All of the forecasting models were considered for the duration of two years from January 2022 to December 2023 for predicting fuel consumption. The training data were considered for six years, from January 2016 to December 2021, for all of the forecasting models. The test set was set from January 2022 to September 2022.
4.1. Forecasting of Fuel Consumption Using the Benchmark Methods
The benchmark approaches are among the most commonly used methods for forecasting time series. Mean, Naive, Drift, and Seasonal naive (SNAIVE) methods are standard methods for forecasting time series. While the mean method uses all of the observations in the data, the Naïve approach considers only the last observation to forecast future values.
In the case of the Drift and SNAIVE methods, they are a variation of the naïve approach. For instance, Drift extrapolates the observations into the future by joining the first and the last data points, and the SNAIVE forecasts the values as the same data point from the last observed in the previous season (Rob J Hyndman and George Athanasopoulos).
Figure 7 shows the results of the benchmark forecasting method. Based on
Figure 7, the SNAIVE method can capture most of the fluctuation of fuel consumption and, hence, it is expected to provide a comparable accurate prediction than other benchmark methods.
4.2. Forecasting of Fuel Consumption Using the STL Decomposition Methods
Decomposition techniques are used for finding and extracting the critical element of a time series. It can split the data into trends, or seasonal or cyclical patterns. STL stands for Seasonal and Trend decomposition using Loess methods to perform additive decomposition of the data through a sequence of applications of the Loess smoother, which applies locally weighted polynomial regression at each point in the data set. The STL technique is resilience to outliers and is capable of handling seasonal time series with any seasonal frequency greater than one and is not restricted to either monthly or quarterly data [
15,
24].
Figure 8 displays the STL method results.
4.3. Forecasting of Fuel Consumption Using the ETS Methods
ETS is an acronym for Error, Trend, and Seasonal. It is a practical algorithm for data sets with seasonality and other prior assumptions about the data. ETS computes a weighted average over all observations in the input time series data set as its prediction. ETS point forecasts are equal to the medians of the forecast distribution. In this study, the ETS (M, N, N) approach is implemented following the single exponential smoothing with multiplicative errors. For ETS models with multiplicative errors, the point forecasts will not be equal to the means of the forecast distributions.
The following ETS decomposition plot, as shown in
Figure 9, has been obtained with a model (M, N, M) which stands for multiplicative (error), none (trend), and multiplicative (season) as the best model in terms of AIC, AICc, and BIC following the ETS () function of the forecast package in R. it estimates the model parameters and returns information about the fitted model. The exponential smoothing parameters were observed as; α = 0.893 and ϒ = 0.0001. The σ2 is obtained as 0.0716. These parameters control the rate of change of the components; α and ϒ retain the flexibility of the level (error) and the trend, respectively. When α = 1, the level never updates (mean), and with ϒ = 0 the seasonality is fixed (seasonal means).
Figure 10 is consistent with the latter insights. The seasonality is not changing in the time series, and the level almost does not change.
Figure 10 depicts the forecasting of fuel consumption for a couple of years, from 2022 to 2024, using the ETS method. The level is the weighted average of previous observations, the season is the seasonality in the data, and the remainder is the data point that the model cannot predict (Rob J Hyndman and George Athanasopoulos).
4.4. Forecasting of Fuel Consumption Using the ARIMA Method
ARIMA refers to “Autoregressive Integrated Moving Average”. It is a forecasting approach based on previous observations assuming a dynamic correlation among the data points over time. The method combines autoregressive and moving average features. The first uses approaches as follows; for instance, the current value is calculated considering the preceding value AR (1) or the previous two Ars (2). The moving average calculates the average of different subsets of data points to smooth out the impact of outliers. The ARIMA models forecast stationary time series which refer to the fact that the properties do not relate to the time. (Equations (3) are used for the ARIMA model):
where
is the parameter for the
ith lag of the model. So, the model assumes that the data on the recent observation are influenced by the
p previous observations. The consumption data series has been checked by the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test [
25] and it was found that no differencing is required to make it stationary. ARIMA (p, d, q); where p is the order of the autoregressive, d is the degree of first differencing and q is the order of the moving average part.
Figure 11 exhibits the ARIMA model predictions with just the first order of the moving average.
Figure 11 exhibits that the best suited ARIMA (p, d, q) model is ARIMA (0, 0, 1) using the
auto.arima() function in R obtained from a variation of the Hyndman–Khandakar algorithm (Hyndman and Khandakar, 2008), which combines unit root tests, minimization of the AICc, and MLE to obtain an ARIMA model.
4.5. Comparison of the Models in Terms of Errors
Table 3 shows the error analysis of different applied forecast methods. The smallest value of RMSE has been observed for the ARIMA method with 953.74. The MAE and other error values were also the smallest for the ARIMA method to forecast fuel consumption.
The residual plot in
Figure 12 shows a normal distribution that ensured the validity of the model without any significant interactions.
Also, the Ljung–Box test in
Table 4 shows that the
p-value is much higher than 0.05. So, the forecast model is good, and the residual does not contain any significant information.
Moreover, the accuracy is evaluated based on the test data set in
Table 5. The best forecasting method observed is ETS based on the RMSE value which is 799.59.
4.6. Forecasting of Fuel Consumption Using the ARIMAX Method
An Autoregressive Integrated Moving Average with Explanatory variable (ARIMAX) model is a multiple regression model which includes one or more autoregressive terms (AR) and/or one or more moving average (MA) terms along with exogeneous variables. The model considers the interaction of multiple variables and uncertainty to generate a wider range of forecast model scenarios. The ARIMAX model is suitable for forecasting stationary or non-stationary data with any types of multivariate pattern; level, trend, seasonality, or cyclicity. The ARIMAX model allows one to take the advantage of autocorrelation that may be present in the residuals of the regression, to improve the accuracy of the forecast. Therefore, the ARIMAX model has been used here to capture the volatile and uncertain fuel consumption due to the pandemic through integrating several exogeneous variables. The ARIMAX is described by Equation (4) as a common model for forecasting consumption time series.
where Lt is the estimation of differenced consumption at time t.
is the autoregressive term with Pth order lag (AR (p) model).
is the moving average term with qth order lag (MA (q)).
is the Ath exogenous variable term, and E is a constant value.
The p, d, and q orders for the ARIMAX model are determined from the Arima () function in the forecast R package. The Autoregressive Integrated Moving Average (ARIMA) model has been extended by incorporating exogenous variables (X) in the model and modified as ARIMAX using a Bayesian framework. The exogenous variables chosen for the model are:
X1 = Mean monthly temperature in the U.S. from January 2016 to August 2022
X2 = Mean monthly petroleum fuel price in the U.S. from January 2016 to August 2022
X3 = Mean monthly mileage traveled by vehicles in the U.S. from Jan 2016 to Aug 2022
The temperature variable was selected because it was observed that the pandemic is likely to be weak at high temperatures. Therefore, it has been considered as a significant factor in understanding the impact of the pandemic on fuel consumption. Also, during the pandemic, the fuel price was volatile, and, thus, fuel consumption is likely to be affected by the price. Lastly, during the pandemic, people hardly traveled, reducing vehicle mileage; therefore, fuel consumption was also expected to decrease. Thus, in this model, all of the seasonality has been eliminated from time-series data. Then, the model has been developed by interacting with one single variable, the combination of each two variables, and a combination of all three variables to find the best forecasting of the pandemic impact on fuel consumption. The temperature variable has been checked for whether any first-order differencing is required or not using the
KPSS test, and it was found that no differencing is required, due to having a
p-value of 0.1 (>0.5). Then, the same data series has been checked for whether any seasonal differencing is required or not using the
KPSS test, and it was found that a seasonal differencing is required. Seasonal differencing with m =12 has been applied and the differenced data have been checked again using the
KPSS test. The data series found was converted to stationary and no further differencing was needed.
Figure 13a shows the average temperature profile over seven years and
Figure 13b shows the differenced temperature variable which is converted into stationary data and does not show any seasonality impacts.
The price variable has been checked for whether any first-order differencing is required or not using the
KPSS test. We found that a first-order differencing is required due to 0.01
p-value (<0.5). Then, the same data have been checked for whether any seasonal differencing is required or not using the
KPSS test and it was found that no seasonal differencing is required. Therefore, a first-order differencing has been applied and the differenced data has been checked again using the
KPSS test. The data series found was converted to stationery with a
p-value of 0.1 and no further difference was needed.
Figure 14a shows the seasonal impact of the price pattern and
Figure 14b shows the time-series pattern after eliminating the seasonal impact.
The miles variable has been checked whether any first-order differencing was required or not using the KPSS test and it was found that no differencing is required due to having a p-value of 0.1 (>0.5). Then the same data series has been checked for whether any seasonal differencing is required or not using the KPSS test and it was found that a seasonal differencing is required. Seasonal differencing with m = 12 has been applied and the differenced data have been checked again using the KPSS test.
The data series found was converted to stationery and no further difference was needed.
Figure 15 shows the differencing of miles variables over the seven years to convert into stationary data and eliminate any seasonality impacts.
So, all of the variables have been converted to stationary data series. For consumption, there is no seasonal or first-order differencing required, confirmed by the KPSS and “unit_nseasonal” tests.
Analysis of Fuel Consumption Forecast Using Combination of ARIMAX Variables
The complete data set contains information of the associate variables for around seven years from January 2016 to September 2022. For the ARIMAX forecast model, the data set has been split into a training data set for five years from January 2016 to December 2020. The remaining data set has been used as a test data set. The accuracy has been estimated using the test data set of consumption for the ARIMAX model using multiple interactions.
From
Table 6, the lowest RMSE is obtained from “price and temp” which is 735.62. Therefore, the forecast has been performed using the test data set consisting of observed price and temperature for 2021 and 2022, to predict the pandemic’s impact. The mileage seems to not have a significant contribution to fuel consumption.
The reason may be due to the Home Office during the lock-down period, which leads to less travel and consequently less fuel consumption.
Figure 16 shows the forecasting of fuel consumption due to the impact of the pandemic and stochastic behavior.