Next Article in Journal
Impulsive Noise Suppression Methods Based on Time Adaptive Self-Organizing Map
Previous Article in Journal
Examining the Energy Efficiency and Economic Growth Potential in the World Energy Trilemma Countries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting Long-Term Electricity Consumption in Saudi Arabia Based on Statistical and Machine Learning Algorithms to Enhance Electric Power Supply Management

by
Salma Hamad Almuhaini
and
Nahid Sultana
*
Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 34212, Saudi Arabia
*
Author to whom correspondence should be addressed.
Energies 2023, 16(4), 2035; https://doi.org/10.3390/en16042035
Submission received: 6 January 2023 / Revised: 30 January 2023 / Accepted: 10 February 2023 / Published: 18 February 2023
(This article belongs to the Section C: Energy Economics and Policy)

Abstract

:
This study aims to develop statistical and machine learning methodologies for forecasting yearly electricity consumption in Saudi Arabia. The novelty of this study include (i) determining significant features that have a considerable influence on electricity consumption, (ii) utilizing a Bayesian optimization algorithm (BOA) to enhance the model’s hyperparameters, (iii) hybridizing the BOA with the machine learning algorithms, viz., support vector regression (SVR) and nonlinear autoregressive networks with exogenous inputs (NARX), for modeling individually the long-term electricity consumption, (iv) comparing their performances with the widely used classical time-series algorithm autoregressive integrated moving average with exogenous inputs (ARIMAX) with regard to the accuracy, computational efficiency, and generalizability, and (v) forecasting future yearly electricity consumption and validation. The population, gross domestic product (GDP), imports, and refined oil products were observed to be significant with the total yearly electricity consumption in Saudi Arabia. The coefficient of determination R 2 values for all the developed models are >0.98, indicating an excellent fit of the models with historical data. However, among all three proposed models, the BOA–NARX has the best performance, improving the forecasting accuracy (root mean square error (RMSE)) by 71% and 80% compared to the ARIMAX and BOA–SVR models, respectively. The overall results of this study confirm the higher accuracy and reliability of the proposed methods in total electricity consumption forecasting that can be used by power system operators to more accurately forecast electricity consumption to ensure the sustainability of electric energy. This study can also provide significant guidance and helpful insights for researchers to enhance their understanding of crucial research, emerging trends, and new developments in future energy studies.

1. Introduction

Forecasting electricity consumption is essential for capacity planning, transmission planning, and pricing. Forecasting electricity consumption has varied features in various prediction perspectives. A long-term projection of total consumption as a function of economic or demographic criteria is essential for capacity planning. Forecasting electricity consumption depends on the time series analysis for the total usage of this energy. For long-term forecasting, the yearly data analysis of past and present values could be beneficial in predicting future values [1]. Yearly data can be analyzed and partitioned based on several methods, such as seasonality, time intervals, or other external variables, to give a clear view of the total consumption of electric power in the future [2].
The forecasting methods are divided into three groups: judgmental, univariate, and multivariate. The judgmental forecasts method depends on judgment or intuition, and the univariate forecasts produce future predictions only by utilizing the past and present values, while the multivariate forecasts include at least one extra variable to generate the future values [2]. Numerous studies have been conducted to analyze electricity energy, especially in forecasting yearly electricity consumption, utilizing different statistical, deep learning, and machine learning techniques [1,3,4,5,6,7,8,9,10,11].
Currently, the Kingdom of Saudi Arabia is witnessing considerable development in population, urbanization, domestic products, and consumption of all public goods and services, leading to an extreme increase in electricity demand [12]. To find the best balance between electric power generation and electricity consumption, Saudi Arabia is keen to provide the best strategies, solutions, and alternatives suitable for balancing all sectors’ electricity supply and demand. Therefore, to overcome the challenges associated with electricity availability, it is essential to develop plans that estimate the extent of electrical energy demand and forecast consumption based on factors, namely, temperatures, population, gross domestic product, and other related variables [12].
However, very few studies have been contacted to study the electricity demand and consumption in Saudi Arabia [3,9,10]. Among the energy research in Saudi Arabia, most studies focused only on statistical analysis, some on forecasting electricity consumption using statistical methods (electricity generation demand for ith year (EGDi), vector autoregression (VAR), autoregressive distributed lag (ARDL), linear, quadratic, sinusoidal, regression, Harvey’s structural time series, ARIMA, SARIMAX models), and forecasting using ML and DL has not often been studied [12,13,14,15,16,17,18,19].
Therefore, this study mainly focuses on forecasting the total electricity consumption using statistical and ML algorithms, namely ARIMAX, SVR, and NARX. Besides employing Bayesian optimization, a successive design technique for the global optimization of black-box functions without derivatives, this work introduces automated hyperparameter tuning for the suggested ML models [11]. By building a probability model based on the best evaluation from numerous iterations’ outcomes, Bayesian optimization finds the value of the hyperparameters that minimize an objective function. Furthermore, this optimization approach validates the model’s error using a set of hyperparameters and seeks to choose the hyperparameters with the lowest error on the validation set. Bayesian optimization is the most frequently used optimization technique due to its worthwhile benefit of using the acquisition function to determine where to sample, making it more distinguishable from other used optimization techniques, such as manual search, grid search, and random search [12].
The Kingdom of Saudi Arabia, with an area of approximately 2,000,000 km2, is located in the southwest of the Asian continent. Most of Saudi Arabia’s land is surrounded by desert; for that reason, the climate is mostly hot for the whole year. In 2020, the population of the Kingdom was about 35 million [20]. The Kingdom of Saudi Arabia has witnessed tremendous development in all sectors in recent decades. As a result of the strategic importance of Saudi Arabia’s geographical location, it is regarded as the most important free economic market in the Middle East and North Africa, accounting for 25% of the total Arab GDP [21]. In addition, it is considered the largest oil reserves globally, holding 25% of the world’s oil.
Moreover, the Saudi vision of 2030 contributes to providing alternative energy resources besides petroleum, which produces opportunities to enhance economic growth. Recently, Saudi Arabia has been ranked among the twenty largest economies worldwide and ranks first in the Middle East and North Africa. The Kingdom greatly supports all economic and developing projects; the most important are the extraction and distribution of natural gas, water desalination, electric power generation, and information technology [21].
Electricity power generation in Saudi Arabia is controlled by 26 authorized producers that fall into five categories, namely: service providers such as independent power producers (IPPs), independent water and power producers (IWPPs), the Power and Water Utility Company for Jubail and Yanbu (MARAFIQ), the Saudi Electricity Company (SEC), renewable energy, and others. Most of the electrical capacity produced is issued by the SEC at 65% of the total electrical energy [22]. However, in the Kingdom of Saudi Arabia, there are 40 electric power generation stations located in different regions, and they include around 497 electric power production units authorized by the Saudi Electricity Company (SEC). The actual capacities of these units range from 15 megawatts to 720 megawatts [23].
Moreover, based on the dataset retrieved from the Central Saudi Bank [22], the power generation capacity increased from 2005 to 2017, then reduced in the next two years. Finally, the power generation capacity continued to increase again in 2019 and 2020 to reach about 64,800 megawatts, the highest capacity of the previous years. Similarly, the power sold by the SEC and MARAFIQ during the last 16 years increased from 153,283 thousand kWh in 2005 to 289,327 thousand kWh in 2020. Therefore, there has been a noticeable increase in electricity production and the number of subscribers to services provided by the Saudi Electricity Company and MARAFIQ, which explains the remarkable progress in the growth of the Saudi GDP as well as increasing population density throughout the Kingdom [22].
The TEC in Saudi Arabia is distributed in four regions: the Central, Eastern, Western, and Southern regions. Furthermore, electricity consumption is divided into six categories according to the sector, namely, residential, agricultural, industrial, commercial, governmental, and others, which includes consumption for educational, health, and desalination purposes [22]. As observed from the previous data about the electricity consumption in Saudi Arabia based on sector, the residential sector was the highest compared to the other sectors. The second consumer was the industrial sector, followed by the commercial, governmental, and other sectors. The lowest sector was the agricultural sector. This logical analysis explains the recent increase in urbanization resulting from increased population and the expansion of the number of buildings [24]. In addition, the agricultural sector in Saudi Arabia is considered to have the lowest electricity consumption compared to other sectors due to the nature of desert lands that are unfit for agriculture in some regions of the Kingdom.
This study sheds light on the energy power in Saudi Arabia and forecasts the long-term total electricity consumption (TEC) and investigates the features that significantly correlate with TEC. The following essential objectives are addressed in this regard:
(1)
Explore the specifics of Saudi Arabia’s total electricity consumption.
(2)
Examine the variables that have a substantial impact on TEC.
(3)
Utilize modern data science approaches, namely, the statistical method (ARIMAX) and the machine learning algorithms (SVR and NARX) to predict long-term TEC.
(4)
Develop super learner models using the Bayesian optimization approach (BOA) to tune hyperparameters automatically.
(5)
Evaluate the performance of the proposed models using several evaluation metrics (viz., MAE, RMSE, MAPE, and R 2 ).
(6)
Forecast future yearly electricity consumption and validation.
It is conspicuous that the present study indicates using super learner models (BOA–SVR and BOA–NARX) for the first time to forecast long-term electricity consumption, specifically the yearly forecasting of electricity consumption in Saudi Arabia. The proposed long-term electricity consumption forecasting models will enhance the assurance, constancy, and sustainability of the power generation and planning process. The structure of this paper is as follows: Section 2 is a review of the literature that focuses on discovering and indicating a clear gap in the current level of knowledge on proven forecasting methodologies. Section 3 expresses the details of the data description and the proposed models. The results and discussions are provided in Section 4. Finally, the conclusions and recommendations are stated in Section 5. Furthermore, to improve the clarity and readability of this article, all abbreviations are listed in Table 1.

2. Literature Review

Many scholars in different countries are highly interested in the electricity consumption forecasting domain due to its critical role in planning electricity resources and assisting policymakers in setting plans to maintain the sustainability of such essential energy. Several studies were conducted to predict electricity consumption in the short, medium, and long term by applying various forecasting.
Khan et al. [1] forecasted electricity consumption in the Organization of Petroleum Exporting Countries (OPEC) of 12 countries. The dataset was compiled based on annual electric consumption from 1980 to 2012 to forecast demand three, six, nine, and thirteen years ahead. The Cuckoo search algorithm with Lévy flights correlated with the ANN was used to produce the CSNN model for forecasting electrical consumption. The study compared the MSE outcomes with the artificial particle swarm optimization-based ANN model (APSONN), the genetic algorithm-based ANN model (GANN), and the artificial bee colony-based ANN model (ABCNN) to measure the model performance. Among the other models, the findings showed that CSNN performed the best.
Aghay Kaboli et al. [3] used the artificial cooperative search (ACS) technique to estimate long-term electricity demand in Iran, a recently developed evolutionary algorithm with a significant chance of discovering the most efficient solution to complicated optimization problems. This study examined the population, stock index, GDP, import, and export of products and facilities that might significantly impact raising or reducing electric electricity consumption. This study employed a yearly dataset of energy demand from 1992 to 2013 for model construction and evaluation. The researchers claim that the created ACS algorithm is more efficient in predicting than existing optimization methods used for energy consumption forecasting, such as the genetic algorithm (GA), differential evolution (DE), Cuckoo search algorithm (CS), practical swarm optimization (PSO), simulated annealing (SA), and independent component analysis (ICA). Furthermore, linear, quadratic, exponential, and logarithmic mathematical models were used to find the appropriate weighting factors for the route coefficient analysis. Finally, the findings of this study revealed that ACS performed well in projecting energy consumption with a low error rate.
Ur Rehman et al. [4] developed three models to forecast the energy usage of electricity and four other essential fuels in Pakistan’s main sectors. The yearly energy data from Pakistan’s Hydrocarbon Development Institute (HDIP) from 1992 to 2014 were used in this study. The analysis then anticipated energy use for the next 21 years. This paper utilized the autoregressive integrated moving average (ARIMA) and Holt–Winter algorithms, and the findings were evaluated and compared using RMSE and MAPE. Moreover, the long-range energy alternative planning (LEAP) software tool was employed to construct the prediction models. The researchers demonstrated that with a 95% confidence interval, the ARIMA model outperformed the other two models in forecasting energy usage.
Kankal et al. [5] created models to predict electricity demand in Turkey. Data for the external features, GDP, population, import, and export were obtained from various resources from 1980 to 2012. This study employed a novel optimized technique based on an ANN named ANN-teaching learning-based optimization (ANN-TLBO) to construct an electricity demand forecasting model. The suggested algorithm’s prediction performance was compared to an artificial neural network with backpropagation (ANN-BP) and an artificial neural network with artificial bee colony algorithm (ANN-ABC) models. The ANN-TLBO model outperformed the other proposed models, reducing RMSE by 42.3% and 39.3%, respectively. According to the researchers, the ANN-TLBO method has a substantial benefit in minimizing computing complexity.
Yukseltan et al. [6] demonstrated the application of Fourier series expansion in Turkish electrical demand forecasting. The feedback-based forecasting methods were used in this study to forecast electricity consumption hourly, daily, and annually. Data were collected from 2012 to 2017. Additionally, a 2-year observation period was used to produce a yearly prediction for the future year to determine the prognosis for a year ahead. The suggested model’s performance in forecasting electricity consumption was strong, and it was confirmed by evaluating the MAPE with 0.87% in hourly consumption, 2.90% in daily, and 3.54% in yearly consumption forecasts. Furthermore, the researchers used an autoregressive (AR) model to improve the forecasting accuracy made by the Fourier series expansion.
In [8], Cho et al. applied several forecasting techniques based on time series, machine learning, and hybrid models to specify the highest performance model for forecasting peak load in Korea. The time series model employs seasonal autoregressive integrated moving average with exogenous variables (SARIMAX); machine learning models employ ANN, support vector regression (SVR), and long short-term memory (LSTM). The hybrid models are SARIMAX–ANN, SARIMAX–SVR, and SARIMAX–LSTM. The power peak load data for Korea were obtained from the Korea Power Exchange (KPX), while the weather data was obtained from the National Climate Data Center. Five years of daily data were divided into training and testing datasets. Data from 1 January 2014 to 31 December 2018 were used for training and from 1 January 2019 to 19 October 2019 for testing. The findings show that hybrid models considerably outperform the SARIMAX model. The performance results of the hybrid SARIMAX–LSTM are MAE = 2326.8125, RMSE = 3093.37, MSE = 9568.9936, MAPE = 3.4737, and R2 = 0.918.
In [9], Sutthichaimethee et al. constructed a model to forecast energy consumption in Thailand by adopting the concept of structural equation modeling-vector autoregressive with exogenous variables (SEM-VARIMAX). The dataset consists of time series data for 1990–2017. The shift is predominantly impacted by a causal relationship with passive factors, including the economic factor (ECON), social component (SOCI), environmental factor (ENVI), and other indicator factors. The SEM-VARIMAX model’s performance was evaluated, and the model generated MAPE of 1.06% and RMSE of 1.19%. Moreover, the developed model achieved higher performance than other forecasting models, namely backpropagation neural network, ANN, ARIMA, MLR, and gray models.
Aurangzeb et al. [10] studied forecasting the power load for a set of customers rather than individual energy customers. The convolutional neural network (CNN) layers in pyramidal architecture based on a deep learning approach were applied in this study. The dataset was obtained from the Smart Grid Smart City project in Australia. In this first stage, the density-based (DBSCAN) approach was utilized to group a subset of the customers. Then, CNN layers were used for feature selection to extract similar features and aggregate them to build the training database for each individual cluster. The proposed model’s results have improved by up to 10% of the MAPE.
Khalid et al. [11] establish a JLSTM model to forecast the electric load and price by obtaining data from different big data sources. To enhance the forecasting results, the authors utilized several techniques: the z-score method, the Jaya optimization method, and normalization. The hourly data and single and multiple features were used in the proposed models to forecast electricity demand and price for a week, a month, and three months. The proposed models were compared with other models based on univariate LSTM and SVM. The proposed model achieved the best results compared with the other methods, with RMSE at 0.02 and 0.04, while MAE was 0.1 and 0.47 for demand and price, respectively.
Hadjout et al. [25] forecasted monthly power consumption for the Algerian economic sector. Using monthly power use data from 2006 to 2019, the authors created three deep learning models: long short-term memory (LSTM), gated recurrent unit (GRU), and temporal convolutional networks (TCNs). A grid search approach is utilized to identify the ideal weight coefficients of each model. MAE, MAPE, and RMSE were used to assess the presented models. TCN produced the best RMSE values due to the minimal variation in the errors.
Peng et al. [26] used empirical wavelet transform- (EWT) and long short-term memory-based models to anticipate energy consumption. Monthly industrial electricity consumption data from January 2010 to December 2015 and China’s monthly total power consumption from 2010 to 2019 were used in this study. Based on the prediction accuracy of one comparative example and two extended applications, the proposed EWT-LSTM model performed better compared with basic long short-term memory and other popular existing models.
Da Silva et al. [27] developed time-series forecasting models based on a statistical and artificial neural network approach to anticipate industrial electricity consumption in the Brazilian system. The Holt–Winters, SARIMA, dynamic linear model, and TBATS (trigonometric Box–Cox transform, ARMA errors, trend, and seasonal components) models were evaluated for the statistical method. The NNAR (neural network autoregression) and MLP (multilayer perceptron) models were chosen for the ANN method. The study used monthly power usage statistics from the Brazilian industry from 1979 through 2020. The findings show that the MLP model had the highest predicting performance, with a MAPE of 2.32.
Saoud et al. [28] developed an optimum long short-term memory autoencoder (LSTM-AE) model. The metaheuristic method, namely, particle swarm optimization (PSO), was used to find the optimal hyperparameters for the model to produce improved prediction accuracy. The results were then compared to existing forecasting models. The hourly dataset from American Electric Power from 2004 to 2018 was used in this study. The proposed model achieved the best results, with RMSE = 680.89 and MAE = 486.28.
In Saudi Arabia, forecasting electric energy consumption has been studied from different aspects. Ouda et al. [13] compared Saudi Arabia’s power demand per capita to that of the UAE and Australia. According to the findings of this study, Saudi Arabia is the lowest power consumption country compared to UAE and Australia. Furthermore, this paper anticipated Saudi Arabia’s electricity consumption according to three scenarios. The optimistic scenario expected that average population growth would be 2.5% per year, while electricity consumption would increase by 1% per year. The second scenario, known as the moderate, predicted a 3% annual rise in population growth. The third scenario was pessimistic, predicting that the average population growth rate would remain the same as in the previous forty years and annual electricity consumption would stay the same as in the last twenty years. This paper also anticipated electricity usage from 2014 to 2040 by calculating the electricity generation demand ( E G D i ) for the ith year in million GW. According to the findings, the KSA would be required to increase electricity generation by 215% under the optimistic scenario and 514% under the pessimistic scenario to offer reliable power consumption and sustain availability for all sectors.
Alsaedi et al. [14] explored the correlation between the two independent variables: the gross domestic product and the peak load with electricity consumption in Saudi Arabia. Authors in this study applied a vector autoregression (VAR) analysis associated with several tests: the forecast error variance decompositions, Granger causality testing, and the impulse response function. The time series data were obtained from the period 1990 to 2015. However, the results illustrate a growth rate for EC of 7.21%, PL of 6.87%, and finally, the GDP was 14.14% higher in the last ten years.
In [15], Senan et al. applied the augmented Dickey–Fuller (ADF) test and autoregressive distributed lag (ARDL) cointegration technique to discover the correlation between electricity usage and the financial market development in Saudi Arabia. This study obtained the data for analysis from World Development Indicators from 1970 to 2015. The results showed that economic growth and urbanization have a positive relationship with electricity consumption, thus positively influencing the financial market development. Finally, this study suggests the need to raise electricity generation in the future to meet the demand.
In contrast, Alkhraijah et al. [16] investigated the influence of social distancing during the pandemic of COVID-19 on electricity use and temperature in Saudi Arabia. This research offered an overview of the consequences of social distance and energy use in various nations. The researchers then explored the effect of social distancing policies in Saudi Arabia as a case study. The linear correlation coefficient was employed in this study to examine the relationship between average power consumption and daily temperature from 1 January 2020 to 21 June 2020. Furthermore, extensive research was conducted to investigate the correlation during the curfew time (from 6 April to 26 April 2019). When compared to the previous five years, the data show a significant link between temperature and electricity usage, resulting in high electricity demand when the temperature is high, mostly for cooling purposes during the whole curfew. Finally, the researchers discovered a delay in the time necessary to respond to temperature changes.
Furthermore, some studies focused more on electricity consumption in Saudis’ residential buildings and other factors related to energy performance. In [17], Mikayilov et al. explored the effect of energy price, weather conditions, and income on the TEC of residential buildings. The study applied a structural time series modeling approach to data obtained from 1990 to 2018. In the long term, the impact of the energy price varied throughout the Central, Eastern, Western, and Southern areas, ranging from 0.20 in the Central region to 0.46 in the Eastern region. Similarly, income had a distinct influence, ranging from 1.02 in the Western area to 0.27 in the Eastern region. Finally, for the weather conditions, all regions showed a significant relationship between hot weather and electricity consumption [17].
In [12], Alharbi et al. came up with a framework that consists of long-term forecasting of several sectors, including electricity consumption, generation, peak load, and installed capacity, to examine the performance of the electric sector in Saudi Arabia. The authors utilized data collected from Saudi Arabia from 1980 to 2020. The SRIMAX model was applied to estimate the aforementioned factors in Saudi Arabia for 30 years, starting from 2021. The results of the proposed models were R 2 = 0.99 and MAPE = 0.30. Furthermore, the authors mentioned that analyzing additional external aspects and their linkages may improve the research and accuracy of projections.
Furthermore, in [18], Alarenan et al. came up with a model that aggregated the total industrial demand in Saudi Arabia and the economic growth factors by utilizing the structural time series model (STSM) of Harvey’s (1989) for the period between 1986 and 2016. This model depended on the energy prices and the average incomes to estimate long-term industrial energy consumption. The aggregated model findings revealed that average incomes and prices were not flexible when predicting the long term. This result explains that incomes and energy prices influence the total demand for industrial energy. Therefore, industrial energy demand will continue to grow in the coming decades responding to economic growth. At the same time, the results also showed the possibility of reducing this growth by increasing energy prices. However, in 2016, the Kingdom of Saudi Arabia proposed a program that objected to raising the prices of services such as water, electricity, and fuel for the industrial and residential sectors. These price increases significantly decreased consumption by 6.9 %, around 3.0 million tonnes.
In a country with a hot climate, such as Saudi Arabia, some studies on electric energy consumption considered the temperature as a significant factor that may influence EC. AL-Zayer et al. [19] proposed an econometric model to find the correlation between the EC and the effect of the surrounding weather temperature in the Eastern region of Saudi Arabia. EC data were retrieved from 1986 to 1990. Then, the researchers applied yearly regression for each year individually from (May to September) to measure the influence of the temperature on EC. Later, an entire 5-year period of data was used for forecasting electricity demand. The obtained results showed that the suggested models that achieved the best estimation to forecast EC were linear and quadratic at the 5% level evaluated by the mean absolute percentage deviation (MAPD), the mean square percentage error (MSPE) and R 2 . Furthermore, the cyclical sinusoidal autoregressive model was utilized to forecast the consumption for 12 months divided into two equal parts. Finally, this study illustrated the effectiveness of the models’ performances and demonstrated that the yearly regression model is the highest achiever compared with the quadratic and sinusoidal models.
In addition, AL-Garni et al. [29] delivered a regression model to forecast EC in Saudi Arabia’s Eastern province. Several significant features were selected in this study using the stepping-regression techniques. The selected features are the population of the targeted region, weather conditions based on temperature and humidity parameters, and solar radiation. A monthly dataset was used to structure the model for five years, from August 1987 to July 1992. The results confirmed the correlation between the electric energy consumption with the independent variables, precisely the weather temperatures that significantly affected the demand stability in high and low temperatures. For model validation, the predicted values were compared to the observed values, and slight deviations from the true values curve were observed, but these deviations are considered statistically acceptable. The authors suggested that the presented model be applied to estimate future electric energy consumption.
In [30], Abdel-aal et al. predicted the consumption of electrical energy in Saudi Arabia’s Eastern province based on meteorological, demographic, and economic statistics. The authors used monthly data from August 1987 to July 1993 to conduct a univariate Box–Jenkins time-series analysis. The study used two (AR) and (MA) models: nonseasonal autoregressive and seasonal autoregressive models. Furthermore, numerous models were built using various machine learning (ML) techniques, including the abductory induction mechanism (AIM) and multivariate regression models. The results revealed that ARIMA models with average percentage errors of 3.8% performed better than multivariate regression, at 5.6%, and AIM, at 8.1%, in forecasting.
Almazrouee et al. [31] presented the Prophet model’s efficacy in peak load long-term forecasting in Kuwait. The Prophet and Holt–Winters models were compared to examine their practicality and accuracy in forecasting long-term peak loads. The electric load peaks from Kuwait power plants from 2010 to 2020 were utilized to predict the peak load for the next ten years. The study delivered a highly performed Prophet model that was evaluated by five evaluation matrices.
J. Buitrago et al. [32] developed a neural network-based NARX model to train the data in an open loop, while the predicted values were produced in a closed loop. A short-term forecasting model was applied in this study to forecast EC in New England by utilizing hourly data for ten years from 2005 to 2015 to predict the next twenty-four hours to improve the electricity energy resources and lower the costs. External variables were included to enhance the model performance, namely, wet and dry bulb temperatures. The presented model’s efficiency was compared to the ARMAX model, and the findings indicated that NARX excelled with a MAPE of 0.85%, whereas ARMAX obtained 1.09%.
In [33], Fahmy et al. forecast the electric energy consumption in Saudi Arabia from 2020 to 2030 using data from 1990 to 2019. The primary purpose of this research is to test the hypothesis if the prediction accuracy using compound models is superior to using a single polynomial or stochastic model. As a result, a two-part compound model was created. The polynomial model is the first part, while the ARIMA model is the second. The findings confirm the null hypothesis, demonstrating that polynomials have the benefit of being able to express a wide variety of mathematical models.
The summary of the related studies, including the application region, selected features, the implemented algorithms, hyperparameter tuning, benchmarked methods, and the performance evaluation metrics, are presented in Table 2. It can be noted that a very limited number of studies focus on long-term electricity consumption forecasting in Saudi Arabia. Most studies did not include exogenous variables, especially GDP, population, export, imports, and refined oil products, which may significantly impact forecasting the TEC in Saudi Arabia. Few researchers utilized optimization approaches to tune the hyperparameters of ML models properly, even though optimized hyperparameters have a substantial influence in predicting model performance. Moreover, none of these studies implemented BOA to tune the optimized hyperparameters automatically. Various statistical, ML, and DL algorithms have been used for benchmarking. Likewise, various metrics were used for performance evaluation and comparison. Table 2 (the very last row) also emphasizes the novelty of this study: this is the first paper that includes various significant exogenous variables (GDP, population, imports, and refined oil products) in developing forecasting models in Saudi Arabia; used BOA for tune optimized hyperparameters; and developed novel super learner models based on SVR and NARX algorithms for forecasting long-term electricity consumption, benchmarked with a classical time series method ARIMAX depending on a variety of metrics, including generalizability and time complexity.

3. Methodology

This section begins with the dataset’s description that is utilized in this study and some fundamental statistical analysis. The suggested algorithms’ brief mathematical basis and functional principles are then presented. This part also outlines the Bayesian algorithm’s mathematical description and theoretical concepts to find the optimal hyperparameters of the suggested methods. Figure 1 depicts the key phases of the approach used in this study.

3.1. Data Description

Yearly TEC data of the Kingdom of Saudi Arabia from 2005 to 2020 was used in this study. Initially, five features, namely, GDP, population, production of refined products, imports, and exports, were considered. All the data were collected from the Saudi Central Bank from 2005 to 2020 [22]. The time series plot of the yearly EC (PJ) from 2005 to 2020 indicates an overall increasing trend (see Figure 2).

3.2. Computational Techniques

This paper uses the traditional statistical approach ARIMAX and two widely used ML algorithms, viz., SVR and NARX, to forecast yearly TEC in Saudi Arabia. Data analysis was performed using MATLAB (version R2021a).

3.2.1. Statistical Approach (ARIMAX)

The ARIMA is a mathematical technique that offers complementary solutions for forecasting future values in time series to acquire valuable insights while minimizing random errors. This model was developed by George Box and Jenkins (1976), and it consists of autoregression (AR), integration (I), and moving average (MA). The data are defined as stationary, nonstationary, and seasonal processes with the (p, d, q) order, where p refers to the AR lag observations comprised of the model, d is the differential order or the number of times the raw observations are plotted, and q is the MA lag or the size of the MA window [2].
The ARIMAX forecasting model is a variant of the ARIMA model that includes exogenous variables to increase the prediction performance and obtain more accurate and better results [2]. Furthermore, this model is applicable to any form of data pattern, including stationary and nonstationary data; however, this model is more appropriate for data that does not show any seasonality. The ARIMAX model can be mathematically represented in Equations (1) and (2) [12]:
e t = γ ( G ) φ ( G ) a t
y t   = + i = 1 m γ i ( G ) φ i   ( G )   G l i X t + e t
where φ ( G ) represents the AR parameters, and γ ( G ) represents the MA parameters. Moreover, e t is the regression error, a t denotes a zero average and the time series error term, G is the backshift operator coefficient, X t is the observed value at time t , l i shows the lag degree, and y t is the output. The autocorrelation function (ACF) and the partial autocorrelation function (PACF) can be used to select the value of p and q. The ACF and PACF can be beneficial to fit autoregressive models and detect periodicities and outliers [12,34].

3.2.2. Machine Learning Approach (SVR)

Support vector regression (SVR) is a supervised learning algorithm applied to predict continuous values. SVR uses the same process as the support vector machine (SVM), which was first introduced forward in the 1960s by Vapnik. After that, this algorithm underwent significant development over the following decades. SVM offers a logical solution to machine learning problems because of its mathematical grounding in the statistical learning theory, known as Vapnik–Chervonenkis [VC] theory. SVR and SVM are machine learning techniques that discover the best fit of the hyperplane based on several factors. However, the SVR solves a regression problem that estimates the continuous values, while the SVM is a classifier that categorizes the data based on the provided class label [35].
The SVM algorithm aims to construct a hyperplane for data regression that ensures that the training samples’ projected response values deviate from their observed (actual) response values. The SVR algorithm incorporates an ε-insensitive loss function, and the regression generalization boundaries are calculated using an ε-insensitive tube (or band) defined by the hyperplane. Optimization is achieved by decreasing the ε-insensitive tube to be as flat (thin) as feasible while containing most of the training data. In this scenario, the hyperplane is computed by a few support vectors, training samples that fall beyond the border of the ε-insensitive tube. Because of the SVR training, a regression model is trained to predict a response output for a new sample [35,36].
A nonlinear SVR method is recommended to manage complicated datasets, which maps the feature vectors into a higher dimensional feature space via some nonlinear mapping and constructs the best feasible separating hyperplane in the new feature space. Nevertheless, this mapping technique requires comprehensive computations and turns ineffectual [36]. The SVR algorithm used a kernel trick to solve this problem by utilizing kernel functions that help to solve nonlinear problems linearly and efficiently. The kernel function is critical to the SVR model’s prediction ability. Linear, Gaussian, and polynomial kernel functions are commonly used.
The performance of the SVR model highly depends on the appropriate choice of hyperparameters, including kernel function, kernel scale, box constraints, and ϵ . The parameter box constraints regulate the tradeoff between the model complexity and the degree of tolerance for points outside of ϵ in the optimization design. When the value of the box constraints increases, the tolerance for points outside of ϵ also increase. In comparison, the parameter ε adjusts the width of the ε -insensitive zone used to fit the training data and impacts the number of support vectors used to construct the SVR model [37,38]. The bigger value of ε indicates a higher tolerance for error and allocates fewer support vectors in the SVR model. Hence, both box constraints and ε values significantly influence the model complexity, and thus, a tradeoff is necessary for better performance.

3.2.3. Machine Learning Approach (NARX)

Due to their structure, which stimulates the brain’s biological neural system and provides a strong capacity to learn, retain, and analyze data, artificial neural networks (ANNs) are being modified and implemented in many applications such as classification, prediction, and recognition [39]. ANNs consist of multiple layers that generate mathematical models based on prior knowledge, including input, output, and hidden layers. Nonlinear autoregressive exogenous (NARX), a dynamic recurrent neural network (RNN), is used in time series forecasting with effective results and achieved high performance in finding short- and long-time patterns [32,40,41]. The NARX-based ANN can be defined by its input–output correlation as described in Equation (3) [41]:
y ( t ) = F [ x ( t ) ,   x ( t Δ t ) ,     ,   x ( t n   Δ t ) ,   y ( t ) ,   y ( t Δ t ) ,     ,   y ( t m   Δ t ) ] ,  
Here, x ( t ) represents the exogenous time series, y ( t ) represents the response time series, n   is the input delay (delay of the exogenous time series), m is the feedback delay (delay of the response variable), and the nonlinear function is represented by F . The NARX neural network model had two types of architectures, namely, open loop and closed loop, that executed parallelly, as shown in Equations (4) and (5) [39]:
y ^ ( t + 1 ) = F ( y ( t ) , y ( t 1 ) , , y ( t n y ) , x ( t + 1 ) ,   x ( t ) , x ( t 1 ) , , x ( t n x ) )  
y ^ ( t + 1 ) = F (   y ^ ( t ) , y ^ ( t 1 ) , , y ^ ( t n y ) , x ( t + 1 ) ,   x ( t ) , x ( t 1 ) , , x ( t n x ) )  
The F(.) represents the mapping function of the architecture of the neural network;   y ^ ( t + 1 ) stands for the NARX output (predicted value) at the time t for the time t + 1 ;     y ^ ( t ) , y ^ ( t 1 ) , , y ^ ( t n y ) are the predicted previous values of the NARX model, y ( t ) , y ( t 1 ) , , y ( t n y ) are the true previous values of the time series, x ( t + 1 ) , x ( t ) , x ( t 1 ) , , x ( t n x ) are the NARX inputs, and n x and n y are the input and output delays, respectively [39]. The training phase in the open loop comprises all of the historical data of the variables that are used to establish node weights and calculate the output to feed the feedforward network’s input. Simultaneously, the model will enter the closed loop phase for forecasting future values, in which the actual output is eliminated, and the forecasted delayed output is used to generate the prediction [39]. The NARX model’s forecasting ability strongly depends on the right selection of its hyperparameters, including the number of hidden layers, neurons, input delay, and feedback delay.

3.2.4. Hyperparameters Optimization for SVR and NARX

The study applied the BOA that follows the Bayes’ rule as represented in Equation (6):
p ( w | D ) = p ( D | w ) p ( w ) p ( D )
where w represents a hidden value, p ( w ) represents the preceding distribution, p   ( D |   w ) represents the probability, and p   ( w   |   D ) indicates the posterior distribution. Bayes’ rule applies past information to calculate the posterior possibility, which implies that the outcomes of previous iterations will be employed when selecting values for the following iteration. As a result, it can reach the optimal position more effectively than arbitrary selection [42].
Surrogate and acquisition are the two considered submodels of the BOA. The Gaussian process (GP), a typical surrogate for modeling objective function, is used to assess the objective function in the replacement model. This is a generalization of the Gaussian distribution. In general, the GP establishes a prior over function that may be turned into a posterior over function after seeing certain function values. In this method, the function f ( z ) is a realization of the GP with the mean function m ( z ) and the covariance function k ( z i ,   z j ) , as given in Equation (7). More details can be found in [7].
f   ( z )     G P   ( m   ( z ) ,   k   ( z i ,   z j ) )
where z is the function value with any possible pair of ( z i ,   z j ) in the input domain. All the variables in the input domains are connected to each other and are described by the covariance function, which is also considered as a kernel that is responsible for the smoothness and amplitude of the GP samples. In contrast, the acquisition function of the BOA is based on past observations and is enhanced over repetitions. Using the outcomes of the surrogate model, the acquisition model offers the next location to iterate [42]. The hyperparameter optimization using BOA is statistically represented in Equation (8) as
g * = arg min g > G f ( g )
where f ( g ) is the objective function to minimize the root mean square error, g * is the set of hyperparameters that produces the lowest score of the objective function, and g is any value of space (set of hyperparameters) G . The BOA was used in this study because it is a systematic method for global optimization of black-box functions, and it is more successful than other known optimization methodologies such as grid, random, and manual search because they are time-consuming and computationally high-cost [7,43,44,45,46,47,48].

3.2.5. Performance Evaluation Metrics

It is critical to employ several statistical metrics to assess the model’s accuracy since it may perform well with one metric but less effectively with another. Four performance evaluation metrics are generated using the following equations to evaluate the constructed models’ performance:
Mean   Absolute   Error   ( MEA ) = 1 n     | A v P v |
Root   Mean   Square   Error   ( RMSE ) =   ( A v P v ) 2 n
Absolute   Percentage   Error   ( MAPE ) = 1 n   | A v P v | | A v   | 100
Coefficient   of   Determination   ( R 2 ) = (   X   Y n   σ x σ y ) 2
where n represents the number of data points, Y is the data of the dependent variable, X is the data of the exogenous variable, σ x is the standard deviation of data X , σ y is the standard deviation of data Y , A v is the historical data values, and P v is the forecasted values.

4. Results and Discussion

The yearly datasets for TEC from 2005 to 2017 were mainly used to train the models, and data from 2018 to 2020 were used to investigate forecasting errors and to avoid overfitting. The original dataset involves five features: population, GDP, import, export, and total refined products. However, the initial feature set may consist of redundant data, which can increase the computation costs and affect insufficient forecasting accuracy [49,50]. To select the most significant features of TEC, several feature selection techniques were used, namely, the Pearson correlation coefficient, univariate feature ranking for regression using F-tests, rank importance of predictors using the RReliefF algorithm, sequential feature selection, and neighborhood component analysis. Based on the results of correlation analysis, population, GDP, total refined products, and imports are strongly associated with TEC (all p-value < 0.001), while export is not (p-value = 0.754, 0.05) (see Table 3). The results of other feature selection algorithms also indicate that export is not an important feature. The overall ranking of the features is presented in Figure 3. Thus, the four features (population, imports, GDP, and refined oil products) are considered in this study to construct the forecast models. Data were standardized as part of data pre-processing.
In this study, three models, viz., ARIMAX, BOA–SVR, and BOA–NARX, were developed to forecast the yearly total electricity consumption in Saudi Arabia.

4.1. Development of ARIMAX Model

It is worth noting that numerous hyperparameters, including autoregressive (AR), moving average (MA), and differencing (d), influence the ARIMAX model’s forecast performance. To obtain an optimum model, these hyperparameters must be tuned. TEC’s sample autocorrelation function (ACF) and partial autocorrelation function (PACF) plots may be used to calculate the optimal AR and MA values. The ACF plot depicts how a time series’ current value is compared to prior values, with the correlation coefficient on the x-axis and the number of lags on the y-axis, while the PACF provides the partial correlation between the time series and its lagged values. If the values of ACF and PACF exceed the confidence interval, the lag time can be taken into account for selecting the hyperparameters AR and MA. Thus, based on the ACF plot, the significant lags are at 1 and 2, while the PACF plot indicates that the considerable lags are at 1, 8, and 13 (Figure 4). Several executions were conducted based on this observation, and the best ARIMAX model with the lowest normalized Bayesian information criterion (BIC = 11.374) was achieved with the optimum values for AR, MA, and d as 2, 8, and 0, respectively (shown in Table 4).

4.2. Development of Hybrid BOA–SVR Model

A hybrid BOA–SVR model was developed to forecast the TEC. The kernel function type, the value of kernel parameter, the values of ε, and box constraints were optimized using the BOA. Several experiments were conducted, and the predicting capabilities of the models were evaluated. The optimization progress of the best BOA–SVR model is shown in Figure 5a. The best observed feasible point for the BOA-SVM model was achieved at iteration 98, with a minimum observed objective value of 0.029947. This minimum validation error for the constructed model was attained through the linear kernel function with the value of ε and box constraint of 0.0017742 and 319.26, respectively. The optimized hyperparameters of the developed BOA-SVM model are showed in Table 4.

4.3. Development of Hybrid BOA–NARX Model

This study applied the BOA approach to finding the optimal hyperparameters for the NARX model to predict TEC in Saudi Arabia. The BOA was employed to tune the number of hidden layers, hidden layer size, feedback delay, input delay, and training function. Figure 5b represents the progress of the Bayesian hyperparameter optimization. The best feasible point was achieved at iteration 86, with a minimum observed objective value of 0.000975. The optimized hyperparameters of the developed BOA–NARX model are presented in Table 4, and the network structure is shown in Figure 6. Generally, in the development of the artificial neural network, the weight values of the neurons were changed at each epoch. Larger epoch results in more extended training, testing, and validation processes. The performance plot of the developed BOA–NARX model is demonstrated in Figure 7, which indicates that the training comprising adjustive weight stopped in the second epoch, with the best validation performance of 0.003399. As shown in Figure 7, the mean squared error (MSE) for both the test and validation data have similar characteristics that suggest no overfitting of the developed model.

4.4. Performance Evaluation and Model Comparison

Prediction accuracy, computational efficiency, and generalizability are crucial characteristics of a forecasting model. All these properties of the three developed models were compared in this study.

4.4.1. Prediction Accuracy of the Developed Models

The prediction performance of the proposed ARIMAX, BOA–SVR, and BOA–NARX models for TEC in Saudi Arabia based on the training and testing datasets is presented in Table 5. As shown in Table 5, no remarkable differences were observed between the performance of training and testing datasets in all three models, indicating the developed models are neither underfitted nor overfitted. The model performance was evaluated based on several performance indicators, viz., MAPE, MAE, and RMSE, for the whole historical data from 2005 to 2020 (see Table 6). The developed BOA–NARX (MAPE = 0.3205, MAE = 598.7603, RMSE = 1080.6) provides the lowest predictive error compared to the ARIMAX (MAPE = 1.9410, MAE = 14.7855, RMSE = 20.1059) and BOA–SVR (MAPE = 2.2444, MAE = 17.8103, RMSE = 28.9151). However, ARIMAX provides better performance than BOA–SVR. A performance (RMSE) improvement of 71% was provided by the BOA–NARX model compared to the ARIMAX model, while it is 80% for the BOA–SVR model. Thus, the overall results indicate that the BOA–NARX predictions agree well with the historical data.
The comparisons between the historical and predicted values of these three artificial intelligence approaches for TEC are presented in Figure 8. All models clearly show the promising capability to manage the TEC forecasts. However, for testing data in 2019, the forecasting value of BOA-SVE is a little higher than the historical data, while the forecasted value of the other two models almost overlapped with the historical data. The fitted line plots were analyzed to investigate the association of historical data with the forecasted TEC by ARIMAX, BOA–SVR, and BOA–NARX (shown in Figure 9). The R 2 values for ARIMAX, BOA-SVE, and BOA–NARX are 98.8%, 97.5%, and 99.6%, respectively. The highest value of R 2 is provided by the BOA–NARX model.
The relative deviations of the forecasted TEC for the developed models from the historical data were calculated. As shown in the clustered column chart Figure 10a, relative deviations in most of the year for all models are scattered around the zero line. It can also be observed from Figure 10b that the range for relative deviation for BOA–SVR (−0.105, 0.043) from the historical data is higher than that of the ARIMAX (−0.092, 0.033) and BOA–NARX (−0.015, 0.008) models, and the lowest deviation was found in the BOA–NARX model.
Investigating the distributions of both historical and forecasted TEC is also crucial for model evaluation. Thus, boxplots for the historical and forecasted TEC for various modeling approaches were generated and compared (shown in Figure 11). It can be noticed that there are no outliers, the shape of the distributions for historical and all forecasted outputs are similar, and the data are left-skewed. The centers of the distribution for SARIMAX (median = 1012.1) and BOA–NARX (median = 1018.7) are very close to that of the historical data (median = 1018.3), while the center for BOA–SVR is a little high (median = 1028.8). Similar observations were also found for the spread of the distribution; the spread for the BOA–SVR predicted TEC (interquartile range (IQR) = 162.4) is wider than that of the historical data (IQR = 130.5) and the predicted TEC of both the ARIMAX (IQR = 126) and BOA–NARX (IQR = 139.4) models. Thus, these results also indicate that SARIMAX and BOA–NARX performed better than the developed BOA–SVR model, and the BOA–NARX performed the best.

4.4.2. Computational Efficiency of the Developed Models

The average computation time of each model in seconds is summarized in Table 7. The BOA–SVR model shows higher computational efficiency (2.6871 s/run) than the time series model ARIMAX (6.186 s/run) and BOA–NARX (12.4476 s/run) model. Compared to the BOA–NARX model, the ARIMAX model has accelerated the computation by about 50%.

4.4.3. Generalizability of the Developed Models

To analyze the generalizability, the study calculated the future forecasted value for the next four years (2021–2024) using the features generated by the SPSS modeler. Figure 8 demonstrates the four years of future TEC forecasts for all three models. The future forecasting for all three models has a similar trend, and the values are very close to each other, suggesting that the developed models are comparable. Furthermore, due to the latest updated data for TEC in Saudi Arabia that was released in 2021 [48], this study has compared the original value for TEC with the predicted values for 2021 (shown in Table 8). The results indicate a good forecasting ability for all three developed models, while the BOA–NARX performed the best.

4.4.4. Evaluation of the Proposed Model with Other Studies in the Literature

The proposed models were compared with other studies that applied similar methods to forecast the annual electricity consumption (shown in Table 2). In contrast with the previous studies, which explored the electricity energy in Saudi Arabia, in [14], the study investigated the relation between the EC, PL, and GDP by utilizing the VAR model. The results illustrate a growth rate for EC of 7.21% and for PL of 6.87%, and finally, the GDP grew 14.14% higher in the last ten years. In [17], EC in Saudis’ residential buildings, considering the effect of energy price, weather conditions, and income and other factors related to the energy performance, has been explored by using STSM approach, and the results showed that all regions showed a significant relationship between hot weather and electricity consumption [17]. In [18], the STSM was applied, including the energy prices and the average incomes, to estimate long-term industrial energy consumption, and the authors found that the price increases resulted in a significant decrease in consumption by 6.9%, around 3.0 million tonnes.
Moreover, in [12], electricity consumption, generation, peak load, and installed capacity have been forecasted to measure the electric system in Saudi Arabia, and the results for the proposed SARIMAX model were ( R 2 = 0.99 and MAPE = 0.30). In [19], the study applied linear, quadratic, and sinusoidal models to forecast electricity consumption in Saudi Arabia, including external variables (weather data and temperature), and the results were as follows: linear model R 2 = 0.808, quadratic model R 2 = 0.818, and sinusoidal model R 2 = 0.875. In [30], the consumption of electrical energy for Saudi Arabia’s Eastern province was forecasted based on meteorological, demographic, and economic statistics by utilizing three models. The results revealed that the ARIMA model with APE of 3.8% performed better than multivariate regression at 5.6% and AIM at 8.1% in forecasting [30].
However, in contrast with the introduced articles, this study included two super learner ML approaches (BOA–SVR, and BOA–NARX) to forecast TEC and compared their performances with the widely used statistical approach (ARIMAX). Furthermore, as previously stated, the proposed models had four features, and the feature of refined oil products was not incorporated as an exogeneous variable to forecast TEC in any previous studies. Eventually, this study applied BOA to select the optimum hyperparameters that enhanced the prediction performance.
Electricity forecasting is an essential tool to address a country’s long-term energy plans and policies. Saudi Arabia has a growing population, abundant natural resources, and good prospects for industrialization, which requires adequate energy supplies for the economy. However, studies in electricity consumption forecasting are still at an evolving stage or lacking in Saudi Arabia. Thus, this study will contribute to the development of Saudi Arabia by providing a future view of long-term electricity consumption to ensure the sustainability of electric energy. The benefit of using the proposed study can be noticed in Table 6 and Figure 9. All proposed models had a low error (MAPE < 2.4) and a high coefficient of determination ( R 2 > 97%), suggesting excellent prediction performance. The hybrid BOA–NARX model, in contrast, produced the best forecasting performance (MAPE = 0.3219 and R 2 = 99.6%). This study achieved these satisfactory results based on the optimal use of the feature selection methods, hyperparameter optimization algorithm BOA, and the machine learning algorithm NARX that adopt the effectiveness of neural network in achieving the highest prediction performance.

5. Conclusions

The Kingdom of Saudi Arabia is one of the developing countries witnessing remarkable development in many fields, including medical, educational, engineering, and urban, especially in the economic and industrial areas. These require intensive amounts and many types of energy such as fuel, solar, and electricity. Recently, one of the most important objectives of the 2030 vision of Saudi Arabia is to encourage all sectors to adopt recent technologies and solutions to achieve the best practices that ensure the availability of energy resources. In this regard, electricity is the leading energy utilized in all fields, such as residential buildings, factories, schools, hospitals, and industrial fields, and the Saudi Electricity Company mainly provides it to cover all areas and cities in the Kingdom.
The high electricity consumption makes the company that supplies this energy work on studying and evaluating the current demand and studying all the factors affecting the increase or decrease in electricity consumption to maintain the services. Equally important to illustrate, the digital transmission of sectors made the data available for researchers in all fields to enhance and provide a hand in setting future development plans. The total electricity consumption for 2005–2020 in Saudi Arabia was investigated in this study. Several feature selection techniques were executed to select the significant features, namely, population, GDP, refined oil products, and imports. Furthermore, the study utilized statistical and machine learning approaches, namely, ARIMAX, BOA–SVR, and BOA–NARX models, to forecasting yearly TEC. The performances of the models were evaluated using various performance indicators. Considerably low error (MAPE < 2.4) and a high coefficient of determination ( R 2 > 97 % ) was achieved for all proposed models, indicating excellent prediction performance. However, the best forecasting model performance was achieved by the hybrid BOA–NARX model (MAPE = 0.3219 and R 2 = 99.6 % ). Thus, this study will contribute to the development of Saudi Arabia by providing a future view of long-term electricity consumption to ensure the security and sustainability of electric energy. This research work could be extended in the future to apply different machine learning and deep learning algorithms to forecast long-, medium- and short-term electricity consumption. The residential sector is one of the major sectors of electricity consumption, consuming 50% of the total electricity consumption, and is the most well understood among all other sectors. Electricity in the residential sector is mainly consumed for lighting, space cooling, space heating, water heating, and appliances. Thus, it would also be interesting to develop models for forecasting electricity consumption in the residential sector in Saudi Arabia.

Author Contributions

S.H.A.: resources, data collection, software, methodology, visualization, and writing; N.S.: conceptualization, software, methodology, visualization, writing–review and editing, and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deanship of Scientific Research, Imam Abdulrahman Bin Faisal University, Saudi Arabia, via grant number 2022-005-CSIT.

Data Availability Statement

Datasets are available from the corresponding author on reasonable request.

Acknowledgments

All authors would like to acknowledge Imam Abdulrahman Bin Faisal University for funding this research and other facilities. We thank the anonymous reviewers for carefully reading our manuscript and for many insightful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Khan, A.; Chiroma, H.; Imran, M.; khan, A.; Bangash, J.I.; Asim, M.; Hamza, M.F.; Aljuaid, H. Forecasting Electricity Consumption Based on Machine Learning to Improve Performance: A Case Study for the Organization of Petroleum Exporting Countries (OPEC). Comput. Electr. Eng. 2020, 86, 106737. [Google Scholar] [CrossRef]
  2. Shadkam, A. Using SARIMAX to Forecast Electricity Demand and Consumption in University Buildings; University of British Columbia: Vancouver, BC, Canada, 2020. [Google Scholar]
  3. Kaboli, S.H.A.; Selvaraj, J.; Rahim, N.A. Long-Term Electric Energy Consumption Forecasting via Artificial Cooperative Search Algorithm. Energy 2016, 115, 857–871. [Google Scholar] [CrossRef]
  4. Rehman, S.; Cai, Y.; Fazal, R.; Das Walasai, G.; Mirjat, N. An Integrated Modeling Approach for Forecasting Long-Term Energy Demand in Pakistan. Energies 2017, 10, 1868. [Google Scholar] [CrossRef] [Green Version]
  5. Kankal, M.; Uzlu, E. Neural Network Approach with Teaching–Learning-Based Optimization for Modeling and Forecasting Long-Term Electric Energy Demand in Turkey. Neural Comput. Appl. 2017, 28, 737–747. [Google Scholar] [CrossRef]
  6. Yukseltan, E.; Yucekaya, A.; Bilge, A.H. Hourly Electricity Demand Forecasting Using Fourier Analysis with Feedback. Energy Strateg. Rev. 2020, 31. [Google Scholar] [CrossRef]
  7. Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian optimization of machine learning algorithms. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 4, pp. 2951–2959. [Google Scholar]
  8. Lee, J.; Cho, Y. National-Scale Electricity Peak Load Forecasting: Traditional, Machine Learning, or Hybrid Model? Energy 2022, 239, 122366. [Google Scholar] [CrossRef]
  9. Sutthichaimethee, P.; Naluang, S. The Efficiency of the Sustainable Development Policy for Energy Consumption under Environmental Law in Thailand: Adapting the SEM-Varimax Model. Energies 2019, 12, 3092. [Google Scholar] [CrossRef] [Green Version]
  10. Aurangzeb, K.; Alhussein, M.; Javaid, K.; Haider, S.I. A Pyramid-CNN Based Deep Learning Model for Power Load Forecasting of Similar-Profile Energy Customers Based on Clustering. IEEE Access 2021, 9, 14992–15003. [Google Scholar] [CrossRef]
  11. Khalid, R.; Javaid, N.; Al-zahrani, F.A.; Aurangzeb, K.; Qazi, E.U.H.; Ashfaq, T. Electricity Load and Price Forecasting Using Jaya-Long Short Term Memory (JLSTM) in Smart Grids. Entropy 2020, 22, 10. [Google Scholar] [CrossRef] [Green Version]
  12. Alharbi, F.R.; Csala, D. A Seasonal Autoregressive Integrated Moving Average with Exogenous Factors (SARIMAX) Forecasting Model-Based Time Series Approach. Inventions 2022, 7, 94. [Google Scholar] [CrossRef]
  13. Ouda, M.; El-Nakla, S.; Yahya, C.B.; Omar Ouda, K.M. Electricity demand forecast in Saudi Arabia. In Proceedings of the IEEE 7th Palestinian International Conference on Electrical and Computer Engineering, PICECE 2019, Gaza, Palestine, 26–27 March 2019. [Google Scholar]
  14. Alsaedi, Y.H.; Tularam, G.A. The Relationship between Electricity Consumption, Peak Load and GDP in Saudi Arabia: A VAR Analysis. Math. Comput. Simul. 2020, 175, 164–178. [Google Scholar] [CrossRef]
  15. Senan, N.A.M.; Mahmood, H.; Liaquat, S. Financial Markets and Electricity Consumption Nexus in Saudi Arabia. Int. J. Energy Econ. Policy 2018, 8, 12–16. [Google Scholar]
  16. Alkhraijah, M.; Alowaifeer, M.; Alsaleh, M.; Alfaris, A.; Molzahn, D.K. The Effects of Social Distancing on Electricity Demand Considering Temperature Dependency. Energies 2021, 14, 473. [Google Scholar] [CrossRef]
  17. Mikayilov, J.I.; Darandary, A.; Alyamani, R.; Hasanov, F.J.; Alatawi, H. Regional Heterogeneous Drivers of Electricity Demand in Saudi Arabia: Modeling Regional Residential Electricity Demand. Energy Policy 2020, 146, 111796. [Google Scholar] [CrossRef]
  18. Alarenan, S.; Gasim, A.A.; Hunt, L.C. Modelling Industrial Energy Demand in Saudi Arabia. Energy Econ. 2020, 85, 104554. [Google Scholar] [CrossRef]
  19. Al-Zayer, J.; Al-Ibrahim, A.A. Modelling the Impact of Temperature on Electricity Consumption in the Eastern Province of Saudi Arabia. J. Forecast. 1996, 15, 97–106. [Google Scholar] [CrossRef]
  20. Know About Kingdom of Saudi Arabia. Available online: https://www.my.gov.sa/wps/portal/snp/aboutksa (accessed on 3 May 2022).
  21. Emerging Economy. Available online: https://www.my.gov.sa/wps/portal/snp/content/1economic (accessed on 3 May 2022).
  22. Yearly Statistics. Available online: https://www.sama.gov.sa/en-us/EconomicReports/pages/YearlyStatistics.aspx (accessed on 3 May 2022).
  23. Saudi Electrcity Company. Available online: https://www.se.com.sa/ar-sa/Pages/AnnualReports.aspx (accessed on 3 May 2022).
  24. Krarti, M.; Aldubyan, M.; Williams, E. Residential Building Stock Model for Evaluating Energy Retrofit Programs in Saudi Arabia. Energy 2020, 195, 116980. [Google Scholar] [CrossRef]
  25. Hadjout, D.; Torres, J.F.; Troncoso, A.; Sebaa, A.; Martínez-Álvarez, F. Electricity Consumption Forecasting Based on Ensemble Deep Learning with Application to the Algerian Market. Energy 2022, 243, 123060. [Google Scholar] [CrossRef]
  26. Peng, L.; Wang, L.; Xia, D.; Gao, Q. Effective Energy Consumption Forecasting Using Empirical Wavelet Transform and Long Short-Term Memory. Energy 2022, 238, 121756. [Google Scholar] [CrossRef]
  27. da Silva, F.L.C.; da Costa, K.; Rodrigues, P.C.; Salas, R.; López-Gonzales, J.L. Statistical and Artificial Neural Networks Models for Electricity Consumption Forecasting in the Brazilian Industrial Sector. Energies 2022, 15, 588. [Google Scholar] [CrossRef]
  28. Saoud, A.; Recioui, A. Load Energy Forecasting Based on a Hybrid PSO LSTM-AE Model. Alger. J. Environ. Sci. 2021, 9, 2886–2894. [Google Scholar]
  29. Al-Garni, A.Z.; Zubair, S.M.; Nizami, J.S. A Regression Model for Electric-Energy-Consumption Forecasting in Eastern Saudi Arabia. Energy 1994, 19, 1043–1049. [Google Scholar] [CrossRef]
  30. Abdel-Aal, R.E.; Al-Garni, A.Z. Forecasting Monthly Electric Energy Consumption in Eastern Saudi Arabia Using Univariate Time-Series Analysis. Energy 1997, 22, 1059–1069. [Google Scholar] [CrossRef]
  31. Almazrouee, A.I.; Almeshal, A.M.; Almutairi, A.S.; Alenezi, M.R.; Alhajeri, S.N. Long-Term Forecasting of Electrical Loads in Kuwait Using Prophet and Holt–Winters Models. Appl. Sci. 2020, 10, 5627. [Google Scholar] [CrossRef]
  32. Buitrago, J.; Asfour, S. Short-Term Forecasting of Electric Loads Using Nonlinear Autoregressive Artificial Neural Networks with Exogenous Vector Inputs. Energies 2017, 10, 40. [Google Scholar] [CrossRef] [Green Version]
  33. Fahmy, M.S.E.; Ahmed, F.; Durani, F.; Bojnec, Š.; Ghareeb, M.M. Predicting Electricity Consumption in the Kingdom of Saudi Arabia. Energies 2023, 16, 506. [Google Scholar] [CrossRef]
  34. Dürre, A.; Fried, R.; Liboschik, T. Robust Estimation of (Partial) Autocorrelation. Wiley Interdiscip. Rev. Comput. Stat. 2015, 7, 205–222. [Google Scholar] [CrossRef]
  35. Zhang, F.; O’Donnell, L.J. Support Vector Regression; Elsevier Inc.: Amsterdam, The Netherlands, 2019; ISBN 9780128157398. [Google Scholar]
  36. Mohammadi, K.; Shamshirband, S.; Anisi, M.H.; Amjad Alam, K.; Petković, D. Support Vector Regression Based Prediction of Global Solar Radiation on a Horizontal Surface. Energy Convers. Manag. 2015, 91, 433–441. [Google Scholar] [CrossRef]
  37. Cherkassky, V.S.; Mulier, F. Learning from Data: Concepts, Theory, and Methods; John Wiley & Sons: Hoboken, NJ, USA, 2007; p. 538. [Google Scholar]
  38. Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995; ISBN 978-1-4757-3264-1. [Google Scholar]
  39. Boussaada, Z.; Curea, O.; Remaci, A.; Camblong, H.; Bellaaj, N.M. A Nonlinear Autoregressive Exogenous (NARX) Neural Network Model for the Prediction of the Daily Direct Solar Radiation. Energies 2018, 11, 620. [Google Scholar] [CrossRef] [Green Version]
  40. De Andrade, L.C.M.; Oleskovicz, M.; Santos, A.Q.; Coury, D.V.; Fernandes, R.A.S. Very short-term load forecasting based on NARX recurrent neural networks. In Proceedings of the IEEE Power and Energy Society General Meeting, National Harbor, MD, USA, 27–31 July 2014; Volume 2014, pp. 1–5. [Google Scholar]
  41. Guzman, S.M.; Paz, J.O.; Tagert, M.L.M. The Use of NARX Neural Networks to Forecast Daily Groundwater Levels. Water Resour. Manag. 2017, 31, 1591–1603. [Google Scholar] [CrossRef]
  42. Chang, D.T. Bayesian Hyperparameter Optimization with BoTorch, GPyTorch and Ax. arXiv 2019, arXiv:1912.05686. [Google Scholar] [CrossRef]
  43. Mockus, J. Global Optimization and the Bayesian Approach; Springer: Berlin/Heidelberg, Germany, 1989; pp. 1–3. [Google Scholar] [CrossRef]
  44. Sultana, N. Predicting Sun Protection Measures against Skin Diseases Using Machine Learning Approaches. J. Cosmet. Dermatol. 2022, 21, 758–769. [Google Scholar] [CrossRef]
  45. Sultana, N.; Hossain, S.M.Z.; Abusaad, M.; Alanbar, N.; Senan, Y.; Razzak, S.A. Prediction of Biodiesel Production from Microalgal Oil Using Bayesian Optimization Algorithm-Based Machine Learning Approaches. Fuel 2022, 309, 122184. [Google Scholar] [CrossRef]
  46. Alam, M.S.; Sultana, N.; Hossain, S.M.Z. Bayesian Optimization Algorithm Based Support Vector Regression Analysis for Estimation of Shear Capacity of FRP Reinforced Concrete Members. Appl. Soft Comput. 2021, 105. [Google Scholar] [CrossRef]
  47. Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar] [CrossRef]
  48. Statistical Report. Available online: https://www.sama.gov.sa/en-US/EconomicReports/Pages/report.aspx?cid=126 (accessed on 11 December 2022).
  49. Saeys, Y.; Inza, I.; Larranaga, P. A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  50. Yang, R.; Zhang, C.; Zhang, L.; Gao, R. A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique. Biomed. Res. Int. 2018, 2018, 1–10. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Methodology to forecast TEC in Saudi Arabia.
Figure 1. Methodology to forecast TEC in Saudi Arabia.
Energies 16 02035 g001
Figure 2. The overall trend of yearly total electricity consumption (PJ) from 2005 to 2020.
Figure 2. The overall trend of yearly total electricity consumption (PJ) from 2005 to 2020.
Energies 16 02035 g002
Figure 3. Features ranking.
Figure 3. Features ranking.
Energies 16 02035 g003
Figure 4. The sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) plots of TEC. The blue area denotes the confidence interval.
Figure 4. The sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) plots of TEC. The blue area denotes the confidence interval.
Energies 16 02035 g004
Figure 5. The progression of the BOA approach to tune the hyperparameters of (a) SVR and (b) NARX.
Figure 5. The progression of the BOA approach to tune the hyperparameters of (a) SVR and (b) NARX.
Energies 16 02035 g005
Figure 6. BOA–NARX networks for (a) open loop and (b) closed loop architecture.
Figure 6. BOA–NARX networks for (a) open loop and (b) closed loop architecture.
Energies 16 02035 g006
Figure 7. Performance plot of the proposed NARX neural network model.
Figure 7. Performance plot of the proposed NARX neural network model.
Energies 16 02035 g007
Figure 8. Comparison between the historical data and forecasted TEC of the three proposed models for training, testing, and future four years.
Figure 8. Comparison between the historical data and forecasted TEC of the three proposed models for training, testing, and future four years.
Energies 16 02035 g008
Figure 9. Fitted line plot of the historical data with corresponding forecasted values of ARIMAX, BOA–SVR, and BOA–NARX. Here, R-Sq and R-Sq (adj) represent the coefficient of determination and the adjusted coefficient of determination, respectively.
Figure 9. Fitted line plot of the historical data with corresponding forecasted values of ARIMAX, BOA–SVR, and BOA–NARX. Here, R-Sq and R-Sq (adj) represent the coefficient of determination and the adjusted coefficient of determination, respectively.
Energies 16 02035 g009
Figure 10. Clustered column chart (a) and individual value plot (b) for relative deviation of the forecasted TEC from the historical data.
Figure 10. Clustered column chart (a) and individual value plot (b) for relative deviation of the forecasted TEC from the historical data.
Energies 16 02035 g010
Figure 11. Box plot for historical and forecasted TEC for different modeling approaches.
Figure 11. Box plot for historical and forecasted TEC for different modeling approaches.
Energies 16 02035 g011
Table 1. Acronyms used in this paper.
Table 1. Acronyms used in this paper.
AcronymDescriptionAcronymDescription
BOABayesian optimization algorithmAEAbsolute error
SVRSupport vector regressionARIMAAutoregressive integrated moving average
NARXNonlinear autoregressive networks with exogenous inputsLEAPLong-range energy alternative planning
ARIMAXAutoregressive integrated moving average with exogenous inputsHDIPHydrocarbon development institute
GDPGross domestic productOPECOrganization of Petroleum Exporting Countries
R2The coefficient of determinationANNArtificial neural network
RMSERoot mean square errorCSNNCuckoo search-based neural network
EGDiElectricity generation demand for ithAPSONNArtificial particle swarm optimization-based neural network
VARVector autoregressionGANNGenetic algorithm-based neural network
ARDLAutoregressive distributed lagABCNNArtificial bee colony-based neural network
MLMachine learningLSTMLong short-term memory
DLDeep learningKPXKorea Power Exchange
SECSaudi Electricity CompanySEM-VARIMAXStructural equation modeling-vector autoregressive with exogeneous variables
MARAFIQPower and Water Utility Company for Jubail and YanbuECONEconomic factor
IWPPsIndependent water and power producersSOCISocial component
IPPsIndependent power producersENVIEnvironmental factor
TECTotal electricity consumptionMLRMultiple linear regression
MAEMean absolute errorARAutoregressive
MAPEMean absolute percentage errorCNNConvolutional neural network
ACSArtificial cooperative searchJLSTMJaya long short-term memory
GAGenetic algorithmELElectric load
PSOPractical swarm optimizationEPElectric price
ICA Independent component analysisUAEUnited Arab Emirates
CSCuckoo search algorithmPJPetajoule
SASimulated annealingMAMoving average
DEDifferential evolution ACFAutocorrelation function
GWGigawattPACFPartial autocorrelation function
EGDiElectricity generation demandGPThe Gaussian process
KSAKingdom of Saudi ArabiaIQRInterquartile range
ADFAugmented Dickey–FullerRERelative error
MAPDMean absolute percentage deviationCVRMSECoefficient of the variation of the root mean square e
MSPEMean square percentage errorSTSMStructural time series model
AIMAbductory induction mechanismECElectricity consumption
Table 2. Comparative table of reported studies.
Table 2. Comparative table of reported studies.
Ref.RegionData DescriptionMethodHyperparameters TuningBenchmarked MethodsMetricesPerformance
[1]12 OPEC countries CSNN APSONN, GANN,
ABCNN
MSECSNN has the best performance among other models
[3]Iranannual dataset: 1992–2013
Variables: GDP, POP, SI, IMP, EXP
ACSGA, PSO, ICA, CS, SA, and DE.linear, quadratic, exponential, logarithmic models AE, RMSE, U-statistic, MAPEACSAchieved higher performance
RMSE of ACS (exponential) = 0.2495
RMSE of ACS (logarithmic) = 0.1652
MAPE of ACS (exponential) = 1.0435%
MAPE of ACS (logarithmic) = 0.6790%
[4]Pakistanannual dataset:
(HDIP) 1992–2014
electricity, natural gas,
oil, coal, and LPG of six sectors (domestic, industrial, commercial, transportation, agriculture and other government sectors)
ARIMA Holt-WinterRMSE
MAPE
ARIMA is more appropriate for energy-demand forecasting
confidence interval of 95%
ARIMA model for electricity
RMSE = 157,556.11
MAPE = 140,742,305.26
[5]Turkeyannual dataset: from 1980 to 2012
GDP-POP import–export
ANN-TLBO ANN-BP, ANNABCRMSE, MAE.the average root-mean-square error decreased by 42.3 and 39.3%
RE = 1.21%
RMSE = 3.06 TWh
MAE = 2.47 TWh
[6]Turkeydataset: 2012–2017
hourly, daily, and yearly manner
AR Fourier series expansion
feedback-based forecasting.
MAPE, RMSPEMAPE with 0.87% hourly, 2.90% daily, and 3.54% yearly
[8]Koreathe power peak load data from KPX and the weather data were obtained from the National Climate Data Center (2014–2019)SARIMAX–LSTM SARIMAX,
ANN, SVR,SARIMAX–ANN,SARIMAX–SVR
MAE, RMSE,
MSE, MAPE,
R 2
SARIMAX LSTM   hybrid   model   achieved   the   best   results :   MAE = 2326.8125 ,   RMSE = 3093.37 ,   MSE = 9568.9936 ,   MAPE = 3.4737 ,   and   R 2 = 0.918
[9]ThailandECON, SOCI, environmental factor (ENVI) and other indicator factorsSEM-VARIMAX ARIMA, MLR, backpropagation neural network, ANN, and gray models.RMSE, MAPEMAPE of 1.06% and RMSE of 1.19%.
[10]Australiathe dataset was obtained from Smart Grid Smart City projectCNN
DBSCAN
MAPEthe proposed model has achieved improvement by up to 10% of the MAPE
[11]Big data sourceselectric load and priceJLSTMthe z-score method, variables, the Jaya optimization method, and normalizationunivariate LSTM and SVMRMSE, MAERMSE = 0.02 and 0.04, while MEA was 0.1 and 0.47 for demand and price, respectively
[14]Saudi Arabiayearly data from 1990 to 2015;
GDP, PL
VAR Granger causality testing, the impulse response function and forecast error variance decompositions EC with 7.21%, while the PL was 6.87%.
GDP achieved 14.14% higher than the last 10 years
[15]Saudi Arabiadataset from 1970 to 2015;
economic growth and urbanization
ADF test and ARDL cointegration technique positive correlation
[16]Saudi ArabiaJanuary 2020–21 June 2020,
From 6 to 26 April;
effect of social distancing and temperature
linear correlation coefficients a strong correlation between the temperature and electricity consumption during the curfew
[17]Saudi Arabiatime series data from 1990 to 2018
energy price, weather conditions and income
STSM all regions showed a significant relationship between hot weather and electricity consumption
[12]Saudi Arabiaelectricity consumption, generation, peak load, and installed capacitySARIMAX MAE, RMSE,
MSE, MAPE,
R 2
R 2 = 0.99 and MAPE = 0.30, MAE = 0.60, RMSE = 1
[18]Saudi Arabiahourly data for 54 prototypes describing the location (Middle–East–West–South)bottom-up approach up to 50% reduction
[19]Saudi Arabiamonthly data 1986–1990
(weather—temperature)
AR Linear, quadratic,
and Sinusoidal models.
MAPD, MSPE, R 2 significant at 5% level
linear model R 2 = 0.808
quadratic model R 2 = 0.818
sinusoidal model R 2 = 0.875
[29]Eastern region in Saudi Arabiamonthly dataset five years August 1987–July 1992;
POP,
weather condition: air temperature, humidity, solar radiation
regression model Predicted values were compared to actual values to calculate the differences.weather temperatures significantly affected the demand stability in high and low temperatures
[30]Saudi Arabiamonthly data for six years August 1987–July 1993;
weather parameters, demographic, and economic variables
ARIMA
AIM and multivariate regression models
APE, MAEARIMA: APE = 3.8%,MAE = 0.1308
AIM: APE = 8.1%, MAE = 0.1308
multivariate regression model:
APE = 5.6%, MAE = 0.2264
[31]Kuwaitpeak load, dataset obtained powerplants from 2010 to 2020The Prophet model Holt–Winters model. MAPE ,   MAE ,   RMSE ,   CVRMSE ,   R 2 MAPE = 1.75 % ,   MAE = 147.89 ,   RMSE = 205.64 ,   CVRMSE = 7.61 % ,   and   R 2 = 0.9942
Proposed modelsSaudi Arabiadataset: yearly electricity consumption from 2005 to 2020
GDP, population, import, refined oil products
BOA–SVR
BOA–NARX
BOAARIMAX RMSE ,   MAE ,   MAPE ,   R 2 , relative deviation, five-number summary, ARIMAX : MAPE = 1.9410 ,   MAE = 14.7855 ,   RMSE = 20.1059 ,   R 2 = 98.8%
BOA SVR :   MAPE = 2.2444 ,   MAE = 17.8103 ,   RMSE = 28.9151 ,   R 2 = 97.5%
BOA NARX   MAPE = 0.3205 ,   MAE = 598.7603 ,   RMSE = 1080.6 ,   R 2 = 99.6%
Table 3. Results of the correlation between TEC and the studied features.
Table 3. Results of the correlation between TEC and the studied features.
VariableSignificant or Notp-ValuePerson Test
PopulationSignificant<0.0010.952 **
GDPSignificant<0.0010.886 **
Total refined productsSignificant0.0010.728 **
ExportsNot significant0.7540.085
ImportsSignificant<0.0010.843 **
** represents that the correlation is significant at the 0.01 level.
Table 4. Tuned hyperparameters for ARIMAX, BOA–SVR, and BOA–NARX models.
Table 4. Tuned hyperparameters for ARIMAX, BOA–SVR, and BOA–NARX models.
ModelParametersARMAd
ARIMAXOptimized value281
BOA–SVRParametersKernel
Function
Kernel scaleεBox constraints
Range for BOA{‘Gaussian’, ‘Linear’, ‘Polynomial’}[0.001, 1000][0.0014, 144.2987][0.001, 1000]
Optimized valueLinear-0.0017742319.26
BOA–NARXParametersNo. of hidden layersHidden layer sizeInput delayFeedback delayTraining
function
Training Error
Range for BOA-[5, 30][1, 7][1, 7]--
Optimized value11266Levenberg-MarquardtMSE
Table 5. Performance of the developed three forecasting models for the training (from 2005 to 2017) and testing (from 2018 to 2020) datasets.
Table 5. Performance of the developed three forecasting models for the training (from 2005 to 2017) and testing (from 2018 to 2020) datasets.
ModelDatasetMAERMSEMAPE
ARIMAXTraining14.976220.92142.0741
Testing13.959016.10151.3643
BOA–SVRTraining16.783526.41342.2524
Testing22.260137.89322.2096
BOA–NARXTraining3.39216.42970.3435
Testing2.82424.02950.2717
Table 6. Performance of the developed models for the historical data from 2005 to 2020.
Table 6. Performance of the developed models for the historical data from 2005 to 2020.
ARIMAXBOA–SVRBOA–NARXImprovement of BOA–NARX wrt ARIMAX (%)Improvement of BOA–NARX wrt BOA–SVR (%)
MAE14.7855117.81033.22177882
RSME20.1058728.91515.81467180
MAPE1.9410272.24440.32198386
Table 7. Average computation time for each model (in seconds).
Table 7. Average computation time for each model (in seconds).
ModelElapsed Time (Second)
ARIMAX6.186 s/run
BOA–SVR2.6871 s/run
BOA–NARX12.4476 s/run
Table 8. Comparison between the original TEC value and the predicted value for 2021.
Table 8. Comparison between the original TEC value and the predicted value for 2021.
TEC (PJ) in 2021ResidualRE (%)
Historical1085.6290--
ARIMAX1036.982348.64674.48
BOA–SVR1027.500058.12905.35
BOA–NARX1050.200035.42903.26
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Almuhaini, S.H.; Sultana, N. Forecasting Long-Term Electricity Consumption in Saudi Arabia Based on Statistical and Machine Learning Algorithms to Enhance Electric Power Supply Management. Energies 2023, 16, 2035. https://doi.org/10.3390/en16042035

AMA Style

Almuhaini SH, Sultana N. Forecasting Long-Term Electricity Consumption in Saudi Arabia Based on Statistical and Machine Learning Algorithms to Enhance Electric Power Supply Management. Energies. 2023; 16(4):2035. https://doi.org/10.3390/en16042035

Chicago/Turabian Style

Almuhaini, Salma Hamad, and Nahid Sultana. 2023. "Forecasting Long-Term Electricity Consumption in Saudi Arabia Based on Statistical and Machine Learning Algorithms to Enhance Electric Power Supply Management" Energies 16, no. 4: 2035. https://doi.org/10.3390/en16042035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop