Application of SHAP and Multi-Agent Approach for Short-Term Forecast of Power Consumption of Gas Industry Enterprises

Stepanova, Alina I.; Khalyasmaa, Alexandra I.; Matrenin, Pavel V.; Eroshenko, Stanislav A.

doi:10.3390/a17100447

Open AccessArticle

Application of SHAP and Multi-Agent Approach for Short-Term Forecast of Power Consumption of Gas Industry Enterprises

Ural Power Engineering Institute, Ural Federal University Named after the First President of Russia B.N. Yeltsin, Ekaterinburg 620062, Russia

^*

Author to whom correspondence should be addressed.

Algorithms 2024, 17(10), 447; https://doi.org/10.3390/a17100447

Submission received: 3 September 2024 / Revised: 6 October 2024 / Accepted: 7 October 2024 / Published: 8 October 2024

(This article belongs to the Special Issue Advanced Artificial Intelligence/Machine Learning Techniques for Safe Operation and Control in Power and Sustainable Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Currently, machine learning methods are widely applied in the power industry to solve various tasks, including short-term power consumption forecasting. However, the lack of interpretability of machine learning methods can lead to their incorrect use, potentially resulting in electrical system instability or equipment failures. This article addresses the task of short-term power consumption forecasting, one of the tasks of enhancing the energy efficiency of gas industry enterprises. In order to reduce the risks of making incorrect decisions based on the results of short-term power consumption forecasts made by machine learning methods, the SHapley Additive exPlanations method was proposed. Additionally, the application of a multi-agent approach for the decomposition of production processes using self-generation agents, energy storage agents, and consumption agents was demonstrated. It can enable the safe operation of critical infrastructure, for instance, adjusting the operation modes of self-generation units and energy-storage systems, optimizing the power consumption schedule, and reducing electricity and power costs. A comparative analysis of various algorithms for constructing decision tree ensembles was conducted to forecast power consumption by gas industry enterprises with different numbers of categorical features. The experiments demonstrated that using the developed method and production process factors reduced the MAE from 105.00 kWh (MAPE of 16.81%), obtained through expert forecasting, to 15.52 kWh (3.44%). Examples were provided of how the use of SHapley Additive exPlanation can increase the safety of the electrical system management of gas industry enterprises by improving experts’ confidence in the results of the information system.

Keywords:

compressor station of the main gas pipeline; gas industry; machine learning; multi-agent approach; short-term power consumption forecast; SHapley Additive exPlanation

1. Introduction

Oil and gas enterprises use special energy programs to meet the requirements of critical infrastructure operations. These programs are aimed at reducing energy resource costs and consumption, as well as improving the energy efficiency and social responsibility of personnel/workers, which in turn affects energy efficiency [1].

One of the promising measures that can be implemented to enhance the energy efficiency of production processes in the gas industry is the development and implementation of a short-term (1–3 days) load-forecasting system [2,3].

Forecasting of power consumption is essential for planning the operation of power systems [4]. Many countries are introducing economic incentives to promote demand-responsive power consumption [5]. One of the methods encouraging enterprises to plan their daily power consumption schedules is the wholesale electricity and capacity market [2]. By connecting to such a market, gas industry enterprises can benefit from lower tariffs than those on the retail market, provided that accurate power consumption forecasting is taken into account, as electricity tariffs include charges for deviations of actual consumption from planned schedules. The consumption patterns of gas industry enterprises often have a high aperiodic component of the load, which requires consideration of numerous production factors, making the process of power consumption forecasting labor-intensive [3].

Thus, the introduction of short-term power consumption forecasting system can enable:

Flattening of the power consumption curve;
Optimizing the operation of self-generation units and energy-storage systems;
Optimizing maintenance, repairs, and other support systems;
Minimization of power consumption costs.

Optimizing the operation of self-generation units and energy-storage systems can be achieved because their purpose is to manage the power consumption schedule, cover the enterprise’s own needs, and ensure power supply quality [6].

Optimizing the support system processes becomes possible because increased accuracy in short-term power consumption forecasting allows the enterprise to develop an optimal maintenance schedule based on peak load hours [7].

Cost minimization can be achieved by regulating tariffs on the wholesale electricity and power market through load planning and controllable load consumer scheduling due to the fact that the wholesale electricity and power tariffs incentivize off-peak power consumption. Additionally, since part of the electricity tariff accounts for deviations in consumed power from the forecasted amount, improving forecast accuracy can help reduce electricity costs [2,8].

Implementing this measure is complicated by the need to account for the influence of numerous production process factors on the power consumption schedule of gas industry enterprises [8]. Generally, existing studies are aimed at short-term forecasts of power consumption in relation to power systems, microgrids, and residential power demand [9,10]. Due to the high proportion of periodic components in electricity consumption processes within power systems, the average forecast accuracy reaches 98–99% [3,10,11,12,13].

For short-term (1–3 days) forecasting of power consumption schedules, various methods are applied, which can be divided into deterministic (statistical) methods and machine learning methods. The former include seasonal models [14] and autoregression-based methods [15,16,17].

Machine learning methods can account for many factors, including weather [9,18], production [19], and their interdependencies [20]. Studies on short-term power consumption forecasting for industrial enterprises to adjust power consumption in real-time based on demand within a microgrid are presented in [21,22]. The advantages of accounting for different factors, not just the seasonal factor, while using machine learning methods compared to deterministic ones have been demonstrated in studies [20,23,24]. Typically, the best results of the accuracy of power consumption forecast are achieved using neural network models [25,26,27], including recurrent [28] and deep neural networks [26,29,30], as well as ensemble decision trees [31,32]. Currently, there is a lack of studies on forecasting power consumption for gas industry enterprises.

However, despite the efficiency of machine learning methods, forecast accuracy can be up only to 60% due to the complexity of accounting for production processes. To address this issue, a multi-agent approach can be applied. Multi-agent methodology is a methodology for modeling production processes by decomposing them into agents. This methodology allows for investigating the behavior of real production processes by constructing loosely coupled autonomous agents that interact with each other to achieve set goals. An agent can be determined as an object that receives data from the environment and acts upon it [33].

Based on the multi-agent approach, a multi-agent system (MAS) can be created, which is defined as a system consisting of several interacting agents. The terms multi-agent system and multi-agent approach are related but refer to different aspects of studying and applying agents. When discussing the term multi-agent system, the main focus is on the system as a whole, the mechanisms of interaction between agents, and between agents and the environment [34]. The application of the multi-agent approach is associated with considering mechanisms for distributing tasks among agents, coordinating their actions, and managing their interactions to achieve individual and common goals. Thus, a multi-agent system is a type of system that consists of several interacting agents, while the multi-agent approach is a methodology for solving problems by decomposing production processes and objects into agents and their interactions. The difference lies in what the focus is on—the system itself or the methodology for designing and analyzing systems [35].

Currently, multi-agent systems (MAS) are applied to solve various optimization problems related to the operation of enterprises and systems. Some of the applications of MAS include but are not limited to the following:

The selection of control strategies for microgrid modes considering energy-storage systems [36,37];
The management of power consumption in a microgrid with photovoltaic stations, introducing a generator agent whose objective function is to minimize power purchase costs based on forecasted generation [38];
The management of a power system with renewable energy generation objects [39];
Processing heterogeneous information in microgrid [40].

The use of MAS for gas industry enterprises was proposed in [41,42]. The authors describe the use of generator agents, power supply system agents, and consumer agents. Agents are used to ensure the reliable operation of the enterprise by optimizing electrical modes, network topology, and power balance. This approach allows for analyzing the operation modes of the system. However, it should be noted that these articles primarily use an object-oriented approach, as the goals of the agents, their input and output data, and the interactions between agents are not defined or described.

It can be assumed that considering production process factors from the perspective of a multi-agent approach may improve the accuracy of power consumption forecasting for gas industry enterprises.

Furthermore, it should be noted that current research on power consumption forecasting primarily focuses on improving forecast accuracy, while the models remain “black boxes” for experts. This complicates the implementation of these models in enterprises due to experts’ lack of trust in the learning results of these models [43]. Studies in the field of eXplainable Artificial Intelligence (XAI) were conducted to address this issue [44,45]. For complex models that are not interpretable, a posterior explanation can be used. Currently, posterior explanation in short-term power consumption forecasting is mainly represented by algorithms such as Local Interpretable Model-Agnostic Explanations (LIME) [46,47] and Shapley Additive exPlanations (SHAP) [32,48].

The drawback of the LIME method is the need to select, configure, and train a surrogate model, as well as the implicit violation of the accuracy principle, as the hypothesis that the surrogate model’s explanation corresponds to the decision-making process of the model under explanation is an untestable assumption.

For an expert, displaying the features that influenced the decision-making process along with their feature importance (weights) develops trust in the system and increases the probability of successful joint operation [49]. Therefore, the SHAP-based method of additive explanation becomes relevant for the system of short-term power consumption forecast, which is installed at gas industry enterprises, where the model is used by an expert. The Shapley Additive exPlanations algorithm is based on the theoretically optimal Shapley values from Game Theory. SHAP determines the influence of each feature on the machine learning model’s results. Currently, SHAP is primarily used for the global explanation of parameter influence on model outputs in short-term power consumption forecasting [50].

Based on the literature review, there is a gap in knowledge regarding the lack of research on power consumption forecasting for gas industry enterprises. This issue is associated with the complexity of accounting for numerous production process factors.

The objectives of the study are to develop a method for decomposing production processes at a gas industry enterprise using a multi-agent approach and to create a method for short-term power consumption forecasting at a gas industry enterprise based on explainable artificial intelligence.

The contributions of this study are the following:

The MAS-based algorithm for improving the energy efficiency of production processes at gas industry enterprises was proposed;
The necessity of the application of production process factors for increased accuracy of power consumption forecast was experimentally proven;
A comparative analysis of different decision tree ensemble algorithms for power consumption forecasting in a gas industry enterprise with varying numbers of features was conducted;
The possibility of enhancing the interpretability of power consumption forecasts for each hour of the day using the SHAP algorithm, considering changes in SHAP results depending on the applied machine learning model, was explored.

In addition, this study covers data preprocessing. This can be of interest to researchers having problems processing the data obtained from real-life power objects with a low level of digitization of data gathering, processing, and storing.

The article is organized as follows. Section 2 presents descriptions of the studied object, the proposed data-preprocessing algorithm, the proposed method for decomposing production processes at a gas industry enterprise using a multi-agent approach, and a method for short-term power consumption forecasting at a gas industry enterprise based on explainable artificial intelligence, descriptions of machine learning methods, and an approach to results interpretation. Section 3 shows the results and discussion of applying machine learning methods and SHAP to the preprocessed dataset in the short-term power consumption forecasting task. Section 4 concludes the article by describing the future prospects of the study.

2. Materials and Methods

This section describes the method for increasing energy efficiency of production processes in gas industry enterprises based on the multi-agent approach, machine learning methods, and the approach to interpreting models of machine learning regarding the problem of short-term power consumption forecast.

The first step of the algorithm proposed in the article involves applying a multi-agent approach to decompose production processes into objects/agents necessary for short-term power consumption forecasting. However, the short-term power consumption forecasting system itself is not implemented as a multi-agent system.

After data preprocessing, ensemble machine learning methods are employed to obtain short-term power consumption forecast values.

Following the power consumption forecasting results, their interpretation is conducted using the SHAP algorithm.

The pipeline of proposed algorithm is shown in Figure 1.

2.1. Gas Industry Enterprise under Consideration

The gas industry encompasses the extraction, transportation, storage, and distribution of natural gas. Since the nature of a company’s operations determines the factors influencing production processes, it is essential to identify the type of enterprise beforehand to analyze these processes. In this study, a compressor station was selected as the object within the gas industry. A compressor station of a main gas pipeline is an enterprise in the gas industry, which can be defined as a complex of equipment for increasing gas pressure and cooling during gas transportation through a main gas pipeline. The compressor station discussed in the article consists of three compressor compartments.

The compressor station includes a compressor compartment with gas–compressor units (GCU) and cooling units (CU). The drives of the GCU of main gas pipelines are gas turbine units (GTU), asynchronous engines with a capacity of 4.5 MW and synchronous from 4 to 25 MW. In addition, it should be noted that the gas turbine units (GTU) of GCU can be used as self-generation units. The most common CU are air-cooled units (ACU), which use electric fan drives from 10 kW to 100 kW. The consumption of ACU can account for 50–80% of the total compressor station load. Figure 2 illustrates the main elements of the compressor station.

2.2. Initial Dataset

The initial dataset for this article included data from the automated commercial electricity metering system (ACEMS) of the gas industry enterprises and weather data. Data from the ACEMS included three-year values (791 days):

Hourly power consumption by the entire compressor station;
Daily power consumption by ACUs of each compartment;
Daily power consumption by GCUs of each compartment.

Hourly power consumption data for each month was provided in separate files. Daily consumption data of ACUs and GCUs for each day was presented in separate files. The data had missing values.

The statistical characteristics of power consumption for the compressor station under consideration are shown in Table 1.

Analysis of the distribution of hourly power consumption values for the entire period shows a large standard deviation of 541.3 kWh compared to the average value of 675.8 kWh. The difference between the maximum consumption in the winter (2837 kWh) and the minimum in the summer (99 kWh) is 2738 kWh.

The analysis of the statistical characteristics showed that the following factors are to be considered for the short-term power consumption forecast:

The dependence of power consumption on the operating modes of GCUs and ACUs;
The non-stationarity nature of gas transportation processes;
The seasonality factor.

Figure 3 presents a fragment of the hourly power consumption for the compressor station from December 2020 to January 2021, demonstrating the high stochasticity of power consumption.

Practice in working with data shows that sometimes, due to low automation of processes, it is not possible to obtain values for production process factors. Therefore, the production process index was introduced by the authors to account for the magnitude of power consumption. By the production process index, we mean a value that indirectly accounts for production processes. Its value was 1 for power consumption greater than 1 MWh; otherwise, it was 0. The value of 1 MWh was selected based on an analysis of the consumption load curve. While we do not provide a specific methodology for selecting this value, we demonstrate that its use improves the accuracy of power consumption forecasting.

Since we are forecasting the power consumption of a compressor station whose production processes are season-dependent, it is necessary to consider weather data. The data were taken for a weather station located 5 km from the compressor station from the archive of the rp5.ru website. This assumption of the use of weather data from the archive instead of the forecasted ones can be justified by the fact that meteorological parameters are additional factors, and the accuracy of weather forecasting is comparable to actual weather parameters.

The meteorological data include:

Wind speed (m/s);
Temperature (°C);
Atmospheric pressure (mm Hg);
Humidity (%).

The discretization step for all meteorological parameters was initially 3 h.

Initial research data are shown in Table 2.

2.3. Multi-Agent Approach

In this article, a multi-agent approach is proposed to enhance the energy efficiency of production processes of gas industry enterprises. The algorithm for applying the multi-agent approach can be described as follows:

Defining the objective function of the short-term power consumption forecasting system for the compressor station (in our case study, the energy costs; see Section 2.3.1);
Description of the information model for short-term forecast of consumption of a compressor station (see Section 2.3.2);
Application of Multi-agent Approach for analyzing production processes of compressor station (see Section 2.3.3):
Identifying agents based on objects involved in the production processes of the compressor station (in our case study, GCU and ACU);
Defining the objective functions of the agents;
Specifying the input and output data flows of the agents;
Establishing connections between the agents.

2.3.1. Defining the Objective Function of the Short-Term Power Consumption Forecasting System for the Compressor Station

The first step in creating a short-term power consumption forecasting system for the compressor station is to define its overall objective function. The objective function of this system is determined by the energy costs according to Formula (1), considering constraint (2) that represent power balance.

S = \sum (P_{e x t} \cdot T + f (P_{g e n}) \cdot k_{g e n} + P_{s t o r +} \cdot k_{s t o r}),

(1)

P_{e x t} + P_{g e n} + P_{s t o r +} = P_{c o n s} + P_{s t o r -} + ∆ P .

(2)

where S is the costs,

P_{e x t}

is the power supplied by the external power system, T is the electricity tariff,

f (P_{g e n})

is the function of dependence between amount of used fuel and output of self-generation units,

P_{s t o r +}

is the power output of energy-storage system,

k_{g e n}

is the utilization factor for self-generation units,

k_{s t o r}

is the utilization factor for energy-storage systems,

P_{g e n}

is the power supplied by self-generation units,

P_{c o n s}

is the power consumption of load objects (GCU, ACU),

P_{s t o r -}

is the power consumed by energy-storage systems during charge cycle, and

∆ P

is the power loss in a power supply system.

2.3.2. Description of the Information Model for Short-Term Forecast of Power Consumption of the Compressor Station

In order to create a short-term power consumption forecasting system, it is necessary to identify the system users, the real-world objects required for forecasting power consumption, and develop an information model of the system. The information model is understood as a model consisting of a set of interconnected entities (objects). Entity is an abstraction of a real-existing object, process, or phenomenon about which information needs to be stored in a database.

The short-term power consumption forecasting system is designed for use by the following subjects:

An expert in power consumption forecasting and energy accounting with expert knowledge in power consumption forecasting;
A chief engineer making decisions related to the production processes of the compressor station.

An expert in power consumption forecasting and energy accounting receives information concerning the schedule of short-term power consumption forecast for enterprises of the gas industry (henceforth—schedule of power consumption forecast), which they can correct in accordance with their knowledge.

A chief engineer receives three schedules:

Operating schedule of energy-storage systems;
Schedule of controllable load consumers;
Operating schedule of self-generation units (henceforth—operating schedule of self-generation).

The chief engineer can make expert adjustments to these schedules.

Appropriate information models based on the multi-agent approach were developed to model generation, storage, and consumption of power by the compressor station. The information data model used to generate “Schedule of power consumption forecast”, “Schedule of controllable load consumers”, and “Operating schedule of energy storage systems”, as shown in Figure 4. The information model used to generate the “Operating schedule of self-generation” is shown in Figure 5. Objects formed by the information system are marked green. Objects from adjacent information systems are marked orange. Users who interact with the system are marked blue. Objects entered by users are shown in white. Arrows between objects indicate the direction of connection: an object where an arrow end is formed based on an object from which it was aimed.

The object “Production plan data” is necessary for the informational data because, according to the specifics of the production processes, “Data on gas transportation plan” have a discretization ranging from one to seven days. Therefore, the object “Data on gas transportation plan” can be differently distributed among working days depending on the equipment maintenance schedule. The object “Production plan data” is generated by the system of short-term power consumption forecast for the compressor station based on the following objects:

“Production calendar data”;
“Data on gas transportation plan”;
“Generation equipment data”, which is used to arrange the “Operating schedule of self-generation”;
“Equipment data”, which is used to arrange other schedule objects.

The “Schedule of power consumption forecast” relates to the following objects:

“Data on actual power consumption”;
“Weather data”;
“Production plan data”;
“Expert adjustments to schedule of power consumption forecast”.

“Schedule of controllable load consumers” is created for the equipment related to controllable load consumers after “Schedule of power consumption forecast” is formed. This schedule describes changes made in “Schedule of power consumption forecast” depending on the forecast of electricity tariff, forecast of peak load hour, and data on equipment of controllable load consumers. Hence, the object “Schedule of controllable load consumers” relates to the following objects:

“Schedule of power consumption forecast”;
“Data on forecast of electricity tariff”;
“Equipment data”;
“Operating schedule of self-generation”;
“Expert adjustments to schedule of controllable load”.

The object “Schedule of controllable load” is connected to “Operating schedule of self-generation” assuming that the distribution of controllable load value is to be determined according not only to electricity tariff but also to operating schedule of self-generation.

An energy-storage system can be a special case of controllable load since it can function as consumer, generator, and energy-storage device. “Operating schedule of energy storage systems” relates to the following objects:

“Schedule of controllable load”;
“Storage equipment data”;
“Expert adjustments to operating schedule of energy storage system”.

The object “Operating schedule of self-generation” is formed based on interaction with the following objects:

“Data on actual power consumption”;
“Data on actual self-generation”;
“Weather data”;
“Production plan data”;
“Expert adjustments to operating schedule of self-generation”.

2.3.3. Application of Multi-Agent Approach for Analyzing Production Processes of Compressor Station

After the objects required to create a schedule of power consumption forecast are defined, agents can be determined. Within the multi-agent approach, an agent is described by a tuple:

T = 〈I_{t}, O_{t}, S_{t}, R_{t}, A_{t}〉,

(3)

where t is the moment of time, I_t is the input data of an agent (for example, retrospective power consumption of the compressor station), O_t is the output data (for example, power consumption), S_t is the object state (for example, equipment status), R_t is the rules of agent’s behavior (for example, emergency start-up/shutdown), and A_t is the list of agent actions.

The proposed system of short-term power consumption forecast system is considered as the system with functions that are defined by all actions of agents aimed at achieving their goals. In the field of power system control, intellectual information systems are allowed to be used only for decision-making support due to their indeterminate nature. Thus, in this study, agents are described by the tuple without actions A_t. Consequently, agents are described by the following tuple:

T = 〈I_{t}, O_{t}, S_{t}, R_{t}〉 .

(4)

The elements are defined by the following interconnections:

O_{t + 1} = f (I_{t}, S_{t + 1}),

(5)

S_{t + 1} = f (I_{t}, S_{t}) .

(6)

The objective function of a consumer agent is the reduction in energy costs according to (1).

Compressor stations can include power consumption objects, self-generation objects, and energy-storage system objects. Power consumers can be divided into those involved in the main production process and regulator consumers. A regulator consumer is a power consumer whose operating mode allows for load limitation during peak hours to smooth out the load curve. A special case of a regulator consumer can be a storage system, which can act as a consumer, generator, or storage system for electrical energy depending on economic factors and production process factors.

Thus, all objects can be classified into storage agents, generator agents (in our case study, GTU) and consumer agents (in our case study, ACU and GCU). The agent decomposition for the compressor station is shown in Figure 6.

Input and output parameters for a consumer-agent are described by (7) and (8), respectively.

I_{t} = 〈P o w e r, T, P l a n, {E x p}_{c o n s}, {E x p}_{r e g}, M e t e o, E q, C a l, {A c t}_{g e n}〉,

(7)

where Power is the retrospective power consumption, T is the forecasted electricity tariff, Plan is the gas transportation plan,

{E x p}_{c o n s}

is the expert adjustments to schedule of power consumption forecast,

{E x p}_{r e g}

is the expert adjustments to the schedule of controllable load, Meteo is the weather data, Eq is the equipment data (on/off, resource), Cal is the calendar-connected features (time, date, working day/day off), and

{A c t}_{g e n}

is the operating schedule of self-generation objects.

O_{t} = 〈{A c t}_{c o n s}, F o r e c a s t〉

(8)

where

{A c t}_{c o n s}

is the schedule of controllable load consumers and Forecast is the schedule of power consumption forecast.

The objective functions of a generator agent and an agent-consumer are the reduction in energy costs according to (1) taking into account power balance constraints (2).

Input and output parameters for a generator agent are described by (9) and (10), respectively.

I_{t} = 〈P o w e r, G e n, P l a n, {E q}_{g e n}, C a l, M e t e o, {E x p}_{g e n}〉,

(9)

where Gen is the retrospective self-generation,

{E q}_{g e n}

is the generation equipment data (data on technical state and on/off state of generation equipment), and

{E x p}_{g e n}

is the expert adjustments to operating schedule of self-generation.

O_{t} = 〈{A c t}_{g e n}〉,

(10)

Input and output parameters flows for an agent-storage are described by (11) and (12), respectively.

I_{t} = 〈{E q}_{s t o r a g e}, F o r e c a s t, {E x p}_{s t o r a g e}〉,

(11)

where

{E q}_{s t o r a g e}

is the storage equipment data and

{E x p}_{s t o r a g e}

is the expert adjustments to operating schedule of energy-storage systems.

O_{t} = 〈{A c t}_{s t o r a g e}〉

(12)

where

{A c t}_{s t o r a g e}

is the operating schedule of energy-storage systems.

The interaction between agents occurs according to Figure 7 through the information system, whose data information models are discussed above. The schedule of power consumption forecast is arranged by consumer agents that provide indicators of the main production process. Moreover, the consumer agents form the schedule of controllable load consumers to flatten the power consumption curve based on the forecasted electricity tariff value and operating schedule of self-generation. Operating schedules of self-generation and energy storage are arranged by a generator agent and agent-storage, respectively, to maintain power balance.

In this section, a method for decomposing production processes at a compressor station enterprise based on a multi-agent approach is presented. The following sections discuss the developed method for short-term power consumption forecasting.

2.4. Data Preprocessing

The data preprocessing algorithm is shown in Figure 8.

The first step of the data preprocessing algorithm was to create a dataset from separate files of hourly power consumption of the entire compressor station and files of daily power consumption by ACUs and GCUs.

The second step was to convert the matrix of hourly power consumption to a vector.

The third step was to perform linear interpolation for the weather data to obtain hourly values.

The fourth step was to create an initial dataset by adding the hourly weather data to the power consumption data and performing a Spearman correlation analysis.

The fifth step was to create a dataset for a machine learning application. Forecast of power consumption is retrospectively based:

y_{i}^{*} = f (g (y_{i - h}, y_{i - h - 1}, \dots, y_{i - h - w}), X),

(13)

where

y_{i}^{*}

is the forecasting power consumption at i-time,

f

is the forecasting model,

g

is the function that defines the rule of selection of actual retrospective values of power consumption,

h

is the forecasting horizon,

w

is the width of the window of retrospective data, and

X

is other values.

After the initial correlation analysis, it was determined that the forecast model can be sufficiently built with 12 h time step and 4-day retrospective depth [8]. This can be justified by the cyclic nature of production processes. In addition, values of 12, 24, and 36 h before the forecast hour were not considered when formulating the schedule of power consumption forecast because these data were not available in the ACEMS at the time of power consumption forecast creation. For example, the following data were used to forecast power consumption for 9 September 19:00–20:00:

First, 7 September 19:00–20:00 and 07:00–08:00;
Second, 6 September 19:00–20:00 and 07:00–08:00;
Third, 5 September 19:00–20:00.

Therefore,

y_{i}^{*}

can be described as

y_{i}^{*} = y_{i}^{*} = f (y_{i - 48}, y_{i - 60}, y_{i - 72}, y_{i - 84}, y_{i - 96}, X),

(14)

where X can include the following:

Weather data:
Wind speed in hour i;
Temperature in hour i;
Atmospheric pressure in i;
Humidity in hour i;
Production plan data:
Hour number i (from 0 to 23);
Day of the month (from 1 to 31);
Day of the week (from 1 to 7);
Month number (from 1 to 12);
Production process index;
Daily power consumption by ACUs and GCUs.

The final step involved removing rows with zero power consumption values from the dataset. After preprocessing, the dataset contained 16,896 rows.

The whole dataset was split into training and testing sets in a 90:10 ratio.

2.5. Machine Learning Methods

The following four ensemble regression methods of machine learning were chosen for power consumption forecast: random forest [51], adaptive boosting (AdaBoost) [51], extreme gradient boosting (XGBoost) [52], and light gradient boosting (LightGBM) [53]. They are used to build ensembles from regression decision trees:

y_{i}^{*} = T (Z_{i}),

(15)

where T is a hierarchic system of rules, each of which compares the value of a certain feature with the threshold.

Random forest builds model as an ensemble consisting of k decision trees:

y_{i}^{*} = f (Z_{i}) = \frac{1}{k} \sum_{j = 1}^{k} T_{j} (Z_{i}),

(16)

AdaBoost, XGBoost, and LightGBM are based on the concept of gradual ensemble improvement (boosting):

y_{i}^{*} = f (Z_{i}) = \sum_{j = 1}^{k} {w_{j} T}_{j} (Z_{i}),

(17)

where

w_{j}

is weight coefficient of model j.

In order to compare the results of experiments, the following metrics were calculated: mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), and coefficient of determination (R²), defined by Formulas (18)–(21), respectively.

M A P E = \frac{1}{N} \sum_{i = 1}^{N} \frac{|y_{i} - y_{i}^{*}|}{y_{i}},

(18)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - y_{i}^{*}|,

(19)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - y_{i}^{*})}^{2}},

(20)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - y_{i}^{*})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}},

(21)

where

y_{i}

is actual i-th value of power consumption,

y_{i}^{*}

is forecast i-th value of power consumption, N is the number of values, and

\bar{y}

is the mean actual value.

Additionally, to assess the lower bound of accuracy, results for a ridge regression model with Tikhonov regularization (henceforth—Ridge) were obtained. The linear regression model can be formulated as follows [41]:

y_{i}^{*} = X \cdot W + d,

(22)

where X is the independent variables (features, described in Formula (14)), W is regression coefficients, and d is bias coefficient.

Data preprocessing, model construction, and testing were performed using Python 3 with open-source libraries: Scikit-Learn (AdaBoost, random forest, and Ridge) [51], XGBoost [52], and LightGBM [53].

2.6. Interpretation of Model Output

The SHAP algorithm was applied to interpret the results of the short-term power consumption forecast for the compressor station. This algorithm enables identifying the influence of parameters related to the operation of equipment (ACU and GCU) on the power consumption forecast. The importance of the j-th feature (e.g., gas transportation plan or the daily power consumption of GCU) for the model f is calculated by analyzing its impact on the output results (power consumption forecast) with input data Z_j, considering all possible feature combinations, as described in the following [48]:

φ_{j} (f, Z_{i}) = \sum_{S \subseteq P \ \{j\}} \frac{|S|! (|P| - |S| - 1)!}{|P|!} [f_{S \cup \{j\}} (Z_{i, S \cup \{j\}}) - f_{S} (Z_{i, S})],

(23)

where P is the set of all features, S is the subset of features, Z is the set of all possible features S, j is a feature, and i is the index of the data instance.

3. Results and Discussion

3.1. Model Training Results

After the dataset was formed, Spearman’s correlation coefficients were analyzed for the gas transportation plan features (daily power consumption values by GCUs and ACUs and their sums: GCU_1, GCU_2, GCU_3, GCU, ACU_1, ACU_2, ACU_3, and ACU, respectively). Spearman’s correlation coefficients were calculated according to the following:

r = 1 - \frac{6 \sum d_{i}^{2}}{n \cdot (n^{2} - 1)},

(24)

where r is the Spearman’s correlation coefficient, d_i is the difference between two ranks of each feature, and n is the number of observations.

Spearman’s correlation coefficients for data on planned gas transportation are shown in Figure 9.

Removing values with Sperman’s correlation coefficients greater than 0.70 from the data set can increase generalization properties. Therefore, only total daily values of power consumption by GCUs and ACUs, as well as daily values of second ACU_2, were considered in the experiments.

Based on the initial data, the following features were chosen for model training:

Retrospective data on actual power consumption (power consumption (hour—h), where h—48, 60, 72, 84, and 96 are the hours before the forecast hour);
Calendar features (hour, day, day of week, and month);
Data on the gas transportation plan (the production process index and total daily power consumption by GCUs, ACUs, and ACU2);
Weather factors (air temperature, atmospheric pressure, humidity, and wind speed).

Experiments were conducted considering various features. In addition to comparing the training results of the models, the errors of the expert forecast were used as reference values: MAPE of 16.81% and MAE of 105 kWh.

In the first experiment, only retrospective values of power consumption

(y_{i - 48}, y_{i - 60}, y_{i - 72}, y_{i - 84}, y_{i - 96}, X)

and calendar features were considered. The experimental results are shown in Table 3.

LightGBM demonstrated the best results on the test dataset by the MAPE criterion at 12.38%. It indicates an increase in forecast accuracy in relation to the expert forecast of 4.43%. It should be noted that the R² for the best result is only 0.67. Consequently, it can be concluded that the LightGBM-based model provides explanations for a significant part of the power consumption of the compressor station dispersion. However, even though parameters substantially contribute to the model, the model makes errors when predicting individual hours of power consumption. In other words, the stochastic nature of power consumption cannot be accurately described when only retrospective power consumption and calendar parameters are considered. It is also important to note that underfitting is observed in the Ridge model training. The underfitting of the model is explained by the fact that the model fails to find a function that adequately describes the data. Overfitting is observed in the training of models based on other methods. Model overfitting is explained by the fact that the model fails to generalize new data: it accounts for the specifics of the training data rather than the underlying patterns. Moreover, the MAPE of the AdaBoost model exceeds the MAPE of the expert forecast.

In the second experiment, weather data were added to the features used in the first experiment. The training results are presented in Table 4.

LightGBM demonstrated the best results using the MAPE criterion of 11.99%. The addition of weather data does not lead to a significant improvement in results in comparison with the results of the first experiment. This can be explained by the fact that hourly weather data were obtained through linear interpolation, assuming that the parameter changes were linear, whereas, in reality, the parameters may change nonlinearly between measurements. In order to prevent this unfavorable outcome, a large dataset of weather data should be used to analyze changes in weather parameters.

In the third experiment, the data on the gas transportation plan (daily values of power consumption by GCUs, ACUs, and ACU_2) were added to the features used in the first experiment. The results are shown in Table 5.

The best results in terms of MAPE on the test set were demonstrated by LightGBM and XGBoost, with values of 3.20% and 3.44%, respectively. Comparing other metrics shows that XGBoost has the best result: these results have higher R² and lower values of MAE and RMSE. Hence, considering production process factors reduces MAPE by 13.37% compared to the expert forecast. The R² of 0.98 on the testing set indicates a strong relationship between input parameters and model results. In other words, the model explains 98% of the variance in the compressor station’s power consumption, with only 2% of the variance remaining unexplained by the model. This means that the model can make very accurate forecasts based on previously unknown data. Comparing the MAE of 10.24 kWh and RMSE of 17.74 kWh with the MAE and RMSE values in the first experiment, which were 37.86 kWh and 75.61 kWh, respectively, shows that considering production process factors reduces the forecast error relative to all analyzed metrics. At the same time, models based on random forest and LightGBM are characterized by good learning results (the difference between the mean absolute percentage error on the test set is less than 4% compared to the mean absolute percentage error on the training set). The results of models based on AdaBoost and Ridge also show improvement.

In the fourth experiment, all features were considered, except for the production process index. The results are presented in Table 6.

The best MAPE on the testing set was achieved by XGBoost and LightGBM, with values of 3.70% and 3.25%, respectively. The higher MAPE compared to the third experiment can be attributed to inaccurate consideration of weather data.

In the fifth experiment, the production process index was added to the features used in the first experiment. The results are presented in Table 7.

LightGBM yielded the best results. Consideration of the production process index without considering other production process factors reduced the MAPE by 5.81% compared to the expert forecast It is important to note that overfitting is observed in the training of models based on other methods.

The summary results of the experiments are shown in Table 8. The absolute and relative improvement in the forecast accuracy was estimated in relation to the MAPE of the expert forecast (16.81%).

Figure 10 and Figure 11 depict the power consumption graphs for the third experiment, which yielded the lowest error values, clearly illustrating the high accuracy of the forecast.

It should be noted that LightGBM demonstrated better results than other machine learning models for a small number of training features. This can be explained by two characteristics of the algorithm [54]:

Use of a leaf-wise tree growth instead of level-wise tree growth;
Bundling of exclusive features to reduce feature space dimensionality.

Thus, LightGBM can be efficient in the case of a small number of training features.

3.2. The Interpretation Examples

For the third experiment, which yielded the best results, the SHAP algorithm was applied to models based on XGBoost and LightGBM. The results of applying the SHAP algorithm show the significance (weights) of the parameters that influenced the forecast of power consumption for each hour. These weights can be interpreted by an expert and used to make expert adjustments. The features are arranged from the bottom of the graph in ascending order of influence on the predicted value of power consumption f(x) relative to the average value across of power consumption of the whole dataset E[f(X)]. The pink color indicates values that increase the power consumption forecast f(x) relative to E[f(X)], and the blue color indicates values that decrease f(x). Thus, by adding and subtracting the influence values of the factors from E[f(X)], the value of f(x) is obtained. The gray color next to the factor labels shows the values of the factors that influenced the forecast.

Features that influenced the power consumption forecast of the second hour 06 February 2020 are shown in Figure 12.

The decrease in power consumption forecast of the compressor station (f(x) = 665.51 kWh) in relation to the average value (E[f(X)] = 720.41 kWh) for the results of LightGBM is justified by

Low power consumption 60 h before the forecast hour (1494 kWh);
The absence of consumption by air-cooling unit 2.

The influence of high consumption factors by all GCUs (13,050 kWh) was compensated by relatively low consumption by all ACUs (7430 kWh) and forecast hour (02:00), which is characterized by statistically low power consumption.

For the results obtained using XGBoost, the decrease in power consumption of the second hour (f(x) = 634.65 kWh) in relation to the average value (E[f(X)] = 720.40 kWh) is justified by

The forecast hour (02:00), which is characterized by statistically low power consumption;
Low power consumption 48 h before the forecast hour (818 kWh).

The influence of high consumption factors by all GCUs (13,050 kWh) was compensated by relatively low consumption by all ACUs (7430 kWh) and the value of consumption 60 h before the forecast hour (1494 kWh).

Despite the difference in model construction, the same features had the largest weight in the decision-making process for both models.

The features that had influenced the forecast for midnight on 03 April 2020 are shown in Figure 13.

For the results obtained using LightGBM, the decrease in power consumption forecast (f(x) = 635.98 kWh) in relation to the average value is mainly explained by the forecast hour (00:00), which is characterized by statistically low power consumption. Other factors were compensated for by each other.

For the results obtained using XGBoost, the decrease in power consumption (f(x) = 608.30 кBт∙ч) in relation to the average value is explained by low power consumption 48 and 60 h before the forecast hour (818 kWh). Other factors were compensated for by each other.

Despite the differences in model construction, it can be noted that both models assigned the highest weight to the same features when making decisions.

The overall feature importance is presented in Table 9. It can be noticed that, despite XGBoost making decisions mainly based on the daily power consumption of ACU, other factors can have an influence on certain forecast hours. The same conclusion can be made for LightGBM. Thus, the usage of the SHAP algorithm improves the interpretability of results, resulting in greater trust by experts in training results for machine learning models.

4. Conclusions

The algorithm for improving the safe operation of gas industry enterprises and the energy efficiency of their production processes using a multi-agent approach was proposed in this article. This approach considers the interdependence of various production processes. The developed method was applied to solve the short-term power consumption forecasting problem for a compressor station. The research indicates that the accuracy of short-term power consumption forecasts can be enhanced through the modeling of production processes using a multi-agent approach.

The experiments were conducted using various ensemble models with different features on real three-year data of a compressor station. It was found that consideration of production process factors can increase forecasting accuracy by 13.37%. Specifically, MAE, obtained by XGBoost, was decreased from 105 kWh of expert forecast to 15.57 kWh. In comparison with the experiment, where only retrospective power consumption and production calendar were considered, R² increased from 0.67 to 0.98, MAE decreased from 61.94 kWh to 15.57 kWh, and RMSE decreased from 91.22 kWh to 22.39 kWh. It was also discovered that LightGBM provides better results for models built on a dataset with a small number of input features.

The SHAP algorithm was applied to enhance the interpretability of the results. The use of this algorithm can increase experts’ trust in the results provided by intelligent information systems. It should be noted that although SHAP facilitates the interpretation of machine learning results by visualizing parameters, this algorithm does not allow for complete explainability of the results. Therefore, the proposed method can only be used if an expert with competencies in enterprise consumption forecasting interacts with the explanation results.

Further research is planned to focus on the short-term forecasting of self-generation in oil and gas industry enterprises, scheduling of controllable load, scheduling of energy-storage systems, and experiments on data from other oil and gas industry enterprises. In addition, the developed method is planned to be applied to consider production processes for other optimization problems in power systems.

Author Contributions

Conceptualization, A.I.S., A.I.K. and P.V.M.; methodology, A.I.S. and P.V.M.; software, A.I.S.; validation, A.I.S. and P.V.M.; formal analysis, A.I.S. and A.I.K.; investigation, A.I.S., A.I.K. and P.V.M.; resources, S.A.E.; data curation, A.I.S. and S.A.E.; writing—original draft preparation, A.I.S.; writing—review and editing, A.I.S. and P.V.M.; visualization, A.I.S.; supervision, A.I.K. and P.V.M.; project administration, A.I.K. and S.A.E.; funding acquisition, A.I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was carried out within the state assignment with the financial support of the Ministry of Science and Higher Education of the Russian Federation (subject No. FEUZ-2022-0030, development of an intelligent multi-agent system for modeling deeply integrated technological systems in the power industry).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Midor, K.; Ivanova, T.N.; Molenda, M.; Biały, W.; Zakharov, O.V. Aspects of Energy Saving of Oil-Producing Enterprises. Energies 2022, 15, 259. [Google Scholar] [CrossRef]
Lee, E.; Kim, J.; Jang, D. Load Profile Segmentation for Effective Residential Demand Response Program: Method and Evidence from Korean Pilot Study. Energies 2020, 13, 1348. [Google Scholar] [CrossRef]
Li, K.; Yang, Z.; Li, D.; Xing, Y.Y.; Nai, W. A Short-Term Forecasting Approach for Regional Electricity Power Consumption by Considering Its Co-movement with Economic Indices. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 551–555. [Google Scholar]
Park, S.; Ryu, S.; Choi, Y.; Kim, J.; Kim, H. Data-Driven Baseline Estimation of Residential Buildings for Demand Response. Energies 2015, 8, 10239–10259. [Google Scholar] [CrossRef]
Faria, P.; Vale, Z. Demand Response in Smart Grids. Energies 2023, 16, 863. [Google Scholar] [CrossRef]
Schulz, J.; Leinmüller, D.; Misik, A.; Zaeh, M.F. Renewable On-Site Power Generation for Manufacturing Companies—Technologies, Modeling, and Dimensioning. Sustainability 2021, 13, 3898. [Google Scholar] [CrossRef]
Ben Mabrouk, A.; Chelbi, A.; Aguir, M.S.; Dellagi, S. Optimal Maintenance Policy for Equipment Submitted to Multi-Period Leasing as a Circular Business Model. Sustainability 2024, 16, 5238. [Google Scholar] [CrossRef]
Stepanova, A.I.; Matrenin, P.V. New Load Forecasting Ensemble Model based on LightGBM for Gas Industry Enterprises. In Proceedings of the 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), Yekaterinburg, Russian, 13–15 May 2024; pp. 71–75. [Google Scholar]
Rodrigues, F.; Cardeira, C.; Calado, J.M.F.; Melicio, R. Short-Term Load Forecasting of Electricity Demand for the Residential Sector Based on Modelling Techniques: A Systematic Review. Energies 2023, 16, 4098. [Google Scholar] [CrossRef]
Koukaras, P.; Mustapha, A.; Mystakidis, A.; Tjortjis, C. Optimizing Building Short-Term Load Forecasting: A Comparative Analysis of Machine Learning Models. Energies 2024, 17, 1450. [Google Scholar] [CrossRef]
Caro, E.; Juan, J.; Nouhitehrani, S. Optimal Selection of Weather Stations for Electric Load Forecasting. IEEE Access 2023, 11, 42981–42990. [Google Scholar] [CrossRef]
Chapagain, K.; Gurung, S.; Kulthanavit, P.; Kittipiyakul, S. Short-Term Electricity Demand Forecasting Using Deep Neural Networks: An Analysis for Thai Data. Appl. Syst. Innov. 2023, 6, 100. [Google Scholar] [CrossRef]
Stamatellos, G.; Stamatelos, T. Short-Term Load Forecasting of the Greek Electricity System. Appl. Sci. 2023, 13, 2719. [Google Scholar] [CrossRef]
Fan, J.; Liu, X.; Li, Z.; Wang, X.; Cao, S.; Lei, J. Power load forecasting research based on neural network and Holt-winters method. IOP Conf. Series Earth Environ. Sci. 2021, 692, 022120. [Google Scholar] [CrossRef]
Chodakowska, E.; Nazarko, J.; Nazarko, L. ARIMA Models in Electrical Load Forecasting and Their Robustness to Noise. Energies 2021, 14, 7952. [Google Scholar] [CrossRef]
Sharma, S.; Majumdar, A.; Elvira, V.; Chouzenoux, E. Blind Kalman Filtering for Short-Term Load Forecasting. IEEE Trans. Power Syst. 2020, 35, 4916–4919. [Google Scholar] [CrossRef]
Madhukumar, M.; Sebastian, A.; Liang, X.; Jamil, M.; Shabbir, M.N.S.K. Regression Model-Based Short-Term Load Forecasting for University Campus Load. IEEE Access 2022, 10, 8891–8905. [Google Scholar] [CrossRef]
Sergeev, N.; Matrenin, P. Enhancing Efficiency of Ensemble Machine Learning Models for Short-Term Load Forecasting through Feature Selection. In Proceeding of the 2022 IEEE 23th International Conference of Young Professionals in Electron Devices and Materials (EDM), Altai, Russian, 30 June–4 July 2022; pp. 368–371. [Google Scholar]
Habbak, H.; Mahmoud, M.; Metwally, K.; Fouda, M.M.; Ibrahem, M.I. Load Forecasting Techniques and Their Applications in Smart Grids. Energies 2023, 16, 1480. [Google Scholar] [CrossRef]
Akhtar, S.; Shahzad, S.; Zaheer, A.; Ullah, H.S.; Kilic, H.; Gono, R.; Jasiński, M.; Leonowicz, Z. Short-Term Load Forecasting Models: A Review of Challenges, Progress, and the Road Ahead. Energies 2023, 16, 4060. [Google Scholar] [CrossRef]
Slowik, M.; Urban, W. Machine Learning Short-Term Energy Consumption Forecasting for Microgrids in a Manufacturing Plant. Energies 2022, 15, 3382. [Google Scholar] [CrossRef]
Yu, F.; Wang, L.; Jiang, Q.; Yan, Q.; Qiao, S. Self-Attention-Based Short-Term Load Forecasting Considering Demand-Side Management. Energies 2022, 15, 4198. [Google Scholar] [CrossRef]
Ryu, S.; Noh, J.; Kim, H. Deep Neural Network Based Demand Side Short Term Load Forecasting. Energies 2017, 10, 3. [Google Scholar] [CrossRef]
Szczepaniuk, H.; Szczepaniuk, E.K. Applications of Artificial Intelligence Algorithms in the Energy Sector. Energies 2023, 16, 347. [Google Scholar] [CrossRef]
Shiwakoti, R.K.; Charoenlarpnopparut, C.; Chapagain, K. A Deep Learning Approach for Short-Term Electricity Demand Forecasting: Analysis of Thailand Data. Appl. Sci. 2024, 14, 3971. [Google Scholar] [CrossRef]
Gonzalez, R.; Ahmed, S.; Alamaniotis, M. Implementing Very-Short-Term Forecasting of Residential Load Demand Using a Deep Neural Network Architecture. Energies 2023, 16, 3636. [Google Scholar] [CrossRef]
Chen, J.; Liu, L.; Guo, K.; Liu, S.; He, D. Short-Term Electricity Load Forecasting Based on Improved Data Decomposition and Hybrid Deep-Learning Models. Appl. Sci. 2024, 14, 5966. [Google Scholar] [CrossRef]
Lee, D.; Kim, J.; Kim, S.; Kim, K. Comparison Analysis for Electricity Consumption Prediction of Multiple Campus Buildings Using Deep Recurrent Neural Networks. Energies 2023, 16, 8038. [Google Scholar] [CrossRef]
Kang, T.; Lim, D.Y.; Tayara, H.; Chong, K.T. Forecasting of Power Demands Using Deep Learning. Appl. Sci. 2020, 10, 7241. [Google Scholar] [CrossRef]
Son, N. Comparison of the Deep Learning Performance for Short-Term Power Load Forecasting. Sustainability 2021, 13, 12493. [Google Scholar] [CrossRef]
Klyuev, R.V.; Morgoeva, A.D.; Gavrina, O.A.; Bosikov, I.I.; Morgoev, I.D. Forecasting planned electricity consumption for the united power system using machine learning. J. Min. Inst. 2023, 261, 392–402. [Google Scholar]
Singh, N.K.; Nagahara, M. LightGBM-, SHAP-, and Correlation-Matrix-Heatmap-Based Approaches for Analyzing Household Energy Data: Towards Electricity Self-Sufficient Houses. Energies 2024, 17, 4518. [Google Scholar] [CrossRef]
Cui, Y.; Xu, Y.; Wang, Y.; Zhao, Y.; Zhu, H.; Cheng, D. Peer-to-peer energy trading with energy trading consistency in interconnected multi-energy microgrids: A multi-agent deep reinforcement learning approach. Int. J. Electr. Power Energy Syst. 2024, 156, 109753. [Google Scholar] [CrossRef]
Li, Y.; Tan, C. A survey of the consensus for multi-agent systems. Syst. Sci. Control Eng. 2019, 7, 468–482. [Google Scholar] [CrossRef]
Perez-Pons, M.-E.; Domingues, J.P.; Anzola-Rojas, C.; Barroso, R.J.D.; Miguel, I.; Queiroz, J.; Leitao, P. A Brief Review on Multi-Agent System Approaches and Methodologies. In Proceedings of the IV Workshop on Disruptive Information and Communication Technologies for Innovation and Digital Transformation, Online, 18 June 2021; pp. 35–47. [Google Scholar]
Bui, V.-H.; Hussain, A.; Kim, H.-M. Q-Learning-Based Operation Strategy for Community Battery Energy Storage System (CBESS) in Microgrid System. Energies 2019, 12, 1789. [Google Scholar] [CrossRef]
Zhou, H.; Erol-Kantarci, H. Correlated Deep Q-learning based Microgrid Energy Management. In Proceedings of the 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), Pisa, Italy, 14–16 September 2020; pp. 1–6. [Google Scholar]
Arwa, E.O.; Folly, K.A. Improved Q-learning for Energy Management in a Grid-tied PV Microgrid. SAIEE Afr. Res. J. 2021, 112, 77–88. [Google Scholar] [CrossRef]
Li, Q.; Lin, T.; Yu, Q.; Du, H.; Li, J.; Fu, X.; Li, Q. Review of Deep Reinforcement Learning and Its Application in Modern Renewable Power System Control. Energies 2023, 16, 4143. [Google Scholar] [CrossRef]
Yu, Q.; Wang, X.; Lv, D.; Qi, B.; Wei, Y.; Liu, L.; Zhang, P.; Zhu, W.; Zhang, W. Data Fusion and Situation Awareness for Smart Grid and Power Communication Network Based on Tensor Computing and Deep Reinforcement Learning. Electronics 2023, 12, 2606. [Google Scholar] [CrossRef]
Pavlov, N.V.; Petrochenkov, A.B. Multi-agent Approach to Modeling of Electrotechnical Complexes Elements at the Oil and Gas Production Enterprise. In Proceedings of the 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), St. Petersburg, Russia, 26–29 January 2021; pp. 1504–1508. [Google Scholar]
Hanga, K.M.; Kovalchuk, Y. Machine learning and multi-agent systems in oil and gas industry applications: A survey. Comput. Sci. Rev. 2019, 34, 100191. [Google Scholar] [CrossRef]
Xu, C.; Liao, Z.; Li, C.; Zhou, X.; Xie, R. Review on Interpretable Machine Learning in Smart Grid. Energies 2022, 15, 4427. [Google Scholar] [CrossRef]
Ahmed, I.; Jeon, G.; Piccialli, F. From Artificial Intelligence to Explainable Artificial Intelligence in Industry 4.0: A Survey on What, How, and Where. IEEE Trans. Ind. Inform. 2022, 18, 5031–5042. [Google Scholar] [CrossRef]
Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Why Should I Trust You? Explaining the Predictions of Any Classifier. 2016. Available online: https://arxiv.org/abs/1602.04938 (accessed on 30 June 2024).
Grzeszczyk, T.A.; Grzeszczyk, M.K. Justifying Short-Term Load Forecasts Obtained with the Use of Neural Models. Energies 2022, 15, 1852. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 4768–4777. [Google Scholar]
Matrenin, P.V.; Gamaley, V.V.; Khalyasmaa, A.I.; Stepanova, A.I. Solar Irradiance Forecasting with Natural Language Processing of Cloud Observations and Interpretation of Results with Modified Shapley Additive Explanations. Algorithms 2024, 17, 150. [Google Scholar] [CrossRef]
Maarif, M.R.; Saleh, A.R.; Habibi, M.; Fitriyani, N.L.; Syafrudin, M. Energy Usage Forecasting Model Based on Long Short-Term Memory (LSTM) and eXplainable Artificial Intelligence (XAI). Information 2023, 14, 265. [Google Scholar] [CrossRef]
Ensembles: Gradient Boosting, Random Forests, Bagging, Voting, Stacking. Available online: https://scikit-learn.org/stable/modules/ensemble.html (accessed on 30 June 2024).
XGBoost. Introduction to Boosted Trees. Available online: https://xgboost.readthedocs.io/en/stable/tutorials/model.html (accessed on 30 June 2024).
LightGBM. Available online: https://lightgbm.readthedocs.io/en/stable/ (accessed on 30 June 2024).
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, Y.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 1–9. [Google Scholar]

Figure 1. Pipeline of the proposed method.

Figure 2. The main elements of typical compressor station.

Figure 3. Actual power consumption of compressor station from December 2020 to January 2021.

Figure 4. The first information model used to generate “Schedule of power consumption forecast”, “Schedule of controllable load consumers”, and “Operating schedule of energy storage systems”.

Figure 5. The second information model used to generate “Operating schedule of self-generation”.

Figure 6. The agent decomposition of elements of compressor station.

Figure 7. Interaction of storage agent, generator agent, and consumer agent.

Figure 8. The algorithm of data preprocessing.

Figure 9. Spearman’s correlation coefficients.

Figure 10. Actual and forecast power consumption from December 2020 to January 2021.

Figure 11. Actual and forecast power consumption from September 2020 to October 2021.

Figure 12. Influence of the features on the forecast of power consumption of 2nd hour on 6 February 2020 obtained by (a) LightGBM; (b) XGBoost.

Figure 13. Influence of the features on the forecast of power consumption for midnight on 3 April 2020 obtained by (a) LightGBM; (b) XGBoost.

Table 1. Statistical characteristics of power consumption of compressor station.

Season/Year	Statistical Characteristics
Season/Year	Average Value, kWh	Standard Deviation, kWh	Min Value, kWh	Max Value, kWh
Seasonal consumption
Whole dataset	675.8	541.3	99.4	2837.0
Summer	270.4	155.6	99.4	1556.0
Autumn	983.9	604.9	125.3	2797.2
Winter	1307.5	453.6	458.5	2837.0
Spring	476.3	230.4	123.2	2017.8
Consumption by year for one season (autumn)
2019	430.2	108.1	266.0	737.0
2020	1400.2	493.6	207.2	2797.2
2021	511.7	288.7	125.3	1808.0

Table 2. Initial data description.

Parameter	Units	Time Sampling Step, h	Source
Hourly power consumption of entire compressor station	kWh	1	ACEMS
Daily power consumption of ACU 1	kWh	24	ACEMS
Daily power consumption of ACU 2	kWh	24	ACEMS
Daily power consumption of ACU 3	kWh	24	ACEMS
Total daily power consumption of ACUs	kWh	24	ACEMS
Daily power consumption of GCU 1	kWh	24	ACEMS
Daily power consumption of GCU 2	kWh	24	ACEMS
Daily power consumption of GCU 3	kWh	24	ACEMS
Total daily power consumption of GCUs	kWh	24	ACEMS
Production process index	-	1	Authors
Wind speed	m/s	3	rp5.ru
Temperature	°C	3	rp5.ru
Atmospheric pressure	mm Hg	3	rp5.ru
Humidity	%	3	rp5.ru

Table 3. Results of power consumption forecast for the compressor station in the first experiment.

Model	MAE, kWh		MAPE, %		RMSE, kWh		R²
Model	Train.	Val.	Train.	Val.	Train.	Val.	Train.	Val.
AdaBoost	31.35	53.75	8.46	19.94	39.46	86.75	1.00	0.57
XGBoost	53.45	41.59	10.12	13.58	79.35	79.25	0.98	0.64
Random Forest	43.79	40.59	7.93	12.90	66.58	81.71	0.99	0.61
LightGBM	61.94	37.86	11.77	12.38	91.22	75.61	0.97	0.67
Ridge	111.39	43.52	19.18	14.70	175.35	84.25	0.90	0.59

Table 4. Results of forecasting the power consumption of the compressor station in the second experiment.

Model	MAE, kWh		MAPE, %		RMSE, kWh		R²
Model	Train.	Val.	Train.	Val.	Train.	Val.	Train.	Val.
AdaBoost	29.26	50.51	7.94	17.91	36.47	85.28	1.00	0.58
XGBoost	71.57	51.84	13.21	17.88	108.91	87.79	0.96	0.56
Random Forest	40.73	42.46	7.43	13.35	62.08	83.85	0.99	0.59
LightGBM	48.68	38.61	9.50	11.99	70.29	78.06	0.98	0.65
Ridge	110.23	49.61	18.81	16.81	172.08	87.09	0.91	0.56

Table 5. Results of power consumption forecast for the compressor station in the third experiment.

Model	MAE, kWh		MAPE, %		RMSE, kWh		R²
Model	Train.	Val.	Train.	Val.	Train.	Val.	Train.	Val.
AdaBoost	14.75	23.40	4.22	8.98	18.35	36.84	1.00	0.92
XGBoost	15.52	10.24	2.66	3.44	22.57	17.74	1.00	0.98
Random Forest	14.31	17.93	2.42	5.60	22.02	36.75	1.00	0.92
LightGBM	11.58	11.19	2.05	3.20	16.54	26.30	1.00	0.96
Ridge	49.62	18.51	7.52	6.52	76.18	29.91	0.98	0.95

Table 6. Results of power consumption forecast of the compressor station in the fourth experiment.

Model	MAE, kWh		MAPE, %		RMSE, kWh		R²
Model	Train.	Val.	Train.	Val.	Train.	Val.	Train.	Val.
AdaBoost	14.73	23.35	4.21	9.10	18.31	36.04	1.00	0.93
XGBoost	15.57	10.43	2.71	3.70	22.38	17.12	1.00	0.98
Random Forest	14.35	17.69	2.41	5.52	22.04	36.67	1.00	0.92
LightGBM	11.53	11.31	2.50	3.25	16.39	25.88	1.00	0.96
Ridge	49.55	19.33	7.59	6.72	75.75	31.47	0.98	0.94

Table 7. Results of power consumption forecast of the compressor station in the fifth experiment.

Model	MAE, kWh		MAPE, %		RMSE, kWh		R²
Model	Train	Val	Train	Val	Train	Val	Train	Val
AdaBoost	32.47	45.48	7.69	15.45	40.45	75.87	1.00	0.67
XGBoost	54.49	42.11	10.43	14.62	80.59	69.50	0.98	0.72
Random Forest	40.52	40.41	7.61	14.11	60.65	74.70	0.99	0.68
LightGBM	60.42	35.02	10.52	11.00	89.70	67.04	0.97	0.74
Ridge	104.89	63.08	18.93	26.04	155.47	85.58	0.92	0.57

Table 8. Summary results of experiments on short-term forecasting of power consumption by a compressor station.

Experiment	Model	MAPE, %	Improvement of Forecast Results, %	Improvement of Forecast Results, p.u.
1	LightGBM	12.38	4.43	0.26
2	LightGBM	11.99	4.82	0.29
3	XGBoost	3.44	13.37	0.80
4	XGBoost	3.70	13.12	0.78
5	LightGBM	11.00	5.81	0.35

Table 9. Feature importance.

LightGBM		XGBoost
Feature	Feature Importance	Feature	Feature Importance
hour	0.1995	air-cooling unit	0.5068
air-cooling unit	0.1790	power consumption (hour—48)	0.3419
power consumption (hour—48)	0.1600	compressor unit	0.0437
compressor unit	0.1100	air-cooling unit 2	0.0316
power consumption (hour—60)	0.0985	power consumption (hour—60)	0.0265
power consumption (hour—96)	0.0610	power consumption (hour—96)	0.0155
air-cooling unit 2	0.0510	hour	0.0117
power consumption (hour—72)	0.0470	month	0.0087
power consumption (hour—84)	0.0440	day	0.0070
day	0.0325	power consumption (hour—84)	0.0037
month	0.0090	power consumption (hour—72)	0.0025
weekday	0.0085	weekday	0.0003

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stepanova, A.I.; Khalyasmaa, A.I.; Matrenin, P.V.; Eroshenko, S.A. Application of SHAP and Multi-Agent Approach for Short-Term Forecast of Power Consumption of Gas Industry Enterprises. Algorithms 2024, 17, 447. https://doi.org/10.3390/a17100447

AMA Style

Stepanova AI, Khalyasmaa AI, Matrenin PV, Eroshenko SA. Application of SHAP and Multi-Agent Approach for Short-Term Forecast of Power Consumption of Gas Industry Enterprises. Algorithms. 2024; 17(10):447. https://doi.org/10.3390/a17100447

Chicago/Turabian Style

Stepanova, Alina I., Alexandra I. Khalyasmaa, Pavel V. Matrenin, and Stanislav A. Eroshenko. 2024. "Application of SHAP and Multi-Agent Approach for Short-Term Forecast of Power Consumption of Gas Industry Enterprises" Algorithms 17, no. 10: 447. https://doi.org/10.3390/a17100447

APA Style

Stepanova, A. I., Khalyasmaa, A. I., Matrenin, P. V., & Eroshenko, S. A. (2024). Application of SHAP and Multi-Agent Approach for Short-Term Forecast of Power Consumption of Gas Industry Enterprises. Algorithms, 17(10), 447. https://doi.org/10.3390/a17100447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of SHAP and Multi-Agent Approach for Short-Term Forecast of Power Consumption of Gas Industry Enterprises

Abstract

1. Introduction

2. Materials and Methods

2.1. Gas Industry Enterprise under Consideration

2.2. Initial Dataset

2.3. Multi-Agent Approach

2.3.1. Defining the Objective Function of the Short-Term Power Consumption Forecasting System for the Compressor Station

2.3.2. Description of the Information Model for Short-Term Forecast of Power Consumption of the Compressor Station

2.3.3. Application of Multi-Agent Approach for Analyzing Production Processes of Compressor Station

2.4. Data Preprocessing

2.5. Machine Learning Methods

2.6. Interpretation of Model Output

3. Results and Discussion

3.1. Model Training Results

3.2. The Interpretation Examples

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI