Interpretable Machine Learning for Multi-Energy Supply Station Revenue Forecasting: A SHAP-Driven Framework to Accelerate Urban Carbon Neutrality

Zhao, Zhihui; Wang, Minjuan; Wei, Jin; Cen, Xiao; Du, Shengnan; Wu, Ziwen; Liu, Huanying; Wang, Weiqiang

doi:10.3390/en18071624

Open AccessArticle

Interpretable Machine Learning for Multi-Energy Supply Station Revenue Forecasting: A SHAP-Driven Framework to Accelerate Urban Carbon Neutrality

by

Zhihui Zhao

¹,

Minjuan Wang

^1,*,

Jin Wei

¹,

Xiao Cen

¹,

Shengnan Du

²,

Ziwen Wu

³,

Huanying Liu

⁴ and

Weiqiang Wang

^1,*

¹

National & Local Joint Engineering Research Center of Harbor Oil & Gas Storage and Transportation Technology/Zhejiang Key Laboratory of Pollution Control for Port-Petrochemical Industry, School of Petrochemical Engineering & Environment, Zhejiang Ocean University, Zhoushan 316022, China

²

College of Mechanical and Automotive Engineering, Ningbo University of Technology, Ningbo 315211, China

³

National Engineering Laboratory for Pipeline Safety/MOE Key Laboratory of Petroleum Engineering/Beijing Key Laboratory of Urban Oil and Gas Distribution Technology, China University of Petroleum-Beijing, Fuxue Road No. 18, Changping District, Beijing 102249, China

⁴

Sinopec Sales Co., Ltd. Zhejiang Hangzhou Petroleum Branch, Hangzhou 310013, China

^*

Authors to whom correspondence should be addressed.

Energies 2025, 18(7), 1624; https://doi.org/10.3390/en18071624

Submission received: 19 February 2025 / Revised: 18 March 2025 / Accepted: 21 March 2025 / Published: 24 March 2025

(This article belongs to the Section H: Geo-Energy)

Download

Browse Figures

Versions Notes

Abstract

:

The transition towards carbon neutrality and sustainable urban development necessitates innovative strategies for managing multi-energy supply stations (MESS), which integrate oil, gas, electricity, and hydrogen to support diversified energy demands. Existing revenue prediction models for MESS lack interpretability and multi-energy adaptability, hindering actionable insights for sustainable operations. This study proposes a novel Shapley additive explanations (SHAP)-driven machine learning framework for multi-energy supply station revenue forecasting. By leveraging real-world consumption data from Hangzhou West Lake Tanghe Station, we constructed a dataset with nine critical parameters, including energy types, transaction frequency, and temporal features. Four machine learning models—decision tree regression, random forest (RF), support vector regression, and multilayer perceptron—were evaluated using MAE, MSE, and R² metrics. The RF model achieved an R² of 0.98, demonstrating superior accuracy in predicting hourly gross transaction values. SHAP analysis further identified consumption volume and transaction frequency as the most influential factors, providing actionable insights for operational optimization. This research not only advances the scientific management of MESS but also contributes to carbon emission reduction by enabling data-driven resource allocation. The proposed framework offers policymakers and industry stakeholders a scalable tool to accelerate urban energy transitions under carbon neutrality goals, bridging the gap between predictive analytics and sustainable infrastructure planning.

Keywords:

multi-energy supply station; revenue forecasting; data-driven model; SHAP algorithm; interpretability

1. Introduction

1.1. Background

The integration of Multi-energy Supply Stations (MESS) [1] combining oil, gas, electricity and hydrogen, has become a cornerstone of modern urban energy systems, particularly in the context of smart cities and energy transition [2,3]. These stations play a pivotal role in meeting the diverse energy demands of electric vehicles (EVs), hydrogen fuel cell vehicles (HFCVs), and conventional vehicles, while also addressing the challenges of fluctuating energy prices and dynamic consumer behaviors [4]. However, the operational complexity of MESS, particularly in revenue forecasting and resource allocation, remains a significant barrier to their efficient management [5]. Accurate prediction of transaction values is crucial for optimizing energy supply chains, enhancing operational efficiency and supporting the development of smart cities [6,7].

According to the review by Hong et al. [8] on global energy regulations, many countries are formulating policies to reduce the role of fossil fuels in their energy systems. These policies include carbon pricing, subsidy reform, and bans on new fossil fuel exploration, development, and infrastructure. Carbon taxes and environmental taxes are also used, working with strict regulations to lower energy intensity and boost efficiency [9,10]. China’s ’Energy Law’ and ’Energy & Infrastructure Legal Outlook 2025’ stress legislative pushes for energy diversification, efficiency, phasing out fossil fuels, and advancing new energy to cut greenhouse gas emissions. Under this legal framework, Multi-energy Supply Stations are in the spotlight. The revenue model of the Multi-energy Supply Station studied here can optimize resource allocation, support energy policies, and contribute to the energy transition.

To address these challenges, this study proposes a novel data-driven framework that leverages machine learning models and interpretable AI techniques for revenue prediction in MESS. By integrating real-world consumption data from Hangzhou West Lake Tanghe Station, we aim to provide actionable insights for urban planners and energy providers, enabling them to optimize resource allocation and improve operational efficiency. Our approach not only advances the scientific understanding of revenue forecasting in MESS but also contributes to the broader goals of energy transition and smart city development.

1.2. Related Works

The revenue prediction of multi-energy supply stations should consider consumers, products, and the stations themselves. From the consumer perspective, the amount spent per transaction, the frequency of consumption, and the time interval between each purchase are all factors to be considered in revenue forecasting. The corresponding influencing factors are largely demographic, including the client’s age, income level, education level, and vehicle type [11]. Customers with different income levels exhibit significant differences in the frequency and amount of their purchases. The products can be categorized into type, price, and market sales performance. Fluctuations in the prices of various energy sources and trends in the energy market significantly impact the revenue of energy supply stations [12]. As for the supply stations themselves, policies such as preferential measures and value-added services must be viewed in revenue forecasting. The geographical location of the supply station and the preferential policies for local members also influence consumers’ purchasing psychology. Finally, days and weather are weighty factors. Holidays, weekdays, sunny days, and rainy days affect people’s travel [13], impacting energy supply stations’ revenue. Various forecasting methods can be constructed based on analyzing these influencing factors.

Revenue forecasting methods can be subdivided into traditional machine learning (ML) methods and prediction models incorporating various optimization techniques on top of existing machine learning methods. The first category involves using traditional machine learning models for conventional forecasting. Yang et al. [14] analyzed daily refined oil sales records and related auxiliary information at gas stations. They utilized back propagation (BP) for short-term forecasting at actual gas stations. However, this approach needs a large amount of data and involves complex modeling. Hao [15] used various forecasting methods, such as the proportion coefficient method, grey models (GM), linear regression (LR), and BP, to forecast the fueling volume at airports. However, this study only predicted the fueling volume and did not address revenue forecasting. Zhang et al. [16] used decision tree regression (DTR), random forest (RF), and gradient boosting decision tree (GBDT) to forecast gas station sales. By leveraging accumulated historical sales data and related feature data, these models provide high-precision predictions, meeting the tangible needs of enterprises. However, it only considers a single type of oil. Deng et al. [17], based on the Light Gradient Boosting Machine (LightGBM) framework, conducted feature engineering and extracted and classified features to achieve sales prediction. The results showed that the accuracy was superior to Support Vector Regression (SVR) and LR without performing feature processing. Cui et al. [18] proposed an iterative model combining the eXtreme Gradient Boosting (XGBoost) and LightGBM, processed the features and conducted iterative training, which improved the performance of the used car price prediction model without performing algorithm optimization. Kumar et al. [19] selected features using the artificial bee colony algorithm and employed models such as AdaBoost, RF, decision tree (DT), and Support Vector Machine (SVM) to predict user repurchase. After comparison, it was found that AdaBoost had the highest accuracy.

The second category involves improvements and optimizations built on traditional algorithms. Wu et al. [20] used an improved RFM model to extract user features. They employed the K-means clustering algorithm to achieve user classification, supporting the platform in implementing precise marketing strategies based on big data. Chen et al. [21] proposed a new framework called task-discriminant component analysis (TDCA), which uses dual attention and multi-task recurrent neural network (RNN) for trend calibration in revenue forecasting but does not optimize the algorithm. Zhang and Wang [22] proposed an improved deep forest (DF) model for predicting user repurchase behavior. Compared with other models such as LR, RF, convolutional neural networks models (CNN), and k-nearest neighbor (KNN), the improved DF demonstrates better performance in user repurchase behavior prediction. However, the model does not undergo an analysis of its interpretability. Liu et al. [23] used grey relational analysis (GRA) to determine the correlation between influencing factors and electric vehicle sales, proposing a discrete wavelet transformation (DWT) and long short-term memory (LSTM) model combined with the K-means algorithm to verify the regional differences in implementation effects under supply-side and demand-side scenarios, but did not achieve feature selection. Li et al. [24] considered market coupling and proposed three deep learning frameworks based on LSTM for electricity price forecasting. They comprehensively compared feature selection algorithms and used the Shapley additive explanations (SHAP) values for model interpretability analysis and feature importance assessment. The related research results are summarized in Table 1.

The continuous optimization and improvement of the algorithms have resulted in enhanced accuracy and flexibility of the prediction model for oil sales and an improvement in the selection of the influencing factors of the gas station, which have provided a superior marketing plan for the gas station. Previous studies mostly focused on single oil products, which cannot fully reflect the revenue status of energy supply stations or meet the needs of modern integrated energy stations. They also faced limitations in feature selection and model interpretability, with insufficient analysis of feature importance, hindering model transparency and credibility. Therefore, this study examines multiple energy types to better support the operation and management of energy supply stations and employs methods such as SHAP analysis to delve into feature significance.

1.3. Contributions of This Work

Therefore, addressing the abovementioned problems, this paper studies the overall aspects of gas stations, considering fueling, non-energy, and electricity categories. The main contributions are as follows:

(1): A data-driven framework for sustainable revenue forecasting is proposed. It integrates transaction data from multiple energy sources, including petroleum, natural gas, electricity, and hydrogen. Using four machine learning algorithms—DTR, RF, SVR, and MLP—the framework predicts hourly total transaction value with high accuracy (R² = 0.98), providing real-time decision support for resource allocation.
(2): The SHAP algorithm is applied to identify key drivers of revenue fluctuations (such as consumption volume and transaction frequency) and quantify their impacts. This provides strategies for reducing energy waste and carbon emissions while enhancing model interpretability.
(3): Policy and Practical Implications: Our findings offer scalable solutions for urban planners and energy providers to design efficient MESS networks, directly supporting the implementation of carbon neutrality policies in smart cities.

1.4. Organization

The structure of this paper is as follows: Section 2 introduces the research framework and methods used in this paper; Section 3 describes the application of the methods, and contains the correlation analysis of features, the setup and prediction effects of each model, and the interpretability analysis of the models; and Section 4 presents the conclusions and significance of the paper.

2. Methodology

For the prediction of the GTV per hour at multi-energy supply stations, this paper is based on the consumption frequency at the Hangzhou West Lake Tanghe Station in Zhejiang Province from February to June 2022 for feature extraction. Features include fueling, electricity, and non-energy aspects, as well as constructing the relevant parameters for the GTV per hour. Then, DTR, RF, SVR, and MLP are used to predict the hourly transaction total of the multi-energy supply station. The prediction effects of the above models are comprehensively compared using the constructed evaluation indicators. Finally, the SHAP algorithm is used to explain and analyze the models with better performance, predicting the daily GTV of the multi-energy supply station, which provides strong support for the operational decision-making of the multi-energy supply station, aiding the scientific management and efficient development of the energy supply station in the energy transition process. The research process of this paper is shown in Figure 1.

2.1. Data Preprocessing and Feature Engineering

2.1.1. Decision Tree Regression (DTR)

The DT algorithm makes decisions based on a tree-like structure, which is a predictive model in supervised learning [25,26]. Each internal node represents a test on an attribute, the branches are the outcomes of the test, and the leaf nodes represent categories or values [1]. By selecting the best features to split the dataset, the DT aims to maximize the purity gain of the target variable in the resulting subsets, quantifying the relative importance of influencing factors [27]. The principle of the DT is illustrated in Figure 2. When predicting the GTV, the DT can be used to progressively segment the data, thereby analyzing the impact of each feature value on the GTV. Decision trees are adept at capturing the nonlinear relationships between features and target variables, whereas the relationship between revenue and various factors may not be simply linear. The DT allows energy supply stations to adjust their revenue strategies accordingly.

2.1.2. Random Forest (RF)

RF is a technique that combines the Bootstrap aggregation algorithm with DT, integrating multiple classifiers into a single ensemble [28,29]. It uses bootstrapping to draw multiple subsamples with replacements from the original training data, constructs DT for each subsample, and aggregates the results of various trees for the final prediction. It has been widely applied in classification and prediction tasks and has also demonstrated extensive use in regression tasks, showing excellent capabilities in handling complex and nonlinear datasets [30]. The RF model can score the feature variables used in the model, characterizing the correlation of each dependent variable in the prediction results [31], and performs excellently in both classification and regression tasks, with the ability to determine the importance of each variable. Figure 3 illustrates the principle of RF. For this study, the Random Forest can assess the importance of each feature for the output, thereby identifying which factors have a greater impact on revenue. This provides a basis for selecting and retaining the features that are most conducive to accurate prediction.

2.1.3. Support Vector Regression (SVR)

SVR is an analytical technique that applies SVM to regression problems, offering advantages such as high predictive accuracy and robustness [32,33]. Its fundamental concept is based on mathematical theory and statistics, utilizing a nonlinear mapping function to project feature data into a high-dimensional space, thereby linearizing the feature data [34]. SVR aims to minimize structural risk under certain constraints, which means finding appropriate values for w and b to reduce the error on the training data while keeping the model complexity low. SVR evolves from SVM by introducing an insensitive loss function. This function is defined as follows:

L_{\in} (y, f (x)) = 0, |y - f (x)| \leq \in

(1)

L_{\in} (y, f (x)) = |y - f (x)| - \in, |y - f (x)| > \in

(2)

The meaning is that when the error between the predicted value

f (x) = w^{T} x + b

and the actual value y is within the range of

\in

, the loss is considered 0; only when the error exceeds is

\in

the loss of the excess part calculated.

The objective function of SVR can be expressed as follows:

\min_{w, b} \frac{1}{2} {‖w‖}^{2} + C \sum_{i = 1}^{n} L_{\in} (y_{i}, w^{T} x_{i} + b)

(3)

\frac{1}{2} {‖w‖}^{2}

is the regularization term, which controls the complexity of the model to prevent overfitting;

\in

is the introduced insensitive loss function; and C is the penalty parameter [1] used to balance the regularization term and empirical risk (i.e., the loss function term). A more considerable C value means the model emphasizes penalizing errors, tending to reduce the error on the training data; a smaller C value means the model focuses more on reducing complexity and may tolerate a certain degree of error.

For this study, the advantage of this method lies in its ability to both tolerate a certain degree of error to enhance model robustness and use kernel functions to map data into a higher-dimensional space, thereby addressing nonlinear relationships and accommodating the complex relationship between revenue and features.

2.1.4. Multilayer Perceptron (MLP)

MLP is a feedforward neural network consisting of an input layer, multiple hidden layers, and an output layer. It is commonly used for prediction and modelling [35]. It performs nonlinear transformations on input data through connections between neurons and activation functions, thereby learning complex patterns and relationships within the data. MLP has strong nonlinear fitting capabilities, handling complex functional relationships and data distributions. It is well suited for large-scale and high-dimensional data, effectively extracting feature information.

z = \sum_{i = 1}^{n} w_{i} x_{i} + b

(4)

A single neuron receives multiple inputs, each of

x_{i}

which corresponds to a weight

w_{i}

. The neuron multiplies the inputs by their weights, sums them up, and adds a bias b.

To give the neuron the ability to perform nonlinear transformations, an activation function f is applied to z, resulting in the output y = f(z). Modeling complex data patterns through linear functions can effectively use them for classification, regression, and pattern recognition [36]. Common activation functions include the following:

Sigmoid function

f (z) = \frac{1}{1 + e^{- z}}

, map the output to the interval (0, 1).

Tanh function

f (z) = \tanh (z) = \frac{e^{z} - e^{- z}}{e^{z} + e^{- z}}

, the output values are between (−1, 1).

ReLU function

f (x) = \max (0, z)

, when z > 0, output z; when z ≤ 0, output 0.

To address the substantial data from the energy supply station between February and June, a Multilayer Perceptron (MLP) was introduced to tackle or mitigate the nonlinear effects in forecasting. Through supervised learning, the MLP can automatically learn the most useful feature representations for predicting revenue, thereby simplifying the work of data preprocessing and feature selection to some extent.

2.2. Evaluating Index

Evaluating the model is essential when making predictions based on machine learning models. To enhance the precision of the model’s predictions, this paper selects six regression metrics to assess the model’s predictive performance, including MAE, MSE, RMSE, MAPE, R², and EV. The performance of the predictive model is evaluated using the following six types of metrics:

(1): Mean Squared Error (MSE)

MSE = \frac{1}{n} \sum_{i - 1}^{n} (y_{i} - \overset{\land}{y_{i}})^{2}

(5)

MSE is the average of the sum of the squares of the differences between predicted values and true values. It significantly amplifies errors, highlighting the impact of larger discrepancies. The greater the deviation of the model’s predicted values from the true values, the larger the MSE, which very intuitively reflects the model’s prediction error.

(2): Root Mean Squared Error (RMSE)

RMSE = \sqrt{\frac{1}{n} \sum_{i - 1}^{n} (y_{i} - \overset{\land}{y_{i}})^{2}}

(6)

RMSE is the square root of the mean squared error. Compared to other metrics, RMSE brings the error back to the scale of the original data, making the size of the error more meaningful and better reflecting the degree of deviation in the model’s prediction error.

(3): Mean Absolute Error (MAE)

MAE = \frac{1}{n} \sum_{i - 1}^{n} |y_{i} - \overset{\land}{y_{i}}|

(7)

MAE is the average of the absolute values of the differences between predicted values and true values. It intuitively reflects the average size of the error between predicted and true values, and it is less sensitive to outliers compared to MSE and RMSE.

(4): Mean Absolute Percentage Error (MAPE)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - \overset{\land}{y_{i}}}{y_{i}}| \times 100 %

(8)

MAPE is a statistical indicator that measures the error between predicted and actual values, expressed as a percentage. It intuitively reflects the degree to which predicted values deviate from actual values. Being expressed as a percentage, it is not affected by the scale of the data, allowing for direct comparison of prediction errors across different datasets or variables.

(5): R-squared (R²)

R^{2} = 1 - \frac{\sum_{i - 1}^{n} (y_{i} - \overset{\land}{y_{i}})^{2}}{\sum_{i - 1}^{n} (y_{i} - \frac{1}{n} \sum_{i}^{n} \bar{y_{i}})}

(9)

R² considers the overall trend of the data and can measure the model’s ability to explain the variability of the data. It intuitively reflects the model’s goodness of fit. The value of R² ranges from 0 to 1, and the closer the R² value is to 1, the better the model’s predictive performance. It is one of the crucial indicators for assessing the precision of the model.

(6): Explanatory variance (EV)

EV = 1 - \frac{var (y_{i} - \overset{\land}{y_{i}})}{var (y_{i})}

(10)

EV evaluates the precision of the model through variance calculation, where Var refers to the variance of the target. The value of explained variance ranges from 0 to 1. The closer it is to 1, the higher the proportion of the dependent variable’s variance that the regression model can explain, indicating a better model fit to the data. Additionally, EV can be used to compare the explanatory power of regression models on the same dataset.

In the above formulas, n is the number of samples,

y_{i}

is the actual value of the i-th sample,

\overset{\land}{y_{i}}

is the predicted value of the i-th sample, and

\bar{y}

is the average of the n actual values. This paper primarily selects the six regression metrics above to evaluate the model’s predictive ability. The larger the values of EV and R², and the smaller the values of MAE, MAPE, MSE, and RMSE, indicate better predictive performance of the model.

2.3. SHAP Implementation

As discussed above, this paper constructs data-driven models using four machine learning algorithms: DT, RF, SVR, and MLP. To better explain the reasons behind model predictions and enhance the interpretability of the models, this paper employs the SHAP algorithm for interpretability analysis. This helps in understanding the reasons for model predictions and strengthens the explainability of the prediction results.

The SHAP algorithm originates from game theory and is now used to assess the overall feature importance in ML models with optimal predictive performance [37]. It assigns a specific contribution value to each feature of the machine-learning model. The final prediction of the model is obtained by calculating these specific contribution values and summing all the feature values. These contribution values are known as SHAP values. By comparing the sizes of SHAP values for different features, one can intuitively understand the importance of each feature to the model’s prediction results. The larger the SHAP value, the greater the influence of that feature on the model’s prediction, and thus, the more critical it is.

The SHAP formula is as follows:

ϕ_{i} = \sum_{S \subseteq F \ {i}} \frac{| S |! (| F | - | S | - 1)!}{| F |!} [v (S \cup \{i\} - v (S)]

(11)

For a model with n features, the SHAP value

ϕ_{i}

of a particular feature i is the average marginal contribution of that feature across all possible combinations of features. To calculate this, consider all possible subsets of features, and for each subset S, compute the difference in the model’s prediction between the subset

S \cup \{i\}

that includes feature I and the subset S that does not include feature i. This difference represents the marginal contribution of feature i. Perform this calculation for all possible subsets and take a weighted average based on the number of subsets to obtain the SHAP value of feature i. F is the set of all features, |S| is the size of subset S, and v(S) is the model’s prediction for the feature subset S.

The SHAP values for each sample are calculated as follows:

y_{i} = y_{b a s e} + f (x_{i 1}) + f (x_{i 2}) + \dots + f (x_{i n})

(12)

y_{b a s e}

represents the baseline of the model, which is the average value of the target variable across all samples; n represents the number of input features;

x_{i}

represents the i-th sample;

x_{i n}

represents the n-th feature of the i-th sample; and

f (x_{i n})

represents the SHAP value of the n-th feature of the i-th sample.

The SHAP algorithm provides detailed insights into the impact and magnitude of each feature on the prediction [38]. The SHAP values of each feature for each sample contribute differently to the predicted value [39], with some being positive and others negative. A positive value indicates a positive contribution to the expected value, while a negative value indicates a negative contribution. By adding the average value of the sample’s target variable to the SHAP values of each feature, one can obtain the SHAP value for that sample. Finally, after performing a weighted average, the approximate output value of the model can be obtained.

3. Method Application

3.1. Data Description

The research object of this paper is the consumption frequency at Hangzhou West Lake Tanghe Station in Zhejiang Province from February to June 2022. To enhance the representativeness of the study, future work will consider expanding the data set to include more stations and longer time periods. Based on the data from February to June 2022, which includes 7735 records, the paper aims to predict the GTV per hour for a future energy supply station.

There are 27 kinds of basic information in the consumption records of energy supply stations. Four kinds of information, including order number, member number, member name, and fuel card number, are unique identifiers. The other 23 kinds of information have been statistically analyzed and include parent company invoicing capability, parent company invoicing amount, subsidiary invoicing capability, subsidiary invoicing amount, refund status, card type, energy type, main card number, license plate number, license plate color, vehicle type, product (type of energy), station name, consumption volume, energy name, amount payable, actual payment, discount amount, total amount payable, total amount payable, balance, contact person, and consumption time.

Figure 4 shows that most parent companies are not invoiceable, while the number of parent companies that have been invoiced or are invoiced is relatively tiny. The invoice amounts for parent companies are generally low, and as the amount increases, the number of companies significantly decreases, presenting a right-skewed distribution (favorable skewness distribution). This means that most data are concentrated at lower values, while higher values are rare. In contrast, most subsidiaries have invoicing capabilities and higher invoicing amounts. This distribution may be relevant for understanding the financial status of companies and predicting whether they will continue to consume at this energy supply station. The distribution of discounts, actual payments, and payable amounts is also significant. Discount amounts are generally low, while the distributions of actual payment and payable amounts are similar, with most concentrated at lower amount ranges. As the amount increases, the quantity significantly decreases. This distribution may be crucial for understanding consumer payment behavior and the effectiveness of discount activities.

Regarding the distribution of energy types and names, fueling holds an absolute advantage in energy types, and 92-octane gasoline (Grade VI) is the most common energy name. This distribution may be necessary for understanding energy consumption patterns and market demands. Most transactions have relatively low consumption volumes and GTV; as the amount increases, the number of transactions significantly decreases. This distribution may be significant for understanding consumer spending habits and the revenue structure of the energy supply station.

3.2. Correlation Analysis

Based on the consumption frequency of its energy supply station, this paper conducted an in-depth analysis of data from Tanghe Station in Hangzhou’s West Lake District, Zhejiang Province, covering the period from February to June. A scientifically rigorous data collection method was employed to ensure the accuracy and representativeness of the data, providing an ideal data source for training machine learning models. For parameter selection in revenue forecasting, we referred to the cases in references [40,41,42]. Researchers selected time, consumer value, and consumer types as key factors for revenue forecasting. The raw data were carefully filtered and processed through meticulous data cleaning. During this process, we screened the 27 original features, removing unnecessary ones and adding important features relevant to revenue forecasting. Since data were not recorded for every hour, we used data imputation methods to fill in missing values. Specific techniques included mean imputation, median imputation, and machine-learning-based K-Nearest Neighbors (KNN) imputation to ensure data completeness. Additionally, we optimized data quality through key steps such as identifying and replacing outliers, standardizing the data, and ensuring consistency and reliability. Ultimately, the dataset was reduced to 7735 data points. These 7735 data points, both abundant in quantity and superior in quality, laid a solid foundation for model training and prediction. From the original 27 features, redundant and irrelevant features were simplified through feature selection and extraction, ultimately obtaining nine parameters for each sample data point, as shown in Table 2. The parameter this paper aims to predict is the hourly GTV (Gross Transaction Value), which refers to the total amount generated by the energy supply station.

Data correlation analysis can help us observe the relationships between any two sets of data, aiding in understanding the impact of different parameters on the GTV. The Pearson correlation coefficient (PCC) is chosen to assess the rationality of the parameters, as it demonstrates the strength and direction of linear relationships between variables in the dataset. By the intensity of the colors, it can be quickly identified which variables have strong correlations (either positive or negative) and which have weak or no correlations. The results are shown in Figure 5.

The PCC heatmap displays the linear correlation between different variables. The value of the PCC ranges from −1 to 1. A value of 1 indicates a perfect positive correlation, meaning that as one variable increases, the other also increases. A value of −1 indicates a perfect negative correlation, meaning that as one variable increases, the other variable decreases. A value of 0 indicates no linear correlation. The colors in the heatmap represent the magnitude of the correlation coefficients, with red typically indicating positive correlation, blue indicating negative correlation, and the deeper the color, the stronger the correlation. Gray may indicate weak correlation or missing data. From Figure 5, it can be observed that the values on the diagonal are all 1, as the correlation of each variable with itself is a perfect positive correlation. There is a positive correlation between consumption volume, number of transactions, hourly consumption time, personal vehicle data volume, non-personal vehicle data volume, and whether it is a weekend with the GTV. At the same time, there is a negative correlation between whether it is a holiday and the season with the GTV. This indicates a certain degree of correlation between the parameters selected in this study, and none of the parameters have an excessively high correlation, thus avoiding adverse effects on the model training weights. Therefore, the selection of input parameters in this paper is reasonable and meets the requirements for input parameters of machine learning models.

3.3. Model Parameter Settings

This paper divides the energy supply station dataset into a training set and a test set at a ratio of 8:2. The data hybrid model is ultimately adjusted in structure by combining the five-fold cross-validation method, grid search method, and R² metric. The hyperparameters are tuned using the Gerchberg-Saxton (GS) algorithm, with the search range and final parameters shown in Table 3.

3.4. Model Performance Analysis

In this paper, the predictive accuracy of the model is comprehensively evaluated and verified using various indicators, including the magnitude of error (MAE, MSE, RMSE), the relative size of error (MAPE), and the model’s explanatory power (R², EV). Based on these indicators, hyperparameters are adjusted to optimize model performance, allowing for an intuitive comparison of the predictive performance of each indicator on the test set. During the training set phase, the performance of prediction models based on single algorithms is compared. The model evaluation indicators MAE, MSE, RMSE, MAPE, R², and EV are obtained through the five-fold cross-validation method, with the relevant indicator values shown in Figure 6.

Figure 6 indicates the following:

(1): The SVR model has the highest MAE value, indicating that its average absolute prediction error is the largest; the RF and MLP models have relatively lower MAE values, suggesting that these two models have more minor average absolute prediction errors, i.e., they are more accurate in predicting the hourly GTV.
(2): The SVR model’s MSE value is significantly higher than that of other models, which indicates that its average squared prediction errors are more considerable, and there may be significant prediction bias or outliers; the RF model has the lowest MSE value, indicating that its average squared prediction errors are the smallest.
(3): The SVR model’s RMSE value is the highest, consistent with the MSE analysis, indicating that its prediction error is larger; the RF model’s RMSE value is the lowest, indicating that its root mean square prediction error is the smallest.
(4): The SVR model has the highest MAPE value, which means that its prediction error as a percentage of the actual value is the largest; the DTR model has the lowest MAPE value, indicating that its prediction error as a percentage of the actual value is the smallest.
(5): When the R² values of the DTR and RF models are close, these two models can explain the variability in the data well. As shown in Figure 5, the R² values of the DTR and RF models are close to 1, which indicates that these two models have high accuracy in predicting the GTV and can capture most of the variability in the data; in contrast, the SVR model has the lowest R² value, indicating that its ability to explain data variability is relatively weak, and its prediction accuracy is relatively poor.
(6): The EV values of the DTR and RF models are close to 1, consistent with the analysis of R², indicating that these two models have strong explanatory power; the SVR model has the lowest EV value, suggesting that its explanatory power is relatively weak.

Overall, the RF model performs well in all indicators, especially in error indicators (MAE, MSE, RMSE). The DTR model performs well in R² and EV indicators but is slightly inferior to RF in error indicators. The performance of the MLP model is between RF and SVR. The SVR model performs worse than other models in all indicators. According to the horizontal comparison in the figure, RF has the strongest predictive performance for the GTV.

When evaluating the models on the test set, we compared the performance of prediction models established using individual algorithms. Based on the original dataset, the energy supply station dataset was split into training and test sets at a ratio of 8:2. Combining five-fold cross-validation [43] and grid search methods, we obtained model evaluation metrics such as MAE, MSE, RMSE, MAPE, R², and EV with the corresponding values shown in Figure 7.

Figure 7 indicates the following:

(1): The DTR model has the highest MAE value, indicating the largest average absolute prediction error. In contrast, the RF and MLP models have relatively lower MAE values, suggesting that these two models have smaller average absolute errors and thus provide more accurate predictions of the total transaction amount per hour.
(2): The MSE value of the DTR model is significantly higher than that of the other models, implying a larger average of squared prediction errors. This may indicate substantial prediction bias or the presence of outliers. The RF model, however, has the lowest MSE value, meaning it has the smallest average of squared prediction errors.
(3): The RMSE value of the DTR model is the highest, consistent with the MSE analysis, indicating larger prediction errors. The RF model has the lowest RMSE value, showing the smallest root mean square error.
(4): The DTR model has the highest MAPE value, meaning its prediction errors as a percentage of actual values are the largest. The RF model has the lowest MAPE value, indicating the smallest percentage errors relative to actual values.
(5): When the R² values of the RF and MLP models are close, it suggests that both models can effectively explain the variability in the data. As shown in Figure 7, the R² values of the RF and MLP models are close to 1, indicating high accuracy in predicting total transaction amounts and the ability to capture most of the data variability. Conversely, the DTR model has the lowest R² value, suggesting weaker explanatory power and lower prediction accuracy.
(6): The EV values of the RF and MLP models are close to 1, consistent with the R² analysis, indicating strong explanatory power. The DTR model has the lowest EV value, reflecting weaker explanatory ability.

Overall, the RF model performs best across all metrics, especially in terms of error indicators (MAE, MSE, RMSE). The MLP model performs well in R² and EV but slightly lags behind the RF model in error metrics. The DTR model underperforms compared to the others. Based on the horizontal comparison in the figure, the RF model has the strongest performance in predicting total transaction amounts.

By comparing the results of the training set and the prediction set, it is evident that the RF model performs exceptionally well across all evaluation metrics. On the training set, the RF model achieves an R² value of 0.98, while on the prediction set, the R² value is 0.96. These results indicate that the model has a high level of predictive accuracy and strong generalization ability, with no signs of overfitting. Additionally, the RF model outperforms other models in terms of error metrics (MAE, MSE, RMSE). As shown in Figure 8, the predicted values of the RF model are highly consistent with the actual values in terms of overall trends, further demonstrating its superiority in forecasting total transaction amounts.

Moreover, other datasets or time frames may alter the optimal model. Since this study is purely data-driven, variations in the feature distribution, noise levels, and dynamic characteristics of the data within different time frames can all impact a model’s predictive capabilities. Considering this, we plan to adopt a mechanism–data fusion approach in future work, combining physical models with data-driven models to enhance model stability and generalizability and improve the reliability of predictions across different datasets and time frames.

3.5. Interpretability Analysis

To analyze the reasons behind the predictions of data-driven models, this paper selects the RF model, which has the highest prediction accuracy, as the focus of analysis and uses the SHAP algorithm to explain its prediction results.

(1): Global Interpretation Based on the SHAP Algorithm

Feature importance reflects the degree of influence each feature has on the prediction results during the model fitting process. The importance of features can be assessed through the average of the absolute values of SHAP. Specifically, features with a larger average of SHAP absolute values are more important, meaning these features significantly impact the model’s prediction of the GTV. Figure 9 displays the ranking of feature importance for the RF model.

Figure 9 shows that the features of consumption volume, number of transactions, and personal vehicle data volume have the top three average absolute values of SHAP. This means that they have a relatively significant impact on the prediction results of the RF model in the data-driven model, and their importance is relatively high. In contrast, the feature of whether it is a holiday has the smallest average absolute value of SHAP, from which it can be inferred that its impact on the prediction results of the RF model is the smallest, and its importance is the lowest.

Although Figure 9 reveals the degree of influence of each feature on the prediction results of the RF model, it does not clarify the direction of influence. To address this, this paper further draws Figure 10 to intuitively display the degree and direction of influence of each feature on the GTV prediction results of the RF model in the data-driven model.

In Figure 10, each marker represents a data sample, and when multiple points overlap, they are randomly offset in the vertical direction to distinguish them; the higher the concentration of points, the denser the data samples at that location. The left vertical axis displays the input features for predicting the GTV, and these features are sorted according to their importance, consistent with the ranking in Figure 9. The right vertical axis indicates the magnitude of feature values through the intensity of colors, transitioning from light blue to deep purple and then to red, reflecting the distribution of feature values from low to high. The horizontal axis shows the SHAP values of the features, with a reference line at SHAP value = 0: features with SHAP values greater than 0 and located to the right of the reference line have a positive effect on the prediction results, whereas features with SHAP values less than 0 and located to the left of the reference line harm the prediction results.

Taking the first row of Figure 10 as an example, the feature of consumption volume ranks first, indicating that it has the most significant influence on the RF model’s prediction of the GTV in the data-driven model. Upon closer examination, it can be observed that the blue points of the consumption volume feature are primarily distributed to the left of the central axis (i.e., the SHAP value equals 0). In contrast, the red points are mainly located to the right of the central axis. As one moves from left to right, the color of the consumption volume feature gradually transitions from blue to purple and eventually to the red area. This trend in color change indicates that when the value of the consumption volume feature is low, its SHAP value is negative; however, as the value of the consumption volume feature increases, its SHAP value also gradually increases, eventually turning positive. This trend fully confirms the positive impact of the consumption volume feature on the RF model’s prediction results. That is, there is a positive correlation between consumption volume and GTV. In short, the higher the consumption volume, the higher the corresponding GTV.

Similarly, we can infer that in the RF model, features such as consumption volume, number of transactions, hourly consumption time, personal vehicle data volume, non-personal vehicle data volume, and whether it is a weekend have a positive contribution to the model’s prediction of the GTV, indicating that these factors are positively correlated with the GTV. That is, an increase in the GTV often accompanies an increase in these features. In contrast, the season’s features and whether it is a holiday have a negative contribution to the model’s prediction results, indicating a negative correlation between these factors and the GTV. These conclusions are consistent with the results of some studies. Therefore, the feature analysis obtained through the SHAP algorithm aligns with the actual situation, confirming the effectiveness and accuracy of the RF model constructed in this paper.

(2): Local Explanation Based on the SHAP Algorithm

This paper randomly selects a sample. Figure 11 displays the waterfall plot of the explanation for this sample based on the SHAP algorithm, where the vertical axis represents the feature values, and the horizontal axis represents the SHAP values. The SHAP values measure the impact of each feature on the sample’s prediction result, with positive values indicated by red arrows, meaning the feature has a positive gain on the prediction result; negative values are indicated by blue arrows, meaning they have a negative gain. Such visualization helps us understand how the model makes predictions based on the input features.

In Figure 11, we can observe that the model’s final prediction value, X, is 1236.586, while the model’s average prediction value, which is the expectation of all instances’ prediction values, is 807.958. The figure lists the features that affect the prediction value and their corresponding SHAP values, which indicate each feature’s contribution to the prediction result.

Specifically, the number of transactions and personal vehicle data volume significantly positively impact the prediction value, increasing it by 208.14 and 168.22, respectively. Hourly consumption time also positively affects the prediction value, adding 82.37. However, an increase in consumption volume negatively affects the prediction value, reducing it by 26.62. The increase in non-personal vehicle data volume also has a slight negative effect on the prediction value, decreasing it by 9.64. The feature of whether it is a weekend, if it is a weekend, slightly increases the prediction value by 6.38. In contrast, if it is a holiday, the prediction value slightly decreases by 4.1. The season feature also has a slight positive impact on the prediction value, adding 3.89.

(3): Comparison of Feature Importance between SHAP Algorithm and RF Algorithm

We leverage the built-in interpretability feature of the RF algorithm to output feature importance, and compare it with the results from the SHAP algorithm.

As shown in Table 4, the two methods demonstrate a high degree of consistency in the ranking of feature importance. Through the interpretability analysis of the DT algorithm, the three most important features are Consumption Volume, Number of Consumption Occasions, and Personal Consumption Vehicle Data Volume. Conversely, the three least important features are Whether Weekend, Season, and Whether Holiday. These findings align with the results obtained from the SHAP algorithm. This consistency indicates that both the RF and SHAP algorithms yield similar assessments of feature importance, further validating the reliability of the analysis.

The analysis using SHAP algorithms further confirmed that consumption volume and frequency of purchases have a significant impact on the prediction results. The direction of their influence aligns with real-world scenarios, thereby validating the effectiveness and accuracy of the model. The SHAP analysis highlighted that consumption volume and frequency are key drivers of revenue. This finding not only reveals the critical factors influencing revenue but also provides a clear direction for practical operations. For example, intensifying marketing activities on non-working days can capitalize on consumers’ free time and their willingness to spend during these periods. This approach can attract more potential customers, increasing both purchase frequency and market share. Moreover, the model’s ability to forecast future energy demand and supply based on historical data and current trends enables policymakers to proactively develop rational energy policies to address potential shortages or surpluses. The efficiency and accuracy of the Random Forest model in energy forecasting make it a powerful tool for policy formulation and energy management.

4. Conclusions

The transition towards sustainable urban development and carbon neutrality has placed multi-energy supply stations (MESS) at the forefront of energy infrastructure innovation. Previous studies have largely focused on single types of fuel, failing to capture the full revenue picture of energy supply stations and falling short of meeting the needs of modern integrated energy stations. Moreover, these studies have shortcomings in feature selection and model interpretability, lacking enough analysis of feature importance, which undermines the transparency and credibility of the models. This study addresses a critical gap in revenue forecasting for MESS by proposing a data-driven framework that integrates machine learning models with interpretable AI techniques. Our findings demonstrate that the random forest (RF) model achieves exceptional accuracy (R² = 0.98) in predicting hourly gross transaction values, with low error metrics (MAE, MSE, RMSE), highlighting its potential for real-world application. The analysis using SHAP algorithms further confirmed that consumption volume and frequency of purchases have a significant impact on the prediction results. The direction of their influence aligns with real-world scenarios, thereby validating the effectiveness and accuracy of the model.

This study zeros in on the crucial challenge of revenue forecasting for Multi-Energy Supply Stations (MESS) and innovatively proposes a data-driven framework that integrates machine learning models with interpretable AI technologies. However, due to limitations in our current data sources, relying solely on data from a single site may restrict the generalizability of our findings. Therefore, in future research, we plan to actively expand our data sources by obtaining data from additional sites for further validation and analysis. Going forward, we will expand our feature set by incorporating external factors such as macroeconomic conditions, policy changes, and broader market dynamics. Additionally, we will employ time-series validation techniques to more comprehensively and deeply assess model performance. These efforts will enhance the robustness and accuracy of our forecasting framework.

Author Contributions

Conceptualization, Z.Z., M.W., J.W., S.D. and W.W.; methodology, Z.Z., M.W., J.W., X.C. and W.W.; software, Z.Z., M.W., J.W. and Z.W.; validation, Z.Z., X.C., S.D. and W.W.; formal analysis, M.W., J.W., X.C., S.D. and Z.W.; investigation, Z.Z., M.W., J.W. and W.W.; resources, H.L. and W.W. data curation, M.W., X.C., S.D., Z.W. and H.L.; writing—original draft preparation, Z.Z., M.W., J.W. and W.W.; writing—review and editing, Z.Z., M.W., J.W., X.C., S.D. and H.L.; visualization, Z.Z., M.W., S.D., Z.W. and W.W.; supervision, W.W.; project administration, W.W.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by the financial support of the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ23E040004, Natural Science Foundation of Chongqing, China (CSTB2023NSCQ- MSX0050) and the Project of Sinopec Sales Co., Ltd. (32850024-23-ZC0607-0001).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Huanying Liu was employed by the company Sinopec Sales Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

The following abbreviations are used in this manuscript:

BP	Back Propagation
CNN	Convolutional Neural Networks
DF	Deep Forest
DT	Decision Tree
DTR	Decision Tree Regression
DWT	Discrete Wavelet Transformation
EV	Explanatory Variance
GBDT	Gradient Boosting Decision Tree
GRA	Grey Relational Analysis
GTV	Gross Transaction Value
GY	Grey Models
KNN	K-Nearest Neighbor
LightGBM	Light Gradient Boosting Machine
LR	Linear Regression
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
ML	Machine Learning
MLP	Multilayer Perceptron
MSE	Mean Squared Error
R2	R-squared
RF	Random Forest
RMSE	Root Mean Squared Error
RNN	Recurrent Neural Network
SHAP	Shapley Additive Explanations
SVM	Support Vector Machine
SVR	Support Vector Regression
XGBoost	eXtreme Gradient Boosting

References

Cen, X.; Chen, Z.; Chen, H.; Ding, C.; Ding, B.; Li, F.; Lou, F.; Zhu, Z.; Zhang, H.; Hong, B. User repurchase behavior prediction for integrated energy supply stations based on the user profiling method. Energy 2024, 286, 129625. [Google Scholar] [CrossRef]
Han, Y.; Meng, J.; Luo, Z. Multi-agent deep reinforcement learning for blockchain-based energy trading in decentralized electric vehicle charger-sharing networks. Electronics 2024, 13, 4235. [Google Scholar] [CrossRef]
Yiğit, Y.; Karabatak, M. A deep reinforcement learning-based speed optimization system to reduce fuel consumption and emissions for smart cities. Appl. Sci. 2025, 15, 1545. [Google Scholar] [CrossRef]
Zeng, B.; Wang, W.; Zhang, W.; Wang, Y.; Tang, C.; Wang, J. Optimal configuration planning of vehicle sharing station-based electro-hydrogen micro-energy systems for transportation decarbonization. J. Clean. Prod. 2023, 387, 135906. [Google Scholar] [CrossRef]
Jia, J.; Li, H.; Wu, D.; Guo, J.; Jiang, L.; Fan, Z. Multi-objective optimization study of regional integrated energy systems coupled with renewable energy, energy storage, and inter-station energy sharing. Renew. Energy 2024, 225, 120328. [Google Scholar] [CrossRef]
Petroșanu, D.M.; Pîrjan, A.; Căruţaşu, G.; Tăbușcă, A.; Zirra, D.L.; Perju-Mitran, A. E-commerce sales revenues forecasting by means of dynamically designing, developing and validating a directed acyclic graph (DAG) network for deep learning. Electronics 2022, 11, 2940. [Google Scholar] [CrossRef]
Bilal, G.A.; Al-Saadi, M.K.; Al-Sultany, G.A.; Al-Maliki, W.A.K. Optimal operation of CCHP smart distribution grid with integration of renewable energy. Appl. Sci. 2025, 15, 1407. [Google Scholar] [CrossRef]
Shahzad, U. Environmental taxes, energy consumption, and environmental quality: Theoretical survey with policy implications. Environ. Sci. Pollut. Res. Int. 2020, 27, 24848–24862. [Google Scholar] [CrossRef]
Yasmeen, R.; Zhang, X.; Tao, R.; Shah, W.U.H. The impact of green technology, environmental tax and natural resources on energy efficiency and productivity: Perspective of OECD Rule of Law. Energy Rep. 2023, 9, 1308–1319. [Google Scholar] [CrossRef]
Yu, H.; Chen, W.; Wang, X.; Delina, L.; Cheng, Z.; Zhang, L. The impact of the energy conservation law on enterprise energy efficiency: Quasi-experimental evidence from Chinese firms. Energy Econ. 2025, 143, 108252. [Google Scholar] [CrossRef]
Guo, J.; Hao, H.; Wang, M.; Liu, Z. An empirical study on consumers’ willingness to buy agricultural products online and its influencing factors. J. Clean. Prod. 2022, 336, 130403. [Google Scholar] [CrossRef]
Agboola, E.; Chowdhury, R.; Yang, B. Oil price fluctuations and their impact on oil-exporting emerging economies. Econ. Model. 2024, 132, 106665. [Google Scholar] [CrossRef]
Semenza, J.C.; Herbst, S.; Rechenburg, A.; Suk, J.E.; Höser, C.; Schreiber, C.; Kistemann, T. Climate change impact assessment of food-and waterborne diseases. Crit. Rev. Environ. Sci. Technol. 2012, 42, 857–890. [Google Scholar] [CrossRef] [PubMed]
Yang, Q.; Huang, D.X. Gasoline sales prediction and sample data design. Comput. Eng. Appl. 2007, 43, 210–213,219. [Google Scholar] [CrossRef]
Hao, Q. Fueling rule analysis and fuel charge prediction of jet fuel in airport. Oil Gas Storage Transp. 2016, 35, 315–320. [Google Scholar] [CrossRef]
Zhang, C.; Qiu, T. Gas station sales forecast based on decision tree integration model. Comput. Appl. Chem. 2019, 36, 615–619. [Google Scholar] [CrossRef]
Deng, T.; Zhao, Y.; Wang, S.; Yu, H. Sales forecasting based on LightGBM. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021; pp. 383–386. [Google Scholar] [CrossRef]
Cui, B.; Ye, Z.; Zhao, H.; Renqing, Z.; Meng, L.; Yang, Y. Used car price prediction based on the iterative framework of XGBoost+ LightGBM. Electronics 2022, 11, 2932. [Google Scholar] [CrossRef]
Kumar, A.; Kabra, G.; Mussada, E.K.; Dash, M.K.; Rana, P.S. Combined artificial bee colony algorithm and machine learning techniques for prediction of online consumer repurchase intention. Neural Comput. Appl. 2019, 31, 877–890. [Google Scholar] [CrossRef]
Wu, J.; Shi, L.; Yang, L.; Niu, X.; Li, Y.; Cui, X.; Tsai, S.B.; Zhang, Y. User value identification based on improved RFM model and K-means++ algorithm for complex data analysis. Wirel. Commun. Mob. Comput. 2021, 2021, 9982484. [Google Scholar] [CrossRef]
Chen, T.; Yin, H.; Chen, H.; Wang, H.; Zhou, X.; Li, X. Online sales prediction via trend alignment-based multitask recurrent neural networks. Knowl. Inf. Syst. 2020, 62, 2139–2167. [Google Scholar] [CrossRef]
Zhang, W.; Wang, M. An improved deep forest model for prediction of e-commerce consumers’ repurchase behavior. PLoS ONE 2021, 16, e0255906. [Google Scholar] [CrossRef]
Liu, B.; Song, C.; Liang, X.; Lai, M.; Yu, Z.; Ji, J. Regional differences in China’s electric vehicle sales forecasting: Under supply-demand policy scenarios. Energy Policy 2023, 177, 113554. [Google Scholar] [CrossRef]
Li, W.; Becker, D.M. Day-ahead electricity price prediction applying hybrid models of LSTM-based deep learning methods and feature selection algorithms under consideration of market coupling. Energy 2021, 237, 121543. [Google Scholar] [CrossRef]
Costa, V.G.; Pedreira, C.E. Recent advances in decision trees: An updated survey. Artif. Intell. Rev. 2023, 56, 4765–4800. [Google Scholar] [CrossRef]
Kim, C.; Kim, S. A decision tree-based pattern classification and regression for a mobility support scheme in industrial wireless sensor networks. Appl. Sci. 2025, 15, 1408. [Google Scholar] [CrossRef]
Hynes, N.R.J.; Sankaranarayanan, R.; Sujana, J.A.J. A decision tree approach for energy efficient friction riveting of polymer/metal multi-material lightweight structures. J. Clean. Prod. 2021, 292, 125317. [Google Scholar] [CrossRef]
Hong, B.Y.; Liu, S.N.; Li, X.P.; Fan, D.; Ji, S.P.; Chen, S.H.; Li, C.C.; Gong, J. A liquid loading prediction method of gas pipeline based on machine learning. Pet. Sci. 2022, 19, 3004–3015. [Google Scholar] [CrossRef]
Ma, Y.; He, L.; Zheng, J. STL-DCSInformer-ETS: A hybrid model for medium- and long-term sales forecasting of fast-moving consumer goods. Appl. Sci. 2025, 15, 1516. [Google Scholar] [CrossRef]
Wang, Z.; Wu, F.; Hao, N.; Wang, T.; Cao, N.; Wang, X. The combined machine learning model SMOTER-GA-RF for methane yield prediction during anaerobic digestion of straw lignocellulose based on random forest regression. J. Clean. Prod. 2024, 466, 142909. [Google Scholar] [CrossRef]
Pham, Q.B.; Tran, D.A.; Ha, N.T.; Islam, A.R.M.T.; Salam, R. Random forest and nature-inspired algorithms for mapping groundwater nitrate concentration in a coastal multi-layer aquifer system. J. Clean. Prod. 2022, 343, 130900. [Google Scholar] [CrossRef]
Cai, W.; Wen, X.; Li, C.; Shao, J.; Xu, J. Predicting the energy consumption in buildings using the optimized support vector regression model. Energy 2023, 273, 127188. [Google Scholar] [CrossRef]
Btoush, E.; Zhou, X.; Gururajan, R.; Chan, K.C.; Alsodi, O. Achieving excellence in cyber fraud detection: A hybrid ML+DL ensemble approach for credit cards. Appl. Sci. 2025, 15, 1081. [Google Scholar] [CrossRef]
Li, Q.; Li, D.; Zhao, K.; Wang, L.; Wang, K. State of health estimation of lithium-ion battery based on improved ant lion optimization and support vector regression. J. Energy Storage 2022, 50, 104215. [Google Scholar] [CrossRef]
Amiri, M.K.; Zaferani, S.P.G.; Emami, M.R.S.; Zahmatkesh, S.; Pourhanasa, R.; Namaghi, S.S.; Klemeš, J.J.; Bokhari, A.; Hajiaghaei-Keshteli, M. Multi-objective optimization of thermophysical properties GO powders-DW/EG Nf by RSM, NSGA-II, ANN, MLP and ML. Energy 2023, 280, 128176. [Google Scholar] [CrossRef]
Chen, D.; Chui, C.K.; Lee, P.S. Adaptive physically consistent neural networks for data center thermal dynamics modeling. Appl. Energy 2025, 377, 124637. [Google Scholar] [CrossRef]
Qi, X.; Wang, S.; Fang, C.; Jia, J.; Lin, L.; Yuan, T. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol. 2024, 79, 103470. [Google Scholar] [CrossRef]
Luo, Z.; Qi, X.; Sun, C.; Dong, Q.; Gu, J.; Gao, X. Investigation of influential variations among variables in daylighting glare metrics using machine learning and SHAP. Build. Environ. 2024, 254, 111394. [Google Scholar] [CrossRef]
Yuan, Y.; Guo, W.; Tang, S.; Zhang, J. Effects of patterns of urban green-blue landscape on carbon sequestration using XGBoost-SHAP model. J. Clean. Prod. 2024, 476, 143640. [Google Scholar] [CrossRef]
Zhao, J.; Zhang, Q.; Tian, L. Market revenue prediction and error analysis of products based on fuzzy logic and artificial intelligence algorithms. J. Ambient. Intell. Smart Environ. 2020, 11, 4011–4018. [Google Scholar] [CrossRef]
Weng, C.H. Revenue prediction by mining frequent itemsets with customer analysis. Eng. Appl. Artif. Intell. 2017, 63, 85–97. [Google Scholar] [CrossRef]
Golderzahi, V.; Pao, H.K.K. Revenue forecasting in smart retail based on customer clustering analysis. Internet Things 2024, 27, 101286. [Google Scholar] [CrossRef]
Asgarkhani, N.; Kazemi, F.; Jankowski, R. Machine learning-based prediction of residual drift and seismic risk assessment of steel moment-resisting frames considering soil-structure interaction. Comput. Struct. 2023, 289, 107181. [Google Scholar] [CrossRef]

Figure 1. Interpretable revenue forecasting methods for multi-energy supply stations based on data-driven models.

Figure 2. The structure of DT.

Figure 3. The structure of RF.

Figure 4. Histograms of parameter distributions.

Figure 5. PCC heatmap.

Figure 6. The predictive performance of data-driven models.

Figure 7. Test set results of the data-driven model.

Figure 8. RF Actual vs. predicted values.

Figure 9. SHAP feature importance ranking chart.

Figure 10. SHAP summary plot.

Figure 11. SHAP-based single-sample explanation waterfall plot.

Table 1. Research on energy supply station-related predictions using ML techniques.

Reference	Feature Selection	Optimization Algorithm	ML Techniques	Interpretability Analysis	Multi-Energy Analysis
Yang et al. [14]	√	×	BP, LR	×	×
Hao. [15]	√	×	LR, GM, BP	×	×
Zhang et al. [16]	√	×	DT, RF, GBDT	×	×
Deng et al. [17]	√	×	LightGBM, LR, SVR	×	×
Cui et al. [18]	√	×	LightGBM XGBoost	×	×
Kumar et al. [19]	√	√	DT, AdaBoost, RF, SVM	×	×
Wu et al. [20]	√	√	RFM, K-means	×	×
Chen et al. [21]	√	×	RNN, LSTM, TADA	×	×
Zhang and Wang [22]	√	√	DF, KNN, LR, RF, CNN	×	×
Liu et al. [23]	×	√	GRA, DWT, LSTM, K-means	√	×
Li et al. [24]	√	√	LSTM, SHAP RNN, CNN	√	×
This Paper	√	√	DT, RF, SVR, MLP, SHAP	√	√

Table 2. The specific meanings of each feature.

Feature	Unit	Meaning	Continuity
Hourly Consumption Time	Hours	Divides a day into 24 h	Continuous
Consumption Volume	Yuan	Total consumption within an hour	Discrete
Number of Consumption Occasions	Times	Number of consumption behaviors occurring within an hour	Discrete
Personal Consumption Vehicle Data Volume	Vehicles	Number of vehicles for personal consumption within an hour	Discrete
Non-Personal Consumption Vehicle Data Volume	Vehicles	Number of vehicles for non-personal consumption within an hour	Discrete
Whether Holiday	0 or 1	1 indicates a holiday, 0 indicates not a holiday	Discrete
Whether Weekend	0 or 1	1 indicates a weekend, 0 indicates not a weekend	Discrete
Season	0–3	Season divided according to the date: 0 represents spring, 1 represents summer, 2 represents autumn, and 3 represents winter	Discrete
Total Transaction Amount	Yuan	Total amount of all transactions within an hour	Discrete

Table 3. The details of hyperparameter tuning using the GS algorithm.

Model	Hyper-Parameter	Search Space	Optimal Hyper-Parameter
RF	n_estimators	1~200	181
	max_depth	1~21	13
	max_features	[1,5,10,15,20, ‘auto’, ‘sqrt’, ‘log2’, None]	sqrt
	min_samples_leaf	1~20	1
	min_samples_split	1~20	2
SVR	kernel	[‘linear’, ‘rbf’, ‘poly’, ‘sigmoid’]	poly
	C	[0.01,0.1,0.2,0.5,0.8,1,5,10,25,50,75,100,125]	100
	gamma	[0.01,0.05,0.1,0.2,0.5,0.8,1]	1
	epsilon	[0.01,0.05,0.1,0.2,0.5,0.8,1]	0.01
DT	max_depth	1~20	14
	max_features	[“auto”, “sqrt”, “log2”, None, 5]	sqrt
	min_samples_leaf	1~20	6
	min_samples_split	2~20	1
MLP	alpha	[0.01, 0.05, 0.1]	0.01
	hidden_layer_sizes	1~50	104
	max_iter	1~1001	101
	activation	[‘relu’, ‘tanh’, ‘logistic’]	Relu
	solver	[‘lbfgs’, ‘sgd’, ‘adam’]	lbfgs

Table 4. A Comparison of feature importance between RF and SHAP algorithms.

Rank	Feature Importance
	The Built-in Interpretability Features of the RF Algorithm		SHAP
	Feature	Importance Score	Feature	Importance Score
1	Consumption Volume	0.468331	Consumption Volume	0.522293
2	Number of Consumption Occasions	0.199677	Number of Consumption Occasions	0.213521
3	Personal Consumption Vehicle Data Volume	0.186803	Personal Consumption Vehicle Data Volume	0.210576
4	Hourly Consumption Time	0.022131	Hourly Consumption Time	0.035886
5	Non-Personal Consumption Vehicle Data Volume	0.012713	Non-Personal Consumption Vehicle Data Volume	0.008716
6	Whether Weekend	0.006382	Whether Weekend	0.003901
7	Season	0.005971	Season	0.003147
8	Whether Holiday	0.003965	Whether Holiday	0.001961

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Z.; Wang, M.; Wei, J.; Cen, X.; Du, S.; Wu, Z.; Liu, H.; Wang, W. Interpretable Machine Learning for Multi-Energy Supply Station Revenue Forecasting: A SHAP-Driven Framework to Accelerate Urban Carbon Neutrality. Energies 2025, 18, 1624. https://doi.org/10.3390/en18071624

AMA Style

Zhao Z, Wang M, Wei J, Cen X, Du S, Wu Z, Liu H, Wang W. Interpretable Machine Learning for Multi-Energy Supply Station Revenue Forecasting: A SHAP-Driven Framework to Accelerate Urban Carbon Neutrality. Energies. 2025; 18(7):1624. https://doi.org/10.3390/en18071624

Chicago/Turabian Style

Zhao, Zhihui, Minjuan Wang, Jin Wei, Xiao Cen, Shengnan Du, Ziwen Wu, Huanying Liu, and Weiqiang Wang. 2025. "Interpretable Machine Learning for Multi-Energy Supply Station Revenue Forecasting: A SHAP-Driven Framework to Accelerate Urban Carbon Neutrality" Energies 18, no. 7: 1624. https://doi.org/10.3390/en18071624

APA Style

Zhao, Z., Wang, M., Wei, J., Cen, X., Du, S., Wu, Z., Liu, H., & Wang, W. (2025). Interpretable Machine Learning for Multi-Energy Supply Station Revenue Forecasting: A SHAP-Driven Framework to Accelerate Urban Carbon Neutrality. Energies, 18(7), 1624. https://doi.org/10.3390/en18071624

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Machine Learning for Multi-Energy Supply Station Revenue Forecasting: A SHAP-Driven Framework to Accelerate Urban Carbon Neutrality

Abstract

1. Introduction

1.1. Background

1.2. Related Works

1.3. Contributions of This Work

1.4. Organization

2. Methodology

2.1. Data Preprocessing and Feature Engineering

2.1.1. Decision Tree Regression (DTR)

2.1.2. Random Forest (RF)

2.1.3. Support Vector Regression (SVR)

2.1.4. Multilayer Perceptron (MLP)

2.2. Evaluating Index

2.3. SHAP Implementation

3. Method Application

3.1. Data Description

3.2. Correlation Analysis

3.3. Model Parameter Settings

3.4. Model Performance Analysis

3.5. Interpretability Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI