Contextual Background Estimation for Explainable AI in Temperature Prediction

Szostak, Bartosz; Doroz, Rafal; Marker, Magdalena

doi:10.3390/app15031057

Open AccessArticle

Contextual Background Estimation for Explainable AI in Temperature Prediction

by

Bartosz Szostak

¹

,

Rafal Doroz

^2,*

and

Magdalena Marker

³

¹

Faculty of Science and Technology, University of Silesia in Katowice, Bedzinska 39, 41-200 Sosnowiec, Poland

²

Institute of Computer Science, Faculty of Science and Technology, University of Silesia in Katowice, Bedzinska 39, 41-200 Sosnowiec, Poland

³

Independent Researcher, 40-069 Katowice, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1057; https://doi.org/10.3390/app15031057

Submission received: 31 December 2024 / Revised: 14 January 2025 / Accepted: 17 January 2025 / Published: 22 January 2025

(This article belongs to the Special Issue Explainable Artificial Intelligence for Visualization in Human Computer Interactions)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate weather prediction and electrical load modeling are critical for optimizing energy systems and mitigating environmental impacts. This study explores the integration of the novel Mean Background Method and Background Estimation Method with Explainable Artificial Intelligence (XAI) with the aim to enhance the evaluation and understanding of time-series models in these domains. The electrical load or temperature predictions are regression-based problems. Some XAI methods, such as SHAP, require using the base value of the model as the background to provide an explanation. However, in contextualized situations, the default base value is not always the best choice. The selection of the background can significantly affect the corresponding Shapley values. This paper presents two innovative XAI methods designed to provide robust context-aware explanations for regression and time-series problems, addressing critical gaps in model interpretability. They can be used to improve background selection to make more conscious decisions and improve the understanding of predictions made by models that use time-series data.

Keywords:

XAI; time-series; temperature forecasting; SHAP; timeSHAP; explainable AI; regression; LSTM

1. Introduction

1.1. Significance of Machine Learning in Power Systems

Over the past few decades, technology has rapidly advanced in areas such as smart grids, renewable energy sources, and distributed generation [1,2,3,4,5]. Forecasting has become a critical focus in these fields, gaining significant attention for its potential impact [6,7,8]. Machine learning (ML) models have been introduced to enhance system resilience, improve stability, and achieve financial savings [9,10,11,12]. Moreover, these models have been employed to analyze complex phenomena in power systems, leveraging their capability to capture non-linear relationships [13].

There are numerous ML algorithms that can be used for different purposes and different datasets. A few of the most notable branches of ML are deep-learning, classification [14], regression [15], and clustering [16]. They have many applications for computer-vision, safety [17], forecasting [18], and disease detection [19], among many others. All those ML branches are used in power systems; however, regression-based problems for forecasting are the most rudimentary for their capabilities to reduce contingency and enable better planing.

A growing body of research underscores the critical importance of explaining ML models [20,21], with Explainable Artificial Intelligence (XAI) gaining increasing prominence. This is particularly crucial for ML models used in power systems, where a clear and comprehensive understanding is necessary before deploying them in sensitive and fragile networks [22]. Amid the pressing challenges in this field, XAI serves as a vital tool, providing valuable insights and practical information about the factors influencing various phenomena in power systems.

XAI methods are being developed at a rapid pace. They can be divided into several categories depending on the purpose and application. The categorization of some of the well-known methods is visualized in Figure 1. The main distinctions are as follows:

1.: Model-agnostic or model-specific—If the method can be applied to any type of ML model, then the method is model-agnostic. On the other hand, if the method is designed for one type of ML model, then it is model-specific. Examples of the model-specific method include Integrated Gradient [23] and Layer-Wise Relevance Backpropagation (LRP) [24], the latter of which is designed to explain the decisions made by neural networks. The model-agnostic method includes ELI5 [25], Kernel SHAP [26], and LIME [27].
2.: Local or global—Global XAI methods can explain the entire scope of the decisions the model can make, even for non-existing cases. In other words, it explains the behavior of an entire model. In contrast to that, local XAI methods explain only the particular decision made by the model. The global methods include, for example, Global Surrogate Models [28], Permutation Feature Importance [29], and Partial Dependence Plot [30]. LIME and Kernel SHAP are examples of local methods.
3.: Intrinsic (ante hoc) or post hoc—Post hoc methods include those methods that can be used after model training to explain its behavior. Contrary to this, a model is intrinsically explainable if it is explainable `by design’. Simple models such as linear models [31] and decision trees [32] can be explained without any exterior method [33]. Grad-CAM [34], LIME, and Kernel SHAP are examples of post hoc methods.

The method that is used much more often than others for power systems is Kernel SHAP [20,26,35]. It is a post hoc, local, and model-agnostic method. This article presents results obtained with this method. Kernel SHAP is an approximation method. Its aim is to approximate Shapley values.

A Shapley value, calculated for feature i, is defined as follows. Let sample data point x from the dataset consist of

i_{m a x}

features.

S

is a set of all combinations (coalitions) of features from x, including subset of all features

{x_{1}, \dots, x_{i_{m a x}}}

—which is equal to x, and no features

{\emptyset}

.

x_{S}

is the data point x where only features in

S

have been retained. Shapley value is calculated with the following equation:

φ_{i} = \sum_{S | i \notin S} α_{S} \cdot [f (x_{S \cup {i}}) - f (x_{S})]

(1)

where

α_{S} = | S |! (i_{m a x} - | S | - 1) / i_{m a x}!

. This equation tests different subsets of features that the model uses. Each time the feature is not included in a subset, a certain value is used to replace it. Usually, mean values calculated over entire dataset for each feature are used as replacements.

Calculating Shapley values is expensive computationally, so Kernel SHAP is used as an approximator. Kernel SHAP is not the only XAI method that can be used with the methods presented in this article. Those two methods could be applied with a different XAI method that uses base values and provides importance for each feature. For each feature

x_{i}

, Kernel SHAP estimates its importance as Shapley values

φ_{i}

, quantifying the contribution of the feature to the model’s predictions. So, if there are

i_{m a x}

features, there are also

i_{m a x}

Shapley values. Shapley values have a unit of target. So, if the model predicts temperature, then each Shapley value provides importance in the unit of temperature.

XAI could be employed to explain the prediction of electrical load, one of the most rudimentary challenges in the field of power systems [36,37,38]. It would be important to understand what exactly defines and influences power demand, such as how much it depends on the temperature, humidity, or other meteorological features, and how much on geography, seasonality, or changes in energy prices. Understanding the importance and dependency of the features would also make the models safer to use [39,40,41]. There are numerous different statistical and machine learning approaches that lead to more accurate predictions of the electrical load [42,43]. Electrical load prediction is a regression-based problem that is usually divided into four types: very short-term load forecasting (up to one day), short-term load forecasting (up to two weeks), medium-term load forecasting (up to three years), and long-term load forecasting (over three years) [44]. Models that predict electrical load use time-series sequences. Based on several steps from the past, they can forecast several steps forward. Recurrent neural networks, such as Long Short-Term Memory (LSTM) neural networks, are often employed to predict power demand [45]. Meteorological data are the most important data that can be used to predict electrical load, notably temperature, which is highly correlated with electrical load [46]. Building on the above, this article focuses on temperature and meteorological data.

1.2. Limits of XAI

As several authors have observed, despite their utility, XAI methods have notable limitations. For instance, many methods struggle to provide meaningful explanations for correlated features [47,48]. Additionally, these methods can be computationally expensive, and the approximating white-box models they rely on may sometimes produce misleading results [49]. As is explained later, of particular interest to this article is the issue of fluctuations in Shapley values, as highlighted in [50]. Another significant challenge, especially for regression-based problems, lies in the selection of two important elements of contextualized XAI base value and background [51].

1.2.1. Base Value

The user of the model f can utilize it to make a certain prediction

y_{e}

. Then, if the user wants to understand why the model f made this particular prediction, explanation is required. To obtain this explanation, one can use a prominent local XAI method called Kernel SHAP. This method uses a base value

y_{b}

to provide explanations by measuring the distance between this base value

y_{b}

and a given prediction

y_{e}

that requires explanation. This process is schematically illustrated in Figure 2. For example, if a model predicts the price of real estate based on five features

x = [x_{1}, \dots, x_{5}]

and provides a prediction of

f (x) = y_{e} =

72,000

USD

, Kernel SHAP would generate an explanation as an importance of each feature. It would indicate how much each feature contributed to moving the prediction from

y_{b} =

43,000

USD

toward 72,000 USD or away from it. For each feature

x_{i}

, Kernel SHAP provides Shapley values

φ_{i}

in the unit of the target. In Figure 2, the Shapley values are equal to

φ_{1} =

19,000

USD

,

φ_{2} = 7000 USD

,

φ_{3} = - 4000 USD

,

φ_{4} = - 5000 USD

, and

φ_{5} =

10,000

USD

for the respective features.

It should also be pointed out that the sum of the base value

y_{b}

and all the Shapley values

φ

is equal to the prediction that the user wants to explain

y_{e}

.

f (x) = y_{e} = y_{b} + \sum_{i = 1}^{i_{m a x}} φ_{i}

(2)

In [26], the base value is defined as the prediction of the model when none of the feature values for the current output are known. In practice, this base value is typically the mean prediction of the model [52,53]. The process involves calculating the mean of each feature across the entire dataset (

x_{m e a n} = [{\bar{x}}_{1}, \dots, {\bar{x}}_{i_{m a x}}]

), and the prediction based on these mean feature values becomes the base value (

f (x_{m e a n}) = y_{b}

). In the case of Figure 2, the base value is calculated as the mean of each feature

x_{1}, \dots, x_{5}

, which results in a prediction of

y_{b} =

43,000

USD

. These mean features are then used to compute the Shapley values for each feature, which explain how the features contribute to moving the base value to the final prediction of

y_{e} =

72,000

USD

.

As a result, the base value may fall out of the range of the model’s known predictions as it is derived from values that might not be part of the dataset used to train or validate the model. The base value is based on mean values. It is also important to note that, in practice, mean values are not calculated over the entire dataset. According to the official SHAP Documentation (https://shap.readthedocs.io/en/latest/index.html, accessed on 20 December 2024), the mean is typically calculated using 100 randomly selected samples. However, as shown in [50], this approach can lead to fluctuations in Shapley values, introducing uncertainties into the results. The authors explained their model by repeatedly selecting different sets of 100 (or more) random samples to compute the averages for the base value. For each set of 100 samples, they obtained different base values that resulted in different Shapley values that fluctuated around a certain mean value, thereby introducing variability and uncertainty into the explanations.

The authors of [51] highlighted that the default base value is not useful in contextualized situations. For example,

If a real-estate agent wanted to understand why a model predicted a property price of 40,000 USD instead of the typical 70,000 USD for the district, a base value of 45,000 USD or 30,000 USD would be unhelpful.
If someone wanted to know why a model predicted an electrical load of 15 GW for next Wednesday, while it normally predicts 10 GW for Wednesdays at this time of year, a base value of 50 GW or 0 GW would provide no useful insight.
If a model predicted that tomorrow’s temperature would be 25 °C, whereas the average temperature for the week was 18 °C, a base value of 0 °C would not enable explaining this difference effectively.

As illustrated in Figure 3, all three examples mentioned above are regression-based problems with real-valued predictions. However, predicting real-estate prices can be conducted using a single vector of features x, where the features might include distance from the city center (

x_{1}

), crime rate (

x_{2}

), age of the estate (

x_{3}

), and so on.

Electrical load and temperature predictions are similar in nature as both rely on time-series data to capture seasonality and generate forecasts. They use features such as temperature (

x_{1}

), humidity (

x_{2}

), wind speed (

x_{3}

), and so on, but the key difference between predicting prices and those two problems is that they require a matrix of features (M) that represents each feature over a time period instead of one vector. This matrix, that is, time-series sequence, has rows representing different features and columns corresponding to time steps from the past, (

t_{1}

to

t_{j_{m a x}}

). For example, suppose that, to predict the temperature for the next day, the model needs to consider the temperature, humidity, wind speed, pressure, and precipitation from the past six days. In this case, the time-series sequence consists of five rows (

i_{m a x} = 5

) and five columns (

j_{m a x} = 6

), providing the dimensions of the time-series sequence as

M_{5 \times 6}

.

1.2.2. TimeSHAP and Background

There are several methods available for explaining predictions made by models using time-series data. One such method is timeSHAP [54], which builds upon the Kernel SHAP framework and incorporates base values to explain predictions. It is schematically described in Figure 4. The method is specifically designed to handle time-series data, taking into account the temporal dependencies within the data. However, when the model input is not a vector with

i_{m a x}

features but a matrix M with

j_{m a x}

rows and

i_{m a x}

columns (as is the case with time-series data), the input

M_{b}

used to predict the base value must also be a

j_{m a x} \times i_{m a x}

matrix.

The timeSHAP framework consists of several steps, of which only the first two are used to obtain results in this article: (i) pruning and (ii) calculation of feature importance.

To optimize the calculation, timeSHAP in the first step iterates over the time steps

t_{j}

to find the threshold step that divides the input sequence into the most current time steps, which influence the results, and the previous ones, which do not change the prediction. Thereafter, the pruned (old) time steps are treated as one, a separate feature for which a Shapley value is also calculated.

After pruning, timeSHAP creates a set

S

of coalitions with different combinations of features. Each feature that is not included in the coalition is replaced by a feature from a background

M_{b}

. A coalition consisting of all the features is equal to time-series

M_{e}

for prediction

y_{e}

that requires explanation, while a coalition consisting of an empty set (no features) is equal to the background

M_{b}

. All the features are replaced with features from the background.

Currently, the process of background preparation involves calculating the mean values of each feature over the entire dataset, just as in the case of one-dimensional input. However, in the case of two-dimensional input, the mean values of the features (

[{\bar{x}}_{1}, \dots, {\bar{x}}_{i_{m a x}}]

) are not simply used as one vector. Instead, these values are multiplied by the length of the input sequence

j_{m a x}

to receive matrix

M_{b} = M_{i_{m a x} \times j_{m a x}}

, which would fit the shape of the input. In this matrix, each column is the same, representing the mean values of the features.

The two-dimensional sequence used in this context to predict the base value is referred to as the background. The background is used to calculate the base value

y_{b}

of the model.

With background

M_{b}

and set of coalitions

S

, Kernel SHAP is applied within timeSHAP to approximate the Shapley values corresponding to each feature. Here, Shapley values for features (not time steps) are used. This means that, further in this article, the main focus will be on the importance of features, not on the exact time steps that were important within each feature.

By using this background, timeSHAP can explain how each feature at each time step contributes to the model’s prediction, again showing it as a shift from the base value towards a particular prediction or away from it.

1.3. The Aim of This Article

Numerous widely used XAI methods rely on base values or backgrounds to generate explanations (as discussed in Section 1.2.1 and Section 1.2.2). However, it is rarely addressed in the literature that base values and backgrounds can significantly influence the explanations provided by these methods, potentially causing misleading results. Furthermore, contextualized use of XAI is also barely discussed in the literature. Usually, approximated mean values of the entire dataset are used to calculate base values.

The purpose of this article is to develop and describe two new methods aimed at improving the understanding of explanations and their reliability. The first method is called the Mean Background Method and the second is called the Background Estimation Method. These methods can help to assess the appropriateness and reliability of explanations, therefore providing a better understanding of the model’s decisions. They can be applied particularly in sensitive domains such as power systems (Section 1.1). In this article, we present the results of tests conducted on models predicting temperature, a key indicator of electrical load (Section 3). The code used to obtain the results presented here is provided from the Gitlab repository (https://gitlab.com/Barszo/timeseries_explainer, accessed on 19 January 2025). Information regarding reproducibility with step-by-step instructions is described in the README file, which provides detailed information about the use of the methods presented here. They are open and free to use under the MIT license. In short, the results presented here were produced with the following libraries: modified timeSHAP 1.0.4 (the required modifications are described in the README), SHAP 0.45.1, tensorflow 2.15.0, and Python 3.11.10. Additionally, the authors provided three classes written in Python to facilitate the process: BackgroundHandler, ModelsForMeteorology, and TimeseriesData. They are all available from Gitlab.

Specifically, the Mean Background Method can be used to prepare mean backgrounds for given base or reference values, as opposed to the default methods proposed by the authors of Kernel SHAP or timeSHAP. They assume calculating mean values and not leaving the possibility to specify a custom base value. The Mean Background Method is the first proposition of a more conscious background selection approach that meets the requirements of the user for contextualized problems. It does not modify either Kernel SHAP or timeSHAP; instead, it is meant to obtain backgrounds that can be used with these XAI methods.

The Background Estimation Method provides even more understanding of the influence of backgrounds that could be used for methods such as Kernel SHAP or timeSHAP. It can reveal the sources of fluctuations regarding explanations and provide more insight into the way the model makes decisions by providing context. With this method, the user can individually select which background (if any) of those existing in dataset time-series sequences will meet the demands.

Both of these methods can also be used to demonstrate the limits of XAI methods in certain cases. If the XAI method is used to obtain importance and the proposed methods reveal large fluctuations caused by backgrounds, then it might mean that the given XAI method cannot be used to obtain the explanation and thus that the explanation cannot be trusted.

2. Materials and Methods

In this article, we will focus on situations where the user predicts the value

y_{e}

with a model f so that

f (M_{e}) = y_{e}

and then wants to explain the difference between

y_{e}

and a base value

f (M_{b}) = y_{b}

. The user of the model might want to know the answer to a contextualized question, like the ones presented above: “why the model predicted

y_{e}

and not

y_{b}

”. The user cannot determine the answer using a default baseline, which is created with the mean values of each feature. To address this, the existing baseline must be replaced with

y_{b}

, which is important for the user. We propose two new methods to help achieve viable results by estimating uncertainty, fluctuations, and the influence of dataset structure.

2.1. Description of Mean Background Method

One method we propose assumes the creation of a mean background that represents all the backgrounds capable of predicting the value

y_{b}

across the entire dataset. The entire dataset is a tensor

D

consisting of matrices M. Each matrix (time-series sequence) can be used to obtain a result from the model (

f (M) = y_{e}

). The result

y_{e}

undergoes logical verification if it is within a small margin m away from

y_{b}

. So, for each M in

D

, we check if it meets the following condition:

y_{b} - m \leq f (M) \leq y_{b} + m

(3)

Each M that meets this requirement is used to calculate the mean background. An important aspect of this approach, specific to meteorological time-series data, is that the target variable—in our case, temperature—exhibits its own frequency. Temperature generally increases from morning to late afternoon, decreases during the night, and repeats this cycle indefinitely.

For hourly data, temperature follows a periodic pattern that repeats every 24 time steps. To represent this periodicity, we could calculate the mean values of each feature for each hour. Specifically, we would use all feature values recorded at 12 a.m. to create mean feature values for 12 a.m., those at 1 p.m. to create mean values for 1 p.m., and so on. This process enables us to construct an artificial background that mimics the real frequency of the data.

This representation of frequency in the background may not be essential if the user is focused solely on explaining the importance of features rather than time steps. However, by incorporating this approach, we can prepare the backgrounds for further analysis of the importance of individual time steps.

The Mean Background Method is summarized in the following steps and in Figure 5:

checking which time-series sequences can be used to predict $y_{b} \pm m$ , where m is some small margin. In Figure 5, two example values are shown ( $y_{b}^{'}$ and $y_{b}^{″}$ );
all time-series sequences selected in step I can be used to create a mean time-series background that could resemble the frequency of the target;
an XAI method is applied to explain the changes between $y_{b}$ and $y_{e}$ . In Figure 5, there are two values, $y_{b}^{'}$ and $y_{b}^{″}$ , which result in two different explanations for each feature. For example, moving the temperature prediction from $y_{b}^{'} = - 12$ °C to $y_{e} = 24$ °C might be influenced by different features than moving the prediction from $y_{b}^{″} = 0$ °C to 24 °C.

There is a drawback to the mean background obtained by the method described above. By averaging sequences, we risk creating a background that is significantly distinct from the values and trends present within the dataset. In other words, the resulting background may not accurately represent the true values provided in the dataset. For example, let us assume the user wants to explain why the model predicted a temperature of 15 °C instead of 0 °C. Following the described steps, the user could identify all time-series sequences that lead the model to predict 0 °C. So,

y_{e} = 15

°C and

y_{b} = 0

°C. However, the user may not realize that, in this hypothetical example, there are two types of time-series sequences capable of predicting 0 °C. The first type consists of sequences with a downward trend in temperature approaching 0 °C, while the second type consists of sequences with an upward trend in temperature reaching 0 °C. Averaging these two types of sequences could produce a background that results in a prediction different from 0 °C. Moreover, consider a scenario where the user has two datasets but only one model. In the first dataset, sequences with downward trends dominate, whereas, in the second dataset, upward trends are more frequent. The mean time-series background could differ depending on whether it is calculated from the first dataset or the second. Even if both backgrounds result in the same prediction of 0 °C, they could yield different Shapley values for the features.

Nonetheless, proposed method can be useful for conducting a quick analysis of how different

y_{b}

values and corresponding mean backgrounds influence Shapley values, providing a rough estimation of which features are important in shifting the prediction from different values of

y_{b}

to the chosen

y_{e}

. A proposed novelty is that it can be considered as a contrast to the default method of calculating an irrelevant base value based on mean features of entire dataset.

2.2. Description of Background Estimation Method

To address the issues of fluctuating Shapley values and the presence of different types of time-series sequences, a second new method was developed. This method offers deeper insights into the shifts caused by specific backgrounds, thereby providing more comprehensive information about the model’s behavior.

The main idea behind the Background Estimation Method is that, given a predictive model f, a sequence for prediction

y_{e}

, and a reference target

y_{b}

, the validity of the explanations can be assessed. This method quantifies the extent to which the explanations vary depending on the background context for a given

y_{e}

. Furthermore, it evaluates the reliability of the results produced by the applied XAI method for each feature. Additionally, it captures the dispersion of the importance across selected backgrounds.

For example, if a user wants to explain why a model predicted a specific value

y_{e}

in comparison to

y_{b}

, the user can select from dataset

D

all time-series sequences M that result in the prediction of

y_{b}

, following the steps of the Mean Background Method depicted in Figure 5. This results in a new set of backgrounds

[M_{1}, \dots, M_{z_{m a x}}]

. However, instead of creating a single mean background, each sequence is used as a background individually, one at a time. This approach results in numerous Shapley values for each feature.

Assuming there are 200 sequences (

z_{m a x} = 200

) that meet the requirement (3), then there is a new set of backgrounds

[M_{1}, \dots, M_{200}]

. In this situation, each feature

x_{i}

receives 200 Shapley values, with one value calculated for each background

M_{z}

. The resulting importance can be organized into a matrix

E_{z_{m a x} \times i_{m a x}}

, where

z_{m a x}

represents the number of backgrounds M and

i_{m a x}

represents the number of features

x_{i}

. Thus, each column contains all Shapley values calculated for a given feature. To enhance interpretability, the elements of the matrix E are transformed into percentages of importance. The values

e_{z i}

represent the importance as a percentage of the distance between the prediction

y_{e}

and the reference base value

y_{b}

.

The matrix E consisting of the importance percentage can be presented as follows:

E = [\begin{matrix} e_{11} & e_{12} & \dots & e_{1 i_{m a x}} \\ e_{21} & e_{22} & \dots & e_{2 i_{m a x}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ e_{z_{m a x} 1} & e_{z_{m a x} 2} & \dots & e_{z_{m a x} i_{m a x}} \end{matrix}],

(4)

e_{z i} = \frac{φ_{z i} \cdot 100}{| y_{e} - y_{b} |}

(5)

where index i is used to represent feature

x_{i}

, index z is used to represent background meeting a requirement from Equation (3), and

φ_{z i}

represents the Shapley values corresponding to a given feature i and background z, used to calculate it.

In Figure 6, importance percentages of three features are presented. The three columns, corresponding to features

x_{1}, x_{2},

and

x_{3}

from matrix E, are used to form histograms visible in Figure 6. This visualization makes it easy to understand the distribution of importance. It is also much easier to compare importance of each feature for different values of

y_{b}

when they are in percentages. If the distribution of percentage importance

[e_{1 i}, \dots, e_{z_{m a x} i}]

for a particular feature

x_{i}

is wide and short, it indicates high fluctuations, making the importance assigned to this feature uncertain and potentially risky to use. Conversely, if the distribution is narrow and tall, the importance for the feature is stable across the backgrounds for the given

y_{b}

.

In cases where the importance for a feature consists of multiple distinct distributions, further analysis is required. It may be possible to identify groups of backgrounds that cause this distinction. The decision of which background group to use and why depends on the user’s context and goals.

To facilitate the estimation of the reliability of explanations, an index named utility

ψ

was created. It can be calculated for each feature (

x_{1}

to

x_{i_{m a x}}

) and gathered in the vector

Ψ = [ψ_{1}, \dots, ψ_{i_{m a x}}]

. To calculate these values, mean percentage importance

{\bar{e}}_{i}

and standard deviation

σ_{i}

have to be used as follows:

{\bar{e}}_{i} = \frac{\sum_{z = 1}^{z_{m a x}} e_{z i}}{z_{m a x}},

(6)

σ_{i} = \sqrt{\frac{\sum_{z = 1}^{z_{m a x}} | e_{z i} - {\bar{e}}_{i} |^{2}}{z_{m a x}}},

(7)

ψ_{i} = \frac{{\bar{e}}_{i}}{σ_{i}}

(8)

where

{\bar{e}}_{i}

is the mean percentage importance of feature

x_{i}

across all backgrounds and

σ_{i}

is the standard deviation of the percentage importance of feature

x_{i}

across all backgrounds. The utility index

ψ_{i}

for feature

x_{i}

is calculated by dividing

{\bar{e}}_{i}

over

σ_{i}

. The utility index favors features with a high mean percentage importance as these features provide more explanatory power. However, it penalizes features with a wide standard deviation of percentage importance.

If the mean percentage importance for a feature is high but its fluctuation is also significant, the utility would not recommend making decisions based on it. In contrast, a feature with a slightly lower utility but much lower standard deviation would be more reliable and, therefore, preferred.

The process used for the Background Estimation Method is detailed in Algorithm 1.

Algorithm 1: Simplified algorithm for Background Estimation Method that uses timeSHAP as an XAI method

2.3. Summary of Methods and Scenarios of Use

For now, the traditional XAI methods are usually used only as means to understand certain decisions without context. Both of the methods presented here reveal new ways to enable conscious use of background for contextualized situations. Authors of [51] used XAI in contextualized situations. They proposed two methods that do not require backgrounds at all. However, those methods can only be used with propagation-based methods (such as LRP) and neural networks and thus cannot be used with Shapley values. Nevertheless, the authors open an important discussion about contextualized use of XAI. The two methods described in this article are the first methods that enable estimation of impact of background on explanations provided by XAI, which can be used for every ML model as they work with model-agnostic XAI methods such as Kernel SHAP. The methods aim to provide better explainability and reduce uncertainty regarding black-box models. For example, the methods could be used in the following scenarios:

The local power supplier predicts power demand for the next day at 8:00, which is equal to 180 MW. The prediction is unusually high and the supplier wants to make sure that the prediction is valid. The user wants to know why the model changed prediction to 180 MW while the expected level for this hour and time of year is 150 MW. This is a contextualized situation, so it is not possible to simply use XAI method with a default base value to find answers. The user can apply Mean Background Method, assuming $y_{e} = 180$ MW and $y_{b} = 150$ MW. Out of the entire dataset, the mean background is created and XAI method is applied. It turns out that the extreme rise in temperature on the previous day is mainly responsible for the shift; it comprises 80% of the shift. Humidity is second most important variable. It comprises 10% of the shift from 15 MW to 18 MW. Based on the historical data and expertise knowledge, the supplier can believe this prediction. However, the supplier has doubts whether XAI method provides viable results. The background used requires verification, so the Background Estimation Method is applied. The results show that the importance of temperature was not changing radically depending on background used, while the importance of humidity was fluctuating strongly from 2% to 17%. The user still decides to trust the prediction since the utility for temperature is much higher than the utility for humidity. Even though the importance of humidity experiences fluctuations, its importance is low, while temperature is much more important and its importance is stable. The power supplier can use a precise and reliable model to steer efficient power production, thus meeting the demand.
A microgrid is supplied with energy by a local wind farm. It is crucial to predict how much power will be created by the wind farm and how much should be ordered from a utility distribution network to meed demand. It is predicted that the wind farm will generate 10 MWh, as opposed to 17 MWh, which was generated a day before. The user wants to know whether this is a viable prediction. XAI method cannot be used for that in a traditional way since this is a contextualized situation and the user wants to know what caused the shift from 17 MWh to 10 MWh. New base value is required. It can be achieved with Mean Background Method. It is applied with $y_{e} = 10$ MWh and $y_{b} = 17$ MWh over entire dataset. It is known that certain features were important to create the shift from 17 MWh to 10 MWh—wind speed (50%), air density (20%), temperature (10%), and humidity (5%). However, after applying Background Estimation Method, all features are assigned with low utility—below 25 a.u. After thorough investigation of results from the Background Estimation Method, it was noted that there are few groups of backgrounds. The user decided to use the Background Estimation Method again but this time using only one group of backgrounds for it—the one that includes the day before the original prediction. This time the utility is high (over 75) for each feature. The user decides to trust the results as the power production can rapidly change due to rough changes in the important features. Due to this decision, more energy is ordered from the utility distribution network.
A new model predicting power production was prepared to work with SCADA system that controls power usage and production in a microgrid. To make sure that the model is working properly and predicts results based on correct features, it undergoes tests. Calculation of utility indexes for numerous test cases can be one of those tests. Utility indexes are high for two features that should be used and low for other two features that also should be used. Even if the accuracy of the model is high, two out of four features that are known to be important cannot be explained viably. The model should be rejected to save fragile systems.

3. Results

Two data sources were utilized for the experiments. The first is the well-known dataset recorded by the Max Planck Institute for Biogeochemistry [55]. This dataset contains 14 features, including temperature, pressure, and humidity, recorded every 10 min at a weather station located at the Max Planck Institute for Biogeochemistry in Jena, Germany. The dataset spans the period from 10 January 2009 to 31 December 2016. The second dataset was provided by the Institute of Meteorology and Water Management (Instytut Meteorologii i Gospodarki Wodnej, IMGW) [56]. This dataset is more recent and includes data from 10 randomly selected meteorological stations across Poland. The identification numbers of these stations are 351180455, 352200375, 353170235, 349190650, 351190469, 352140310, 352150300, 351160424, 353200272, and 354160115. Twelve features, similar to those in the Jena dataset, were selected for model training and analysis. Unlike the Jena dataset, these measurements were recorded at an hourly frequency, covering the period from 1 January 2015 to 31 December 2023. The data gathered from Jena were also pre-processed to resemble an hourly frequency.

All the datasets were trained on separate LSTM models. The models use hourly time-series sequences and predict the temperature two hours ahead based on sequences spanning the previous 72 h. Thus, each time-series sequence created for this article consists of 14 features (

x_{1}, \dots, x_{14}

) for the Jena dataset or 12 features (

x_{1}, \dots, x_{12}

) for the IMGW dataset and 72 time steps (

t_{1}, \dots, t_{72}

). Consequently, the dimensions of the sequences are

M_{14 \times 72}

or

M_{12 \times 72}

.

It also has to be mentioned that the models were not optimized. It is possible to limit features and time steps without any impact on the accuracy of the model. In other words, features of no importance or minuscule importance were added to the training, as were time steps. This was carried out on purpose. The aim of this article is not to provide efficient ML models, quite the contrary. The aim is to test two novel methods on the imperfect models.

Each model used the same hyperparameters, comprising one LSTM layer with 64 units and one dense output layer. The models were trained on standardized data. It is important to note that the predictions of the models are also standardized, as are the Shapley values. Standardization consisted of subtracting the mean value of a given feature from each value and dividing it by the standard deviation of each value. The standardization was applied as a standard procedure, preventing dominance of features with larger values, which is especially required when comparing the importance of features. Additionally, standardization helps to ensure stable learning and faster convergence. It should also prevent exploding gradients or vanishing gradients.

The datasets were split into three subsets: 50% for training, 25% for testing, and 25% for evaluation. To make the estimation more realistic, the data were not shuffled before they were split. This was conducted to make sure that the model will be tested and validated on future data and trained on past data, simulating a realistic application. The evaluation results are presented in Table 1. Due to the standardization, metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) can be used to demonstrate that the accuracy of the models is comparable. Furthermore,

R^{2}

clearly indicates that the models achieve very high accuracy on the validation dataset.

The performance of the model and the specific machine learning (ML) algorithm are of lesser importance in this article. LSTM is used merely as an example. The methods presented here can be applied to results obtained from different ML algorithms. The primary focus is placed on explanations.

The explainability of the models was assessed using the Kernel SHAP method, which was applied indirectly through the timeSHAP library to calculate feature importance.

Each model for IMGW was used to predict the temperature (in standardized units) for 15 July 2021 at 01:00:00. For this prediction, the time-series sequences spanning from 12 July 2021 23:00:00 to 14 July 2021 23:00:00, consisting of 72 h, were used separately from each dataset. Similarly, the temperature for 15 July 2014 at 01:00:00 was predicted for the Jena dataset using the corresponding time-series sequences. The dates were chosen randomly.

3.1. Evaluation of Mean Background Method

The datasets from the IMGW stations were used to calculate Shapley in combination with the Mean Background Method to explain the prediction for 15 July 2021 at 01:00:00. To show the difference in explanation between the backgrounds, multiple values of

y_{b}

were selected. Different numbers of

y_{b}

were used depending on the dataset. For each station and each dataset, only integer temperatures °C, with the margin of 0.1 were used as

y_{b}

. The base value changed because only those temperatures present in each dataset were considered. The mean backgrounds were created for each integer temperature in °C present in the entire dataset, with a margin of 0.1. Shapley values were calculated for each background using Kernel SHAP and timeSHAP. The results for four stations (352140310, 352150300, 351190469, and 351160424) are presented below in Figure 7. The margin could be modified. Ideally, it should be equal to 0 to include only the time-series sequences that result in prediction equal to

y_{b}

. However, this is not realistic regarding regression-based problems, when predictions are rarely equal to one value. This is why a small margin needs to be added to include backgrounds from the vicinity of

y_{b}

.

The number of backgrounds varied by station. Specifically, for stations 352140310, 352150300, 351190469, and 351160424, the background numbers were 52, 50, 56, and 53, respectively. Each background corresponds to a different base value

y_{b}

. Additionally, the actual background predictions (Act. bg pred.), the sum of Shapley values (Shap. vals. sum), and the sum of these two quantities are provided.

The actual background prediction refers to the prediction obtained using the created mean background. If a specific mean background (b) is associated with a particular

y_{b}

value, the prediction

f (b)

can either closely approximate

y_{b}

or be equal to it. This explains why, in all the figures, the actual background prediction exhibits a linear trend with the temperature

y_{b}

, which is shown on the x-axis. Nevertheless, slight fluctuations are visible, particularly at the extreme values of the predictions, but not exclusively. For instance, in Figure 7a, the predictions of the background slightly deviate from

y_{b}

in the vicinity of 1.5 °C.

However, for the presented datasets, these fluctuations are neither significant nor frequent, so users cannot reliably assume that a mean background created for a given

y_{b}

will always result in predictions equal to

y_{b}

. This introduces potential uncertainties in the explanation of the model. For example, if a user aims to explain why the model predicted 25 °C instead of 0 °C, a background created for

y_{b} = 0

°C that produces an actual prediction

f (y_{b}) = 1

°C may not effectively capture the importance of the features driving the change from 0 °C to 25 °C.

The sum of Shapley values also exhibits a linear trend, but it is oriented in the opposite direction. This occurs because the Shapley values indicate how much the base value

y_{b}

shifts toward the given prediction

y_{e}

that the user aims to explain. As

y_{b}

approaches

y_{e}

, the Shapley values do not need to be large to represent the extent of the prediction shift.

The sum of all the Shapley values and the actual background prediction is always equal to a single value—the real prediction for 15 July 2021 at 01:00:00 (

y_{e}

). This serves as a verification to ensure that all the calculations have been performed correctly. The value

y_{e}

differs for each station as the temperature (and hence the prediction) varies slightly across stations. It is important to note that, when

y_{b}

crosses

y_{e}

, the Shapley values change signs. Prior to the crossing, they were shifting the prediction from a smaller

y_{b}

toward a larger

y_{e}

. After crossing, they shift the prediction from a larger

y_{b}

toward a smaller

y_{e}

and therefore become negative. It might be easier to interpret after referring to Figure 2 again, knowing that the base value shifts from the lowest values to the highest values, passing

y_{e}

along the way.

The Shapley values assigned to each of the 12 features are also presented. Additionally, there is a feature labeled ’Pruned Events’, which represents insignificant time steps, subtracted by timeSHAP and treated as a separate feature. These ’Pruned Events’ do not have any significance but can be used to determine whether timeSHAP was applied correctly. The three most significant features for each station are air temperature, vapor pressure, and relative humidity. For stations (a) and (b) in Figure 7, sunshine duration is also slightly important.

What is the most significant in these results is that they clearly demonstrate the high importance of the

y_{b}

selection in the calculation of Shapley values. For every station, the Shapley values differ significantly when, for example,

y_{b} = 2

°C compared to

y_{b} = - 2

°C. The importance of features changes depending on whether the prediction is being shifted from one of these two points toward

y_{e}

.

It is also evident that the importance of features changes gradually over the backgrounds. When

y_{b}

is low and the Shapley values need to push the prediction toward a higher

y_{e}

, air temperature is the most important feature, followed by vapor pressure. Neither relative humidity nor sunshine duration are significant at this point. As

y_{b}

increases, vapor pressure, relative humidity, and sunshine duration gain importance. In some cases, vapor pressure becomes more important than air temperature. When the Shapley values become negative, relative humidity becomes more important, sometimes surpassing air temperature, while vapor pressure remains positive.

The proposed method, can be used to provide a more thorough understanding of the overall importance of the features during a feature selection. It clearly outlines which features should be kept and which removed. Keras SHAP could also be used to determine that the most important features are air temperature, vapor pressure, and relative humidity. Nevertheless, the Mean Background Method enables understanding how this importance changes with different backgrounds.

Each significant feature exhibits minor fluctuations, introducing some uncertainty. This method also does not provide certainty regarding the Shapley values between

y_{e}

and

y_{b}

predicted with the real time-series sequence from the dataset but only with the average background. If there are specific types or clusters of backgrounds for a given

y_{b}

, they are not identified in this approach. For a further and more thorough analysis, the next step is taken.

3.2. Evaluation of Background Estimation Method

The predictions for 15 July 2021 at 01:00:00 (IMGW) and 15 July 2014 at 01:00:00 (Jena) were explained using the Background Estimation Method. In this case, only one example

y_{b}

was chosen, set to 0 °C for both IMGW and Jena. The margin m was set to 0.1 °C. The XAI method was applied using Algorithm 1. During the calculation, the matrix E was generated, containing percentage explanations for each feature and background. Each Shapley value was then transformed into the percentage of the explanation for the difference between the prediction and the background target. The mean values and standard deviations for each feature and dataset are presented in Table 2, Table 3 and Table 4. A detailed analysis of the percentage of explanation distributions for four stations is presented in Figure 8 and Figure 9. The utility index calculated via mean and standard deviation is presented in Figure 10.

A detailed analysis of the percentage of explanation distributions for four stations is presented. Not all the features are included in Figure 8 and Figure 9; only the features with a mean percentage of explanation equal to or greater than 2% are shown. Several significant observations can be made. The station in Figure 8 on the right is an example of a relatively stable explanation across the given set of backgrounds, resulting in the prediction

y_{b}

. The Shapley values for each feature do not fluctuate significantly across the different backgrounds in the entire dataset. This is reflected in the thick and high histograms.

In the same figure, on the left side, there are also histograms that are thick and high; however, some of them appear within the same feature. In other words, the presented percentages consist of two or more separate distributions. Although each distribution shows relatively minor fluctuations, since they belong to the same feature, they exhibit a wide range and large fluctuations. A similar situation can be observed in Figure 9 on the right side, where the percentage assigned to each feature is divided into three distributions. This strongly suggests that there are several types or clusters of sequences that can be used as backgrounds. Whenever possible, these should be treated separately, and the user should decide which type of sequence to use. This is a key difference from the previous method. Previously, it would not have been possible to distinguish these clusters, but they are now clearly visible. In the previous method, all the sequences would have been combined and averaged. This also explains why the previous method, while fast and providing rough estimations of explanations across the entire dataset, is less effective.

A difficult and cumbersome situation is presented in Figure 9 on the left. The range of the feature “Pressure at station level” is large, and there are no means of modifying the backgrounds to limit these fluctuations. This highlights the need to identify challenging features that can introduce uncertainty in explanations. Such features can be identified using the utility index.

The utility index indicates whether a feature can be reliably used to explain the model’s behavior and whether we can trust the explanations. If the results do not highlight features that can effectively explain most of the prediction, further analysis is required to reduce fluctuations. For instance, clustering the backgrounds and dividing the set of backgrounds into more precise subsets may better represent the context. The utility indices are presented in Figure 10.

The utility indices for the stations discussed above can be verified. In the case of station 353200272, for which the explanations were visibly stable, high utility values are notable. Air temperature has a much higher utility value, even though it does not explain as much as “Pressure at station level”. However, it has a significantly lower standard deviation, and, as a result, it was rewarded for its stability.

On the other hand, the explanations for station 352150300 are extremely unstable, and, therefore, each feature has a low utility value. The steam pressure for this station has a high mean importance of

41 %

but also a large standard deviation of

5 %

. At the same time, wind gusts demonstrate a very low standard deviation of

0.10 %

but also an insignificantly low share of explanation. The utility values for this station are low because there are several distributions within each feature. Further analysis could help to distinguish these distributions and create groups of stable backgrounds.

In the case of station 352140310, steam pressure and air temperature both have comparable percentages of explanation and similar standard deviations, which is why they were assigned similar utility values.

Utility indices are valid only for a given set of backgrounds, so they cannot be used to draw conclusions about the entire dataset or all the possible backgrounds. Additionally, they do not provide further insights into the behavior of the model beyond the particular prediction

y_{e}

and the corresponding

y_{b}

. Utility could be calculated for the entire dataset, similarly to in the Mean Background Method, to observe how it changes across the dataset. This could provide an impression of stability over the entire dataset. However, the main focus of this article is to study local importance.

4. Discussion and Future Work

This article is the next step towards improving the use of XAI, notably for power systems. It corroborates the need to provide new solutions regarding contextualized XAI methods for regression-based problems. This is the first article to discuss the significance of different backgrounds in the context of regression-based problems and shifting references. The proposed methods reveal a new chapter in the analysis of background significance for those problems. Even though this was the first attempt at finding the optimal method for the estimation of backgrounds, it was not limited to vector features but already tested and applied for two-dimensional time-series problems. It was proved that the methods are beneficial for estimating the stability of explanations and for finding clusters and distributions that can be used to better understand the behavior of the model. The natural next step would be to find a way to easily, and automatically, assign time-series sequences to specific clusters and types of backgrounds.

The methods presented here are not limit-free. First of all, they are time-consuming. The calculation of Shapley values is time-consuming by itself, and it depends on the number of features. Additionally, much more time is required to calculate Shapley values within timeSHAP due to numerous time steps. Finally, each step, already time-consuming, needs to be repeated for each prepared background. This is mainly relevant in the Background Estimation Method and depends on the number of elements in the dataset. The next step would be to optimize the algorithm, for example, by detecting clusters of backgrounds beforehand or limiting the number of processed backgrounds based on contextual information.

Undoubtedly, by limiting the dataset to a specific range of backgrounds, the interpretation of the behavior of a model as a whole can also be limited. The methods aim at exploring model behavior only in certain contextualized situations, not as a whole. So, by understanding why a model made a certain prediction in one context will not reveal much about its decisions in different contexts. These are context-specific methods. However, by performing a thorough analysis, with numerous base values, these methods could describe the behavior of a model better than the XAI method by itself, thus potentially being used as supports and estimators for the XAI method to provide even more information about the model by demonstrating the influence of the backgrounds.

The solutions presented here have high significance for forecasts in power systems. They prove that, in this fragile field, the explanations provided by conventional XAI methods that use backgrounds cannot be fully trusted and there is a need for further verification. Kernel SHAP, used in this article, can be employed to determine the reasons for changes in predictions of temperature, which is the main indicator of electrical load. The utility index, which was presented here, can facilitate this process. It can be used whenever the user requires contextualized explanations by providing indications of the viability and utility of features for explanations in power systems.

5. Conclusions

This article presented two new methods for analyzing the influence of different backgrounds on the explanations provided by the XAI method in contextualized situations, which can be used for time-series and regression-based problems. The methods are the Mean Background Method and the Background Estimation Method. The XAI method that was used was Kernel SHAP. Both methods were used on eleven datasets, ten from IMGW stations and one from a station in Jena.

The results of the Mean Background Method proved that background modifications can strongly influence the explanations provided by local XAI methods. The method was used to present and quantify the uncertainties and fluctuations caused by different sets of mean backgrounds. The user should consider which background ought to be used to provide the best explanations and if it is viable in comparison to the fluctuations existing in the explanations.

To provide a more detailed analysis, the Background Estimation Method was presented. Instead of using mean backgrounds, it uses original time-series sequences from datasets to single out different groups of time-series sequences that provide different explanations. Apart from that, it indicates the fluctuations and uncertainties of importance of each feature. The method also provides a simple index that can be used to quickly determine whether the feature can be used for the analysis of importance and if its results can be trusted.

Both methods, in combination, can provide the user with a full picture of how backgrounds change explanations, to what extent the explanations can be trusted, and if there are any features that are assigned with more or less useful importance.

Author Contributions

Conceptualization, B.S.; methodology, B.S.; software, B.S. and M.M.; validation, B.S.; formal analysis, B.S. and R.D.; investigation, B.S.; resources, B.S.; data curation, B.S. and M.M.; writing—original draft preparation, B.S.; writing—review and editing, B.S. and R.D.; visualization, B.S.; supervision, R.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code presented in this study is openly available from GitLab at https://gitlab.com/Barszo/timeseries_explainer (accessed on 1 March 2024). Datasets used for the results part are available at https://www.bgc-jena.mpg.de/wetter/ (accessed on 1 March 2024) and at https://danepubliczne.imgw.pl (accessed on 15 December 2024). They are provided by Max Planck Institute for Biogeochemistry and the Institute of Meteorology and Water Management, respectively.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Moreno Escobar, J.J.; Morales Matamoros, O.; Tejeida Padilla, R.; Lina Reyes, I.; Quintana Espinosa, H. A comprehensive review on smart grids: Challenges and opportunities. Sensors 2021, 21, 6978. [Google Scholar] [CrossRef] [PubMed]
Strielkowski, W.; Civín, L.; Tarkhanova, E.; Tvaronavičienė, M.; Petrenko, Y. Renewable energy in the sustainable development of electrical power sector: A review. Energies 2021, 14, 8240. [Google Scholar] [CrossRef]
Almihat, M.G.M.; Kahn, M.; Aboalez, K.; Almaktoof, A.M. Energy and sustainable development in smart cities: An overview. Smart Cities 2022, 5, 1389–1408. [Google Scholar] [CrossRef]
Rahman, M.M.; Dadon, S.H.; He, M.; Giesselmann, M.; Hasan, M.M. An Overview of Power System Flexibility: High Renewable Energy Penetration Scenarios. Energies 2024, 17, 6393. [Google Scholar] [CrossRef]
Yu, K.; Wei, Q.; Xu, C.; Xiang, X.; Yu, H. Distributed Low-Carbon Energy Management of Urban Campus for Renewable Energy Consumption. Energies 2024, 17, 6182. [Google Scholar] [CrossRef]
El Rhatrif, A.; Bouihi, B.; Mestari, M. AI-based solutions for grid stability and efficiency: Challenges, limitations, and opportunities. Int. J. Internet Things Web Serv. 2024, 9, 16–28. [Google Scholar]
Khodayar, M.; Liu, G.; Wang, J.; Khodayar, M.E. Deep learning in power systems research: A review. CSEE J. Power Energy Syst. 2020, 7, 209–220. [Google Scholar]
Ozcanli, A.K.; Yaprakdal, F.; Baysal, M. Deep learning methods and applications for electrical power systems: A comprehensive review. Int. J. Energy Res. 2020, 44, 7136–7157. [Google Scholar] [CrossRef]
Forootan, M.M.; Larki, I.; Zahedi, R.; Ahmadi, A. Machine learning and deep learning in energy systems: A review. Sustainability 2022, 14, 4832. [Google Scholar] [CrossRef]
Janjua, J.I.; Ahmad, R.; Abbas, S.; Mohammed, A.S.; Khan, M.S.; Daud, A.; Abbas, T.; Khan, M.A. Enhancing smart grid electricity prediction with the fusion of intelligent modeling and XAI integration. Int. J. Adv. Appl. Sci. 2024, 11, 230–248. [Google Scholar] [CrossRef]
Park, J.; Kang, D. Artificial Intelligence and Smart Technologies in Safety Management: A Comprehensive Analysis Across Multiple Industries. Appl. Sci. 2024, 14, 11934. [Google Scholar] [CrossRef]
Elmousalami, H.; A Alnaser, A.; Kin Peng Hui, F. Advancing Smart Zero-Carbon Cities: High-Resolution Wind Energy Forecasting to 36 Hours Ahead. Appl. Sci. 2024, 14, 11918. [Google Scholar] [CrossRef]
Titz, M.; Pütz, S.; Witthaut, D. Identifying drivers and mitigators for congestion and redispatch in the German electric power system with explainable AI. Appl. Energy 2024, 356, 122351. [Google Scholar] [CrossRef]
Doroz, R.; Orczyk, T.; Wrobel, K.; Porwik, P. Adaptive classifier ensemble for multibiometric Verification. Procedia Comput. Sci. 2024, 246, 4038–4047. [Google Scholar] [CrossRef]
Hamrani, A.; Medarametla, A.; John, D.; Agarwal, A. Machine-Learning-Driven Optimization of Cold Spray Process Parameters: Robust Inverse Analysis for Higher Deposition Efficiency. Coatings 2024, 15, 12. [Google Scholar] [CrossRef]
Ali, A.; Faheem, Z.B.; Waseem, M.; Draz, U.; Safdar, Z.; Hussain, S.; Yaseen, S. Systematic review: A state of art ML based clustering algorithms for data mining. In Proceedings of the 2020 IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan, 5–7 November 2020; pp. 1–6. [Google Scholar]
Orczyk, T.; Porwik, P.; Doroz, R. A preliminary study on the dispersed classification system for recognizing safety of drivers’ maneuvers. Procedia Comput. Sci. 2023, 225, 2604–2613. [Google Scholar] [CrossRef]
Masini, R.P.; Medeiros, M.C.; Mendes, E.F. Machine learning advances for time series forecasting. J. Econ. Surv. 2023, 37, 76–111. [Google Scholar] [CrossRef]
Wrobel, K.; Doroz, R.; Porwik, P.; Orczyk, T.; Cavalcante, A.B.; Grajzer, M. Features of Hand-Drawn Spirals for Recognition of Parkinson’s Disease. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Ho Chi Minh City, Vietnam, 28–30 November 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 458–469. [Google Scholar]
Machlev, R.; Heistrene, L.; Perl, M.; Levy, K.Y.; Belikov, J.; Mannor, S.; Levron, Y. Explainable Artificial Intelligence (XAI) techniques for energy and power systems: Review, challenges and opportunities. Energy AI 2022, 9, 100169. [Google Scholar] [CrossRef]
Letzgus, S.; Müller, K.R. An explainable AI framework for robust and transparent data-driven wind turbine power curve models. Energy AI 2024, 15, 100328. [Google Scholar] [CrossRef]
Panagoulias, D.P.; Sarmas, E.; Marinakis, V.; Virvou, M.; Tsihrintzis, G.A.; Doukas, H. Intelligent decision support for energy management: A methodology for tailored explainability of artificial intelligence analytics. Electronics 2023, 12, 4430. [Google Scholar] [CrossRef]
Davydko, O.; Pavlov, V.; Longo, L. Selecting Textural Characteristics of Chest X-Rays for Pneumonia Lesions Classification with the Integrated Gradients XAI Attribution Method. In Proceedings of the World Conference on Explainable Artificial Intelligence, Lisbon, Portugal, 26–28 July 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 671–687. [Google Scholar]
Mahendran, A.; Vedaldi, A. Salient deconvolutional networks. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VI 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 120–135. [Google Scholar]
Victoria, A.H.; Tiwari, R.S.; Ghulam, A.K. Libraries for Explainable Artificial Intelligence (EXAI): Python. In Explainable AI (XAI) for Sustainable Development; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024; pp. 211–232. [Google Scholar]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Monteiro, W.R.; Reynoso-Meza, G. On the generation of global surrogate models through unconstrained multi-objective optimization. arXiv 2022. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chambers, J.M. Linear models. In Statistical Models in S; Routledge: London, UK, 2017; pp. 95–144. [Google Scholar]
De Ville, B. Decision trees. Wiley Interdiscip. Rev. Comput. Stat. 2013, 5, 448–455. [Google Scholar] [CrossRef]
Puthanveettil Madathil, A.; Luo, X.; Liu, Q.; Walker, C.; Madarkar, R.; Cai, Y.; Liu, Z.; Chang, W.; Qin, Y. Intrinsic and post-hoc XAI approaches for fingerprint identification and response prediction in smart manufacturing processes. J. Intell. Manuf. 2024, 35, 4159–4180. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Strumbelj, E.; Kononenko, I. An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 2010, 11, 1–18. [Google Scholar]
Yaprakdal, F.; Varol Arısoy, M. A multivariate time series analysis of electrical load forecasting based on a hybrid feature selection approach and explainable deep learning. Appl. Sci. 2023, 13, 12946. [Google Scholar] [CrossRef]
Powroźnik, P.; Szcześniak, P. Predictive Analytics for Energy Efficiency: Leveraging Machine Learning to Optimize Household Energy Consumption. Energies 2024, 17, 5866. [Google Scholar] [CrossRef]
Laitsos, V.; Vontzos, G.; Paraschoudis, P.; Tsampasis, E.; Bargiotas, D.; Tsoukalas, L.H. The State of the Art Electricity Load and Price Forecasting for the Modern Wholesale Electricity Market. Energies 2024, 17, 5797. [Google Scholar] [CrossRef]
Gürses-Tran, G.; Körner, T.A.; Monti, A. Introducing explainability in sequence-to-sequence learning for short-term load forecasting. Electr. Power Syst. Res. 2022, 212, 108366. [Google Scholar] [CrossRef]
Grzeszczyk, T.A.; Grzeszczyk, M.K. Justifying short-term load forecasts obtained with the use of neural models. Energies 2022, 15, 1852. [Google Scholar] [CrossRef]
Sarker, M.A.A.; Shanmugam, B.; Azam, S.; Thennadil, S. Enhancing smart grid load forecasting: An attention-based deep learning model integrated with federated learning and XAI for security and interpretability. Intell. Syst. Appl. 2024, 23, 200422. [Google Scholar] [CrossRef]
Abumohsen, M.; Owda, A.Y.; Owda, M. Electrical load forecasting using LSTM, GRU, and RNN algorithms. Energies 2023, 16, 2283. [Google Scholar] [CrossRef]
Cordeiro-Costas, M.; Villanueva, D.; Eguía-Oller, P.; Martínez-Comesaña, M.; Ramos, S. Load forecasting with machine learning and deep learning methods. Appl. Sci. 2023, 13, 7933. [Google Scholar] [CrossRef]
Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
Ungureanu, S.; Topa, V.; Cziker, A.C. Analysis for non-residential short-term load forecasting using machine learning and statistical methods with financial impact on the power market. Energies 2021, 14, 6966. [Google Scholar] [CrossRef]
Cassarino, T.G.; Sharp, E.; Barrett, M. The impact of social and weather drivers on the historical electricity demand in Europe. Appl. Energy 2018, 229, 176–185. [Google Scholar] [CrossRef]
Baur, L.; Ditschuneit, K.; Schambach, M.; Kaymakci, C.; Wollmann, T.; Sauer, A. Explainability and interpretability in electric load forecasting using machine learning techniques—A review. Energy AI 2024, 16, 100358. [Google Scholar] [CrossRef]
Olsen, L.H.B.; Glad, I.K.; Jullum, M.; Aas, K. A comparative study of methods for estimating model-agnostic Shapley value explanations. Data Min. Knowl. Discov. 2024, 38, 1782–1829. [Google Scholar] [CrossRef]
Alkhatib, A.; Boström, H.; Johansson, U. Estimating Quality of Approximated Shapley Values Using Conformal Prediction. Proc. Mach. Learn. Res. 2024, 230, 1–17. [Google Scholar]
Yuan, H.; Liu, M.; Kang, L.; Miao, C.; Wu, Y. An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models. arXiv 2022, arXiv:2204.11351. [Google Scholar]
Letzgus, S.; Wagner, P.; Lederer, J.; Samek, W.; Müller, K.R.; Montavon, G. Toward explainable artificial intelligence for regression models: A methodological perspective. IEEE Signal Process. Mag. 2022, 39, 40–58. [Google Scholar] [CrossRef]
Errousso, H.; Abdellaoui Alaoui, E.A.; Benhadou, S.; Medromi, H. Exploring how independent variables influence parking occupancy prediction: Toward a model results explanation with SHAP values. Prog. Artif. Intell. 2022, 11, 367–396. [Google Scholar] [CrossRef]
Książek, W. Explainable Thyroid Cancer Diagnosis Through Two-Level Machine Learning Optimization with an Improved Naked Mole-Rat Algorithm. Cancers 2024, 16, 4128. [Google Scholar] [CrossRef]
Bento, J.; Saleiro, P.; Cruz, A.F.; Figueiredo, M.A.; Bizarro, P. Timeshap: Explaining recurrent models through sequence perturbations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 2565–2573. [Google Scholar]
Max Planck Institute for Biogeochemistry. Jena, Germany. 2009–2016. Available online: https://www.bgc-jena.mpg.de/wetter/ (accessed on 1 March 2024).
Instytut Meteorologii i Gospodarki Wodnej Państwowy Instytut Badawczy. 2015–2023. Available online: https://danepubliczne.imgw.pl/ (accessed on 1 March 2024).

Figure 1. Categorization of some XAI methods.

Figure 2. Schematic description of explanation provided by SHAP method.

Figure 3. Schematic description of difference between (a) single-vector input and (b) time-series sequence input.

Figure 4. Schematic description of results of first two steps of timeSHAP—pruning and calculation of feature importance.

Figure 5. Schematic description of the Mean Background Method.

Figure 6. Schematically presented importance of each feature for set of time-series sequences chosen for particular

y_{b}

.

Figure 6. Schematically presented importance of each feature for set of time-series sequences chosen for particular

y_{b}

.

Figure 7. Results of Mean Background Method for stations (a) 352150300, (b) 352140310, (c) 351190469, and (d) 351160424.

Figure 8. Dispersion of percentage of explanation for each feature that has mean percentage equal to or larger than 2% for stations 352200375 and 353200272.

Figure 9. Dispersion of percentage of explanation for each feature that has mean percentage equal to or larger than 2% for stations 352140310 and 352150300.

Figure 10. Utility of each feature of IMGW stations.

Table 1. Metrics used for evaluation of the models.

Metric Name	Dataset Name
Metric Name	Jena	IMGW 351180455	IMGW 352200375	IMGW 353170235	IMGW 349190650	IMGW 351190469	IMGW 352140310	IMGW 352150300	IMGW 351160424	IMGW 353200272	IMGW 354160115
Mean Absolute Error	0.73	0.77	0.76	0.70	0.65	0.77	0.78	0.71	0.87	0.74	0.68
Mean Squared Error	1.04	1.11	1.12	0.94	0.81	1.15	1.12	0.93	1.39	1.06	1.01
Root Mean Squared Error	1.02	1.05	1.06	0.97	0.90	1.07	1.06	0.96	1.18	1.03	1.00
$R^{2}$	0.99	0.98	0.98	0.99	0.99	0.98	0.98	0.99	0.98	0.98	0.98

Table 2. Distribution of explanation descriptions for each IMGW dataset (part 1).

Feature Name	Dataset Name
	IMGW 351180455		IMGW 352200375		IMGW 353170235		IMGW 349190650		IMGW 351190469
	Mean [%]	Std [%]	Mean [%]	Std [%]	Mean [%]	Std [%]	Mean [%]	Std [%]	Mean [%]	Std [%]
Air temperature	31.81	0.63	26.27	0.50	25.98	0.76	36.61	0.84	41.46	0.71
Wind direction	0.61	0.13	0.26	0.27	1.14	0.10	0.26	0.31	3.15	0.56
Wind speed	0.73	0.38	0.37	0.39	1.50	0.31	6.57	0.78	1.46	0.58
Wind gust	0.14	0.32	0.15	0.16	0.14	0.07	0.58	0.40	0.12	0.14
Steam pressure	45.29	0.71	32.20	1.39	36.52	0.90	57.06	3.77	25.17	0.74
Relative humidity	28.66	0.53	17.36	1.12	22.75	0.23	27.67	0.57	8.62	1.29
Pressure at station level	13.44	1.35	34.82	2.33	26.55	1.11	75.39	2.51	20.44	2.99
Precipitation over 6 h	0.27	0.09	0.24	0.20	0.75	0.13	0.09	0.09	0.30	0.21
Sunshine duration	0.47	0.50	0.39	0.50	2.08	0.80	1.95	1.06	4.03	1.27
Max wind gust over 12 h	0.39	0.12	0.77	0.38	0.15	0.06	4.04	0.54	0.25	0.12
Min temperature over 12 h	0.02	0.03	0.16	0.19	0.25	0.09	0.21	0.18	0.50	0.28
Max temperature over 12 h	0.24	0.07	0.22	0.08	0.58	0.14	1.01	0.38	0.46	0.37

Table 3. Distribution of explanation descriptions for each IMGW dataset (part 2).

Feature Name	Dataset Name
	IMGW 352140310		IMGW 352150300		IMGW 351160424		IMGW 353200272		IMGW 354160115
	Mean [%]	Std [%]	Mean [%]	Std [%]	Mean [%]	Std [%]	Mean [%]	Std [%]	Mean [%]	Std [%]
Air temperature	53.35	1.18	33.75	2.76	26.44	0.49	43.72	0.25	36.60	0.73
Wind direction	0.92	0.39	2.02	1.08	1.00	0.30	0.77	0.08	0.38	0.10
Wind speed	53.35	1.18	0.52	0.85	1.22	0.42	0.30	0.13	0.58	0.22
Wind gust	0.06	0.12	0.09	0.10	0.14	0.07	1.13	0.11	0.09	0.03
Steam pressure	52.47	0.97	40.93	5.14	41.83	0.46	23.45	0.27	34.10	0.56
Relative humidity	21.35	1.10	19.05	1.76	16.46	0.43	14.37	0.20	13.63	0.41
Pressure at station level	13.07	4.42	11.27	4.88	1.36	1.23	57.02	0.57	32.86	1.80
Precipitation over 6 h	0.22	0.21	0.15	0.36	0.37	0.14	0.80	0.21	0.09	0.09
Sunshine duration	1.43	1.18	1.15	1.64	1.18	1.36	0.00	0.00	1.17	1.06
Max wind gust over 12 h	0.23	0.13	0.10	0.11	0.20	0.07	0.66	0.05	0.12	0.03
Min temperature over 12 h	1.22	0.32	0.41	0.66	0.09	0.09	0.07	0.04	0.33	0.11
Max temperature over 12 h	0.72	0.34	0.49	0.55	0.51	0.25	0.09	0.04	0.76	0.10

Table 4. Distribution of explanation description of Jena dataset.

Feature Name	Mean [%]	Std [%]
p	0.40	0.52
T	5.60	0.06
Tpot	40.24	0.40
Tdew	3.04	0.07
rh	5.00	0.06
VPmax	3.43	0.05
VPact	1.67	0.06
VPdef	0.00	0.00
sh	4.94	0.11
H2OC	7.83	0.07
rho	10.44	0.19
wv	0.00	0.00
max. wv	1.23	0.33
wd	0.69	0.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Szostak, B.; Doroz, R.; Marker, M. Contextual Background Estimation for Explainable AI in Temperature Prediction. Appl. Sci. 2025, 15, 1057. https://doi.org/10.3390/app15031057

AMA Style

Szostak B, Doroz R, Marker M. Contextual Background Estimation for Explainable AI in Temperature Prediction. Applied Sciences. 2025; 15(3):1057. https://doi.org/10.3390/app15031057

Chicago/Turabian Style

Szostak, Bartosz, Rafal Doroz, and Magdalena Marker. 2025. "Contextual Background Estimation for Explainable AI in Temperature Prediction" Applied Sciences 15, no. 3: 1057. https://doi.org/10.3390/app15031057

APA Style

Szostak, B., Doroz, R., & Marker, M. (2025). Contextual Background Estimation for Explainable AI in Temperature Prediction. Applied Sciences, 15(3), 1057. https://doi.org/10.3390/app15031057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Contextual Background Estimation for Explainable AI in Temperature Prediction

Abstract

1. Introduction

1.1. Significance of Machine Learning in Power Systems

1.2. Limits of XAI

1.2.1. Base Value

1.2.2. TimeSHAP and Background

1.3. The Aim of This Article

2. Materials and Methods

2.1. Description of Mean Background Method

2.2. Description of Background Estimation Method

2.3. Summary of Methods and Scenarios of Use

3. Results

3.1. Evaluation of Mean Background Method

3.2. Evaluation of Background Estimation Method

4. Discussion and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI