Evidential Interpretation Approach for Deep Neural Networks in High-Frequency Electromagnetic Wave Processing

Li, Xueliang; Su, Ming; Zhu, Yu; Ma, Shansong; Liu, Shifu; Tong, Zheng

doi:10.3390/electronics14163277

Open AccessArticle

Evidential Interpretation Approach for Deep Neural Networks in High-Frequency Electromagnetic Wave Processing

by

Xueliang Li

¹,

Ming Su

¹,

Yu Zhu

¹,

Shansong Ma

¹,

Shifu Liu

¹ and

Zheng Tong

^2,*

¹

Xinjiang Jiaokan Zhiyuan Engineering Technology Co., Ltd., Urumqi 830022, China

²

School of Transportation, Southeast University, Nanjing 210018, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(16), 3277; https://doi.org/10.3390/electronics14163277

Submission received: 11 July 2025 / Revised: 9 August 2025 / Accepted: 16 August 2025 / Published: 18 August 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Despite the widespread adoption of high-frequency electromagnetic wave (HF-EMW) processing, deep neural networks (DNNs) remain primarily black boxes. Interpreting the semantics behind the high-dimensional representations of a DNN is quite crucial for getting insights into the network. This study has proposed an evidential representation fusion approach that interprets the high-dimensional representations of a DNN as HF-EMW semantics, such as time- and frequency-domain signal features and their physical interpretation. In this approach, an evidential discrete model based on Dempster–Shafer theory (DST) converts a subset of DNN representations to mass function reasoning on a class set, indicating whether the subset contains HF-EMW semantics information. An interpretable continuous DST-based model maps the subset into HF-EMW semantics via representation fusion. Finally, the two DST-based models are extended to interpret the learning processes of high-dimensional DNN representations. Experiments on the two datasets with 2680 and 4000 groups of HF-EMWs demonstrate that the approach can find and interpret representation subsets as HF-EMW semantics, achieving an absolute fractional output change of 39.84% with an 10% removed elements in most important features. The interpretations can be applied for visual learning evaluation, semantic-guided reinforcement learning with an improvement of 4.23% on classification accuracy, and even HF-EMW full-waveform inversion.

Keywords:

electromagnetic wave; deep neural network; Dempster–Shafer theory; signal processing; network interpretation

1. Introduction

High-frequency electromagnetic waves (HF-EMWs) have been widely used to probe specific objects, including concrete [1], metals [2], naval objects [3], and archaeology [4]. An HF-EMW is emitted by a transmitting antenna at frequencies between 10 MHz and 2.6 GHz. A target object or a border with different permittivities causes the wave to be reflected, refracted, and dispersed. The reflected, refracted, and dispersed waves are observed by an HF-EMW receiving antenna for wave inversion.

Wave inversion describes the structure and shape of a target object or a border based on the observed HF-EMWs. Full-waveform inversion (FWI) [5] is a high-resolution HF-EMW inversion approach that analyzes the shape and properties of target objects based on observed waves. FWI, which belongs to the family of partial differential equation-constrained optimization problems [6], minimizes the misfit between the observed and predicted HF-EMW waveforms to build a reasonable velocity model.

In recent studies, deep neural networks (DNNs) have been widely adopted for the optimization process of FWI modeling [7], trying to solve the problems of the nonlinearity of the objective function, local minimum interference, and wave noise. There are two main directions for the DNN-based FWI. One direction utilizes a pairwise dataset of HF-EMWs and velocity models to train a DNN, which generates a direct inversion mapping from observed waves to the shape and properties of target objects [8]. Though the accuracy of the FWI process is improved by the DNNs’ powerful capacity for denoising and parameter optimization, the generalization of these approaches cannot be guaranteed due to the black-box property of DNNs. In detail, the black-box property, especially the high-dimensional representations from DNNs, cannot be interpreted by the semantics of full waveforms, uncertainty, and partial differential equation constraints [9]. For instance, FWI sometimes loses low frequencies using neural networks, called the cycle skipping problem [10]. Another direction of DNN-based FWI defines an FWI problem as a physics-constrained issue by combining neural networks and partial differential equations. For example, generative adversarial networks [11] and convolutional neural networks [12] have been used to build a prior model related to the shape and properties of target objects, and the prior model then fits observed waves by optimizing a lower-dimensional latent variable. These approaches face the problem of determining a prior model for a complex inversion workflow since one cannot interpret the semantics of high-dimensional DNN representations well.

The problems of the two directions derive from the fact that there still needs to be a comprehensive theoretical understanding of high-dimensional DNN representations about HF-EMW semantics. This paper defines the HF-EMW semantics as the time- and frequency-domain signal features of an HF-EMW, which can be interpreted as its physical behavior during its propagation, such as physical properties, boundary conditions and signal patterns. For the first direction of DNN-based FWI, understanding high-dimensional representations may provide insights into the model, such as the relationships between the representations and HF-EMW semantics. With such relationships, it is possible to add the partial differential equation constraint in the representations to improve the reliability of an FWI approach. The understanding also indicates when FWI models are likely incorrect [13]. For the second direction, understanding the DNN-based physics-constrained problem is important in assessing the trust of an FWI parameter from a DNN model [14]. Further, the understanding can transform an untrustworthy FWI model or prediction into a trustworthy one. For example, we understand why a DNN model selects a parameter about HF-EMW semantics in a physics-constrained problem. In that case, it is possible to know whether an FWI model is imperfect in a particular situation and calibrate the inversion error. Finally, the error can easily be corrected by understanding HF-EMW semantics.

Many studies have tried to understand neural network representations in the last decade. These studies can be divided into four groups. The first method is the visualization of DNN representations in intermediate network layers [15], such as gradient-based modeling [16]. This method explores the middle-layer representations by maximizing the outcome of a specific unit in a DNN or inverting representations from a middle layer returning to the input. Even though it is the most straightforward way to investigate high-dimensional representations, the method faces the challenge of exploring the semantics of middle-to-end representations. The second is the diagnosis of DNN representations, which either diagnoses a DNN feature space for a class set or discovers potential representation flaws in middle layers [17]. This approach can find the conflicts between the ground truth and some local representations by masking one or more parts of an input. Thus, the exploration heavily depends on whether the masked parts have high semantics about the ground truth, which requires the masked parts to be easily explored by a human, such as the eyes and mouth in a face image. The third disentangles DNN representations into explanatory graphs or decision trees [18], while the fourth learns neural networks with interpretable representations [19]. The two approaches are also helpful for semantic explorations but require prior knowledge about DNNs. More details of interpretation algorithms for deep learning can be found in [20]. In summary, interpretation is still the Achilles’ heel of DNNs, even though these approaches can interpret high-dimensional representations to some degree.

This study has proposed an approach that interprets the HF-EMW semantics behind the high-dimensional representations of a DNN. In this approach, an evidential discrete model based on Dempster–Shafer theory (DST) first transforms a subset of representations to mass functions reasoning on a class set, indicating whether the subset contains HF-EMW semantics information. An interpretable continuous DST-based model maps the subset into the HF-EMW semantics via representation fusion. Finally, the two DST-based models are extended to interpret the learning processes of high-dimensional representations in a DNN. The main contributions are summarized as follows.

The proposed approach can indicate whether a subset of DNN representations contains an HF-MEW semantic supporting one or more classes. At the same time, it can also explore the subset by the HF-MEW semantic that a human can understand. The subset is trustworthy for high-dimensional representation explanations with small uncertainty. The proposed approach outperforms the other DNN interpretation approaches in HF-EMW processing.
The explained subsets can be applied to evaluate the DNN learning process to avoid over-fitting. The application visually shows the two-phase learning processes of how a DNN captures HF-MEW semantics. Furthermore, the explained subsets can also be used for semantics-guided reinforcement learning, which can improve the performance of a DNN.

The remaining paper first recalls the background on HF-EMW semantics and DST in Section 2. The DST-based proposed approach is then described in Section 3. Section 4 then adopts a sampling method to pick potential subsets instead of interpreting all of them. Section 5 shows the experiments with two HF-EMW datasets, demonstrating the effectiveness of the proposed approach. Finally, Section 6 concludes this study.

2. Background

This section recalls background knowledge on HF-EMW semantics and DST. The definitions and forms of HF-EMW semantics are described in Section 2.1. Basic notions about DST are summarized in Section 2.2.

2.1. HF-EMW Semantics

HF-EMW semantics, also known as “HF-EMW waveform”, describes the time- and frequency-domain features in an HF-EMW. Generally, there are two main types of semantic content in an HF-EMW. One is the semantic contents that can be formalized, such as the time- and frequency-domain waveform equations in Table 1. Another is the semantic contents, which are hard to formalize but easy for humans to understand. For example, Figure 1a shows reflected HF-EMWs with a 100 MHz frequency caused by pipeline leakage. The difference in the reflected waves in the red box is caused by different water contents in the soil, which is easy for experts to understand but hard to describe with formulations. Similarly, time delay and integral of amplitude with travel time are shown in Figure 1b,c, respectively. Previous studies have widely reported the semantic features [21,22].

In recent years, many studies have used HF-EMW semantic contents and DNN-based methods for FWI. Despite achieving remarkable success, they still need to pay more attention to exploring DNN representations, as introduced in Section 1. In the view of FWI, the interpretation of high-dimensional DNN representations is formed to explain the semantic relationship between the DNN representations and HF-EMWs. The interpretation captures the intuition behind DNNs about the FWI process, which can transform an untrustworthy FWI model into a trustworthy one. It may even find some new HF-EMW semantic content to provide feedback on FWI research.

2.2. Dempster–Shafer Theory

As a framework for reasoning and decision making under uncertainty [24], the Dempster–Shafer theory (DST) of belief functions is now well-established [25,26]. In [27,28], Denœux demonstrates that DST can convert the inputs of a neural network into mass functions, which quantifies the uncertainty from lack of evidence (when no evidence provides discriminant information) and conflicting evidence (when different inputs support different classes). This idea has been extended to transform the high-dimensional representations from a DNN into mass functions for object classification, detection, and segmentation with uncertainty quantification [29,30,31]. These findings provide a potential way to interpret the HF-EMW semantics behind the high-dimensional DNN representations.

2.2.1. Mass Functions

Let

Ω = {ω_{i}}_{i = 1}^{M}

be a class set. A discrete mass function on

Ω

is a mapping

m : 2^{Ω} \to [0, 1]

as

\sum_{A \subseteq Ω} m (A) = 1

(1)

and

m (\emptyset) = 0

. A mass m is a share of a unit mass of belief to the hypothesis that the truth is in A but cannot be allocated to any strict subset of A. Any subset

A \subseteq Ω

is defined as a focal set iff

m (A) > 0

. Then, a simple mass function is defined as

m (A) = s, m (Ω) = 1 - s,

(2)

with

A \subset Ω

,

A \neq \emptyset

, and

s \in [0, 1]

.

2.2.2. Dempster’s Rule

Let

m_{1}

and

m_{2}

be two independent mass functions. They can be aggregated as their orthogonal sum, called Dempster’s rule [32], as

(m_{1} \oplus m_{2}) (A) : = \frac{1}{1 - κ} \sum_{B \cap C = A} m_{1} (B) m_{2} (C),

(3)

and

(m_{1} \oplus m_{2}) (\emptyset) : = 0

with

A \subseteq Ω

. In Equation (3),

κ

is the degree of conflict between

m_{1}

and

m_{2}

as

κ : = \sum_{B \cap C = \emptyset} m_{1} (B) m_{2} (C) .

(4)

Dempster’s rule is commutative and associative.

2.2.3. Weights of Evidence

Define

m_{1}

and

m_{2}

as two independent simple mass functions with the same focal set A and degrees of support

s_{1}

and

s_{2}

; their orthogonal sum in Equation (3) can be re-written as

(m_{1} \oplus m_{2}) (A) : = 1 - (1 - s_{1}) (1 - s_{2})

(5a)

(m_{1} \oplus m_{2}) (\emptyset) : = (1 - s_{1}) (1 - s_{2})

(5b)

The weight of evidence [33] is then defined as

\begin{matrix} w & = - \ln [(1 - s_{1}) (1 - s_{2})] \end{matrix}

(6a)

\begin{matrix} = - ln (1 - s_{1}) - \ln (1 - s_{2}) = w_{1} + w_{2} . \end{matrix}

(6b)

Thus, weights of evidence add up when aggregating mass functions using Equation (3). This study denotes this property as

A^{w_{1}} \oplus A^{w_{2}} = A^{w_{1} + w_{2}} .

(7)

In addition, the term “weight” in this study is

- \ln w

.

3. Interpretation of High-Dimension Representations

This section describes a DST-based approach to interpreting the high-dimension representations of a DNN with HF-EMW semantics. Section 3.1 starts with measuring the degree of support from a subset of the DNN representations reasoning on a class set. Section 3.2 then proposes a model that explores the subset with HF-EMW semantics.

3.1. Evidential Reasoning on Class Set

Previous studies [30,34] have demonstrated that a DST-based evidential discrete model can weight the degree of supports from the DNN representations to a class set. Let

X = {x_{j}}_{j = 1}^{P}

be a representation vector from a DNN. The DST-based evidential discrete model first linearly converts an element

x_{j}

into a sign

τ_{i j}

as

τ_{i j} : = β_{i j} x_{j} + α_{i j}

(8)

where

α_{i j}

and

β_{i j}

are two model parameters associated to element

x_{j} \in X

and class

ω_{i} \in Ω

. The weights of

x_{j}

for

{ω_{i}}

and

\bar{{ω_{i}}}

are then defined as the positive and negative portions in

τ_{i j}

, with

τ_{i j}^{+} = max (0, τ_{i j})

and

τ_{i j}^{-} = max (0, - τ_{i j})

, respectively. Two simple mass functions are then computed as

m_{i j}^{+} : = {ω_{i}}^{w_{i j}^{+}}

and

m_{i j}^{-} : = {ω_{i}}^{w_{i j}^{-}}

using Equation (6a,b). Finally, two simple mass functions w.r.t all elements in

X

are fused by adding up the positive and negative weights of evidences w.r.t singleton set

{ω_{i}}

as

\begin{matrix} m_{i}^{+} = ⨁_{j = 1}^{P} m_{i j}^{+} = {ω_{i}}^{w_{i}^{+}} \end{matrix}

(9a)

\begin{matrix} m_{i}^{-} = ⨁_{j = 1}^{P} m_{i j}^{-} = {\bar{{ω_{i}}}}^{w_{i}^{+}}, \end{matrix}

(9b)

with

w_{i}^{+} : = \sum_{j = 1}^{P} w_{i j}^{+} and w_{i}^{-} : = \sum_{j = 1}^{P} w_{i j}^{-} .

where ⨁ is the orthogonal sum of the mass functions from all elements in

X

. Finally, the model outputs

m_{X}

as

m_{X} = ⨁_{i = 1}^{M} ({ω_{i}}^{w_{i}^{+}} \oplus {\bar{{ω_{i}}}}^{w_{i}^{-}}) .

(10)

In practice,

m_{X} = {(m (A), A \subseteq Ω ∖ \emptyset)}^{T}

can be expressed as:

\begin{matrix} m_{X} ({ω_{i}}) = η η^{+} η^{-} \exp (- w_{i}^{-}) \{\exp (- w_{i}^{+}) - 1 + \prod_{l \neq i} [1 - \exp (- w_{l}^{-})]\} \end{matrix}

(11)

for

i = 1, \dots, M

, and

\begin{matrix} m_{X} (A) = η η^{+} η^{-} \{\prod_{ω_{i} \notin A} [1 - \exp (- w_{i}^{-})]\} \{\prod_{ω_{i} \in A} [\exp (- w_{i}^{-})]\} \end{matrix}

(12)

for each

A \subseteq Ω

such that

| A | > 1

. In (11) and (12),

η

,

η^{-}

, and

η^{+}

are functions of the conflicting degree of

m^{-}

and

m^{+}

(4) as

\begin{matrix} η = \frac{1}{1 - κ} = \frac{1}{1 - \sum_{i = 1}^{M} {η^{+} (\exp (w_{i}^{+}) - 1) [1 - η^{-} \exp (- w_{i}^{-})]}}, \end{matrix}

(13a)

\begin{matrix} η^{+} = {(\sum_{i = 1}^{M} \exp (w_{i}^{+}) - M + 1)}^{- 1}, \end{matrix}

(13b)

\begin{matrix} η^{-} = {(1 - \prod_{i = 1}^{M} [1 - \exp (- w_{i}^{-})])}^{- 1} . \end{matrix}

(13c)

The proof Equation (13a–c) has been reported in [27,34].

Schwartz-Ziv and Tishby [35] demonstrate that interpreting a single element of DNN representations is meaningless because of the weak relationships among in-parameters of different units in a DNN layer. This finding indicates two points. First, two mass functions based on two elements in

X

are independent and can be fused using Dempster’s rule. Second, interpreting the semantic contents of subset

x \subseteq X

, such that

| x | > 1

, may be more meaningful than exploring any element x.

Following the commutative and associative properties of Dempster’s rule, mass functions

m_{x} = {(m (A), A \subseteq Ω ∖ \emptyset)}^{T}

can be expressed by adding up the weights of all elements in subset

x

by re-forming Equation (9a,b) as

\begin{matrix} m_{x} ({ω_{i}}) = η η^{+} η^{-} \exp (- w_{i, x}^{-}) \{\exp (- w_{i, x}^{+}) - 1 + \prod_{l \neq i \cap l \in x} [1 - \exp (- w_{l, x}^{-})]\} \end{matrix}

(14)

for

i = 1, \dots, M

, and

\begin{matrix} m_{x} (A) = η η^{+} η^{-} \{\prod_{ω_{i} \notin A} [1 - \exp (- w_{i, x}^{-})]\} \{\prod_{ω_{i} \in A} [\exp (- w_{i, x}^{-})]\} \end{matrix}

(15)

with

w_{i, x}^{+} : = \sum_{x_{j^{'}} \in x} w_{i j^{'}}^{+} and w_{i, x}^{-} : = \sum_{x_{j^{'}} \in x} w_{i j^{'}}^{-} .

Mass

m_{x} ({ω_{i}})

is the

x

-based belief that the truth is class

ω_{i}

, while the mass

m_{x} (A)

is the

x

-based belief that the truth is one of the classes in A but which one cannot be determined. Thus, the following exploration can be performed to find the potential representation subsets with HF-EMW semantics.

Mass $m_{x} ({ω_{i}}) \approx 1$ indicates that subset $x$ highly support that the truth is class ${ω_{i}}$ . Thus, subset $x$ may contain one or more HF-EMW semantic contents related to class $ω_{i}$ .
Mass $m_{x} (A) \approx m_{x} (B)$ indicates that subset $x$ cannot determine that the true class is in subsets A or B. Thus, subset $x$ may have the common HF-EMW semantics related to all classes in the intersection between A and B.
Mass $m_{x} (Ω) \approx 1$ indicates that subset $x$ cannot provide any useful supports to any class on $Ω$ . Thus, subset $x$ has very low HF-EMW semantics.

Therefore, the DST-based evidential discrete model can be used to determine whether a subset of the DNN representations has the HF-EMW semantics. Please note that the learning strategy of the DST-based evidential discrete model has been reported in our previous studies [30,34], while the sensitivity of these parameters to the calculation of mass functions has been demonstrated by the previous study [27].

3.2. Evidential Reasoning on HF-EMW Semantics

3.2.1. Continuous DST-Based Model

Once the potential subsets with HF-EMW semantics are found, the next step is to map these subsets into HF-EMW semantics with uncertainty quantification. A mapping with small uncertainty from a subset

x

into a certain HF-EMW semantic content F explores that a DNN learns the semantic content and encodes it as

x

.

This study extends the DST-based model in [36] to map these subsets into HF-EMW semantics with uncertainty quantification. The basic idea is to covert a subset

x

into Gaussian random fuzzy numbers (GRFNs) via representation fusion, which represents the most plausible predicted value

μ (x)

, variability around this value

σ (x)

, and epistemic uncertainty

h (x)

, respectively. A good mapping with small uncertainty has

μ (x) \approx F

,

σ (x) \approx 0

, and

h (x) \approx 1

. Given a subset

x \in R^{p}

with potential HF-EMW semantics, the proposed continuous DST-based model can be summarized as the following steps.

Step 1.: The similarity between subset $x$ and a prototype vector in the continuous DST-based model is computed as

$d_{z} (x) = \exp (- γ_{z}^{2} | | x - ς_{z} {| |}^{2}),$

(16)

where $γ_{z}$ is a scale factor associated to prototype $ς_{z}$ ; a continuous DST-based model has Z trainable prototype vectors with the p dimension, notated as $ς_{1}, \dots ς_{Z}$ .
Step 2.: The similarity w.r.t $ς_{z}$ is then converted into a GRFN as

${\tilde{F}}_{z} (x) \sim \tilde{N} (μ_{z} (x), σ_{z}^{2}, d_{z} (x) h_{z})$

(17)

where $\tilde{N} (μ, σ^{2}, h)$ is a GRFN with mean $μ$ , variance $σ^{2}$ , and precision h; $σ_{z}^{2}$ and $h_{z}$ are the parameters associated to z; the mean $μ_{z} (x)$ can be computed as

$μ_{z} (x) = ϑ_{z}^{T} x + ϑ_{z 0}$

(18)

where $ϑ_{z}$ and $ϑ_{z 0}$ are a trainable parameter vector and scale associated to prototype $ς_{z}$ . The quantity $μ_{z} (x)$ is a prediction of the conditional expectation of an HF-EMW semantic content F, while $σ_{z}$ is the corresponding variation. If the value of $| | x - ς_{z} | |$ tends toward infinity, the precision $d_{z} (x) h_{z}$ is close to zero, and ${\tilde{F}}_{z} (x)$ cannot support the hypothesis that the true value is $μ_{z} (x)$ .
Step 3.: The Z GRFNs from prototypes $ς_{1}, \dots ς_{Z}$ are then aggregated by a generalized Dempster’s rule operation ⊞ as $\tilde{F} (x) \sim \tilde{N} (μ (x), σ^{2}, d (x) h)$ such that

$μ (x) = \frac{\sum_{z = 1}^{Z} d_{z} (x) h_{z} μ_{z} (x)}{\sum_{z = 1}^{Z} d_{z} (x) h_{z}},$

(19a)

$σ^{2} (x) = \frac{\sum_{z = 1}^{Z} d_{z}^{2} (x) h_{z}^{2} σ_{z}^{2} (x)}{{(\sum_{z = 1}^{Z} d_{z} (x) h_{z})}^{2}},$

(19b)

$h (x) = \sum_{z = 1}^{Z} d_{z} (x) h_{z} .$

(19c)

The proof of the continuous DST-based model is introduced in Appendix A. The aggregated GRFN

\tilde{F} (x)

can explore subset

x

with HF-EMW semantic F as follows:

The output $μ (x)$ represents the estimate of the conditional expectation of HF-EMW semantics F. A small distance $| μ (x) - F |$ indicates that subset $x$ has useful information to support the semantics.
The variance output $σ^{2} (x)$ represents the conditional variability w.r.t F when the given input is $x$ , which can be regarded as aleatory uncertainty. A large value of $σ^{2} (x)$ indicates a large aleatory uncertainty. The large value might be caused by random noise or some elements in $x$ , which do not contain related information about F. In this case, there may be a strict subset in $x$ , which is the learned semantic knowledge about F.
The precision output $h (x)$ represents the conditional precision of F when the input is $x$ , which can be regarded as epistemic uncertainty. A small value of $h^{2} (x)$ indicates a large epistemic uncertainty. The large value might originate from the fact that $x$ does not include enough information about F or contains some conflicting information.

3.2.2. Evidential Belief Prediction Interval

It is challenging to understand the correlation between

x

and F based on

\tilde{F} (x)

with three outputs. Therefore, this study utilizes a belief prediction interval to evaluate whether the predicted probability distribution

\tilde{F} (x)

is similar to the true conditional distribution of F.

Let

{\hat{C}}_{\tilde{F} | x}

be a predicted cumulative distribution function for

\tilde{F} (x)

. With

Π \in (0, 1]

, an belief prediction interval is defined to calibrate the aleatory and epistemic uncertainties in

\tilde{F}

as

G_{Π} (x) = [{\hat{C}}_{\tilde{F} | x}^{- 1} (\frac{1 - Π}{2}), {\hat{C}}_{\tilde{F} | x}^{- 1} (\frac{1 + Π}{2})]

(20)

where

{\hat{C}}_{\tilde{F} | x}^{- 1} (π)

is the inverse of the cumulative distribution function

{\hat{C}}_{\tilde{F} | x} (π)

, telling what

\tilde{F} | x

makes

{\hat{C}}_{\tilde{F} | x}

return a value

Π

. The inverse of cumulative distribution function can be computed as

\forall Π \in (0, 1), P_{x, \tilde{F}} (F \in G_{Π} (x)) \approx Π .

(21)

For any

Π \in (0, 1]

, the

Π

-level interval in Equation (20) can be regarded as a prediction with a confidence level

Π

, where

x

contains the truth of HF-EMW semantics F. Thus, the prediction

\tilde{F} (x)

can be said to be well calibrated with a coverage probability

P_{x, \tilde{F}} (F (x) \in G_{Π} (x)) \geq Π .

(22)

Finally, a calibration plot of coverage probability vs.

Π

can be used to visually examine the predictions

\tilde{F}

. The predictions

\tilde{F}

are well-calibrated when the plot of coverage probability vs.

Π

is close to the first diagonal. A good calibration indicates that

μ (x)

is close to the true HF-EMW semantics F, and two types of uncertainties are small. This behavior demonstrates that subset

x

is the encoded HF-EMW semantic content F.

3.2.3. Interpretation of DNN Learning Processes

Once we find some representation subsets containing HF-EMW semantics, it is natural to understand the process of how a DNN captures the HF-EMW semantics by learning. The discrete mass functions provide a way to visualize the representation learning process of HF-EMW semantics. This study designs two observations in learning as follows:

A curve of $\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (ω_{i})$ vs. epoch is plotted to visualize the change about the evidence of supports to class $ω_{i} \in A$ based on subset $x$ , where $T_{i}$ is the subset of a training set $T$ , including samples belonging to class $ω_{i}$ .
A curve of $\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (Ω)$ vs. epoch is plotted to visualize the change about lack of evidence (no information in $x$ provides related classification information).

3.2.4. Learning Strategy

Section 3.2.1 and Section 3.2.2 introduce a parameter set

Ψ = {ψ_{1}, \dots, ψ_{Z}}

associated with Z prototypes with

ψ_{z} = (ς_{z}, γ_{z}, ϑ_{z}, ϑ_{z 0}, σ_{z}^{2}, h_{z})

. These parameters should be adjusted by a learning set before the proposed approach interprets the high-dimensional representations of a DNN.

Let

T = {(W_{q}, F_{q})}_{q = 1}^{Q}

be a learning set for a DNN, where

W

is an HF-EMW with manually labeled semantics F and Q is the number of the samples in the set. The DNN generates a high-dimensional representation vector

X

once an input

W

is given. Then, a subset

x \in X

is selected to obtain a new learning set as

T^{'} = {(W_{q}, X_{q}, x_{q}, F_{q})}_{q = 1}^{Q}

. The subset-selection method will be introduced in Section 4. The new learning set

T^{'}

can be used to adjust the parameter set

Ψ

.

Once given

x_{q}

with its labeled HF-EMW feature

F_{q}

, Equation (20) outputs an interval

G_{Π} (x_{q})

. To measure the gap between label

F_{q}

and interval

G_{Π} (x_{q})

, we define a small value

ε

as a learning hyper-parameter and obtain an interval w.r.t

F_{q}

as

{[F]}_{ε} = [F - ε, F + ε]

. Following the general definitions of random fuzzy set in [37,38], the expected necessity and possibility w.r.t to

{[F]}_{ε}

can be computed as

\begin{matrix} B e l_{\tilde{F}} ({[F]}_{ε}) & = \int N_{\tilde{F}} ({[F]}_{ε}) d P \\ = Φ (\frac{F + ε - μ}{σ}) - Φ (\frac{F - ε - μ}{σ}) \\ - p l_{\tilde{F}} (F - ε) [Φ (\frac{F - μ}{σ \sqrt{h σ^{2} + 1}}) - Φ (\frac{F - ε - μ}{σ \sqrt{h σ^{2} + 1}})] \\ - p l_{\tilde{F}} (F + ε) [Φ (\frac{F + ε - μ}{σ \sqrt{h σ^{2} + 1}}) - Φ (\frac{F - μ}{σ \sqrt{h σ^{2} + 1}})] \end{matrix}

(23a)

\begin{matrix} P l_{\tilde{F}} ({[F]}_{ε}) & = 1 - B e l_{\tilde{F}} ({[F]}_{ε}^{c}) \\ = Φ (\frac{F + ε - μ}{σ}) - Φ (\frac{F - ε - μ}{σ}) \\ + p l_{\tilde{F}} (F - ε) Φ (\frac{F - ε - μ}{σ \sqrt{h σ^{2} + 1}}) + p l_{\tilde{F}} (F + ε) [1 - Φ (\frac{F + ε - μ}{σ \sqrt{h σ^{2} + 1}})] \end{matrix}

(23b)

with

\begin{matrix} p l_{\tilde{F}} (x) = \frac{1}{\sqrt{1 + h σ^{2}}} \exp (- \frac{h (x - μ)}{2 (1 + h σ^{2})}), \end{matrix}

(23c)

where

Φ

is the standard normal cumulative distribution function, and

{[F]}_{ε}^{c}

is the complementary set of

{[F]}_{ε}

.

The learning strategy computes the gap of

{[F]}_{ε}

and

G_{Π} (x)

as a trade-off between the expected necessity and possibility as

C_{λ, ε} (G_{Π} (x), F) = - λ \ln B e l_{\tilde{F}} ({[F]}_{ε}) - (1 - λ) \ln P l_{\tilde{F}} {[F]}_{ε},

(24)

where

λ \in [0, 1]

is the trade-off weight of the expected necessity; a small value of

λ

amounts to favoring cautious prediction.

Therefore, the loss function w.r.t

{[F]}_{ε}

and

G_{Π} (x)

is defined as

L_{λ, ε, ξ, ρ} (T) = \frac{1}{Q} \sum_{q = 1}^{Q} C_{λ, ε} (G_{Π} (x_{q}), F_{q}) + \frac{ξ}{Z} \sum_{z = 1}^{Z} h_{z} + \frac{ρ}{Z} \sum_{z = 1}^{Z} γ_{z}^{2} .

(25)

On the right side, the first term computes the average loss of Q samples; the second term with regularization coefficient

ξ

is used to reduce the effect of the number of prototypes; and the third term on the right side with regularization coefficient

ρ

shrinks the solution towards a linear model.

4. Sampling Method for Representation Subsets

Although the proposed approach provides an understanding of HF-EMW semantics, it is still not practical to interpret all subsets of representations due to trillions of potential options. For example, there is a complexity of

O ((2^{| X |} - 1) \times | Ω |)

if we exhaustively explore all subsets. Unfortunately, the cardinality of representation

| X |

is more than a thousand in practice, such as 2048 in an AlexNet model [39]. Thus, this study adopts a sampling method to pick potential subsets and then interpret them using the proposed approach.

For the learning set

T^{'} = {(W_{q}, X_{q}, x_{q}, F_{q})}_{q = 1}^{Q}

, the element importance

I (x_{j}; ω_{i})

is defined to measure the information in

x_{j}

exploring the instances belonging to class

ω_{i}

as

I (x_{j}; ω_{i}) = \frac{1}{Q} \sum_{q}^{Q} \frac{Max τ - τ_{i j q}^{+}}{Max τ - Min τ},

(26a)

with

Max τ = max_{k = 1, \dots, Q} τ_{i j k}^{+} and Min τ min_{k = 1, \dots, Q} τ_{i j k}^{+}

(26b)

where

x_{j}

is the jth element in

X

;

τ_{i j q}^{+}

is the positive sign of element

x_{j}

from sample q supporting class

ω_{i}

, which can be computed in Equation (8). A large value of I indicates that element

x_{j}

of sample q contains much information exploring class

ω_{i}

, following the intuition of the sign in [27].

Using Equation (26a,b), some elements have large values of I for two or more classes. Such elements contain the information of HF-EMW semantics in two or more classes. For example, two types of waves in Figure 2 have similar time-domain amplitude features in a time window of 5–6 ns. Thus, we should place the elements into

2^{| Ω |} - 1

bins using I, such as the example in Table 2. The information of Elements 1–3 is larger than the threshold to support class

ω_{1}

, while the information of Elements 2 and 4 is larger than the threshold to support class

ω_{2}

. The information of Element 5 is smaller than the thresholds to support classes

ω_{1}

and

ω_{2}

. Thus, Elements 1 and 3 are the elements of a subset with the potential HF-EMW semantics only in class

ω_{1}

, while Element 2 is the element of a subset with the potential HF-EMW semantics in both classes

ω_{1}

and

ω_{2}

.

Algorithm 1 has been proposed to place each element

x \in X

into one of

2^{| Ω |} - 1

bins. The complexity is

O (| X | \times | Ω |) ≪ O ((2^{| X |} - 1) \times | Ω |)

. The final outputs are a set of bins

B = {B_{A} | A \subseteq Ω}

, in which each bin

B_{A}

contains the element(s) in

X

supporting hypothesis A. The elements in bin

B_{A}

build a subset

x

supporting A.

Algorithm 1 Sampling algorithm for local representations

Input:

x_{j} \in X

Output:

B = {B_{A} | A \subseteq Ω}

Require:

V_{i}

as a threshold for

ω_{i} \in Ω

,

i = 1, \dots, M

for all $x_{j} \in X$ do
for all $ω_{i} \in Ω$ do
if $I (x_{j}; ω_{i}) > V_{i}$ then ▹ Using (26a,b)
$B_{i}$ .insert(j) ▹ Element $x_{j}$ contains semantic of class i with $I (x_{j}; ω_{i}) > V_{i}$ .
end if
end for
end for
for all $i = 1, \dots, M$ do
$A = B_{i} \cap B_{\neg i}$ ▹ Find elements contain semantic of class i and also other classes.
if $A \neq \emptyset$ then
$B_{A}$ .insert(A) ▹ Include elements with semantic of all classes in A.
$B_{i} \leftarrow B_{i} ∖ A$ ▹ Remove the elements in $B_{A}$ from $B_{i}$ with $i \in A$ .
end if
end for

5. Numerical Experiments

This section presents two numerical experiments that use the proposed approach to interpret the HF-EMW semantics of high-dimensional DNN representations. The experiment settings are described in Section 5.1, followed by the results of semantic explanations and trustworthiness evaluations in Section 5.2. Finally, Section 5.3 presents the three applications of the semantics interpretation.

5.1. Experiment Setting

5.1.1. Datasets

Two HF-EMW datasets are adopted in the numerical experiments: airport road EM (AREM) and underground object EM (UOEM).

The AREM dataset [40] includes 2680 HF-EMWs with a frequency of 1.0 GHz. All waves were collected by an IDS radar in Nanjing Dajiaochang and Xuzhou Guanyin airports with a maximum detection speed of 18 km/h. The HF-EMWs were collected in the temperature range of 3–10 °C on the four different cement-based materials with various material conditions, which have been reported in our previous study [40,41]. The numbers of sampling points in an HF-EMW are normalized as 477 after start-time moving processing. There are four types of HF-EMWs: slab bottom interface

ω_{1}

, void without rebar interference

ω_{2}

, single-layer rebars

ω_{3}

, and void with rebar interference

ω_{4}

. Figure 2 shows examples of the four types. The dataset is split into training, validation, and testing sets with a ratio of 6:2:2.

The UOEM dataset [23] contains 4000 groups of three-dimensional (3D) HF-EMWs, which investigate the underground areas of three cities in China: Nanjing, Suzhou, and Nanchang. These groups were collected by several 3D radars with antenna frequencies of 200 MHz, 450 MHz, 800 MHz, and 1.2 GHz and nonlinear antenna gains, where the sample distribution on various frequencies is shown in Table 3. Each group of 3D HF-EMWs covers an underground range with a width of 1.5 m, a length of 1.0 m with a sampling rate of 0.02 m, and a time-window depth of 30 ns with 128 sampling points. The dataset has three classes: (a) normal 3D HF-EMW group without underground object

ω_{1}

, (b) abnormal 3D HF-EMW group with underground pipeline

ω_{2}

, and (c) abnormal 3D HF-EMW group with pipeline leakage

ω_{3}

. The ratio of the three classes is about 3:1:1. The dataset is split into training, validation, and testing sets with a ratio of 6:2:2.

5.1.2. Network Details

The proposed approach interprets the high-dimensional features from two state-of-the-art networks trained by the two datasets.

On the AREM dataset, a signal-wise cascade deep network (SWC-Net) [40] is used to detect the abnormal areas of the HF-EMWs, whose architecture is shown in Figure 3. The optimal architecture of the SWC-Net includes, in order, a VGG-16 backbone, two

1 \times 1

convolution layers, a softmax layer for classification, and a regression layer for abnormal area prediction. The proposed approach is adopted to interpret the HF-EMW semantics of the high-dimension feature vector

X

from the end of the backbone, where the dimension of the vector is 1972.

On the UOEM dataset, a transformer model has been used to detect the abnormal areas of the three classes, called the dual-branch frequency domain feature fusion transformer (DBFFT) [42], as shown in Figure 4. The optimal architecture of the DBFFT network (DBFFT-H-L) has four stages, where the first stage consists of a progressive patch embedding with the channels

[48, 48, 96, 96, 96]

, and each stage has parallel frequency and spatial encoders. The proposed approach is adopted to interpret the HF-EMW semantics of the high-dimension feature vector

X

from the end of the final stage, in which the dimension of each feature vector is 512.

5.1.3. Training Details

The two networks were trained on 2 Tesla RTX A6000 GPUs from scratch. For the SWC-Net network, the learning optimizer is NADAM, whose initial learning rate and beta coefficients are 0.001 and (0.9, 0.999), respectively. The batch size and training epochs are 20 and 200, respectively. For the DBFFT network, the learning optimizer AdamW has 0.9 momentum. The initial learning rate is set to

\frac{b a t c h_s i z e}{512} \times 0.0005

and is decayed by cosine schedule. The batch size and training epochs are 2 and 300, respectively. The hyper-parameters of the two networks are the same as the ones in their original works [40,42]

After the training of the SWC-Net and DBFFT networks, the learning strategy in Section 3.2.4 is adopted with a NADAM optimizer to train the proposed approach, whose initial learning rate and beta coefficients are 0.001 and (0.9, 0.999). The batch size and training epochs are 20 and 150, respectively. The optimized hyper-parameters of the proposed method are determined by the validation set of the two datasets.

5.1.4. Comparison Study and Trustworthiness Metrics

In the two experiments, the proposed approach is compared with the other widely used interpretation approaches, including LIME [14], GALE [43], FULLGrad [44], Chefer et al. [45], Score-CAM [46], Dice [47], and TACV [48]. In the comparison study, each approach should find at least three subsets in the high-dimensional feature vector from the SWC-Net or DBFFT models, where the subsets should be the most relevant to the tasks of the two models. The trustworthiness of the subsets is measured to evaluate the capacities of the approaches on the semantic exploration of high-dimensional representation.

Three trustworthiness methods have been used to compare the interpretability of the proposed approach with the others as follows.

Most Relevant First (MoRF) [49]: Partial elements in the three subsets are replaced by random values, and the output change is measured to evaluate whether the subsets are important for the related task. A large change indicates that the subsets are trustworthy and important for HF-EMW semantics.
Remove and retrain (ROAR) [50]: In the MoRF method, the output change might result from the network not being trained well. Thus, the ROAR method retrains the network on the subset with the random values. The subsets are not highlighted as necessary if the accuracy does not drop.
Calibration plot [38]: In the probabilistic case, the trustworthiness can be evaluated by a calibration plot of coverage rate (22) vs. confidence levels $Π \in {0.1, \dots, 0.9}$ . A subset is trustworthy if the curve lies above the first diagonal.

5.2. Semantic Explanations and Trustworthiness Evaluations

5.2.1. Experiment on the AREM Dataset

Start with the semantic explanations of the SWC-net representations. The proposed approach finds the three subsets

x_{1}

,

x_{2}

, and

x_{3}

with the maximum information I using Equation (26a,b) and Algorithm 1 (Details of the three subsets can be found in https://docs.qq.com/sheet/DVUtxcktOWlpKU1ls) (accessed on 17 August 2025), where

x_{1}

,

x_{2}

, and

x_{3}

belong to bins

B_{{ω_{1}}}

,

B_{{ω_{3}}}

, and

B_{{ω_{1}, ω_{2}}}

, respectively.

Figure 5 presents the discrete mass functions of the three subsets’ reasoning on different classes. In the first row of subset

x_{1}

in Figure 5a, the majority of the samples belonging to

ω_{1}

(red points) achieve mass functions

m_{x_{1}} (ω_{1}) \approx 1

, while the samples belonging to the other classes (yellow, blue, and green point) have

m_{x_{1}} (ω_{1}) \approx 0

. This indicates that the

x_{1}

may represent the HF-EMW semantics related to class

ω_{1}

following in the theoretical explanation in Section 3.1. Similarly, subset

x_{2}

consists of the HF-EMW semantics related to class

ω_{3}

. Different from

x_{1}

and

x_{2}

, subset

x_{3}

has similar values of

m_{x_{3}} (ω_{1})

and

m_{x_{3}} (ω_{2})

and a large value of

m_{x_{3}} ({ω_{1}, ω_{2}})

when a sample belongs to classes

ω_{1}

or

ω_{2}

. Thus, subset

x_{3}

characterizes the HF-EMW semantics related to classes

ω_{1}

and

ω_{2}

.

Figure 6 shows the belief prediction interval

G_{Π} (x_{1})

on the HF-EMW semantics

F_{8}

in a time window [8 ns, 10 ns], where the form of

F_{8}

is shown in Table 1. If a subset contains a specific semantics, the continuous DST-based model in Section 3.2 can effectively predict the semantic value using the subset as an inputs and the predicted value falls into a BMI with a

Π = 0.5

. In Figure 6a, when the testing samples belong to class

ω_{1}

, all predicted values

μ (x_{1})

are close to the truths, and the majority of the truths fall into the belief prediction interval

G_{Π = 0.5} (x_{1})

, indicating

x_{1}

can be used to predict the semantics

F_{8}

in a time window [8 ns, 10 ns] with a small uncertainty if the samples belong to class

ω_{1}

. Therefore,

x_{1}

contains a specific semantics

F_{8}

in a time window [8 ns, 10 ns]. However, Figure 6b–d indicate that

x_{1}

cannot predict the values of

F_{8}

in a time window [8 ns, 10 ns] when the samples belong to

ω_{2}

,

ω_{3}

, or

ω_{4}

. Thus,

x_{1}

has the HF-EMW semantics w.r.t

F_{8}

with a time window [8 ns, 10 ns]. In detail, for class

ω_{1}

, there are several significant couples of peaks and troughs in the HF-EMWs with a time window [8 ns, 10 ns], as shown in the red box of Figure 2a. Such couples lead to a large value of

F_{8}

in the time window [8 ns, 10 ns]. Since the couples derive from the wave propagation from a slab bottom to a soil surface, it can be inferred that the subset

x_{1}

has the semantics that an input may belong to class

ω_{1}

if it has couples of peaks and troughs in the time window [8 ns, 10 ns]. Similarly, following Figure 2c and Figure 7,

x_{2}

has the HF-EMW semantics relevant to the HF-EMW vibrations

F_{3}

in the time window [3 ns, 5 ns] originating from the rebar interference. As for

x_{3}

related to

F_{4}

in the time window [8 ns, 10 ns], it has the semantics that there is an HF-EMW vibration originating from a void at the slab bottom, where this phenomenon can be observed in Figure 2a,b and Figure 8.

The results of MoRF and ROAR are shown in Figure 9 to evaluate the semantic effectiveness of the most important three subsets from the proposed method and other explanation approaches. In the proposed method, with the increase in the percentage of the randomized elements in

x_{1}

,

x_{2}

, and

x_{3}

, the absolute fractional output change and the ignorance

m (Ω)

increases, and the testing accuracy decreases. This phenomenon indicates that the proposed approach finds the most important feature subsets in the detection task. The three subsets are trustworthy because they have significant attribution to the task. A randomization loss of the partial elements in the subsets leads to a significant decrease in accuracy and an increase in ignorance. In addition, the changes in absolute fractional output and testing accuracy of the proposed approach are more significant than the ones of the other interpretation approaches. This indicates that the subsets from the proposed approach are more important than those from the other approaches. The examples in Figure 10 can visually explain the behavior in Figure 9. For an HF-EMW belonging to class

ω_{1}

, its most significant feature is the couples of peaks and troughs in time window [8 ns, 10 ns], such as the one in the red box in Figure 2a. Thus, the most important HF-EMW semantic should focus on this time window, which is most significant difference between class 1 and others. Figure 10a indicates that

x_{1}

from the proposed method focuses on this time window, while Figure 10b–h shows that the other methods focus on the time windows either smaller than or larger than the window [8 ns, 10 ns], even completely deviating from the window. This is mainly because the DST-based method can represent the conflict via

m_{x} (A) \approx m_{x} (B)

and the ignorance via

m (Ω) \approx 1

in the reasoning on class set, and it can also represent the aleatory and epistemic uncertainty via

1 < σ^{2} < 0

and

1 < h < 0

, respectively, in the reasoning on HF-EMW semantics. Unfortunately, the other methods are in the framework of probability theory, which only captures the conflict and aleatory uncertainty aspect of an HF-EMW, but neither ambiguity nor incompleteness, which are inherent in uncertain data. Thus, the proposed approach is more suitable trustworthy than the other explanation approaches in the interpretation of HF-EMWs.

The calibration plots of

x_{1}

are shown in Figure 11a to measure the uncertainty in HF-EMW semantics. The calibration curve of class

ω_{1}

is close to the first diagonal, while the ones of classes 2–4 are far from the diagonal. This indicates that the proposed approach can predict the HF-EMW semantics

F_{8}

in [8 ns, 10 ns] using

x_{1}

with high precision

h (x_{1})

and small errors

| F - μ (x_{1}) |

. Thus, the two phenomena in Figure 6 and Figure 11 indicate that subset

x_{1}

contains the information of HF-EMW semantics

F_{8}

supporting and only supporting class

ω_{1}

. Similar behaviors can also be found in subsets

x_{2}

and

x_{3}

. Thus, the subsets from the proposed approach are trustworthy with a small uncertainty. Though the proposed approach can find subsets with potential HF-EMW semantics and map the subsets to certain semantics, it is still challenging to determine the HF-EMW semantics of these subsets. For example, the proposed approach finds the mapping from

x_{1}

to

F_{8}

by a uniformed search strategy. Such a challenge makes it difficult to explore subsets by some HF-EMW semantics that is not well defined by humans.

5.2.2. Experiment on the UOEM Dataset

The proposed approach can also interpret the HF-EMW semantics of the most important three subsets, as shown in Figure 12, Figure 13, Figure 14 and Figure 15. For example, subset

x_{1}

contains the encoded HF-EMW semantics related to the integral of amplitude with travel time in the first 3 ns. In detail, subset

x_{1}

means that the underground pipelines in the soil with different water contents cause the difference of the reflected HF-EMWs in the first 3 ns, as shown in Figure 1c. Figure 12c indicates that

x_{1}

captures the HF-EMW semantics about the effects of pipelines and water on class 3 that contains pipelines and water. Similarly, the transformer model also learns the HF-EMW semantics about the signal phase reversal (Figure 1a) and time delay (Figure 1b), as shown in Figure 13, Figure 14 and Figure 15b,c. In addition, the most important subsets,

x_{1}

,

x_{2}

, and

x_{3}

, are trustworthy based on the evaluation results in Figure 16. In summary, the Transformer model can capture the HF-EMW semantics by encoding the input into high-dimensional representations.

5.3. Applications of HF-EMW Semantics Explanations

The HF-EMW semantics explanations can be used in many directions, such as network learning evaluation in Section 5.3.1 and semantics-guided reinforcement learning in Section 5.3.2.

5.3.1. Learning Evaluation

Figure 17 presents the evolution of the discrete mass functions vs. learning epoch in the AREM training set. Before learning optimization, the averaged values of

m_{x} (ω_{i})

are close to 0 and the averaged value of

m_{x} (Ω)

is close to 1. This is because the random initialization of the network parameters cannot capture the information about the four classes in the task. This phenomenon has also been reported in the previous study [30], indicating that the network cannot output useful information before learning. Then, the learning processes have two phases: quick representation learning and slow representation fine-tuning. At the beginning of the learning optimization (about the first thirty epochs in Figure 17), there is an abrupt decrease in the averaged value of

m_{x} (Ω)

and a significant increase in the averaged value of

m_{x} (ω_{1})

, including the blue surface in Figure 17a, orange surface in Figure 17b, and blue and green surfaces in Figure 17c. In this phase, the network quickly learns some HF-EMW semantics and outputs

x_{1}

with useful information by encoding the input to efficient representations. The network then supports class

ω_{1}

and rejects complete ignorance. After the first thirty epochs, the changes of the two average values are slow and small, and the network weights fluctuate primarily due to random diffusion, with the minimal influence of error gradients. This phase is marked by a slow representation learning. This behavior has been also explained in an information-plane view [35]. A similar phenomenon can also be found in the learning optimization on the UOEM training set, as shown in Figure 18. The phenomena of the two learning optimizations indicate that the mass functions visually present how a network learns HF-EMW semantics during learning. Therefore, the discrete mass function

m (Ω)

can be used to evaluate the learning evaluation, which may avoid over-fitting. In detail, a complete learning optimization should have two learning phases, and the learning optimization should be stopped if

m_{x} (Ω)

is smaller than an ignorance threshold. The threshold heavily depends on the uncertainty of a learning set, whose determination method will be our future work.

5.3.2. Semantics-Guided Reinforcement Learning

The subsets with the explained HF-EMW semantics can also be used for semantics-guided reinforcement learning. Take the AREM dataset as an example. Having determined the subsets

x_{1}

,

x_{2}

, and

x_{3}

in Section 5.2, we used the finite-difference time-domain (FDTD) simulation [40] to find the ideal HF-EMWs and compute the perfect subsets

x_{1}

,

x_{2}

, and

x_{3}

of a wave without real-world noises, such as material heterogeneity and wave diffusion. The gaps among the perfect and on-site subsets have been computed and used to fine-tune the network parameters through the negative direction of the blue arrow in Figure 3. This process is semantics-guided reinforcement learning. Table 4 shows the performance of the networks before and after the fine-tuning. The fine-tuned network exceeds the one before fine-tuning on the classification and regression metrics, indicating that semantics-guided reinforcement learning can improve the accuracy of the SWC-net.

In addition, different from the applications in deep learning, the high-dimensional features with the HF-EMW semantics also provide a new option as the inputs of FWI. Common FWI methods predict the permittivity distribution of a space using the widely accepted features of HF-EMWs, such as electromagnetic field intensity and wave speed. These approaches face the challenge of a non-unique solution since the input features are not informative enough for the inversion. It might be helpful to use the subsets, rather than these widely accepted features, to solve the challenge. For example, the subsets

x_{1}

,

x_{2}

, and

x_{3}

from a HF-EMW can be used as the inputs of FWI to predict the the permittivity distribution in the propagation path of the HF-EMW. Then, the permittivity distribution of a 2D/3D space can be predicted using the HF-EMWs in the space. This is a potential application of the proposed approach in the field of electromagnetism, which will be our future work.

6. Conclusions

This study has proposed an evidential representation fusion approach that interprets the HF-EMW semantics behind the high-dimensional representations of a DNN by mapping and fusing some subsets of DNN representations into HF-EMW semantics. In this approach, an evidential discrete model first converts a subset of DNN representations to mass function reasoning on the class set. An interpretable continuous DST-based model then maps the subset into the HF-EMW semantics. Finally, the two DST-based models are extended to interpret the learning processes of high-dimensional representations in a DNN. The numerical experiments on two HF-EMW datasets demonstrate the effectiveness of the proposed framework. The conclusions can be drawn as follows.

The evidential discrete models can indicate whether a subset of representations contains an HF-MEW semantic supporting one or more classes, while an interpretable continuous DST-based model interprets the subset as the HF-MEW semantic that humans can understand.
The trustworthiness evaluation indicates that the representation subsets from the proposed approach are trustworthy for high-dimensional representation explanations with small uncertainty. The proposed approach outperforms the other interpretation approaches in MoRF and ROAR testing views, achieving an absolute fractional output change of 39.84% with 10% removed elements in most important features.
The explained subsets can be applied to evaluate the learning process to avoid under- and over-fitting. The application visually shows the two-phase learning processes of how the subset captures semantic intuition. Furthermore, the explained subsets can also be used for semantics-guided reinforcement learning, where the semantic-guided reinforcement learning make an improvement of 4.23% on classification accuracy.
Regarding limitations, the proposed approach cannot interpret the non-formalized semantics of electromagnetic signals, such as the ones in Figure 1. To address this issue, we consider converting the interpretable continuous DST-based model into an evidential signal inversion model, which can directly invert the subsets of DNN representations into the distribution of an electromagnetic property in a 2D/3D space. One potential way is to use the important semantic subsets, such as $x_{1}$ , $x_{2}$ , and $x_{3}$ in Section 5, as the inputs of FWI to predict the the permittivity distribution in the propagation path of the HF-EMW. Then, the permittivity distribution of a 2D/3D space can be predicted using the HF-EMWs in the space.

Author Contributions

Conceptualization, Z.T.; methodology, X.L., M.S. and Z.T.; software, X.L. and M.S.; validation, M.S., Y.Z., S.M. and S.L.; investigation, X.L., M.S., Y.Z., S.M. and S.L.; data curation, X.L., M.S. and Y.Z.; writing—original draft preparation, X.L.; writing—review and editing, Z.T. and M.S.; visualization, M.S., Y.Z., S.M. and S.L.; supervision, Z.T.; project administration, X.L. and M.S.; funding acquisition, X.L. and Z.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Xinjiang Uygur Autonomous Region Key Research and Development Project, grant number 2021B01005, National Natural Science Foundation of China, grant number 52308447, and Jiangsu Province Youth Science and Technology Talent Lifting Project under Grant JSTJ-2024-089.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

Authors Xueliang Li, Ming Su, Yu Zhu, Shansong Ma and Shifu Liu were employed by the company Xinjiang Jiaokan Zhiyuan Engineering Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Proof of the Predictive GRFN

A predictive GRFN can map

x

to a random variable F in Section 3.2.1. Define Gaussian fuzzy number (GFN) as a fuzzy subset of

R

with

GFN (x; m, h) = \exp (- \frac{h}{2} {(x - m)}^{2}),

(A1)

where

m \in R

is the mode and

h \in [0, + \infty]

is the precision. Two GFNs

{GFN}_{1} (x; m_{1}, h_{1})

and

{GFN}_{2} (x; m_{2}, h_{2})

can be combined as

{GFN}_{12} (x; m_{12}, h_{12})

with

\begin{matrix} m_{12} = \frac{h_{1} m_{1} + h_{2} m_{2}}{h_{1} + h_{2}}, h_{12} = h_{1} + h_{2} . \end{matrix}

(A2)

A GRFN can be expressed by a GFN with the mode of a Gaussian random variable. In detail, a Gaussian random variable with mean

μ

and variance

σ^{2}

is formed as

M : Ω \to R

in a probability space

(Ω, \sum_{Ω}, P)

. Then, a random fuzzy set

\tilde{N} : Ω \to {[0, 1]}^{R}

in Equation (17) is written as

\tilde{N} (ω) = GFN (M (ω), h),

(A3)

where

\tilde{N} (μ, σ^{2}, h)

, called the GRFN, has a location parameter

μ

and two uncertainty parameters

σ^{2}

and h. The two uncertainty parameters represent possibility and probability. In particular, a GRFN

\tilde{N}

with

h = + \infty

equals a Gaussian random variable with mean

μ

and variance

σ^{2}

. In contrast, the case with

h = 0

has

\tilde{N} (ω) (x) = 1

for all

ω \in Ω

and

x \in R

. Furthermore,

\tilde{N}

with

σ^{2} = 0

has a constant random variable M taking value

μ

, which can be seen as a possibilistic variable with possibility distribution

GFN (μ, h)

In addition, Equation (19a–c) in Section 3.2.1 combines several GRFNs into one. Given two GRFNs

{\tilde{N}}_{1} (μ_{1}, σ_{1}^{2}, h_{1})

and

{\tilde{N}}_{2} (μ_{2}, σ_{2}^{2}, h_{2})

,

{\tilde{N}}_{1} ⊞ {\tilde{N}}_{2} \sim {\tilde{N}}_{1, 2} (μ_{12}, σ_{12}^{2}, h_{12})

can be defined based on the property of GFN in Equation (A2) as

μ_{12} = \frac{h_{1} μ_{1} + h_{2} μ_{2}}{h_{1} + h_{2}}

(A4a)

σ_{12}^{2} = \frac{h_{1}^{2} σ_{1}^{2} + h_{2}^{2} σ_{2}^{2}}{{(h_{1} + h_{2})}^{2}}

(A4b)

h_{12} = h_{1} + h_{2} .

(A4c)

Obviously, the property is commutative and associative. Thus, two or more GRFNs can be aggregated by accumulation operation as Equation (19a–c).

References

Wu, H.T.; Li, H.; Chi, H.L.; Kou, W.B.; Wu, Y.C.; Wang, S. A hierarchical federated learning framework for collaborative quality defect inspection in construction. Eng. Appl. Artif. Intell. 2024, 133, 108218. [Google Scholar] [CrossRef]
Yang, H.; Ma, T.; Huyan, J.; Han, C.; Wang, H. Aggregation segregation generative adversarial network (AG-GAN) facilitated multi-scale segregation detection in asphalt pavement paving stage. Eng. Appl. Artif. Intell. 2024, 129, 107663. [Google Scholar] [CrossRef]
Yu, C.M.; Lin, Y.H. The docking control system of an autonomous underwater vehicle combining intelligent object recognition and deep reinforcement learning. Eng. Appl. Artif. Intell. 2025, 139, 109565. [Google Scholar] [CrossRef]
Alzubaidi, L.; Chlaib, H.K.; Fadhel, M.A.; Chen, Y.; Bai, J.; Albahri, A.S.; Gu, Y. Reliable deep learning framework for the ground penetrating radar data to locate the horizontal variation in levee soil compaction. Eng. Appl. Artif. Intell. 2024, 129, 107627. [Google Scholar] [CrossRef]
Virieux, J.; Operto, S. An overview of full-waveform inversion in exploration geophysics. Geophysics 2009, 74, WCC1–WCC26. [Google Scholar] [CrossRef]
Plessix, R.E. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 2006, 167, 495–503. [Google Scholar] [CrossRef]
Adler, A.; Araya-Polo, M.; Poggio, T. Deep learning for seismic inverse problems: Toward the acceleration of geophysical analysis workflows. IEEE Signal Process. Mag. 2021, 38, 89–119. [Google Scholar] [CrossRef]
Kazei, V.; Ovcharenko, O.; Plotnitskii, P.; Peter, D.; Zhang, X.; Alkhalifah, T. Mapping full seismic waveforms to vertical velocity profiles by deep learning. Geophysics 2021, 86, R711–R721. [Google Scholar] [CrossRef]
Gebraad, L.; Boehm, C.; Fichtner, A. Bayesian elastic full-waveform inversion using Hamiltonian Monte Carlo. J. Geophys. Res. Solid Earth 2020, 125, e2019JB018428. [Google Scholar] [CrossRef]
Sun, H.; Demanet, L. Extrapolated full-waveform inversion with deep learning. Geophysics 2020, 85, R275–R288. [Google Scholar] [CrossRef]
Mosser, L.; Dubrule, O.; Blunt, M.J. Stochastic seismic waveform inversion using generative adversarial networks as a geological prior. Math. Geosci. 2020, 52, 53–79. [Google Scholar] [CrossRef]
Wu, Y.; McMechan, G.A. CNN-boosted full-waveform inversion. In Proceedings of the SEG International Exposition and Annual Meeting, Online, 11–16 October 2020; SEG: Houston, TX, USA, 2020; p. D031S057R003. [Google Scholar]
Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1321–1330. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Zhang, K.; Yan, J.; Zhang, F.; Ge, C.; Wan, W.; Sun, J.; Zhang, H. Spectral-Spatial Dual Graph Unfolding Network for Multispectral and Hyperspectral Image Fusion. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5508718. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
Keshk, M.; Koroniotis, N.; Pham, N.; Moustafa, N.; Turnbull, B.; Zomaya, A.Y. An explainable deep learning-enabled intrusion detection framework in IoT networks. Inf. Sci. 2023, 639, 119000. [Google Scholar] [CrossRef]
Zhang, Q.; Yang, Y.; Ma, H.; Wu, Y.N. Interpreting cnns via decision trees. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6261–6270. [Google Scholar]
Yan, J.; Zhang, K.; Zhang, F.; Ge, C.; Wan, W.; Sun, J. Multispectral and hyperspectral image fusion based on low-rank unfolding network. Signal Process 2023, 213, 109223. [Google Scholar] [CrossRef]
Sun, L.; Zhang, K.; Zhang, F.; Wan, W.; Sun, J. Deep Rank-N Decomposition Network for Image Fusion. IEEE Trans. Multimed. 2024, 26, 7335–7348. [Google Scholar] [CrossRef]
Zou, L.; Tosti, F.; Alani, A.M. Nondestructive inspection of tree trunks using a dual-polarized ground-penetrating radar system. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–8. [Google Scholar] [CrossRef]
Liu, B.; Ren, Y.; Liu, H.; Xu, H.; Wang, Z.; Cohn, A.G.; Jiang, P. GPRInvNet: Deep learning-based ground-penetrating radar data inversion for tunnel linings. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8305–8325. [Google Scholar] [CrossRef]
Fang, Y.; Ma, T.; Tong, Z. Ground-Penetrating Radar Wave Response Simulation of Pipe Leakage in Subgrade Soil of Urban Road with Couple of Leakage and Radio Frequency. In Proceedings of the 22nd 103rd Transportation Research Board (TRB) Annual Meeting, Washington, DC, USA, 8–12 January 2023. [Google Scholar]
Yager, R.R.; Liu, L. Classic Works of the Dempster-Shafer Theory of Belief Functions; Springer: Berlin/Heidelberg, Germany, 2008; Volume 219. [Google Scholar]
Dempster, A.P. Upper and lower probabilities induced by a multivalued mapping. In Classic Works of the Dempster-Shafer Theory of Belief Functions; Springer: Berlin/Heidelberg, Germany, 2008; pp. 57–72. [Google Scholar]
Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976; Volume 42. [Google Scholar]
Denœux, T. Logistic regression, neural networks and Dempster–Shafer theory: A new perspective. Knowl.-Based Syst. 2019, 176, 54–67. [Google Scholar] [CrossRef]
Denœux, T. An evidential neural network model for regression based on random fuzzy numbers. In Proceedings of the International Conference on Belief Functions, Paris, France, 26–28 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 57–66. [Google Scholar]
Huang, L.; Ruan, S.; Decazes, P.; Denœux, T. Lymphoma segmentation from 3D PET-CT images using a deep evidential network. Int. J. Approx. Reason. 2022, 149, 39–60. [Google Scholar] [CrossRef]
Tong, Z.; Xu, P.; Denoeux, T. An evidential classifier based on Dempster-Shafer theory and deep learning. Neurocomputing 2021, 450, 275–293. [Google Scholar] [CrossRef]
Tong, Z.; Xu, P.; Denoeux, T. Evidential fully convolutional network for semantic segmentation. Appl. Intell. 2021, 51, 6376–6399. [Google Scholar] [CrossRef]
Denoeux, T. Distributed combination of belief functions. Inf. Fusion 2021, 65, 179–191. [Google Scholar] [CrossRef]
Denoeux, T.; Kanjanatarakul, O.; Sriboonchitta, S. A new evidential k-nearest neighbor rule based on contextual discounting with partially supervised learning. Int. J. Approx. Reason. 2019, 113, 287–302. [Google Scholar] [CrossRef]
Tong, Z.; Ma, T.; Zhang, W.; Huyan, J. Evidential transformer for pavement distress segmentation. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 2317–2338. [Google Scholar] [CrossRef]
Shwartz-Ziv, R.; Tishby, N. Opening the black box of deep neural networks via information. arXiv 2017, arXiv:1703.00810. [Google Scholar] [CrossRef]
Denœux, T. Quantifying prediction uncertainty in regression using random fuzzy sets: The ENNreg model. IEEE Trans. Fuzzy Syst. 2023, 31, 3690–3699. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1978, 1, 3–28. [Google Scholar] [CrossRef]
Denœux, T. Reasoning with fuzzy and uncertain evidence using epistemic random fuzzy sets: General framework and practical models. Fuzzy Sets Syst. 2023, 453, 1–36. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 26, 84–90. [Google Scholar] [CrossRef]
Zhang, Y.; Tong, Z.; She, X.; Wang, S.; Zhang, W.; Fan, J.; Cheng, H.; Yang, H.; Cao, J. SWC-Net and Multi-Phase Heterogeneous FDTD Model for Void Detection Underneath Airport Pavement Slab. IEEE Trans. Intell. Transp. Syst. 2024, 25, 20698–20714. [Google Scholar] [CrossRef]
Tong, Z.; Zhang, Y.; Ma, T. Permittivity measurement with uncertainty quantification in cement-based composites using ENNreg-ANet and high-frequency electromagnetic waves. Measurement 2025, 244, 116537. [Google Scholar] [CrossRef]
Zeng, J.; Huang, L.; Bai, X.; Wang, K. DBFFT: Adversarial-robust dual-branch frequency domain feature fusion in vision transformers. Inf. Fusion 2024, 108, 102387. [Google Scholar] [CrossRef]
Van Der Linden, I.; Haned, H.; Kanoulas, E. Global aggregations of local explanations for black box models. arXiv 2019, arXiv:1907.03039. [Google Scholar] [CrossRef]
Srinivas, S.; Fleuret, F. Full-gradient representation for neural network visualization. Adv. Neural Inf. Process. Syst. 2019, 32, 4124–4133. [Google Scholar]
Chefer, H.; Gur, S.; Wolf, L. Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 782–791. [Google Scholar]
Wang, H.; Wang, Z.; Du, M.; Yang, F.; Zhang, Z.; Ding, S.; Mardziel, P.; Hu, X. Score-CAM: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 24–25. [Google Scholar]
Mothilal, R.K.; Sharma, A.; Tan, C. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 607–617. [Google Scholar]
Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.; Viegas, F.; Sayres, R. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2668–2677. [Google Scholar]
Samek, W.; Binder, A.; Montavon, G.; Lapuschkin, S.; Müller, K.R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2660–2673. [Google Scholar] [CrossRef]
Hooker, S.; Erhan, D.; Kindermans, P.J.; Kim, B. A benchmark for interpretability methods in deep neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 9737–9748. [Google Scholar]

Figure 1. Semantic contents of HF-EMWs are hard to formalize but easy for a human to understand: (a) Time-domain HF-EMWs caused by a pipe leakage [23] and the curves with different colors are the HF-EMWs collected from different distances from the leakage center; (b) time delay of HF-EMWs owing to the increase of water content in the soil, where the red and black dots indicates the time difference of two HF-EMWs traveling same distance; and (c) integral of amplitude with travel time, indicating the energy dissipation difference of HF-EMWs in different water contents;

n s

is nanosecond.

Figure 1. Semantic contents of HF-EMWs are hard to formalize but easy for a human to understand: (a) Time-domain HF-EMWs caused by a pipe leakage [23] and the curves with different colors are the HF-EMWs collected from different distances from the leakage center; (b) time delay of HF-EMWs owing to the increase of water content in the soil, where the red and black dots indicates the time difference of two HF-EMWs traveling same distance; and (c) integral of amplitude with travel time, indicating the energy dissipation difference of HF-EMWs in different water contents;

n s

is nanosecond.

Figure 2. HF-EMW examples on the AREM dataset [40]: (a) slab bottom interface

ω_{1}

, (b) void without rebar interference

ω_{2}

, (c) single-layer rebars

ω_{3}

and (d) void with rebar interference

ω_{4}

.

Figure 2. HF-EMW examples on the AREM dataset [40]: (a) slab bottom interface

ω_{1}

, (b) void without rebar interference

ω_{2}

, (c) single-layer rebars

ω_{3}

and (d) void with rebar interference

ω_{4}

.

Figure 3. Architecture of signal-wise cascade deep network (SWC-net) [40]. The green arrows are the interference flow of the SWC-net, while the blue ones are the process of representation interpretation. The architecture of the interpretable continuous DST-based model is shown in the pink box.

Figure 4. Architecture of dual-branch frequency domain feature fusion transformer (DBFFT) [42]. The pink box is the process of representation interpretation. The architecture of the interpretable continuous DST-based model is shown in the pink box.

Figure 5. On the AREM dataset, the first, second, and third rows are the discrete masses based on

x_{1}

,

x_{2}

,

x_{3}

, respectively. The first, second, third, and forth columns are the masses’ reasoning on sets

{ω_{1}}

,

{ω_{3}}

,

{ω_{1}, ω_{2}}

, and

Ω

, respectively. Different colors stand for samples labeled with different classes.

Figure 5. On the AREM dataset, the first, second, and third rows are the discrete masses based on

x_{1}

,

x_{2}

,

x_{3}

, respectively. The first, second, third, and forth columns are the masses’ reasoning on sets

{ω_{1}}

,

{ω_{3}}

,

{ω_{1}, ω_{2}}

, and

Ω

, respectively. Different colors stand for samples labeled with different classes.

Figure 6. AREM testing data, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{1})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), (c), and (d) are the samples belonging to

ω_{1}

,

ω_{1}

,

ω_{1}

, and

ω_{1}

, respectively.

Figure 6. AREM testing data, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{1})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), (c), and (d) are the samples belonging to

ω_{1}

,

ω_{1}

,

ω_{1}

, and

ω_{1}

, respectively.

Figure 7. Testing data of AREM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{2})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), (c), and (d) are the samples belonging to

ω_{1}

,

ω_{1}

,

ω_{1}

, and

ω_{1}

, respectively.

Figure 7. Testing data of AREM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{2})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), (c), and (d) are the samples belonging to

ω_{1}

,

ω_{1}

,

ω_{1}

, and

ω_{1}

, respectively.

Figure 8. Testing data of the AREM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{3})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), (c), and (d) are the samples belonging to

ω_{1}

,

ω_{2}

,

ω_{3}

, and

ω_{4}

, respectively.

Figure 8. Testing data of the AREM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{3})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), (c), and (d) are the samples belonging to

ω_{1}

,

ω_{2}

,

ω_{3}

, and

ω_{4}

, respectively.

Figure 9. The curves of absolute fractional output change, testing accuracy, and

m (Ω)

vs. the removing percentage of the most relevant features

x_{1}

(a),

x_{2}

(b), and

x_{3}

(c) in the AREM testing set.

Figure 9. The curves of absolute fractional output change, testing accuracy, and

m (Ω)

vs. the removing percentage of the most relevant features

x_{1}

(a),

x_{2}

(b), and

x_{3}

(c) in the AREM testing set.

Figure 10. Heat maps of the most important features on an example belonging to class

ω_{1}

: (a) ours, (b) LIME, (c) GALE, (d) FULLGrad, (e) Chefer et al., (f) Score-CAM, (g) Dice, and (h) TACV.

Figure 10. Heat maps of the most important features on an example belonging to class

ω_{1}

: (a) ours, (b) LIME, (c) GALE, (d) FULLGrad, (e) Chefer et al., (f) Score-CAM, (g) Dice, and (h) TACV.

Figure 11. On the AREM dataset, calibration plots for belief prediction intervals of (a)

x_{1}

, (b)

x_{2}

, (c)

x_{3}

, which presents the testing converge rates vs. the belief confidence levels

Π \in {0.1, 0.2, \dots, 0.9}

.

Figure 11. On the AREM dataset, calibration plots for belief prediction intervals of (a)

x_{1}

, (b)

x_{2}

, (c)

x_{3}

, which presents the testing converge rates vs. the belief confidence levels

Π \in {0.1, 0.2, \dots, 0.9}

.

Figure 12. Testing data of UOEM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{1})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), and (c) are the samples belonging to

ω_{1}

,

ω_{2}

, and

ω_{3}

, respectively.

Figure 12. Testing data of UOEM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{1})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), and (c) are the samples belonging to

ω_{1}

,

ω_{2}

, and

ω_{3}

, respectively.

Figure 13. Testing data of UOEM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{2})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), and (c) are the samples belonging to

ω_{1}

,

ω_{2}

, and

ω_{3}

, respectively.

Figure 13. Testing data of UOEM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{2})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), and (c) are the samples belonging to

ω_{1}

,

ω_{2}

, and

ω_{3}

, respectively.

Figure 14. Testing data of UOEM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{3})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), and (c) are the samples belonging to

ω_{1}

,

ω_{2}

, and

ω_{3}

, respectively.

Figure 14. Testing data of UOEM dataset, truth (red solid line) and predictions from a GRFN model: expected value

μ (x_{3})

(yellow broken line) and belief prediction intervals at level

Π \in {0.5, 0.9, 0.99}

, where sub-figures (a), (b), and (c) are the samples belonging to

ω_{1}

,

ω_{2}

, and

ω_{3}

, respectively.

Figure 15. On the UOEM dataset, calibration plots for belief prediction intervals of

x_{1}

(a),

x_{2}

(b),

x_{3}

(c), which presents the testing converge rates vs. the belief confidence levels

Π \in {0.1, 0.2, \dots, 0.9}

.

Figure 15. On the UOEM dataset, calibration plots for belief prediction intervals of

x_{1}

(a),

x_{2}

(b),

x_{3}

(c), which presents the testing converge rates vs. the belief confidence levels

Π \in {0.1, 0.2, \dots, 0.9}

.

Figure 16. The curves of absolute fractional output change, testing accuracy, and

m (Ω)

vs. the removing percentage of the most relevant features

x_{1}

(a),

x_{2}

(b), and

x_{3}

(c) on the UOEM dataset.

Figure 16. The curves of absolute fractional output change, testing accuracy, and

m (Ω)

vs. the removing percentage of the most relevant features

x_{1}

(a),

x_{2}

(b), and

x_{3}

(c) on the UOEM dataset.

Figure 17. On the AREM dataset, the evolution of mass function

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (ω_{1})

and

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (Ω)

during training epochs, where

T_{i}

is the testing samples belonging to class

ω_{i}

.

Figure 17. On the AREM dataset, the evolution of mass function

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (ω_{1})

and

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (Ω)

during training epochs, where

T_{i}

is the testing samples belonging to class

ω_{i}

.

Figure 18. On the UOEM dataset, the evolution of mass function

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (ω_{1})

and

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (Ω)

during training epochs, where

T_{i}

is the testing samples belonging to class

ω_{i}

.

Figure 18. On the UOEM dataset, the evolution of mass function

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (ω_{1})

and

\frac{1}{| T_{i} |} \sum_{X \in T_{i}, x \subseteq X} m_{x} (Ω)

during training epochs, where

T_{i}

is the testing samples belonging to class

ω_{i}

.

Table 1. Time- and frequency-domain HF-EMW waveform equations, where

t_{i}

is the ith point in an HF-EMW,

i = 1, \dots, N

;

f (k)

is the kth frequency component of the frequency-amplitude obtained by fast Fourier transformation with

k = 1, \dots, K

and K is the number of frequency components.

Table 1. Time- and frequency-domain HF-EMW waveform equations, where

t_{i}

is the ith point in an HF-EMW,

i = 1, \dots, N

;

f (k)

is the kth frequency component of the frequency-amplitude obtained by fast Fourier transformation with

k = 1, \dots, K

and K is the number of frequency components.

$F_{1} = \frac{1}{n} \sum_{i = 1}^{n} t_{i}$	$F_{2} = \frac{1}{n} \sum_{i = 1}^{n} \| t_{i} \|$	$F_{3} = max \| t_{i} \|$	$F_{4} = \frac{1}{n} \sum_{i = 1}^{n} {(t_{i} - F_{1})}^{2}$	$F_{5} = \frac{1}{n} \sum_{i = 1}^{n} {(t_{i} - F_{1})}^{3}$
$F_{6} = \frac{1}{n} \sum_{i = 1}^{n} {(t_{i} - F_{1})}^{4}$	$F_{7} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} t_{i}^{2}}$	$F_{8} = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(t_{i} - F_{1})}^{2}}$	$F_{9} = max (t_{i}) - min (t_{i})$	$F_{10} = {(\frac{1}{n} \sum_{i = 1}^{n} \sqrt{\| t_{i} \|})}^{2}$
$F_{11} = \frac{max \| t_{i} \|}{\sqrt{\frac{1}{n} \sum_{i = 1}^{n} t_{i}^{2}}}$	$F_{12} = \frac{\sum_{i = 1}^{n} {(t_{i} - F_{1})}^{3}}{(n - 1) F_{8}^{3}}$	$F_{13} = \frac{\sum_{i = 1}^{n} {(t_{i} - F_{1})}^{4}}{(n - 1) F_{8}^{4}}$	$F_{14} = \frac{max \| t_{i} \|}{F_{10}}$	$F_{15} = \frac{\sqrt{\frac{1}{n} \sum_{i - 1}^{n} t_{i}^{2}}}{\frac{1}{n} \sum_{i = 1}^{n} \| t_{i} \|}$
$F_{16} = \frac{max \| t_{i} \|}{\frac{1}{n} \sum_{i = 1}^{n} \| t_{i} \|}$	$F_{17} = \frac{1}{n} \sum_{i = 1}^{n} t_{i}^{2}$	$F_{18} = \frac{1}{n} \sum_{i = 1}^{n} {(\frac{t_{i} - F_{17}}{F_{8}})}^{3}$	$F_{19} = \frac{\sum_{k = 1}^{K} f (k)}{K}$	$F_{20} = \sqrt{\frac{\sum_{k = 1}^{K} {(f (k) - P_{19})}^{2}}{K - 1}}$
$F_{21} = \frac{\sum_{k = 1}^{K} {(f (k) - P_{19})}^{3}}{(K - 1) F_{20}^{3}}$	$F_{22} = \frac{\sum_{k = 1}^{K} {(f (k) - P_{19})}^{4}}{(K - 1) F_{20}^{4}}$	$F_{23} = \frac{\sum_{k = 1}^{K} k \cdot f (k)}{\sum_{k = 1}^{K} f (k)}$	$F_{24} = \sqrt{\frac{\sum_{k = 1}^{K} {(f (k) - F_{23})}^{2}}{K}}$	$F_{25} = \sqrt{\frac{(\sum_{k^{'} = 1}^{K} f {(k^{'})}^{2} \cdot f (k)}{\sum_{k = 1}^{K} f (k)}}$
$F_{26} = \sqrt{\frac{(\sum_{k^{'} = 1}^{K} f {(k^{'})}^{4} \cdot f (k)}{\sum_{k = 1}^{K} f {(k)}^{2} \cdot f (k)}}$	$F_{27} = \sqrt{\frac{\sum_{k^{'} = 1}^{K} f {(k^{'})}^{2} \cdot f (k)}{\sqrt{\sum_{k = 1}^{K} f (k) \cdot \sum_{k^{″} = 1}^{K} f {(k^{″})}^{4}}}}$	$F_{28} = \frac{F_{24}}{F_{23}}$	$F_{29} = \frac{\sum_{k = 1}^{K} {(f (k) - F_{23})}^{2} \cdot f (k)}{K \cdot P_{24}^{3}}$	$F_{30} = \frac{\sum_{k = 1}^{K} {(f (k) - F_{23})}^{4} \cdot f (k)}{K \cdot P_{24}^{4}}$

Table 2. Examples of sampling local representation, where

ω_{*}

is the labeled classes and Electronics 14 03277 i001

indicates an element supports the labeled class.

Table 2. Examples of sampling local representation, where

ω_{*}

is the labeled classes and Electronics 14 03277 i001

indicates an element supports the labeled class.

Examples	Element Number
Examples	1	2	3	4	5
Sample 1 $(ω_{*} = ω_{1})$
Sample 2 $(ω_{*} = ω_{1})$
Sample 3 $(ω_{*} = ω_{2})$
Sample 4 $(ω_{*} = ω_{2})$

Table 3. Split protocols on the HF-EMW dataset of underground objects. Each column presents the number of HF-EMWs with different frequencies.

Frequency		200 MHz	450 MHz	800 MHz	1.2 GHz
No.	Training	240	360	900	900
	Validation	80	120	300	300
	Testing	80	120	300	300
	Total	400	600	1500	1500

Table 4. Testing performance of the SWC nets with/without semantics-guided reinforcement learning. The metric

C I o U

evaluates the overlapping between predicted and labeled abnormal-area intervals [40]. The models “SWC-net” and “Semantics-guided SWC-net” are the SWC nets with/without semantics-guided reinforcement learning, respectively.

Table 4. Testing performance of the SWC nets with/without semantics-guided reinforcement learning. The metric

C I o U

evaluates the overlapping between predicted and labeled abnormal-area intervals [40]. The models “SWC-net” and “Semantics-guided SWC-net” are the SWC nets with/without semantics-guided reinforcement learning, respectively.

	RNN	YOLO v8	DetTransformer	StreamPETR	SWC-Net	Semantics-Guided SWC-Net
Classification accuracy/%	76.85	89.91	90.32	88.26	91.27	94.26
CIoU/%	79.34	85.28	81.58	82.36	87.15	90.03

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Su, M.; Zhu, Y.; Ma, S.; Liu, S.; Tong, Z. Evidential Interpretation Approach for Deep Neural Networks in High-Frequency Electromagnetic Wave Processing. Electronics 2025, 14, 3277. https://doi.org/10.3390/electronics14163277

AMA Style

Li X, Su M, Zhu Y, Ma S, Liu S, Tong Z. Evidential Interpretation Approach for Deep Neural Networks in High-Frequency Electromagnetic Wave Processing. Electronics. 2025; 14(16):3277. https://doi.org/10.3390/electronics14163277

Chicago/Turabian Style

Li, Xueliang, Ming Su, Yu Zhu, Shansong Ma, Shifu Liu, and Zheng Tong. 2025. "Evidential Interpretation Approach for Deep Neural Networks in High-Frequency Electromagnetic Wave Processing" Electronics 14, no. 16: 3277. https://doi.org/10.3390/electronics14163277

APA Style

Li, X., Su, M., Zhu, Y., Ma, S., Liu, S., & Tong, Z. (2025). Evidential Interpretation Approach for Deep Neural Networks in High-Frequency Electromagnetic Wave Processing. Electronics, 14(16), 3277. https://doi.org/10.3390/electronics14163277

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evidential Interpretation Approach for Deep Neural Networks in High-Frequency Electromagnetic Wave Processing

Abstract

1. Introduction

2. Background

2.1. HF-EMW Semantics

2.2. Dempster–Shafer Theory

2.2.1. Mass Functions

2.2.2. Dempster’s Rule

2.2.3. Weights of Evidence

3. Interpretation of High-Dimension Representations

3.1. Evidential Reasoning on Class Set

3.2. Evidential Reasoning on HF-EMW Semantics

3.2.1. Continuous DST-Based Model

3.2.2. Evidential Belief Prediction Interval

3.2.3. Interpretation of DNN Learning Processes

3.2.4. Learning Strategy

4. Sampling Method for Representation Subsets

5. Numerical Experiments

5.1. Experiment Setting

5.1.1. Datasets

5.1.2. Network Details

5.1.3. Training Details

5.1.4. Comparison Study and Trustworthiness Metrics

5.2. Semantic Explanations and Trustworthiness Evaluations

5.2.1. Experiment on the AREM Dataset

5.2.2. Experiment on the UOEM Dataset

5.3. Applications of HF-EMW Semantics Explanations

5.3.1. Learning Evaluation

5.3.2. Semantics-Guided Reinforcement Learning

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of the Predictive GRFN

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI