Causality-Aware Training of Physics-Informed Neural Networks for Solving Inverse Problems

Kim, Jaeseung; Son, Hwijae

doi:10.3390/math13071057

Open AccessArticle

Causality-Aware Training of Physics-Informed Neural Networks for Solving Inverse Problems

by

Jaeseung Kim

^† and

Hwijae Son

^*,†

Department of Mathematics, Konkuk University, Seoul 05029, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(7), 1057; https://doi.org/10.3390/math13071057

Submission received: 21 February 2025 / Revised: 21 March 2025 / Accepted: 22 March 2025 / Published: 24 March 2025

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

Inverse Physics-Informed Neural Networks (inverse PINNs) offer a robust framework for solving inverse problems governed by partial differential equations (PDEs), particularly in scenarios with limited or noisy data. However, conventional inverse PINNs do not explicitly incorporate causality, which hinders their ability to capture the sequential dependencies inherent in physical systems. This study introduces Causal Inverse PINNs (CI-PINNs), a novel framework that integrates directional causality constraints across both temporal and spatial domains. Our approach leverages customized loss functions that adjust weights based on initial conditions, boundary conditions, and observed data, ensuring the model adheres to the system’s intrinsic causal structure. To evaluate CI-PINNs, we apply them to three representative inverse PDE problems, including an inverse problem involving the wave equation and inverse source problems for the parabolic and elliptic equations, each requiring distinct causal considerations. Experimental results demonstrate that CI-PINNs significantly improve accuracy and stability compared to conventional inverse PINNs by progressively enforcing causality-driven conditions and data consistency. This work underscores the potential of CI-PINNs to enhance robustness and reliability in solving complex inverse problems across diverse physical domains.

Keywords:

physics-informed neural networks (PINNs); inverse problems; respecting causality

MSC:

68T01; 68T07; 35R30

1. Introduction

Recent breakthroughs in deep learning have transformed numerous fields, including computer vision and natural language processing [1,2,3,4,5,6,7]. In parallel, partial differential equations (PDEs) and delay differential equations (DDEs) have long been essential tools for modeling nonlinear dynamics in natural and engineered systems. These equations capture complex phenomena such as turbulence, pattern formation, and chaotic behavior and have been the subject of extensive theoretical and numerical investigations. Recent studies have provided significant advancements in this field, as demonstrated in works such as [8,9,10,11]. Furthermore, recent investigations have extended our understanding of nonlinear and delay systems. Xiuli Xu [8] analyzed the asymptotic limit of the Navier–Stokes–Poisson–Korteweg system in a half-space, providing key insights into boundary layer phenomena and long-term dynamics in complex fluid systems. In the realm of geophysical flows, Boling Guo [10] studied the decay of solutions to a two-layer quasi-geostrophic model, elucidating the mechanisms of energy dissipation and stabilization in stratified environments. Moreover, Wentao Wang has made significant contributions to the stability analysis of delay systems; his work on the global exponential stability of periodic solutions for inertial delayed BAM neural networks [9] and his study employing the characteristics method to investigate delayed inertial neural networks [11] offer rigorous frameworks to ensure robustness in systems affected by delays. These studies deepen our theoretical understanding and underscore the importance of integrating accurate physical modeling into data-driven approaches.

This progress has given rise to Physics-Informed Neural Networks (PINNs), which incorporate governing equations, initial conditions, and boundary conditions directly into the neural network’s loss function [12,13]. By leveraging automatic differentiation, PINNs accurately compute the necessary derivatives, ensuring that the learned solutions conform to the underlying physics even when data are sparse or noisy. In this framework, the unknown solution of a PDE is represented as a deep neural network, and all physical constraints are enforced simultaneously through a composite loss function [12,13,14]. PINNs have proven effective for solving forward problems such as fluid dynamics, heat conduction, and electromagnetics [13,15,16]. PINNs compute the solutions to the PDEs when all necessary conditions are provided.

Inverse problems governed by partial differential equations (PDEs), where one must infer unknown quantities in the system from limited measurements, are ubiquitous in science and engineering, with applications spanning material characterization to biomedical imaging. However, these problems are inherently challenging due to their ill-posed nature and the need to adhere to fundamental physical laws. Inverse PINNs address these challenges by incorporating data loss terms that align predictions with sparse and noisy observations while satisfying the physical principles. Recent progress in diverse scientific problems, including material science, biomedical imaging, and system identification, exhibits the inverse PINNs’ effectiveness [13,17,18,19,20]. Additionally, Bayesian PINNs have been introduced to incorporate prior knowledge and quantify uncertainty in inverse problems, providing a probabilistic framework for PDE-constrained learning [21]. Meanwhile, sparsity-aware neural networks enforce parsimonious representations by penalizing unnecessary parameters, which can be particularly advantageous for high-dimensional or ill-posed systems [22]. Integrating these approaches with causal training strategies may further enhance the robustness and interpretability of inverse PDE solvers.

Despite their effectiveness, both conventional and inverse PINNs share a critical limitation: they often fail to enforce the inherent causal structure of physical systems, especially in time-dependent problems. Temporal causality mandates that solutions evolve naturally from initial conditions and progress continuously. Without enforcing this principle, training may converge to non-physical or unstable solutions. Recent studies have introduced causal training frameworks [23] and augmented Lagrangian methods [24] to tackle this issue by prioritizing learning during early time steps or near boundaries. More recently, the training paradigm that respects temporal causality has been developed by incorporating forward numerical solvers and multiple neural networks [25,26]. However, researchers have not yet adequately adapted these approaches for inverse problems, which require balancing the influence of additional observational data with physical constraints because current methods focus primarily on temporal causality. This limitation arises from the restrictive design of the causal weights, which motivates the current work.

In this paper, we propose a novel framework, Causal Inverse PINNs (CI-PINNs), a causality-respecting training framework of PINNs for solving inverse problems. Unlike conventional approaches, CI-PINNs integrate multidirectional causal weights—covering temporal, spatial, and data dimensions—directly into the loss function. This innovative strategy enables the model to prioritize supervised regions, such as initial and boundary conditions and observational data, thereby enforcing the natural causal structure of the underlying physical system. Consequently, our framework produces more stable and physically consistent reconstructions.

However, the development and application of CI-PINNs come with several technical challenges. First, the design and tuning of the causal weights require careful consideration; the weights must accurately reflect the significance of various regions in the domain while ensuring a smooth transition from supervised to unsupervised areas. In particular, selecting an optimal causality parameter (e.g.,

ϵ

) is critical, as inappropriate values can lead to insufficient propagation of supervision or cause an overly rapid convergence that masks the true dynamics of the system. The computational overhead introduced by computing multidimensional causal weights can be nontrivial, especially when scaling to high-dimensional or more complex physical problems. Addressing these challenges is essential for ensuring that the benefits of the proposed method are realized in practical applications.

We organize this paper as follows. First, we offer a comprehensive review of PINNs by examining the core mechanisms used to solve forward problems and exploring the extensions developed for inverse problems. This review synthesizes insights from seminal works (e.g., [12,27,28,29]) as well as modern reviews on physics-informed machine learning. Second, we offer an in-depth examination of state-of-the-art developments in inverse PINNs and related parameter identification methods, incorporating recent advances in Bayesian approaches and sparse identification techniques as detailed in [21,22]. Third, we introduce the concept of CI-PINNs—a novel methodology that leverages temporal, spatial, and data causal weights to balance residual minimization. Building on recent causal training strategies [23], we propose an approach that addresses the imbalance between observational data and physical constraints. Finally, we validate the proposed framework through benchmark inverse problems, including an inverse problem for the wave equation, a parabolic inverse source problem, and an elliptic inverse source problem. The experimental results demonstrate substantial improvements in convergence behavior and reconstruction fidelity compared to conventional inverse PINNs.

This work deepens our understanding of how physics-informed machine learning can address challenging inverse problems. It advances the state of the art through a causality-aware training paradigm. By explicitly enforcing the sequential structure inherent in time-dependent and spatially distributed processes, our framework provides a robust, generalizable solution for a broad spectrum of scientific and engineering applications.

2. Preliminaries

2.1. Physics-Informed Neural Networks (PINNs)

Jo et al. [27] proposed PINNs to infer the solutions of partial differential equations (PDEs). Consider a generic time-dependent PDE defined as:

\begin{matrix} u_{t} + N [u] & = 0, & x \in D, t \in [0, T], \end{matrix}

(1)

\begin{matrix} I [u] & = g (x), & x \in D, \end{matrix}

(2)

\begin{matrix} B [u] & = h (x, t), & x \in \partial D, t \in [0, T], \end{matrix}

(3)

where

N [\cdot]

denotes a differential operator, and

I [\cdot]

and

B [\cdot]

represent the initial and boundary operator, respectively. In the PINN literature, the solution to the PDEs (1)–(3) is approximated using a neural network

u_{θ} (x, t)

, where

θ

denotes the vector consisting of all trainable parameters, including weights and biases. To ensure that

u_{θ} (x, t)

satisfies the PDE, the model is trained by minimizing a composite loss function evaluated over the

(x, t)

grid. This loss function is designed to enforce the conditions in Equations (1)–(3) [12,13].

Specifically, the training process minimizes the following Mean Squared Error (MSE)-based loss:

\begin{matrix} L (θ) & = λ_{r} L_{r} (θ) + λ_{t b} L_{t b} (θ) + λ_{s b} L_{s b} (θ) . \end{matrix}

with

\begin{matrix} L_{r} (θ) & = \frac{1}{N_{r}} \sum_{i = 1}^{N_{r}} {|\frac{\partial u_{θ}}{\partial t} (x_{r}^{i}, t_{r}^{i}) + N [u_{θ}] (x_{r}^{i}, t_{r}^{i})|}^{2}, \\ L_{t b} (θ) & = \frac{1}{N_{t b}} \sum_{i = 1}^{N_{t b}} {|I [u_{θ}] (x_{t b}^{i}, 0) - g (x_{t b}^{i})|}^{2}, \\ L_{s b} (θ) & = \frac{1}{N_{s b}} \sum_{i = 1}^{N_{s b}} {|B [u_{θ}] (x_{s b}^{i}, t_{s b}^{i}) - h (x_{s b}^{i})|}^{2} . \end{matrix}

(4)

Here,

L_{r} (θ)

is the residual loss that ensures the PDE holds at interior points,

L_{t b} (θ)

enforces the initial conditions, and

L_{s b} (θ)

enforces the boundary conditions.

The grid points

{\{x_{r}^{i}, t_{r}^{i}\}}_{i = 1}^{N_{r}}

,

{\{x_{t b}^{i}, t_{t b}^{i}\}}_{i = 1}^{N_{t b}}

, and

{\{x_{s b}^{i}, t_{s b}^{i}\}}_{i = 1}^{N_{s b}}

are sampled from the domain. In our case, these points are uniformly sampled such that

{\{x_{r}^{i}, t_{r}^{i}\}}_{i = 1}^{N_{r}} \in D \times [0, T]

,

{\{x_{t b}^{i}, t_{t b}^{i}\}}_{i = 1}^{N_{t b}} \in D \times \{0\}

, and

{\{x_{s b}^{i}, t_{s b}^{i}\}}_{i = 1}^{N_{s b}} \in \partial D \times [0, T]

.

To efficiently compute gradients concerning input variables or the network parameters

θ

, PINNs leverage automatic differentiation. This capability is crucial for accurately calculating the PDE residuals and ensuring the network adheres to the underlying physics. Additionally, the hyper-parameters

\{λ_{r}, λ_{t b}, λ_{s b}\}

can be manually set or adapted during training to balance the contributions of the different loss terms, thus optimizing the model’s performance.

2.2. Inverse PINNs

PINNs can naturally be extended to address inverse problems, which involve inferring unknown quantities of a physical system from observed data. The unknown quantities are generally represented as parameters, functions, or other hidden physical properties included in the PDE. Inverse problems aim to determine the unknown quantities that are aligned with both the observed data and the governing equations defined in (1), (2), and (3). Such problems arise in various fields, including material science, biomedical imaging, and fluid dynamics.

Consider a PDE with unknown quantity f defined by:

\begin{matrix} u_{t} + N [u, f] & = 0, & x \in D, t \in [0, T], \\ I [u] & = g (x), & x \in D, \\ B [u] & = h (x, t), & x \in \partial D, t \in [0, T] . \\ u & = u_{d}, & (x, t) \in D_{d} . \end{matrix}

In this formulation, the additional constraint

u = u_{d}, (x, t) \in D_{d},

specifies that, in the data subdomain

D_{d}

, the solution u must match the observed data

u_{d}

. This condition provides an extra supervision signal to guide the inversion process and ensure that the learned solution is consistent with the available measurements in

D_{d}

. The inverse PINNs approximate both f and u using neural networks

f_{η}

and

u_{θ}

, respectively, thereby ensuring that the governing PDEs are satisfied while aligning the network’s output with the observed data in

D_{d}

.

To achieve this, the loss function for inverse PINNs builds upon the standard PINNs’ loss function by incorporating an additional data loss term [30]. The total loss function is expressed as:

\begin{matrix} L (θ, η) & = λ_{r} L_{r} (θ, η) + λ_{t b} L_{t b} (θ) + λ_{s b} L_{s b} (θ) + λ_{d} L_{d} (θ) \end{matrix}

where initial condition loss

L_{t b} (θ, η)

and boundary condition loss

L_{t b} (θ, η)

are defined as in the forward PINNs’ formulation. The additional residual loss term and data loss term are given by:

\begin{matrix} L_{r} (θ, η) & = \frac{1}{N_{r}} \sum_{i = 1}^{N_{r}} {|\frac{\partial u_{θ}}{\partial t} (x_{r}^{i}, t_{r}^{i}) + N [u_{θ}, f_{η}] (x_{r}^{i}, t_{r}^{i})|}^{2}, \\ L_{d} (θ) & = \frac{1}{N_{d}} \sum_{i = 1}^{N_{d}} {|u_{θ} (x_{d}^{i}, t_{d}^{i}) - u_{o b s}^{i}|}^{2}, \end{matrix}

where

u_{o b s}^{i}

denotes the observed data at the sampled points

{\{x_{d}^{i}, t_{d}^{i}\}}_{i = 1}^{N_{d}}

. The hyper-parameter

λ_{d}

controls the influence of this data loss relative to the other loss components. By optimizing both

θ

and

η

, inverse PINNs seek a solution that respects the governing PDEs while matching the observed data.

However, conventional inverse PINNs face a significant limitation: they do not explicitly account for the causal nature of dynamic systems. Temporal causality ensures that the predicted solution evolves naturally from the initial conditions and progresses consistently. Without such considerations, the model may converge to unstable or non-physical solutions, especially in time-dependent problems. Recent studies have proposed frameworks that integrate causal training mechanisms [23], which sequentially prioritize residuals to enforce causality. These advancements lay the foundation for extending inverse PINNs into causality-aware domains—a concept we explore further in this work.

2.3. Causal Training for Physics-Informed Neural Networks

Conventional PINN training can yield solutions that neglect temporal causality, which is crucial for dynamic systems. A causal training framework was introduced in [23] to address this issue, aligning the training process with the natural sequential structure of physical systems.

First, the training objective is reformulated to prioritize learning time sequentially. Specifically, the residual loss function is modified by incorporating a weighting mechanism that emphasizes earlier time points. The basic loss function (4) is rewritten by separating the contributions of time and space as follows:

\begin{matrix} L_{r} (θ) & = \frac{1}{N_{t}} \sum_{i = 1}^{N_{t}} L_{r} (t_{i}, θ) \\ = \frac{1}{N_{t} N_{x}} \sum_{i = 1}^{N_{t}} \sum_{j = 1}^{N_{x}} {|\frac{\partial u_{θ}}{\partial t} (x_{r}^{j}, t_{r}^{i}) + N [u_{θ}] (x_{r}^{j}, t_{r}^{i})|}^{2} . \end{matrix}

(5)

Here,

N_{t}

and

N_{x}

denote the number of temporal and spatial discretization points, respectively—that is,

N_{t}

is the total number of time points sampled with

[0, T]

and

N_{x}

is the number of spatial grid points sampled within the domain D. The loss function for the spatial component at a given time

t_{r}^{i}

becomes:

L_{r} (t_{r}^{i}, θ) = \frac{1}{N_{x}} \sum_{j = 1}^{N_{x}} {|\frac{\partial u_{θ}}{\partial t} (x_{r}^{j}, t_{r}^{i}) + N [u_{θ}] (x_{r}^{j}, t_{r}^{i})|}^{2} .

Next, to enforce the inherent causality of the system, a time-dependent weighting function

w_{i}

is introduced. This function assigns greater importance to earlier time steps so that the model focuses on minimizing residuals at these points before addressing later times:

w_{i} = exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (t_{k}, θ)), for i = 2, 3, \dots, N .

(6)

Here,

ϵ

is the causality parameter controlling the weight decay. In [23],

ϵ

was experimentally chosen from the range [100, 10, 1, 0.1, 0.01, 0.001]. When applying

w_{i}

to the loss function (6), the overall training objective becomes:

\begin{matrix} L_{r} (θ) = \frac{1}{N_{t}} \sum_{i = 1}^{N_{t}} exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (t_{k}, θ)) L_{r} (t_{i}, θ) . \end{matrix}

Although this weighting mechanism is designed primarily for temporal causality, it can be extended to spatial dimensions. During training, introducing an analogous spatial weighting function can prioritize points closer to given conditions—such as boundary values or observed data. This spatial extension complements the temporal weighting mechanism and lays the groundwork for addressing both temporal and spatial causality, as further explored in the context of inverse problems. Figure 1 illustrates the proposed CI-PINN algorithm.

Hyperparameter Selection: In all subsequent CI-PINN experiments, we utilize the causality parameter

ϵ

as introduced above to control the decay behavior of causal weights. To determine an optimal value for

ϵ

, we experimentally evaluated candidate values from the set {10, 1, 0.1, 0.01, 0.001}. Candidate values that resulted in insufficient propagation (i.e., the weights did not spread effectively from the observation and boundary regions) or led to an overly rapid convergence (i.e., the weights quickly converged to 1 across the domain) were excluded. Among the remaining candidates, we selected the value that produced the smallest

L^{2}

relative error on the validation set as the optimal

ϵ

. This systematic selection process ensures that the causal weights are balanced and effectively guide the training process for stable and accurate reconstructions. The following sections illustrate how these ideas are applied to specific inverse problems.

Causal weights: In Section 2.3, we introduced the concept of causal weights using notation similar to

w_{i}

in Equation (6), focusing primarily on the temporal dimension. In this work, we adopt a more general representation for the causal weights,

w_{d} (x_{i}), w_{t} (t_{i}), w_{x y} (x_{i}, y_{i})

. Henceforth, we use w to denote these causal weights in all scenarios. Each experiment section will specify how w is constructed (e.g., temporal, spatial, or data causal weights) to emphasize critical regions such as boundaries, initial conditions, or observation points. Although the formulation of these weights is based on a relatively simple functional form, their effective integration into our inverse PINN framework poses several challenges. Unlike conventional applications, where a simple decay function is applied to a single scalar term, our method requires a coordinated balance among multiple causal weights across all dimensions. Determining an optimal value for

ϵ

is particularly demanding because the chosen parameter must allow supervision to propagate smoothly from regions with strong data support (e.g., boundaries and initial conditions) to areas with sparse information, all without compromising convergence or stability. In essence, while the mathematical basis for the weights is well established, combining them in a multidimensional setting to enforce the complex causal structure of physical systems necessitates careful analysis and calibration, highlighting the sophisticated nature of our approach.

3. Inverse Problem for the Wave Equation

3.1. The Underlying Inverse Problem

The wave equation is a fundamental model for physical phenomena such as acoustics, electromagnetics, and elasticity. As a representative linear hyperbolic PDE [31], it provides a valuable testbed for assessing the proposed CI-PINN framework. Let

D \subset R^{d}

be an open, bounded, and simply connected set with smooth boundary

\partial Ω

. We consider the wave equation with zero Dirichlet boundary conditions:

\begin{matrix} u_{t t} - Δ u & = f, \forall (x, t) \in D \times (0, T), \\ u & \equiv 0, \forall (x, t) \in \partial D \times (0, T), \end{matrix}

where

Δ

denotes the spatial Laplace operator and

f \in L^{2} (D_{T})

denotes a source function with

D_{T} = D \times (0, T)

. In the forward problem, if the initial conditions:

\begin{matrix} u (x, 0) & = u_{0} (x), \forall x \in D, \\ u_{t} (x, 0) & = u_{1} (x), \forall x \in D, \end{matrix}

are prescribed with

u_{0} \in L^{2} (D)

and

u_{1} \in H^{- 1} (D)

, classical theory guarantees that a unique solution

u (x, t)

exists. In our study, the specific solution we aim to approximate is given by

u (x, t) = sin (π x) cos (2 π x) .

In many practical applications, however, the initial data

u_{0}

and

u_{1}

are unavailable. Instead, we obtain measurements of the solution over an observation subdomain

D^{'} \subset \bar{D} .

That is, we are provided with

u (x, t) = g, (x, t) \in D^{'} \times (0, T) .

The inverse problem is a data assimilation problem of recovering the unknown initial conditions and, consequently, the entire solution

u (x, t)

of the wave equation from the available partial data. In other words, the solution we seek is a function

u (x, t)

that satisfies the wave equation and the zero boundary conditions throughout

D \times (0, T)

while simultaneously matching the measurement data

g (x, t)

on

D^{'} \times (0, T)

and implicitly determining the initial state. Although the lack of initial data renders the inverse problem ill posed in the classical sense, stability can be recovered under additional conditions. In particular, if the observation subdomain

D^{'}

satisfies the Geometric Control Conditions (GCCs) [32,33], which, in essence, ensures that every generalized bicharacteristic (or “light ray”) of the wave operator, including reflected paths, intersects

D^{'}

, then conditional stability estimates can be established. As shown in Theorem 5.2 (cf. [31]), under the GCCs, the energy of the solution (measured in appropriate norms) can be bounded in terms of the measurement data g, the source term f, and the residual of the wave operator. These conditional stability results provide a theoretical foundation for the existence and uniqueness of the solution to the inverse problem in a conditional sense. By incorporating the PDE residual, boundary conditions, and measurement data into the loss function, the CI-PINN is trained to approximate a solution

u^{*} (x, t)

that converges to the true solution

u (x, t)

when training error is sufficiently small.

3.2. Residuals and Network Smoothness

For the neural network

u_{θ} (x, t)

with parameters

θ \in Θ

, we assume sufficient smoothness (

C^{k}

with

k \geq 2

) over

D_{T}

. The PDE residual is defined as

R_{θ} (x, t) = \partial_{t t} u_{θ} - Δ u_{θ} - f (x, t), \forall (x, t) \in D_{T} .

To enforce the boundary conditions in

\partial D \times (0, T)

, we define the boundary residual as

R_{s b, θ} = u_{θ}, \forall (x, t) \in \partial D \times (0, T) .

Furthermore, the data residual on the observation domain

D_{T}^{'}

is

R_{d, θ} = u_{θ} - g, \forall (x, t) \in D_{T}^{'} .

3.3. Causal Weights

Since this inverse problem does not involve initial conditions, we apply spatial and data causal weights to prioritize learning in critical spatial regions. In our framework, we define two types of causal weights.

Data weights: These weights focus on regions where observation data g are provided. For each spatial points

x_{i}

, we set

w_{d} (x_{i}) = \{\begin{matrix} 1, & x_{i} \in D_{T}^{'} \\ \exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (x_{k}, θ, f)), & x_{i} \in D_{T} \ D_{T}^{'} . \end{matrix}

Spatial weights: These weights emphasize the contribution of the boundary conditions. For each spatial point

x_{i}

, we define

w_{x} (x_{i}) = \{\begin{matrix} 1, & x_{i} \in \partial D \\ \exp (- ϵ \sum_{k = 1}^{N_{x} - i} L_{r} (x_{N_{x} - k + 1}, θ, f)), & x_{i} \in D \ \partial D . \end{matrix}

The final causal weight at each point is then given by

w_{x d} (x_{i}) = max (w_{d} (x_{i}), w_{x} (x_{i})) .

As depicted in the right panel of Figure 2, the data weights propagate upward from the observation region

D_{T}^{'}

into the rest of the domain, while the spatial weights move inward from the boundary

\partial D

. This combined weighting mechanism first emphasizes critical regions (observation data and boundary conditions). Then, their influence gradually extends to the entire domain, promoting a more stable and accurate solution.

3.4. Loss Function

The overall loss function for the inverse problem is constructed by combining the weighted residual loss, the boundary loss, and the data loss:

L (θ) = \sum_{\begin{matrix} i = 1 \\ (x_{i}, t_{i}) \in D_{T} \end{matrix}}^{N_{i n t}} w_{x d} (x_{i}) | R_{θ} (x_{i}, t_{i}) |^{2} + \sum_{\begin{matrix} i = 1 \\ (x_{i}, t_{i}) \in \partial D_{T} \end{matrix}}^{N_{s b}} | R_{s b, θ} (x_{i}, t_{i}) |^{2} + \sum_{\begin{matrix} i = 1 \\ (x_{i}, t_{i}) \in D_{T}^{'} \end{matrix}}^{N_{d}} {| R_{d, θ} (x_{i}, t_{i}) |}^{2},

where the causal weights w are applied as described above.

3.5. Numerical Experiments

In our experiments, the source term f and the observation data g are generated directly from the exact solution. Specifically, with the exact solution defined as

u (x, t) = sin (π x) cos (2 π t)

we substitute into the wave equation

\partial_{t t} u - Δ u = f,

which yields

f = 0

. Similarly, the observation data g are obtained by evaluating

u (x, t)

on the observation domain

D^{'} = (0, 0.2)

.

For the numerical implementation, we use a fully connected neural network with four hidden layers, each containing 24 neurons and employing the

t a n h

activation function. The network takes a 2D input

(x, t)

and outputs the scalar prediction

u (x, t)

. We use the Adam optimizer with a learning rate of

1 \times 10^{- 4}

for 50,000 epochs.

Figure 2 illustrates the domain

D_{T}

, the observation domain

D_{T}^{'}

, and the interior collocation points, boundary points, and observation points.

The first row of Figure 3 presents the exact solution alongside the CI-PINNs’ predicted solution. The second row illustrates the pointwise absolute error and the

L^{2}

highlighting that CI-PINNs achieve significantly lower errors compared to conventional PINNs and Causal PINNs (Section 2.3 [23]). Specifically, the

L^{2}

relative error for conventional PINNs is

0.004321

; for Causal PINNs, it is 0.003865. In contrast, CI-PINNs reduce this error to

0.00206

, a reduction of approximately

52 %

compared to conventional PINNs and

47 %

compared to Causal PINNs. This substantial improvement underscores the effectiveness of incorporating spatial causal weights in enhancing solution accuracy. Additionally, as summarized in Table 1, the training speed for CI-PINNs remains comparable to that of conventional PINNs, indicating that the accuracy gains do not come at a significant computational cost.

The experimental results demonstrate that CI-PINNs significantly outperform conventional PINNs in reconstructing the wave equation’s solution

u (x, t)

. The effective incorporation of spatial causal weights derived from observation and boundary considerations guides the training process to prioritize regions of high importance, thereby enhancing stability and accuracy.

Figure 4 displays the evolution of the spatial causal weights during training. As illustrated in the right panel of Figure 2, the data weights propagate upward from the observation region, while the boundary weights move inward from the domain edges. At early epochs, the weights are concentrated in the supervised regions (boundaries and observation areas), ensuring that these regions receive priority. As training progresses, the influence of the weights gradually spreads across the domain, contributing to enhanced stability and overall accuracy.

4. Inverse Source Problem for a Parabolic Equation

4.1. The Underlying Inverse Problem

To demonstrate the versatility of our CI-PINN framework, we consider an inverse source problem governed by a parabolic equation as described in [34]. Parabolic equations are widely used to model diffusion processes such as heat conduction and pollutant dispersion. The governing equation for the parabolic system is given by:

\begin{matrix} \partial_{t} u + Δ u & = F (x, y, t), & (x, y, t) & \in D_{T} : = D \times (0, T], \\ u (x, y, t) & = b (x, y, t), & (x, y, t) & \in \partial D_{T} : = \partial D \times (0, T], \\ u (x, y, t) & = u_{0} (x, y), & (x, y) & \in D, \end{matrix}

where

D \subset R^{d}

is a bounded domain with a smooth boundary,

b (x, y, t)

denotes the boundary conditions, and

u_{0} (x, y)

represents the initial condition. For the forward problem, if the source function

F (x, y, t)

, and the initial and boundary data are all known, it is well posed in appropriate function spaces. In many practical applications, however, the internal source function

F (x, y, t) = f (x, y) g (t)

is difficult to measure directly. Instead, one assumes

g (t)

is given in advance and recovers

f (x, y)

by using the final time measurement

u (x, y, T) = ϕ (x, y), (x, y) \in D .

Thus, the inverse problem is to recover the unknown spatial source

f (x, y)

(and, consequently, the entire solution

u (x, y, t)

) from the final time data

ϕ (x, y)

. It is well known that the inverse source problem of recovering

f (x, y)

from the final time data is inherently ill posed, meaning that small perturbations in

ϕ (x, y)

could lead to significant errors in

f (x, y)

. Nevertheless, when

ϕ (x, y)

is noise free and sufficiently regular (for instance,

ϕ (x, y) \in H^{2} (D)

), uniqueness and conditional stability results (see, e.g., [35,36,37]) ensure that a unique solution exists in a regularized sense. We consider a two-dimensional model with

D = [0, 1] \times [0, 1]

and

T = \frac{4}{3} .

The exact solution of the parabolic problem is chosen as

u (x, y, t) = sin (π t) sin (π x) sin (π y),

for

(x, y, t) \in D_{T} : = [0, 1] \times [0, 1] \times [0, \frac{4}{3}] .

This function satisfies the zero initial condition

u (x, y, 0) = 0

and homogeneous boundary conditions. From direct computation, the exact source term is given by

f (x, y) = sin (π x) sin (π y),

and the known temporal source term is

g (t) = π cos (π t) + 2 π^{2} sin (π t) .

Thus, the exact final time measurement is

u (x, y, \frac{4}{3}) = ϕ (x, y) = sin (\frac{4 π}{3}) sin (π x) sin (π y) .

In our reconstruction scheme, both

u (x, y, t)

and

f (x, y)

are approximated by deep neural networks

u_{θ} (x, y, t)

and

f_{η} (x, y)

, respectively. We incorporate derivative terms of the residuals into the loss function to mitigate the inherent ill-posed nature of the inverse source problem, thereby enforcing higher regularity in the solution. This approach stabilizes the inversion and yields improved reconstruction accuracy compared to standard least-squares loss functions. Our numerical experiments demonstrate that this framework yields enhanced accuracy in recovering the unknown source

f (x, y)

and the solution

u (x, y, t)

.

4.2. Residuals and Network Smoothness

By using the

t a n h

activation function, we can assume that the neural network approximations

u_{θ}

and

f_{η}

belong to

C^{\infty} (\bar{D} \times [0, T])

. Therefore, the network

u_{θ} (x, y, t)

and its derivatives up to second order are uniformly bounded on

\bar{D} \times [0, T]

.

The PDE residual is defined as

R_{θ, η} (x, y, t) = \partial_{t} u_{θ} (x, y, t) + Δ u_{θ} (x, y, t) - f_{η} (x, y) g (t), (x, y, t) \in D_{T} .

To enforce the boundary and initial conditions, we define:

\begin{matrix} R_{s b, θ} (x, y, t) & = u_{θ} (x, y, t), (x, y, t) \in \partial D_{T}, \\ R_{t b, θ} (x, y) & = u_{θ} (x, y), x \in D . \end{matrix}

and the data residual as

R_{d, θ} (x, y) = u_{θ} (x, y, T) - ϕ (x, y), x \in D

4.3. Causal Weights

To enhance stability and accuracy, we incorporate causal weights, prioritizing learning in critical regions across the spatial dimensions x and y and the temporal dimension t. Given that boundary conditions, initial conditions, and observation data are specified, we define separate causal weights for each dimension.

Spatial Weights for x and y: For each spatial coordinate, we define weights that emphasize the domain boundaries. For the x-dimension:

w_{x} (x_{i}) = \{\begin{matrix} 1, & x_{i} \in \partial D, \\ \begin{matrix} max & [exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (x_{k}, θ, η)), \\ exp (- ϵ \sum_{k = 1}^{N_{x} - i} L_{r} (x_{N_{x} - k + 1}, θ, η))], \end{matrix} & x_{i} \in D \ \partial D . \end{matrix}

Similarly, for the y-dimension:

w_{y} (y_{i}) = \{\begin{matrix} 1, & y_{i} \in \partial D, \\ \begin{matrix} max & [exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (y_{k}, θ, η)), \\ exp (- ϵ \sum_{k = 1}^{N_{y} - i} L_{r} (y_{N_{y} - k + 1}, θ, η))] . \end{matrix} & y_{i} \in D \ \partial D . \end{matrix}

Temporal weights: For the time dimension, we define the causal weights to emphasize the initial time:

w_{t} (t_{i}) = \{\begin{matrix} 1, & t_{i} = 0, \\ \begin{matrix} exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (t_{k}, θ, η)), \end{matrix} & t_{i} \in (0, T] . \end{matrix}

Data weights: Since data are provided at

t = T

in the time dimension, we can define the weights with a focus on this point as follows:

w_{d} (t_{i}) = \{\begin{matrix} 1, & t_{i} = T, \\ \begin{matrix} exp (- ϵ \sum_{k = 1}^{N_{d} - i} L_{r} (t_{N_{t} - k + 1}, θ, η)), \end{matrix} & t_{i} \in [0, T) . \end{matrix}

The overall causal weight for grid points

(x_{i}, y_{i}, t_{i})

is then given by

w_{x y t d} (x_{i}, y_{i}, t_{i}) = max {w_{x} (x_{i}), w_{y} (y_{i}), w_{t} (t_{i}), w_{d} (t_{i})}

These weights are incorporated into the loss function to prioritize learning near boundaries, initial conditions, and observation data.

4.4. Loss Function

The standard loss function combines the residuals for the PDE, boundary, initial, and data constraints:

\begin{matrix} L^{s} (θ, η) = & \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}, t_{i}) \in D_{T} \end{matrix}}^{N_{i n t}} w_{x y t d} (x_{i}, y_{i}, t_{i}) | R_{θ, η} (x_{i}, y_{i}, t_{i}) |^{2} + \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in D_{t = T} \end{matrix}}^{N_{d}} {| R_{d, θ} (x_{i}, y_{i}) |}^{2} + \\ \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in D \end{matrix}}^{N_{t b}} | R_{t b, θ} (x_{i}, y_{i}) |^{2} + \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}, t_{i}) \in \partial D_{T} \end{matrix}}^{N_{s b}} {| R_{s b, θ} (x_{i}, y_{i}, t_{i}) |}^{2} . \end{matrix}

Following the modified loss proposed in [34], we further incorporate terms that involve time derivatives of the residual and additional quadrature weights, leading to

\begin{matrix} L (θ, η) = & \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in D_{t = T} \end{matrix}}^{N_{d}} {| R_{d, θ} (x_{i}, y_{i}) |}^{2} + \\ \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}, t_{i}) \in D_{T} \end{matrix}}^{N_{i n t}} w_{x y t d}^{0} (x_{i}, y_{i}, t_{i}) {| R_{θ, η} (x_{i}, y_{i}, t_{i}) |}^{2} + \\ \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}, t_{i}) \in D_{T} \end{matrix}}^{N_{i n t}} w_{x y t d}^{1} (x_{i}, y_{i}, t_{i}) {| \partial_{t} R_{θ, η} (x_{i}, y_{i}, t_{i}) |}^{2} + \\ \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in D \end{matrix}}^{N_{t b}} | R_{t b, θ} (x_{i}, y_{i}) |^{2} + \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in D \end{matrix}}^{N_{t b}} {| Δ R_{t b, θ} (x_{i}, y_{i}) |}^{2} + \\ \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}, t_{i}) \in \partial D_{T} \end{matrix}}^{N_{s b}} | R_{s b, θ} (x_{i}, y_{i}, t_{i}) |^{2} + \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}, t_{i}) \in \partial D_{T} \end{matrix}}^{N_{s b}} {| \partial_{t} R_{s b, θ} (x_{i}, y_{i}, t_{i}) |}^{2} + \\ \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}, t_{i}) \in \partial D_{T} \end{matrix}}^{N_{s b}} {| \partial_{t}^{2} R_{s b, θ} (x_{i}, y_{i}, t_{i}) |}^{2}, \end{matrix}

where

w_{i}^{0}, w_{i}^{1}

are causal weights for PDE and PDE differentiated concerning t, respectively. These additional terms help enforce the PDE and its temporal derivatives, improving overall reconstruction accuracy.

4.5. Numerical Experiments

We conduct experiments by adopting the setup in [34]. Two separate neural networks approximate

u_{θ} (x, y, t)

and

f_{η} (x, y)

. Both networks are fully connected with two hidden layers containing 20 neurons each and use the

t a n h

activation function.

The total number of collocation points is

N_{i n t} = 50 \times 50 \times 50

, and the points are sampled uniformly from

D_{T}

. Boundary points are chosen along

\partial D \times [0, T]

, initial points are taken at

t = 0

, and data points correspond to the observation domain at

t = \frac{4}{3}

. This diversified sampling ensures that the model learns from all critical regions.

The initial learning rate is

1 \times 10^{- 3}

, decaying by a factor of 10 every

10^{5}

iterations to facilitate convergence. In our experiments, the optimal value of

ϵ

was determined to be 0.1. The loss function incorporates the causal weights, emphasizing critical regions near the observation data at

t = \frac{4}{3}

and along the boundaries.

Figure 5 shows the reconstruction accuracy. The upper panels compare the exact source term

f (x, y)

with its reconstruction obtained using CI-PINNs, showing excellent agreement. The lower left panel illustrates the absolute error

| f - f^{*} |

between the exact and predicted solutions, with only minor discrepancies in regions where observation data are sparse. Notably, the lower right panel shows the

L^{2}

relative error convergence: conventional PINNs converge to a final error of 0.0855, Causal PINNs reduce this to 0.0673, and CI-PINNs further lower it to 0.0397, an improvement of approximately 53.6% over conventional PINNs. This substantial improvement confirms that integrating 3D causal weights enhances reconstruction accuracy significantly. Moreover, despite the cubic nature of the 3D causal weights, the computational overhead remains minimal. The iteration speed for CI-PINNs is comparable to that of conventional PINNs, indicating that the improved stability and accuracy do not come at a steep computational cost (Table 1).

Figure 6 visualizes the evolution of the 3D causal weights. Each column represents a different time step (e.g.,

t = \frac{3}{15}, \frac{7}{15}, \frac{11}{15}, 1

), and each row corresponds to a training epoch (e.g., 400, 800, 1,200, 1,600). Initially, the weights are concentrated on the boundaries, initial conditions, and observation regions; as training progresses, they gradually propagate into the interior, enabling the network to capture the solution’s local and global features.

Overall, the experimental results validate that CI-PINNs significantly outperform conventional PINNs for the inverse source problem of a parabolic equation. The effective integration of 3D causal weights, which balance the contributions from the spatial and temporal domains, results in faster convergence and improved accuracy in determining both the solution

u (x, y, t)

and the unknown spatial source term

f (x, y)

.

5. Inverse Source Problem for Elliptic Equation

5.1. The Underlying Inverse Problem

To illustrate the adaptability of our CI-PINN framework, we consider an inverse source problem governed by an elliptic equation as detailed in [38]. Elliptic equations are fundamental for describing steady-state phenomena such as electrostatics, thermal equilibrium, and fluid flow. Let

D \subset R^{2}

be a bounded, connected domain with a smooth boundary

\partial D

. We consider the elliptic problem

\begin{matrix} P u : = - \nabla \cdot (σ (x, y) \nabla u (x, y)) + c (x, y) u (x, y) & = f (x, y), (x, y) \in D, \end{matrix}

with the Neumann boundary condition

\begin{matrix} B u : = σ (x, y) \frac{\partial u (x, y)}{\partial ν} & = g (x, y), (x, y) \in \partial D . \end{matrix}

Here,

σ (x, y)

and

c (x, y)

are given smooth functions (with

σ (x, y) \geq σ_{0} > 0

and

c (x, y)

bounded) and

ν

is the outward normal vector on

\partial D

, and for a specified source

f (x, y)

and boundary data

g (x, y)

, the forward problem is well posed and admits a unique solution

u \in H^{1} (D) .

The source term

f (x, y)

is not directly measurable in many practical applications. Instead, one often acquires internal measurement data on a subdomain

D_{s} \subset D

. In our noise-free setting, we assume that

u (x, y) = u_{s} (x, y), (x, y) \in D_{s},

where

u_{s} (x, y)

represents the exact measurement of the solution within

D_{s}

. Due to the unique continuation property of elliptic equations, the internal data

u_{s}

uniquely determine the solution u throughout D. Consequently, the source term can be formally computed via

\begin{matrix} f (x, y) = - \nabla \cdot (σ (x, y) \nabla u (x, y)) + c (x, y) u (x, y) . \end{matrix}

However, because the continuation and the differentiation are unstable, the inverse problem is inherently unstable, even in the noise-free case. For our numerical experiment, we consider a two-dimensional model with

D = {[0, 1]}^{2}

and specify the exact solution as

u (x, y) = sin (π x) + y^{3}, (x, y) \in D .

We choose the coefficients as

c (x, y) = σ (x, y) = exp (- 30 [{(x - 0.5)}^{2} + {(y - 0.5)}^{2}]) .

A straightforward calculation then yields the exact source term:

\begin{matrix} f (x, y) = exp (- 30 [{(x - 0.5)}^{2} + {(y - 0.5)}^{2}] \times [(π^{2} + 1) sin (π x) + \\ 60 π (x - 0.5) cos (π x) + 180 (y - 0.5) y^{2} + y^{3} - 6 y]) \end{matrix}

The internal measurement

u_{s} (x, y)

is taken as the restriction of

u (x, y)

to the subdomain

D_{s}

; for instance, we may choose

D_{s} = [\frac{1}{4}, \frac{3}{4}] \times [\frac{1}{2}, \frac{3}{4}] .

Figure 7 illustrates the domain D and the observation subdomain

D_{s}

along with the distribution of training points. Training points are sampled uniformly over D for the interior residual, along

\partial D

for boundary residuals, and within

D_{s}

for observation data. Spatial causal weights are constructed to propagate the influence of the observation subdomain

D_{s}

and the boundary conditions throughout the domain.

In our reconstruction approach, both

u (x, y)

and

f (x, y)

are approximated by deep neural networks

u_{θ} (x, y)

and

f_{η} (x, y)

, respectively. Since recovering f from u involves differentiation, which is unstable, the inverse problem is ill posed. We augment our loss function with additional regularization terms (detailed in Section 5.4) that penalize large gradients in the network outputs to stabilize the inversion. This regularization enforces a Lipschitz condition on the approximations, thereby mitigating the instability and improving reconstruction accuracy. Our numerical experiments demonstrate that, with this regularized framework, the unknown source

f (x, y)

and the solution

u (x, y)

can be recovered accurately over the entire domain.

5.2. Residuals and Network Smoothness

Both

u (x, y)

and

f (x, y)

are approximated by deep neural networks

u_{θ} (x, y)

and

f_{η} (x, y)

, respectively. The PDE residual is defined as

R_{θ, η} (x, y) = - \nabla \cdot (σ \nabla u_{θ} (x, y)) + c (x, y) u_{θ} - f_{η} (x, y), (x, y) \in D .

For the boundary data, the residual is

R_{s b, θ} (x, y) = σ \frac{\partial u_{θ}}{\partial ν} - g (x, y), (x, y) \in \partial D .

The data residual corresponding to observation data is

R_{d, θ} (x, y) = u_{θ} (x, y) - u_{s} (x, y), (x, y) \in D_{s} .

5.3. Causal Weights

The elliptic problem differs from the wave and parabolic cases in that the observation data are confined to a specific subdomain

D_{s}

, and the boundary conditions are provided on

\partial D

. We construct data weights that expand outward to propagate the influence of the observation data and boundary conditions on the entire domain. Specifically, we divide the domain D into a sequence of expanding regions:

D_{s} = Ω_{0} \subset Ω_{1} \subset Ω_{2} \subset \dots \subset Ω_{m} = D .

(See Figure 7 for a schematic illustration of how these regions

Ω_{j}

expand outward from

D_{s}

).

Data weights: For points

x_{i}

within

Ω_{0}

, the causal weight is set to 1. For points in each subsequent layer

Ω_{j}

(for

j \geq 1)

, the weight is defined as

w_{d} (x_{i}, y_{i}) = \{\begin{matrix} 1, & if (x_{i}, y_{i}) \in D_{s} = Ω_{0}, \\ exp (- ϵ \sum_{(x_{k}, y_{k}) \in Ω_{j - 1}} L_{r} ((x_{k}, y_{k}), θ, η)), & if (x_{i}, y_{i}) \in Ω_{j} \ Ω_{j - 1}, j = 1, 2, \dots m . \end{matrix}

Spatial weights: To ensure that the boundary conditions are accurately enforced, we define causal weights for the domain boundaries in the x and y directions as

w_{x} (x_{i}) = \{\begin{matrix} 1, & x_{i} \in \partial D \\ \begin{matrix} max & [exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (x_{k}, θ, η)), \\ exp (- ϵ \sum_{k = 1}^{N_{x} - i} L_{r} (x_{N_{x} - k + 1}, θ, η))], \end{matrix} & x_{i} \in D \ \partial D \end{matrix}

and similarly for y:

w_{y} (y_{i}) = \{\begin{matrix} 1, & y_{i} \in \partial D \\ \begin{matrix} max & [exp (- ϵ \sum_{k = 1}^{i - 1} L_{r} (y_{k}, θ, η)), \\ exp (- ϵ \sum_{k = 1}^{N_{y} - i} L_{r} (y_{N_{y} - k + 1}, θ, η))], \end{matrix} & y_{i} \in D \ \partial D \end{matrix}

The final causal weight for a point

x_{i}

is then defined by combining the weights from the observation and boundary considerations:

w_{x y d} (x_{i}, y_{i}) = max (w_{d} (x_{i}, y_{i}), w_{x} (x_{i}), w_{y} (y_{i}))

As shown in Figure 7, the domain D is decomposed into nested regions

Ω_{0}, Ω_{1}, \dots, Ω_{m}

. The data weights (blue arrows in the right panel) propagate outward from the observation subdomain

D_{s} = Ω_{0}

, while the spatial weights (red arrows) move inward from the boundary of D. Consequently, the influence of observation data and boundary conditions gradually converges toward the domain’s interior. This ensures that the entire domain is guided by critical constraints—observation data in

D_{s}

and boundary conditions on

\partial D

—ultimately leading to more stable and accurate reconstructions. The flow of causal weights depicted in the right panel of Figure 7 thus matches the final distribution of weights observed during training.

5.4. Loss Function

The overall loss function for the elliptic inverse problem is constructed by combining the weighted residual losses from the PDE, the boundary conditions, and the internal measurement data:

\begin{matrix} L (θ, η) = & \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in D \end{matrix}}^{N_{i n t}} w_{x y d} (x_{i}, y_{i}) {| R_{θ, η} (x_{i}, y_{i}) |}^{2} + \\ \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in \partial D \end{matrix}}^{N_{s b}} | R_{s b, θ} (x_{i}, y_{i}) |^{2} + \sum_{\begin{matrix} i = 1 \\ (x_{i}, y_{i}) \in D_{s} \end{matrix}}^{N_{d}} {| R_{d, θ} (x_{i}, y_{i}) |}^{2} . \end{matrix}

Here,

w_{x y d}

are the causal weights, which are computed so that critical regions (such as the boundary or observation subdomain) exert a greater influence during training. However, to address the ill-posed nature inherent in differentiating u to recover the source f, we augment this basic loss function with Lipschitz-based regularization terms:

\begin{matrix} L_{R} (θ, η) = & L (θ, η) + λ max_{1 \leq i \leq N_{i n t}} {∥\nabla f_{η} (x_{i}, y_{i})∥}^{2} + λ max_{1 \leq i \leq N_{d}} {∥\nabla u_{θ} (x_{i}, y_{i})∥}^{2} + \end{matrix}

\begin{matrix} λ max_{1 \leq i \leq N_{i n t}} {∥\nabla P [u_{θ}] (x_{i}, y_{i})∥}^{2} + λ max_{1 \leq i \leq N_{s b}} {∥\nabla B [u_{θ}] (x_{i}, y_{i})∥}^{2} . \end{matrix}

where

P [u_{θ}]

represents the PDE operator (

- \nabla \cdot (σ (\nabla u_{θ}) + c u_{θ})

and

B [u_{θ}]

corresponds to the boundary residual (e.g., the Neumann boundary condition). The parameter

λ = 10^{- 4}

scales these gradient-based penalties. By penalizing large gradients in the source network

f_{η}

, the solution network

u_{θ}

, the PDE residual

P [u_{θ}]

, and the boundary residual

B [u_{θ}]

, this Lipschitz regularization effectively enforces higher regularity on the network outputs. Such regularity is crucial for stabilizing the differentiation process needed to recover f, thereby mitigating the ill-posed nature of the inverse source problem.

5.5. Numerical Experiments

This section presents the numerical experiments for the inverse source problem for the elliptic equation. All experiments follow the setup described in Section 5.1, and we provide additional details on the network architectures and hyper-parameters used. In our implementation, two deep neural networks are employed: one to approximate the solution

u (x, y)

with parameters

θ

, and the other to approximate the unknown source term

f (x, y)

with parameters

η

. The network

u_{θ} (x, y)

consists of 2 hidden layers with 10 neurons per layer, while the network

f_{η} (x, y)

is deeper, comprising 14 hidden layers with 10 neurons per layer. Both networks utilize the

t a n h

activation function. Training is performed using the Adam optimizer with a learning rate of

1 \times 10^{- 4}

for

5 \times 10^{5}

epochs.

Figure 8 presents the reconstruction results. The top row compares the exact solution

u (x)

(left) with the CI-PINN prediction (middle), along with the corresponding absolute error (right). The bottom row shows the exact source term

f (x)

(left), its reconstruction (middle), and the absolute error (right).

Figure 9 illustrates the evolution of the spatial causal weights during training. Initially, the weights are concentrated in the observation subdomain

D_{s}

and along the domain boundaries. As training progresses, the weights gradually propagate into the interior, ensuring the model captures the solution’s local and global features.

Figure 10 presents the

L^{2}

errors during training for both u and f. Since this problem does not involve a temporal domain, the Causal PINN approach introduced in Section 2.3 [23] is not applicable here. Therefore, we compare only conventional PINNs and CI-PINNs. The smoother convergence of CI-PINNs, in contrast to the oscillatory behavior observed in conventional PINNs, further demonstrates the effectiveness of the causal weighting strategy in improving stability and accuracy. Quantitatively, the

L^{2}

relative error for the solution u of conventional PINNs is 0.06823, while CI-PINNs achieve an error of 0.033448—a reduction of over

50 %

. For the source term f, conventional PINNs yield an

L^{2}

error of 0.08981, whereas CI-PINNs reduce the error to 0.06434, corresponding to an improvement of approximately

28 %

. Moreover, conventional PINNs exhibit significant oscillations in the error for f during the training process, indicating unstable convergence behavior. In contrast, CI-PINNs show a much smoother and more monotonic decrease in error, reflecting improved stability and more consistent propagation from the supervised region.

Table 1 presents the number of iterations and iteration times for vanilla PINNs and CI-PINNs. While both methods exhibit similar iteration times for the wave and parabolic problems, CI-PINNs run slower in the elliptic case. This slowdown is primarily due to the additional computational overhead required to define and process the expanding subdomains,

Ω_{k}

, which propagate the influence of the observation data and boundary conditions. In contrast, for other inverse problems, CI-PINNs can be applied without such subdomain constructions.

These results demonstrate that the CI-PINN framework significantly improves reconstruction accuracy for the elliptic inverse problem. The method reduces the final

L^{2}

errors for u and f and ensures smoother convergence during training, thereby providing a more robust and reliable solution.

6. Conclusions

In this paper, we introduced a novel framework called Causal Inverse PINNs (CI-PINNs), marking a significant advancement in physics-informed machine learning for solving inverse problems governed by partial differential equations. By integrating multidirectional causal weights into the loss function, our approach prioritizes learning in supervised regions to capture physical systems’ inherent causal structure, such as initial conditions, boundary conditions, and observational data. This innovative strategy directly addresses one of the key limitations of conventional PINNs, resulting in enhanced stability and improved reconstruction accuracy.

Comprehensive numerical experiments on the wave equation and inverse source problems for the parabolic and elliptic equation demonstrated that CI-PINNs significantly outperform standard methods in convergence behavior and solution fidelity. The framework’s ability to enforce the natural causal progression within the system enables more accurate prediction of unknown parameters and solutions, even in sparse data scenarios. These results underscore the robustness and generalizability of our proposed method across various types of inverse problems.

While incorporating causal weights represents a major innovation, the development and application of CI-PINNs also attain several practical challenges. In particular, careful tuning of the causality parameter is essential, as suboptimal settings can impede effective supervision propagation or lead to premature convergence. Additionally, the extra computational overhead of calculating multidimensional causal weights poses a challenge when extending the approach to high-dimensional or more complex systems. Another notable difficulty arises from applying the method to diverse domain geometries and configurations. The performance and stability of CI-PINNs can vary significantly depending on the domain shape, boundary conditions, and data distribution, indicating that further investigation is needed to adapt and optimize the framework for a broader range of practical scenarios.

In summary, CI-PINNs not only establish a new benchmark for stability and accuracy in solving inverse problems but also offer a flexible and scalable framework that can be adapted to a wide range of scientific and engineering applications. We believe this work paves the way for diverse future research directions, including developing adaptive weighting strategies, integrating uncertainty quantification techniques, and extending the framework to tackle nonlinear multi-physics systems, more complex domain geometries, and realistic noisy data scenarios.

Author Contributions

Conceptualization, H.S.; Methodology, J.K.; Writing—original draft, J.K.; Visualization, J.K.; Supervision, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data used in this study can be generated following the procedures described in the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [PubMed]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012): 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sutskever, I. Sequence to Sequence Learning with Neural Networks. arXiv 2014, arXiv:1409.3215. [Google Scholar]
Vaswani, A. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Xu, X.; Pu, X.; Zhang, J. Asymptotic limit of the Navier-Stokes-Poisson-Korteweg system in the half-space. J. Differ. Equ. 2022, 335, 201–243. [Google Scholar]
Wang, W.; Zeng, W.; Chen, W. Global exponential stability of periodic solutions for inertial delayed BAM neural networks. Commun. Nonlinear Sci. Numer. Simul. 2025, 145, 108728. [Google Scholar]
Guo, B.; Huang, D.; Zhang, J. Decay of solutions to a two-layer quasi-geostrophic model. Anal. Appl. 2017, 15, 595–606. [Google Scholar]
Wang, W.; Wu, J.; Chen, W. The characteristics method to study global exponential stability of delayed inertial neural networks. Math. Comput. Simul. 2025, 232, 91–101. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
Lu, L.; Meng, X.; Mao, Z.; Karniadakis, G.E. DeepXDE: A deep learning library for solving differential equations. SIAM Rev. 2021, 63, 208–228. [Google Scholar] [CrossRef]
Sirignano, J.; Spiliopoulos, K. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 2018, 375, 1339–1364. [Google Scholar] [CrossRef]
Zhu, Y.; Zabaras, N.; Koutsourelakis, P.S.; Perdikaris, P. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 2019, 394, 56–81. [Google Scholar] [CrossRef]
Chen, Y.; Lu, L.; Karniadakis, G.E.; Dal Negro, L. Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Opt. Express 2020, 28, 11618–11633. [Google Scholar] [CrossRef]
Lu, L.; Pestourie, R.; Yao, W.; Wang, Z.; Verdugo, F.; Johnson, S.G. Physics-informed neural networks with hard constraints for inverse design. SIAM J. Sci. Comput. 2021, 43, B1105–B1132. [Google Scholar] [CrossRef]
Jagtap, A.D.; Mao, Z.; Adams, N.; Karniadakis, G.E. Physics-informed neural networks for inverse problems in supersonic flows. J. Comput. Phys. 2022, 466, 111402. [Google Scholar] [CrossRef]
Son, H.; Lee, M. A PINN approach for identifying governing parameters of noisy thermoacoustic systems. J. Fluid Mech. 2024, 984, A21. [Google Scholar] [CrossRef]
Yang, L.; Meng, X.; Karniadakis, G.E. B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 2021, 425, 109913. [Google Scholar] [CrossRef]
Blalock, D.; Gonzalez Ortiz, J.J.; Frankle, J.; Guttag, J. What is the state of neural network pruning? Proc. Mach. Learn. Syst. 2020, 2, 129–146. [Google Scholar]
Wang, S.; Sankaran, S.; Perdikaris, P. Respecting causality for training physics-informed neural networks. Comput. Methods Appl. Mech. Eng. 2024, 421, 116813. [Google Scholar]
Son, H.; Cho, S.W.; Hwang, H.J. Enhanced physics-informed neural networks with augmented Lagrangian relaxation method (AL-PINNs). Neurocomputing 2023, 548, 126424. [Google Scholar]
Jung, J.; Kim, H.; Shin, H.; Choi, M. CEENs: Causality-enforced evolutional networks for solving time-dependent partial differential equations. Comput. Methods Appl. Mech. Eng. 2024, 427, 117036. [Google Scholar]
Li, Y.; Chen, S.; Shan, B.; Huang, S.J. Causality-enhanced discreted physics-informed neural networks for predicting evolutionary equations. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, Jeju, Republic of Korea, 3–9 August 2024; pp. 4497–4505. [Google Scholar]
Jo, H.; Son, H.; Hwang, H.J.; Kim, E. Deep neural network approach to forward-inverse problems. arXiv 2019, arXiv:1907.12925. [Google Scholar]
Han, J.; Jentzen, A.; E, W. Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 2018, 115, 8505–8510. [Google Scholar]
E, Y.; Yu, B. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 2018, 6, 1–12. [Google Scholar]
Jagtap, A.D.; Kharazmi, E.; Karniadakis, G.E. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 2020, 365, 113028. [Google Scholar]
Mishra, S.; Molinaro, R. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA J. Numer. Anal. 2023, 43, 1–43. [Google Scholar]
Bardos, C.; Lebeau, G.; Rauch, J. Un exemple d’utilisation des notions de propagation pour le contrôle et la stabilisation de problemes hyperboliques. Rend. Sem. Mat. Univ. Politec. Torino 1988, 46, 11–31. [Google Scholar]
Le Rousseau, J.; Lebeau, G.; Terpolilli, P.; Trélat, E. Geometric control condition for the wave equation with a time-dependent observation domain. Anal. PDE 2017, 10, 983–1015. [Google Scholar]
Zhang, M.; Li, Q.; Liu, J. On stability and regularization for data-driven solution of parabolic inverse source problems. J. Comput. Phys. 2023, 474, 111769. [Google Scholar] [CrossRef]
Isakov, V. Inverse Problems for Partial Differential Equations; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Prilepko, A.I.; Kostin, A.B. On certain inverse problems for parabolic equations with final and integral observation. Mat. Sb. 1992, 183, 49–68. [Google Scholar] [CrossRef]
Prilepko, A.I.; Orlovsky, D.G.; Vasin, I.A. Methods for Solving Inverse Problems in Mathematical Physics; CRC Press: Boca Raton, FL, USA, 2000. [Google Scholar]
Zhang, H.; Liu, J. Solving an inverse source problem by deep neural network method with convergence and error analysis. Inverse Probl. 2023, 39, 075013. [Google Scholar]

Figure 1. Schematic illustration of CI-PINNs. The definitions of causal weights vary depending on the problem and are detailed in the following sections.

Figure 2. (Left) Visualization of the domain

D_{T}

and observation domain

D_{T}^{'}

for the wave equation. Blue dots represent interior collocation points, while gray dots indicate boundary and observation data points where supervision is provided. (Right) An illustration depicting the propagation of data weights upward from the observation subdomain

D_{T}^{'}

into the domain

D_{T}

, as well as the inward propagation of spatial weights from the domain boundaries. Arrows indicate the direction of influence for each weight type.

Figure 2. (Left) Visualization of the domain

D_{T}

and observation domain

D_{T}^{'}

for the wave equation. Blue dots represent interior collocation points, while gray dots indicate boundary and observation data points where supervision is provided. (Right) An illustration depicting the propagation of data weights upward from the observation subdomain

D_{T}^{'}

into the domain

D_{T}

, as well as the inward propagation of spatial weights from the domain boundaries. Arrows indicate the direction of influence for each weight type.

Figure 3. (Upper left) Exact solution

u (x, t)

; (upper right) CI-PINNs’ predicted solution; (lower left) pointwise absolute error; (lower right)

L^{2}

relative error during training. Three methods are compared: PINNs (blue), Causal PINNs (green), and CI-PINNs (red).

Figure 3. (Upper left) Exact solution

u (x, t)

; (upper right) CI-PINNs’ predicted solution; (lower left) pointwise absolute error; (lower right)

L^{2}

relative error during training. Three methods are compared: PINNs (blue), Causal PINNs (green), and CI-PINNs (red).

Figure 4. Evolution of causal weights over different epochs. In training, weights concentrate on critical regions; the influence spreads across the domain as training proceeds.

Figure 5. (Upper left) Exact source term

f (x, y)

; (upper right) reconstructed source term using CI-PINNs; (lower left) pointwise absolute error

| f - f^{*} |

between the exact solution

f^{*}

and the CI-PINNs’ prediction; (lower right)

L^{2}

relative error convergence during training for three methods: PINNs (blue), Causal PINNs (green), and CI-PINNs (red).

Figure 5. (Upper left) Exact source term

f (x, y)

; (upper right) reconstructed source term using CI-PINNs; (lower left) pointwise absolute error

| f - f^{*} |

between the exact solution

f^{*}

and the CI-PINNs’ prediction; (lower right)

L^{2}

relative error convergence during training for three methods: PINNs (blue), Causal PINNs (green), and CI-PINNs (red).

Figure 6. Evolution of the 3D causal weights during training for the parabolic inverse problem. Each column represents a different time step (e.g.,

t = \frac{3}{15}, \frac{7}{15}, \frac{11}{15}, 1

), and each row corresponds to a training epoch (e.g., 400, 800, 1200, 1600). Initially, the weights concentrate on critical regions (boundaries, initial conditions, and observation areas), and as training progresses, they gradually propagate into the interior, facilitating improved accuracy and stability.

Figure 6. Evolution of the 3D causal weights during training for the parabolic inverse problem. Each column represents a different time step (e.g.,

t = \frac{3}{15}, \frac{7}{15}, \frac{11}{15}, 1

), and each row corresponds to a training epoch (e.g., 400, 800, 1200, 1600). Initially, the weights concentrate on critical regions (boundaries, initial conditions, and observation areas), and as training progresses, they gradually propagate into the interior, facilitating improved accuracy and stability.

Figure 7. (Left) Visualization of the domain D and observation subdomain

D_{s}

for the elliptic inverse problem. Blue points indicate interior training points, while gray points represent observation data points. (Right) A schematic illustration of how

Ω_{0} = D_{s}

expands layer by layer (

Ω_{1}, Ω_{2}, \dots

toward the full domain

Ω_{m} = D

). The blue arrows represent data weights propagating outward from the observation subdomain

D_{s}

, while the red arrows indicate spatial weights moving inward from the

\partial D

.

Figure 7. (Left) Visualization of the domain D and observation subdomain

D_{s}

for the elliptic inverse problem. Blue points indicate interior training points, while gray points represent observation data points. (Right) A schematic illustration of how

Ω_{0} = D_{s}

expands layer by layer (

Ω_{1}, Ω_{2}, \dots

toward the full domain

Ω_{m} = D

). The blue arrows represent data weights propagating outward from the observation subdomain

D_{s}

, while the red arrows indicate spatial weights moving inward from the

\partial D

.

Figure 8. Visualization of results for the elliptic inverse problem. The top row compares the exact solution u (upper left) with the CI-PINN prediction (upper middle) and their absolute difference

| u - u^{*} |

(upper right). The bottom row illustrates the exact source term f (upper left), the CI-PINNs’ reconstruction (upper middle), and their absolute difference

| f - f^{*} |

(upper right).

Figure 8. Visualization of results for the elliptic inverse problem. The top row compares the exact solution u (upper left) with the CI-PINN prediction (upper middle) and their absolute difference

| u - u^{*} |

(upper right). The bottom row illustrates the exact source term f (upper left), the CI-PINNs’ reconstruction (upper middle), and their absolute difference

| f - f^{*} |

(upper right).

Figure 9. Evolution of spatial causal weights during the training process for the elliptic problem. The central rectangular region corresponds to the observation domain

D_{s}

, where data are available. The edges represent the domain boundaries

\partial D

. The weights progressively spread outward as training advances, enabling accurate reconstruction throughout the domain.

Figure 9. Evolution of spatial causal weights during the training process for the elliptic problem. The central rectangular region corresponds to the observation domain

D_{s}

, where data are available. The edges represent the domain boundaries

\partial D

. The weights progressively spread outward as training advances, enabling accurate reconstruction throughout the domain.

Figure 10. Convergence comparison for the elliptic inverse problem. The left panel shows the

L^{2}

error for the solution u, while the right panel illustrates the

L^{2}

error for the source term f.

Figure 10. Convergence comparison for the elliptic inverse problem. The left panel shows the

L^{2}

error for the solution u, while the right panel illustrates the

L^{2}

error for the source term f.

Table 1. Summary of the maximum number of training iterations and the corresponding iteration time (in seconds) for the wave, parabolic, and elliptic inverse problems. The “Max iteration” row indicates the total training iterations for each problem, while the remaining rows show the average iteration time for the vanilla PINNs and CI-PINNs, respectively.

		Wave	Parabolic	Elliptic
Max iteration		$8 \times 10^{5}$	$5 \times 10^{5}$	$8 \times 10^{5}$
Iteration time (s)	PINNs	0.006	0.026	0.009
Iteration time (s)	CI-PINNs	0.006	0.026	0.019

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.; Son, H. Causality-Aware Training of Physics-Informed Neural Networks for Solving Inverse Problems. Mathematics 2025, 13, 1057. https://doi.org/10.3390/math13071057

AMA Style

Kim J, Son H. Causality-Aware Training of Physics-Informed Neural Networks for Solving Inverse Problems. Mathematics. 2025; 13(7):1057. https://doi.org/10.3390/math13071057

Chicago/Turabian Style

Kim, Jaeseung, and Hwijae Son. 2025. "Causality-Aware Training of Physics-Informed Neural Networks for Solving Inverse Problems" Mathematics 13, no. 7: 1057. https://doi.org/10.3390/math13071057

APA Style

Kim, J., & Son, H. (2025). Causality-Aware Training of Physics-Informed Neural Networks for Solving Inverse Problems. Mathematics, 13(7), 1057. https://doi.org/10.3390/math13071057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Causality-Aware Training of Physics-Informed Neural Networks for Solving Inverse Problems

Abstract

1. Introduction

2. Preliminaries

2.1. Physics-Informed Neural Networks (PINNs)

2.2. Inverse PINNs

2.3. Causal Training for Physics-Informed Neural Networks

3. Inverse Problem for the Wave Equation

3.1. The Underlying Inverse Problem

3.2. Residuals and Network Smoothness

3.3. Causal Weights

3.4. Loss Function

3.5. Numerical Experiments

4. Inverse Source Problem for a Parabolic Equation

4.1. The Underlying Inverse Problem

4.2. Residuals and Network Smoothness

4.3. Causal Weights

4.4. Loss Function

4.5. Numerical Experiments

5. Inverse Source Problem for Elliptic Equation

5.1. The Underlying Inverse Problem

5.2. Residuals and Network Smoothness

5.3. Causal Weights

5.4. Loss Function

5.5. Numerical Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI