1. Introduction
Recent breakthroughs in deep learning have transformed numerous fields, including computer vision and natural language processing [
1,
2,
3,
4,
5,
6,
7]. In parallel, partial differential equations (PDEs) and delay differential equations (DDEs) have long been essential tools for modeling nonlinear dynamics in natural and engineered systems. These equations capture complex phenomena such as turbulence, pattern formation, and chaotic behavior and have been the subject of extensive theoretical and numerical investigations. Recent studies have provided significant advancements in this field, as demonstrated in works such as [
8,
9,
10,
11]. Furthermore, recent investigations have extended our understanding of nonlinear and delay systems. Xiuli Xu [
8] analyzed the asymptotic limit of the Navier–Stokes–Poisson–Korteweg system in a half-space, providing key insights into boundary layer phenomena and long-term dynamics in complex fluid systems. In the realm of geophysical flows, Boling Guo [
10] studied the decay of solutions to a two-layer quasi-geostrophic model, elucidating the mechanisms of energy dissipation and stabilization in stratified environments. Moreover, Wentao Wang has made significant contributions to the stability analysis of delay systems; his work on the global exponential stability of periodic solutions for inertial delayed BAM neural networks [
9] and his study employing the characteristics method to investigate delayed inertial neural networks [
11] offer rigorous frameworks to ensure robustness in systems affected by delays. These studies deepen our theoretical understanding and underscore the importance of integrating accurate physical modeling into data-driven approaches.
This progress has given rise to Physics-Informed Neural Networks (PINNs), which incorporate governing equations, initial conditions, and boundary conditions directly into the neural network’s loss function [
12,
13]. By leveraging automatic differentiation, PINNs accurately compute the necessary derivatives, ensuring that the learned solutions conform to the underlying physics even when data are sparse or noisy. In this framework, the unknown solution of a PDE is represented as a deep neural network, and all physical constraints are enforced simultaneously through a composite loss function [
12,
13,
14]. PINNs have proven effective for solving forward problems such as fluid dynamics, heat conduction, and electromagnetics [
13,
15,
16]. PINNs compute the solutions to the PDEs when all necessary conditions are provided.
Inverse problems governed by partial differential equations (PDEs), where one must infer unknown quantities in the system from limited measurements, are ubiquitous in science and engineering, with applications spanning material characterization to biomedical imaging. However, these problems are inherently challenging due to their ill-posed nature and the need to adhere to fundamental physical laws. Inverse PINNs address these challenges by incorporating data loss terms that align predictions with sparse and noisy observations while satisfying the physical principles. Recent progress in diverse scientific problems, including material science, biomedical imaging, and system identification, exhibits the inverse PINNs’ effectiveness [
13,
17,
18,
19,
20]. Additionally, Bayesian PINNs have been introduced to incorporate prior knowledge and quantify uncertainty in inverse problems, providing a probabilistic framework for PDE-constrained learning [
21]. Meanwhile, sparsity-aware neural networks enforce parsimonious representations by penalizing unnecessary parameters, which can be particularly advantageous for high-dimensional or ill-posed systems [
22]. Integrating these approaches with causal training strategies may further enhance the robustness and interpretability of inverse PDE solvers.
Despite their effectiveness, both conventional and inverse PINNs share a critical limitation: they often fail to enforce the inherent causal structure of physical systems, especially in time-dependent problems. Temporal causality mandates that solutions evolve naturally from initial conditions and progress continuously. Without enforcing this principle, training may converge to non-physical or unstable solutions. Recent studies have introduced causal training frameworks [
23] and augmented Lagrangian methods [
24] to tackle this issue by prioritizing learning during early time steps or near boundaries. More recently, the training paradigm that respects temporal causality has been developed by incorporating forward numerical solvers and multiple neural networks [
25,
26]. However, researchers have not yet adequately adapted these approaches for inverse problems, which require balancing the influence of additional observational data with physical constraints because current methods focus primarily on temporal causality. This limitation arises from the restrictive design of the causal weights, which motivates the current work.
In this paper, we propose a novel framework, Causal Inverse PINNs (CI-PINNs), a causality-respecting training framework of PINNs for solving inverse problems. Unlike conventional approaches, CI-PINNs integrate multidirectional causal weights—covering temporal, spatial, and data dimensions—directly into the loss function. This innovative strategy enables the model to prioritize supervised regions, such as initial and boundary conditions and observational data, thereby enforcing the natural causal structure of the underlying physical system. Consequently, our framework produces more stable and physically consistent reconstructions.
However, the development and application of CI-PINNs come with several technical challenges. First, the design and tuning of the causal weights require careful consideration; the weights must accurately reflect the significance of various regions in the domain while ensuring a smooth transition from supervised to unsupervised areas. In particular, selecting an optimal causality parameter (e.g., ) is critical, as inappropriate values can lead to insufficient propagation of supervision or cause an overly rapid convergence that masks the true dynamics of the system. The computational overhead introduced by computing multidimensional causal weights can be nontrivial, especially when scaling to high-dimensional or more complex physical problems. Addressing these challenges is essential for ensuring that the benefits of the proposed method are realized in practical applications.
We organize this paper as follows. First, we offer a comprehensive review of PINNs by examining the core mechanisms used to solve forward problems and exploring the extensions developed for inverse problems. This review synthesizes insights from seminal works (e.g., [
12,
27,
28,
29]) as well as modern reviews on physics-informed machine learning. Second, we offer an in-depth examination of state-of-the-art developments in inverse PINNs and related parameter identification methods, incorporating recent advances in Bayesian approaches and sparse identification techniques as detailed in [
21,
22]. Third, we introduce the concept of CI-PINNs—a novel methodology that leverages temporal, spatial, and data causal weights to balance residual minimization. Building on recent causal training strategies [
23], we propose an approach that addresses the imbalance between observational data and physical constraints. Finally, we validate the proposed framework through benchmark inverse problems, including an inverse problem for the wave equation, a parabolic inverse source problem, and an elliptic inverse source problem. The experimental results demonstrate substantial improvements in convergence behavior and reconstruction fidelity compared to conventional inverse PINNs.
This work deepens our understanding of how physics-informed machine learning can address challenging inverse problems. It advances the state of the art through a causality-aware training paradigm. By explicitly enforcing the sequential structure inherent in time-dependent and spatially distributed processes, our framework provides a robust, generalizable solution for a broad spectrum of scientific and engineering applications.
2. Preliminaries
2.1. Physics-Informed Neural Networks (PINNs)
Jo et al. [
27] proposed PINNs to infer the solutions of partial differential equations (PDEs). Consider a generic time-dependent PDE defined as:
where
denotes a differential operator, and
and
represent the initial and boundary operator, respectively. In the PINN literature, the solution to the PDEs (
1)–(3) is approximated using a neural network
, where
denotes the vector consisting of all trainable parameters, including weights and biases. To ensure that
satisfies the PDE, the model is trained by minimizing a composite loss function evaluated over the
grid. This loss function is designed to enforce the conditions in Equations (
1)–(3) [
12,
13].
Specifically, the training process minimizes the following Mean Squared Error (MSE)-based loss:
with
Here, is the residual loss that ensures the PDE holds at interior points, enforces the initial conditions, and enforces the boundary conditions.
The grid points , , and are sampled from the domain. In our case, these points are uniformly sampled such that , , and .
To efficiently compute gradients concerning input variables or the network parameters , PINNs leverage automatic differentiation. This capability is crucial for accurately calculating the PDE residuals and ensuring the network adheres to the underlying physics. Additionally, the hyper-parameters can be manually set or adapted during training to balance the contributions of the different loss terms, thus optimizing the model’s performance.
2.2. Inverse PINNs
PINNs can naturally be extended to address inverse problems, which involve inferring unknown quantities of a physical system from observed data. The unknown quantities are generally represented as parameters, functions, or other hidden physical properties included in the PDE. Inverse problems aim to determine the unknown quantities that are aligned with both the observed data and the governing equations defined in (
1), (2), and (3). Such problems arise in various fields, including material science, biomedical imaging, and fluid dynamics.
Consider a PDE with unknown quantity
f defined by:
In this formulation, the additional constraint
specifies that, in the data subdomain
, the solution
u must match the observed data
. This condition provides an extra supervision signal to guide the inversion process and ensure that the learned solution is consistent with the available measurements in
. The inverse PINNs approximate both
f and
u using neural networks
and
, respectively, thereby ensuring that the governing PDEs are satisfied while aligning the network’s output with the observed data in
.
To achieve this, the loss function for inverse PINNs builds upon the standard PINNs’ loss function by incorporating an additional data loss term [
30]. The total loss function is expressed as:
where initial condition loss
and boundary condition loss
are defined as in the forward PINNs’ formulation. The additional residual loss term and data loss term are given by:
where
denotes the observed data at the sampled points
. The hyper-parameter
controls the influence of this data loss relative to the other loss components. By optimizing both
and
, inverse PINNs seek a solution that respects the governing PDEs while matching the observed data.
However, conventional inverse PINNs face a significant limitation: they do not explicitly account for the causal nature of dynamic systems. Temporal causality ensures that the predicted solution evolves naturally from the initial conditions and progresses consistently. Without such considerations, the model may converge to unstable or non-physical solutions, especially in time-dependent problems. Recent studies have proposed frameworks that integrate causal training mechanisms [
23], which sequentially prioritize residuals to enforce causality. These advancements lay the foundation for extending inverse PINNs into causality-aware domains—a concept we explore further in this work.
2.3. Causal Training for Physics-Informed Neural Networks
Conventional PINN training can yield solutions that neglect temporal causality, which is crucial for dynamic systems. A causal training framework was introduced in [
23] to address this issue, aligning the training process with the natural sequential structure of physical systems.
First, the training objective is reformulated to prioritize learning time sequentially. Specifically, the residual loss function is modified by incorporating a weighting mechanism that emphasizes earlier time points. The basic loss function (
4) is rewritten by separating the contributions of time and space as follows:
Here,
and
denote the number of temporal and spatial discretization points, respectively—that is,
is the total number of time points sampled with
and
is the number of spatial grid points sampled within the domain
D. The loss function for the spatial component at a given time
becomes:
Next, to enforce the inherent causality of the system, a time-dependent weighting function
is introduced. This function assigns greater importance to earlier time steps so that the model focuses on minimizing residuals at these points before addressing later times:
Here,
is the causality parameter controlling the weight decay. In [
23],
was experimentally chosen from the range [100, 10, 1, 0.1, 0.01, 0.001]. When applying
to the loss function (
6), the overall training objective becomes:
Although this weighting mechanism is designed primarily for temporal causality, it can be extended to spatial dimensions. During training, introducing an analogous spatial weighting function can prioritize points closer to given conditions—such as boundary values or observed data. This spatial extension complements the temporal weighting mechanism and lays the groundwork for addressing both temporal and spatial causality, as further explored in the context of inverse problems.
Figure 1 illustrates the proposed CI-PINN algorithm.
Hyperparameter Selection: In all subsequent CI-PINN experiments, we utilize the causality parameter as introduced above to control the decay behavior of causal weights. To determine an optimal value for , we experimentally evaluated candidate values from the set {10, 1, 0.1, 0.01, 0.001}. Candidate values that resulted in insufficient propagation (i.e., the weights did not spread effectively from the observation and boundary regions) or led to an overly rapid convergence (i.e., the weights quickly converged to 1 across the domain) were excluded. Among the remaining candidates, we selected the value that produced the smallest relative error on the validation set as the optimal . This systematic selection process ensures that the causal weights are balanced and effectively guide the training process for stable and accurate reconstructions. The following sections illustrate how these ideas are applied to specific inverse problems.
Causal weights: In
Section 2.3, we introduced the concept of causal weights using notation similar to
in Equation (6), focusing primarily on the temporal dimension. In this work, we adopt a more general representation for the causal weights,
. Henceforth, we use
w to denote these causal weights in all scenarios. Each experiment section will specify how
w is constructed (e.g., temporal, spatial, or data causal weights) to emphasize critical regions such as boundaries, initial conditions, or observation points. Although the formulation of these weights is based on a relatively simple functional form, their effective integration into our inverse PINN framework poses several challenges. Unlike conventional applications, where a simple decay function is applied to a single scalar term, our method requires a coordinated balance among multiple causal weights across all dimensions. Determining an optimal value for
is particularly demanding because the chosen parameter must allow supervision to propagate smoothly from regions with strong data support (e.g., boundaries and initial conditions) to areas with sparse information, all without compromising convergence or stability. In essence, while the mathematical basis for the weights is well established, combining them in a multidimensional setting to enforce the complex causal structure of physical systems necessitates careful analysis and calibration, highlighting the sophisticated nature of our approach.
4. Inverse Source Problem for a Parabolic Equation
4.1. The Underlying Inverse Problem
To demonstrate the versatility of our CI-PINN framework, we consider an inverse source problem governed by a parabolic equation as described in [
34]. Parabolic equations are widely used to model diffusion processes such as heat conduction and pollutant dispersion. The governing equation for the parabolic system is given by:
where
is a bounded domain with a smooth boundary,
denotes the boundary conditions, and
represents the initial condition. For the forward problem, if the source function
, and the initial and boundary data are all known, it is well posed in appropriate function spaces. In many practical applications, however, the internal source function
is difficult to measure directly. Instead, one assumes
is given in advance and recovers
by using the final time measurement
Thus, the inverse problem is to recover the unknown spatial source
(and, consequently, the entire solution
) from the final time data
. It is well known that the inverse source problem of recovering
from the final time data is inherently ill posed, meaning that small perturbations in
could lead to significant errors in
. Nevertheless, when
is noise free and sufficiently regular (for instance,
), uniqueness and conditional stability results (see, e.g., [
35,
36,
37]) ensure that a unique solution exists in a regularized sense. We consider a two-dimensional model with
and
The exact solution of the parabolic problem is chosen as
for
This function satisfies the zero initial condition
and homogeneous boundary conditions. From direct computation, the exact source term is given by
and the known temporal source term is
Thus, the exact final time measurement is
In our reconstruction scheme, both and are approximated by deep neural networks and , respectively. We incorporate derivative terms of the residuals into the loss function to mitigate the inherent ill-posed nature of the inverse source problem, thereby enforcing higher regularity in the solution. This approach stabilizes the inversion and yields improved reconstruction accuracy compared to standard least-squares loss functions. Our numerical experiments demonstrate that this framework yields enhanced accuracy in recovering the unknown source and the solution .
4.2. Residuals and Network Smoothness
By using the activation function, we can assume that the neural network approximations and belong to . Therefore, the network and its derivatives up to second order are uniformly bounded on .
The PDE residual is defined as
To enforce the boundary and initial conditions, we define:
and the data residual as
4.3. Causal Weights
To enhance stability and accuracy, we incorporate causal weights, prioritizing learning in critical regions across the spatial dimensions x and y and the temporal dimension t. Given that boundary conditions, initial conditions, and observation data are specified, we define separate causal weights for each dimension.
Spatial Weights for x and y: For each spatial coordinate, we define weights that emphasize the domain boundaries. For the
x-dimension:
Similarly, for the
y-dimension:
Temporal weights: For the time dimension, we define the causal weights to emphasize the initial time:
Data weights: Since data are provided at
in the time dimension, we can define the weights with a focus on this point as follows:
The overall causal weight for grid points
is then given by
These weights are incorporated into the loss function to prioritize learning near boundaries, initial conditions, and observation data.
4.4. Loss Function
The standard loss function combines the residuals for the PDE, boundary, initial, and data constraints:
Following the modified loss proposed in [
34], we further incorporate terms that involve time derivatives of the residual and additional quadrature weights, leading to
where
are causal weights for PDE and PDE differentiated concerning
t, respectively. These additional terms help enforce the PDE and its temporal derivatives, improving overall reconstruction accuracy.
4.5. Numerical Experiments
We conduct experiments by adopting the setup in [
34]. Two separate neural networks approximate
and
. Both networks are fully connected with two hidden layers containing 20 neurons each and use the
activation function.
The total number of collocation points is , and the points are sampled uniformly from . Boundary points are chosen along , initial points are taken at , and data points correspond to the observation domain at . This diversified sampling ensures that the model learns from all critical regions.
The initial learning rate is , decaying by a factor of 10 every iterations to facilitate convergence. In our experiments, the optimal value of was determined to be 0.1. The loss function incorporates the causal weights, emphasizing critical regions near the observation data at and along the boundaries.
Figure 5 shows the reconstruction accuracy. The upper panels compare the exact source term
with its reconstruction obtained using CI-PINNs, showing excellent agreement. The lower left panel illustrates the absolute error
between the exact and predicted solutions, with only minor discrepancies in regions where observation data are sparse. Notably, the lower right panel shows the
relative error convergence: conventional PINNs converge to a final error of 0.0855, Causal PINNs reduce this to 0.0673, and CI-PINNs further lower it to 0.0397, an improvement of approximately 53.6% over conventional PINNs. This substantial improvement confirms that integrating 3D causal weights enhances reconstruction accuracy significantly. Moreover, despite the cubic nature of the 3D causal weights, the computational overhead remains minimal. The iteration speed for CI-PINNs is comparable to that of conventional PINNs, indicating that the improved stability and accuracy do not come at a steep computational cost (
Table 1).
Figure 6 visualizes the evolution of the 3D causal weights. Each column represents a different time step (e.g.,
), and each row corresponds to a training epoch (e.g., 400, 800, 1,200, 1,600). Initially, the weights are concentrated on the boundaries, initial conditions, and observation regions; as training progresses, they gradually propagate into the interior, enabling the network to capture the solution’s local and global features.
Overall, the experimental results validate that CI-PINNs significantly outperform conventional PINNs for the inverse source problem of a parabolic equation. The effective integration of 3D causal weights, which balance the contributions from the spatial and temporal domains, results in faster convergence and improved accuracy in determining both the solution and the unknown spatial source term .
6. Conclusions
In this paper, we introduced a novel framework called Causal Inverse PINNs (CI-PINNs), marking a significant advancement in physics-informed machine learning for solving inverse problems governed by partial differential equations. By integrating multidirectional causal weights into the loss function, our approach prioritizes learning in supervised regions to capture physical systems’ inherent causal structure, such as initial conditions, boundary conditions, and observational data. This innovative strategy directly addresses one of the key limitations of conventional PINNs, resulting in enhanced stability and improved reconstruction accuracy.
Comprehensive numerical experiments on the wave equation and inverse source problems for the parabolic and elliptic equation demonstrated that CI-PINNs significantly outperform standard methods in convergence behavior and solution fidelity. The framework’s ability to enforce the natural causal progression within the system enables more accurate prediction of unknown parameters and solutions, even in sparse data scenarios. These results underscore the robustness and generalizability of our proposed method across various types of inverse problems.
While incorporating causal weights represents a major innovation, the development and application of CI-PINNs also attain several practical challenges. In particular, careful tuning of the causality parameter is essential, as suboptimal settings can impede effective supervision propagation or lead to premature convergence. Additionally, the extra computational overhead of calculating multidimensional causal weights poses a challenge when extending the approach to high-dimensional or more complex systems. Another notable difficulty arises from applying the method to diverse domain geometries and configurations. The performance and stability of CI-PINNs can vary significantly depending on the domain shape, boundary conditions, and data distribution, indicating that further investigation is needed to adapt and optimize the framework for a broader range of practical scenarios.
In summary, CI-PINNs not only establish a new benchmark for stability and accuracy in solving inverse problems but also offer a flexible and scalable framework that can be adapted to a wide range of scientific and engineering applications. We believe this work paves the way for diverse future research directions, including developing adaptive weighting strategies, integrating uncertainty quantification techniques, and extending the framework to tackle nonlinear multi-physics systems, more complex domain geometries, and realistic noisy data scenarios.