The combination of multiple loss functions plays a significant role in the convergence of PINNs [
10]. The most common way to combine losses of each constraint is the weighted summation. These are either nonadaptive or require training many times at an increased computational cost. Here, we propose a simple procedure using fully trainable weights. It is in line with the idea of neural network adaptation, that is, the dynamic weights in the loss function are updated together with network parameters through backpropagation.
3.1. Dynamic Weights Strategy for Physics-Informed Neural Networks
We define residuals to be given by the left-hand-side of Equations (
1a) and (
1b); i.e.,
and proceed by approximating
by neural networks. This assumption, along with Equation (
2b) results in physical constraints
.
indicates that the numerical solution satisfies the conservation of momentum, and
indicates that the numerical solution satisfies the conservation of mass. The physical constraints of the network can be derived by applying the chain rule for differentiating compositions of functions using automatic differentiation. In order to balance the training of the residuals in each part of the loss, we multiply the trainable weights before each residual term of the PINNs loss function. The objective function is defined as follows
where
and
Here,
are the newly introduced balance parameters.
denote observed data (if any)
denote the initial and boundary training data on
and
specify the collocations points for
.
is the number of corresponding data. The sampling method of initial and boundary training data is random selection at the corresponding boundary. The selection method of the collocations points is Latin hypercube sampling. We determine unknown parameters by performing the following tasks
This can be accomplished by a gradient descent/ascent procedure, with updates given by
where
is the learning rate for the
kth step in the process of updating the network parameters,
is the learning rate for the
kth step in the process of updating the balance parameters. Considering the dynamic weight
, to fix ideas, we see that
The sequence of weights
is monotonically nondecreasing, provided that
is initialized to a non-negative value. Furthermore, (
7) shows that the magnitude of the gradient, and therefore of the update, is larger when the mean squared error
is large. This progressively penalizes the network more for not fitting the initial points closely. Notice that any of the weights can be set to fixed, non-trainable values if desired. For example, by setting
, only the weights of the initial and collocation points would be trained. If necessary, the weight can also be changed to other types of functions. The convergence of the weight sequence plays an important role in the stability of the Min system. Next, we prove the convergence of
. From (
6), we can obtain
According to (
7)
Therefore . According to the principle of compression mapping, is convergent. has an upper bound. The analysis for ,, is the same as .
Remark 1. From Formula (8), it can be seen that the convergence of the sequence depends on the monotonous decrease in the . The mean square error is theoretically monotonically decreased, so the weight sequence is theoretically convergent. However, in actual training, the mean square error is not strictly monotonically decreasing, which may have a little adverse effect on the performance of our method. In order to overcome the above problems as much as possible, Algorithm is adopted. To strengthen the condition of balancing weight update, we update the weight only when the condition
is satisfied. As far as possible, to make the weight sequence meet the convergence conditions, reducing the fluctuation of
has an adverse effect on our method. In our implementation of dwPINNs, we use Tensorflow with a fixed number of iterations of Adam followed by another fixed number of iterations of the L-BFGS quasi-newton method [
27,
28]. This is consistent with the PINNs formulation in [
9], as well as follow-up literature [
11]. The adaptive weights are only updated in the Adam training steps and are held constant during L-BFGS training. The dwPINNs algorithm is summarized.
Algorithm 1: Dynamic weights strategy for PINNs |
|
3.2. A Brief Note on the Errors Involved in the dwPINNs Methodology
Let
and
u be the family of functions that can be represented by the chosen neural network and the exact solution of PDE. Then, we define
as the best approximation to the exact solution
u. Let
be the solution of net at global minimum and
be the solution of net at local minimum. Therefore, the total error consists of an approximation error
the optimization error
and the generalization error
In PINNs, the number and location (distribution) of residual points are two important factors that affect the generalization error [
13]. The optimization error is introduced due to the complexity of the loss function. The performance of PINNs is closely related to the appropriate combination of loss terms, which may avoid local optimization. Thus, the total error in PINNs as