3.2.2. Training of DBNN Model

The dataset in this study is obtained by spatio-temporal matching of CYGNSS data and SMAP data ranging from May to September 2020, where CYGNSS data provide DDMs and seven features, as shown in Table 3, and SMAP data offer vegetation information and the target label derived from the classification results of inundated versus non-inundated areas classified by SMAP soil moisture [43]. In this study, 50,000 samples were randomly selected from the dataset as the sample set of the DBNN model, and the sample set was divided into training, validation, and testing subsets at a rate of 80%, 15%, and 5%, respectively [15]. These subsets are designed to provide sufficient data to train the network, evaluate its performance, and tune the hyper-parameters. In addition, all the remaining samples in the dataset are predicted by the DBNN model for the inversion of flood monitoring.

The information forward propagation and the error back-propagation algorithms are adopted for the training of the DBNN model. During the forward propagation, the parameters of the neural network are constant, while in the backward propagation, the parameters are automatically updated to minimize the loss function using the Adam optimizer. The loss function of the model is given in Equation (6):

$$L = \frac{1}{M} \sum\_{m=1}^{M} \sum\_{k=1}^{K} \mathcal{Y}\_m^k \times \log(h\_{\theta}(\mathbf{x}\_{m\prime} k)) \tag{5}$$

where *M* is the number of training samples in each round; *K* is the number of classes, set to 2 in this study; *y<sup>k</sup> <sup>m</sup>* is the target label, which is derived from the classification results of inundated versus non-inundated areas classified by SMAP soil moisture [43], for training example *m* for class *k*; *x* is the input for training example *m*; *h<sup>θ</sup>* is the neural network model with weights *θ*. In each round of training, the weights *θ* of the DBNN model are updated according to the following formula:

$$\begin{cases} \mathbf{g} = \nabla\_{\theta} \mathbf{L} \\ m = \beta\_1 m + (1 - \beta\_1) \mathbf{g} \\ s = \beta\_2 s + (1 - \beta\_2) \mathbf{g}^2 \\ \boldsymbol{\eta} \mathbf{i} = \frac{m}{1 - \beta\_1^t} \\ \boldsymbol{\xi} = \frac{s}{1 - \beta\_1^t} \\ \boldsymbol{\theta} = \boldsymbol{\theta} - \eta \boldsymbol{\eta} \mathbf{i} / \sqrt{\boldsymbol{\xi} + \boldsymbol{\varepsilon}} \end{cases} \tag{6}$$

where ∇*<sup>θ</sup>* and *g* denote the gradient operation and the gradient value of the loss function *L* on the parameter *θ*; *η* is the learning step, defaulted to 0.001; *m* and *s* denote the first- and second-order moment variables (initialized to be 0), respectively; *β*<sup>1</sup> and *β*<sup>2</sup> represent the exponential decay coefficients of *m* and *s*;  is a small constant, set to 10<sup>−</sup>6; *t* is the number of iterations, which is set to 300 in this experiment.

Specifically, the training process of the proposed DBNN model can be described as follows:

