*2.1. Traditional DBN Method*

The DBN can be approximated as a stack of restricted Boltzmann machines (RBMs) [28], as shown in Figure 1a. An RBM is a generative stochastic network containing a visible layer *<sup>v</sup>* <sup>=</sup> {*vi*}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> and a hidden layer *<sup>h</sup>* <sup>=</sup> *hj <sup>M</sup> <sup>j</sup>*=<sup>1</sup> with the parameters *<sup>θ</sup>* <sup>=</sup> {*W*, *<sup>d</sup>*, *<sup>c</sup>*}. The energy function and the likelihood function of the RBM can be stated as

$$E(v, h) = -\sum\_{i,j} v\_i \mathcal{W}\_{ij} h\_j - \sum\_i d\_i v\_i - \sum\_j c\_j h\_{j\prime} \tag{1}$$

$$P(v,h) = \frac{1}{Z} \exp(-E(v,h)),\tag{2}$$

where *vi* <sup>∈</sup> {0, 1}, *hj* <sup>∈</sup> {0, 1}, *<sup>W</sup>* <sup>=</sup> *Wij* ∈ *<sup>R</sup>N*×*<sup>M</sup>* are the weights connecting the visible layer and hidden layer, *d* and *c* are the bias terms of the visible and hidden layers, and *Z* represents the partition function. Moreover, the probabilities *P*(*v*|*h*) and *P*(*h*|*v*) can be calculated by

$$P(v\_i = 1 | h) = \sigma \left(\sum\_j \mathcal{W}\_{ij} h\_j + d\_i\right),\tag{3}$$

$$P(h\_j = 1 | v) = \sigma \left(\sum\_i \mathcal{W}\_{ij} v\_i + c\_j\right),\tag{4}$$

where *<sup>σ</sup>*(·) is the sigmoid function defined as *<sup>σ</sup>*(*x*) = 1/(<sup>1</sup> + *<sup>e</sup>*−*x*).

**Figure 1.** (**a**) The structure of DBN, which can be approximated as a stack of RBMs. (**b**) The feedforward architecture of the *j*-th hidden neuron. (**c**) The reconstruction architecture of the *j*-th hidden neuron.

The DBN with *λ* hidden layers contains *W*1, *W*2, ... , *W<sup>λ</sup>* connection weight matrices and *λ* + 1 biases *d*0, *d*1, ... , *dλ*, where *d*<sup>0</sup> is the bias of the visible layer *v*. Therefore, the output probability can be calculated by the hidden vector: <sup>Φ</sup> = *<sup>σ</sup>*(*Wλhλ*−<sup>1</sup> + *<sup>d</sup>λ*).

Figure 1b,c represent the *j*-th neuron of the feedforward and reconstruction architecture. The RBM utilizes the stochastic approximation method to update the parameters *θ* = {*W*, *d*, *c*} by maximizing the likelihood *P<sup>θ</sup>* (*v*).
