**Appendix A**

According to the chain rule, *<sup>H</sup><sup>s</sup>* in (31) is:

$$\begin{aligned} \left[ \operatorname{d} \mathbb{1}\_{\mathcal{E}}(\mathbf{X}) / \operatorname{d} \mathbf{x}\_{i} = \left[ \operatorname{d} \mathbb{1}\_{\mathcal{E}}(\mathbf{X}) / \operatorname{d} \mathbf{M}(D\_{l-1} \{ \dots \left( \operatorname{M}(D\_{1}(\mathbf{X})) \right) \dots \dots \} \right] \right] \times \dots \dots \times \left[ \operatorname{d} \mathcal{D}\_{1}(\mathbf{X}) / \operatorname{d} \mathbf{X} \right] \times \left[ \operatorname{d} \mathbf{X} / \operatorname{d} \mathbf{x}\_{i} \right] \\ = \operatorname{w}\_{l} \left[ \operatorname{d} \mathbb{1}\_{\mathcal{E}} \operatorname{M}(D\_{l-1} \{ \dots \left( \operatorname{M}(D\_{1}(\mathbf{X})) \right) \dots \} \right) \times \left( \operatorname{w}\_{l-1} \operatorname{w}\_{i} \right) \end{aligned} \tag{A1}$$

To simplify (A1), the following functions are defined:

$$
\pi \boldsymbol{\sigma}\_i^1 = \boldsymbol{\pi} \boldsymbol{\sigma}^1 \boldsymbol{\sigma}\_i \tag{A2}
$$

$$A\_l(\mathbf{X}) = \text{d}\_{\mathbf{x}}M(D\_l(\dots \dots (M(D\_1(\mathbf{X})))\dots \dots)) \times \pi\_l \tag{A3}$$

$$\begin{array}{llll}\prod^{l2}i\_{\ i=I1}A\_{i}(\mathbf{X}) = A\_{I1} \times \dots \times A\_{I2}, \text{ if } l\_{2} < l\_{1} < l\_{\prime} \\ = A\_{I2} \times \dots \times A\_{I1}, & \text{ if } l\_{1} < l\_{2} < l\_{\prime} \\ = 1, & \text{ if } l\_{2} < l \le l\_{1}, \\ = A\_{I1}, & \text{ if } l\_{1} = l\_{2} \end{array} \tag{A4}$$

where d*xM*(*Dl*( ... (*X*) ... )) in (A3) can be derived from the following matrix formulation:

$$\begin{aligned} \mathrm{d}\_{\mathrm{X}}M(D\_{l}(\dots\ \left(\mathrm{X}\right)\dots\dots)) &= \{ [\mathrm{d}\_{\mathrm{D}\_{l}1}(\dots\ \left(\mathrm{X}\right)\dots\left(\mathrm{X}\right)\dots\dots)], 0, \dots, 0 \}; \\ [\mathrm{0}, \mathrm{d}\_{\mathrm{D}l\_{-2}(\dots\ \left(\mathrm{X}\right)\dots\dots)}M(D\_{l}^{L}(\left(\dots\ \left(\mathrm{X}\right)\dots\dots\right)), \dots, 0]; \\ &\dots \\ [\mathrm{0}, \mathrm{0}, \dots, \mathrm{d}\_{\mathrm{D}l\_{-2}(\left(\dots\ \left(\mathrm{X}\right)\dots\dots\right)}M(D\_{l}^{nl}(\left(\dots\ \left(\mathrm{X}\right)\dots\dots\right)))], \end{aligned} \tag{A5}$$

$$\begin{aligned} D\_{l}(\dots\ \left(\mathrm{X}\right)\dots) &= [D\_{l}^{1}(\left(\dots\ \left(\mathrm{X}\right)\dots\right); \dots; D\_{l}^{nl}(\left(\dots\ \left(\mathrm{X}\right)\dots\dots\right)]] \in \mathbb{R}^{nl\times 1},\\ D\_{l}^{k}(\left(\dots\ \left(\mathrm{X}\right)\dots\right)) &= \mathrm{w}\_{l}^{k}M(D\_{l-1}(\left(\dots\ \left(\mathrm{X}\right)\dots\right))) + b\_{l}^{k} \end{aligned} \tag{A6}$$

Using (A2)~(A4), the *Jacobian* matrix can be simplified to:

$$\left[\partial \Gamma\_{\mathbb{C}} / \partial \mathbf{x}\_{l} = \mathbf{w}\_{l} [\prod^{2} \mathbf{i}\_{l=l=-1} A\_{l}(\mathbf{X})] \times [\mathbf{d}\_{\mathbf{x}} M(D\_{1}(\mathbf{X})) \times \mathbf{w}\_{1} \mathbf{i}] \right] \tag{A7}$$

Similar to the derivation process of the *Jacobian* matrix, the *Hessian* matrix can be obtained by the following formulations:

$$\begin{array}{l} \mathbf{d}\_{\mathbf{x}\mathbf{j}} \mathbf{d}\_{l}(\mathbf{X}) = \mathbf{d}\_{\mathbf{x},\mathbf{x}\mathbf{j}}^{2} \mathbf{M}(D\_{l}(\begin{array}{c} \dots \ (\mathbf{X}) \ \dots \ \mathbf{l})) \times w\_{l} \\ = \{ \ [\mathbf{d}\_{\mathbf{x},\mathbf{x}\mathbf{j}}^{2} \mathbf{M}(D\_{l}^{-1}(\begin{array}{c} \dots \ (\mathbf{X}) \ \dots \ \mathbf{l})), 0, \dots, 0 \end{array} \}; \\ \{ \mathbf{0}, \ \mathbf{d}\_{\mathbf{x},\mathbf{x}\mathbf{j}}^{2} \mathbf{M}(D\_{l}^{-2}(\begin{array}{c} \dots \ (\mathbf{X}) \ \dots \ \mathbf{l})), \dots, 0 \}; \\ \dots \end{array} \tag{A8}$$

$$[0, 0, \dots, \text{ d}^2 \text{ }\_{\text{x}, \text{x}} \text{M}(D\_l^{\text{nl}}(\dots \text{ } (\text{X}) \dots \text{ }))] \models$$

d2 *<sup>x</sup>*,*xjM*(*Dl <sup>k</sup>*(... (*X*) . . . )) = [d2*M*(*Dl <sup>k</sup>*(... (*X*) . . . )) /d(*Dl <sup>k</sup>*(... (*X*) . . . ))2] <sup>×</sup> *<sup>w</sup><sup>l</sup>* <sup>×</sup> ∏<sup>2</sup> *<sup>i</sup>* <sup>=</sup> *<sup>l</sup>* <sup>−</sup> <sup>1</sup>*Ai*(*X*) <sup>×</sup> [d*xM*(*D*1(*X*)) <sup>×</sup> *<sup>w</sup>*<sup>1</sup> *j* ], (A9)

where, *w<sup>l</sup>* = *w*<sup>1</sup> *<sup>i</sup>* if *l* = 1 in (A8) and (A9). And, defined (A10) as follows:

Λ*<sup>k</sup>* = [∏*l*−<sup>1</sup> *<sup>i</sup>* <sup>=</sup> *k +* <sup>1</sup>*Ai*(*X*)] <sup>×</sup> <sup>d</sup>*xjAk*(*X*) <sup>×</sup> [∏*<sup>1</sup> <sup>i</sup>* <sup>=</sup> *k-*<sup>1</sup>*Ai*(*X*)], *if k* <sup>≥</sup> 2, = [∏*l*−<sup>1</sup> *<sup>i</sup>* <sup>=</sup> *k +* <sup>1</sup>*Ai*(*X*)] <sup>×</sup> <sup>d</sup>*xjAk*(*X*), *if k* = 1, = d*xjAk*(*X*) <sup>×</sup> [∏*<sup>1</sup> <sup>i</sup>* <sup>=</sup> *<sup>k</sup>* <sup>−</sup> <sup>1</sup>*Ai*(*X*)], *if k* <sup>=</sup> *<sup>l</sup>* <sup>−</sup> <sup>1</sup> (A10)

Then, the Hessian matrix can be derived as (A11):

$$\left(\partial^2 \Gamma\_c / (\partial \mathbf{x}\_i \partial \mathbf{x}\_j) = \mathbf{w}\_l [\Sigma^{l-1} \mathbf{1}\_{k=1} \Lambda\_k] \right) \tag{A11}$$
