*2.1. NMF*

Consider the hyperspectral image data *X* = [*<sup>x</sup>*1, *<sup>x</sup>*2 , ··· , *xN* ], where *X* ∈ *RL*×*<sup>N</sup>* and *N* is the number of pixels. In the linear mixing model, the hyperspectral data *X* could be represented as:

$$X = \mathcal{W}H + E\tag{1}$$

where *W* = [**<sup>w</sup>**1, **w**2, ... **<sup>w</sup>***P*] ∈ *RL*×*<sup>P</sup>* denotes the endmember matrix, *H* ∈ *RP*×*<sup>N</sup>* denotes the abundances of respective endmembers, and *E* is a residual term. The NMF algorithm is designed to find an approximate factorization of *X*, such that *X* ≈ *WH*, where *W* ≥ 0 and *H* ≥ 0. To quantify the quality of the approximate factorization, the Euclidean distance is commonly used to measure the distance between *X* and *WH*. The loss function of NMF based on the Euclidean distance is defined as follows:

$$f(\mathcal{W}, H) = \frac{1}{2} \|X - \mathcal{W}H\|\|\_F^2 \tag{2}$$

where || · ||*F* is the Frobenius norm. The problem of NMF is globally nonconvex. The problem is convex for one of the two blocks of variables only when the other is fixed. Estimating the values of *W* or *H* is a convex optimization problem when the other is fixed. A multiplication update rule for standard NMF algorithm is presented in [18] to locally minimize the cost function in (2)

$$\mathcal{W} = \mathcal{W}.\*(\mathcal{X}H^T)./\mathcal{W}HH^T\tag{3}$$

$$H = H.\*(\mathcal{W}^T X)./\mathcal{W}^T W H\tag{4}$$

where .\* and ./ denote element-wise multiplication and division, respectively.

#### *2.2. NMF with Sparseness Constraints*
