*3.4. J-LASU*

We formulate the new problem by adding the LA term with Equation (7) and the additional TV term. Thus, the problem of the proposed J-LASU algorithm in a convex form becomes

$$\min \; g(\mathbf{X}) = \frac{1}{2} \| \mathbf{A} \mathbf{X} - \mathbf{Y} \|\_{F}^{2} + \lambda \| \mathbf{X} \|\_{2,1} + \gamma \| \mathbf{X} \|\_{TV} + \rho \| \mathbf{X} \|\_{LA\*} \quad \text{s.t.} \quad \mathbf{X} \ge \mathbf{0} \tag{10}$$

where *λ*, *γ*, and *ρ* are the regularization parameter for the collaborative sparsity, TV, and LA term, respectively. We use the anisotropic TV, which is used in SUnSAL-TV [27], defined as

$$\|\mathbf{X}\|\_{TV} = \|\mathbf{D}\mathbf{X}\|\_1 \tag{11}$$

where **D** = [**<sup>D</sup>***h*; **<sup>D</sup>***v*], **D***h* : R*m*×*n* → R*m*×*n* and **D***v* : R*m*×*n* → R*<sup>m</sup>*×*n*, are horizontal and vertical differential operators, respectively. The **D***h***X** computes the differences between the components of **X** and the corresponding right-side adjacent pixels with cyclic boundary assumption, and the same way for **D***v***X**, which corresponds to the differences with the up-side adjacent pixels [27].

We estimate the abundance matrix **X** by solving problem in Equation (10) by using ADMM. The cost function in Equation (10) written in ADMM form becomes

$$f\_1(\mathbf{X}) + f\_2(\mathbf{V}) \quad \text{s.t.} \quad \mathbf{V} = \mathbf{G}\mathbf{X} \tag{12}$$

where

$$f\_1(\mathbf{X}) = \frac{1}{2} \|\mathbf{A}\mathbf{X} - \mathbf{Y}\|\_F^2 \tag{13}$$

$$f\_2(\mathbf{V}) = \lambda \|\mathbf{V}\_1\|\_{2,1} + \gamma \|\mathbf{V}\_2\|\_1 + \rho \|\mathbf{V}\_3\|\_{LA} + \iota\_{\mathbb{R}+}(\mathbf{V}\_4) \tag{14}$$

$$\mathbf{V} = \begin{bmatrix} \mathbf{V}\_1 \\ \mathbf{V}\_2 \\ \mathbf{V}\_3 \\ \mathbf{V}\_4 \end{bmatrix}, \text{ and } \mathbf{G} = \begin{bmatrix} \mathbf{I} \\ \mathbf{D} \\ \mathbf{I} \\ \mathbf{I} \end{bmatrix}. \tag{15}$$

Here, the *<sup>ι</sup>R*+ term projects the solution onto the nonnegative orthant (*<sup>ι</sup>R*+(*x*) = 0 if *x* ≥ 0 and *<sup>ι</sup>R*+(*x*)=+<sup>∞</sup> otherwise), and **I** is an identity matrix with a proportional size. The constraint in Equation (12) satisfies the relations

$$\mathbf{V}\_1 = \mathbf{X}; \quad \mathbf{V}\_2 = \mathbf{D} \mathbf{X}; \quad \mathbf{V}\_3 = \mathbf{X}; \quad \mathbf{V}\_4 = \mathbf{X}. \tag{16}$$

*Remote Sens.* **2017**, *9*, 1224

Using a positive constant *μ* and the Lagrange multipliers **<sup>B</sup>**/*μ* corresponding to the constraint **V** = **GX**, the cost function is minimized using ADMM. Then, the steps for the proposed algorithm are as follows

$$\mathbf{X}^{(k+1)} = \arg\min\_{\mathbf{X}} \left. f\_1(\mathbf{X}) + \frac{\mu}{2} \| \mathbf{G} \mathbf{X} - \mathbf{V}^{(k)} - \mathbf{B}^{(k)} \|\_{F}^{2} \tag{17}$$

$$\mathbf{V}^{(k+1)} = \arg\min\_{\mathbf{V}} \ f\_2(\mathbf{V}) + \frac{\mu}{2} \| \mathbf{G} \mathbf{X}^{(k)} - \mathbf{V} - \mathbf{B}^{(k)} \|\_{2}^{2} \tag{18}$$

$$\mathbf{B}^{(k+1)} = \mathbf{B}^{(k)} - (\mathbf{G}\mathbf{X}^{(k+1)} - \mathbf{V}^{(k+1)}).\tag{19}$$

To find the solution for **X** of the augmented Lagrangian formula, we calculate the solution of Equation (17) by taking the partial derivative as follows:

$$\begin{split} \mathbf{X}^{(k+1)} &= \arg\min\_{\mathbf{X}} \frac{1}{2} \| \mathbf{A} \mathbf{X} - \mathbf{Y} \|\_{F}^{2} + \frac{\mu}{2} \| \mathbf{G} \mathbf{X} - \mathbf{V}^{(k)} - \mathbf{B}^{(k)} \|\_{F}^{2} \\ &= \left( \mathbf{A}^{T} \mathbf{A} + \mu \mathbf{G}^{T} \mathbf{G} \right)^{-1} \left( \mathbf{A}^{T} \mathbf{Y} + \mu \mathbf{G}^{T} (\mathbf{V}^{(k)} + \mathbf{B}^{(k)}) \right) \end{split} \tag{20}$$

The detailed steps for computing the values of variables **V**1, **V**2, **V**3, and **V**4 for each iteration are written in general form of the proximal operator (General form of the proximal operator is as follows: prox*γ<sup>h</sup>*(**x**¯) = arg min **v**∈R*<sup>N</sup> h*(**v**) + 12*γ* **v** − **x**¯22 ) [40,41].

**V**4

$$\begin{array}{ll} \mathbf{V}\_{1}^{(k+1)} = & \operatorname{prox}\_{\frac{k}{\mu}||\cdot||\_{2,1}}^{\frac{1}{k}}(\mathbf{R}\_{1}) \\\\ &= & \arg\min\_{\mathbf{V}\_{1}} \lambda ||\mathbf{V}\_{1}||\_{2,1} + \frac{\mu}{2} ||\mathbf{V}\_{1} - \mathbf{R}\_{1}||\_{2}^{2} \\\\ \mathbf{V}\_{2}^{(k+1)} = & \operatorname{prox}\_{\frac{\mu}{\mu}||\cdot||\_{1}}^{\frac{1}{k}}(\mathbf{R}\_{2}) \\\\ &= & \arg\min\_{\mathbf{V}\_{2}} \gamma ||\mathbf{V}\_{2}||\_{1} + \frac{\mu}{2} ||\mathbf{V}\_{2} - \mathbf{R}\_{2}||\_{2}^{2} \end{array} \tag{22}$$

$$\begin{split} \mathbf{V}\_{3}^{(k+1)} &= \quad \text{prox}\_{\frac{\mu}{\mu} \parallel \cdots \parallel L\_{4} \text{\*}} (\mathbf{R}\_{3}) \\ &= \quad \text{arg}\min\_{\mathbf{V}\_{3}} \rho \| \mathbf{V}\_{3} \|\_{L A \*} + \frac{\mu}{2} \| \mathbf{V}\_{3} - \mathbf{R}\_{3} \|\_{2}^{2} \\\\ \mathbf{V}\_{4}^{(k+1)} &= \quad \text{prox}\_{\frac{1}{\mu} \left( \iota\_{\mathbb{R}\_{4}} \right)} (\mathbf{R}\_{4}) \\ &= \quad \text{arg}\min\_{\mathbf{V}\_{4}} \iota\_{\mathbb{R}\_{4} +} (\mathbf{V}\_{4}) + \frac{\mu}{2} \| \mathbf{V}\_{4} - \mathbf{R}\_{4} \|\_{2}^{2} \end{split} \tag{24}$$

where **R**1 = **X**(*k*) − **B**(*k*) 1 , **R**2 = **DX**(*k*) − **B**(*k*) 2 , **R**3 = **X**(*k*) − **B**(*k*) 3 , and **R**4 = **X**(*k*) − **B**(*k*) 4 , and **B** = [**<sup>B</sup>**1;**B**2;**B**3;**B**4].

For **V**(*k*+<sup>1</sup>) 1 , since the *<sup>L</sup>*2,1 norm is not differentiable, the solution is obtained by the shrinkage for the group lasso as follows:

$$w\_{1(i)}^{(k+1)} = \begin{cases} \ r\_{1(i)} - \frac{\lambda}{\mu} \frac{r\_{1(i)}}{\|r\_{1(i)}\|\_2} & \text{if } \|r\_{1(i)}\|\_2 > \frac{\lambda}{\mu} \\ 0 & \text{otherwise} \end{cases} \tag{25}$$

where *v*(*k*+<sup>1</sup>) 1(*i*) and *<sup>r</sup>*1(*i*) denote the *i*-row of **V**(*k*+<sup>1</sup>) 1 and **R**1, respectively. This operation is denoted as *group* − *lasso*(·, *<sup>τ</sup>*), where *τ* is the threshold.

The TV term in Equation (22) is solved by soft-thresholding on each element of **V**(*k*+<sup>1</sup>) 2.

$$\begin{array}{rcl} r\_{2(i,j)}^{(k+1)} = \begin{cases} r\_{2(i,j)} - \frac{\gamma}{\mu} & \text{if } r\_{2(i,j)} > \frac{\gamma}{\mu} \\\ r\_{2(i,j)} + \frac{\gamma}{\mu} & \text{if } r\_{2(i,j)} < -\frac{\gamma}{\mu} \\\ 0 & \text{if } -\frac{\gamma}{\mu} \le r\_{2(i,j)} \le \frac{\gamma}{\mu} \end{array} \tag{26}$$

where *v*(*k*+<sup>1</sup>) <sup>2</sup>(*<sup>i</sup>*,*j*) and *<sup>r</sup>*2(*<sup>i</sup>*,*j*) denote the (*i*, *j*)-element of **V**(*k*+<sup>1</sup>) 2 and **R**2, respectively. This operation is denoted as *so f <sup>t</sup>*(·, *<sup>τ</sup>*), where *τ* is the threshold.

The solution of **V**(*k*+<sup>1</sup>) 3 in Equation (23) is acquired by constructing the LA matrices, applying singular value shrinkage to each matrix, and reconstructing the output abundance matrix, which is denoted as

$$\mathbf{V}\_{\mathfrak{Z}}^{(k+1)} = \operatorname{sbr}(\mathfrak{X}^{(k)} - \mathbf{B}\_{\mathfrak{Z}}^{(k)}, \frac{\rho}{\mu}) \tag{27}$$

where *shr*(·, *τ*) denotes the singular value shrinkage (*y* → diag(*max*{*SVD*(*y*) − *τ*, 0})) of the LA matrices **Hxˆ** *b* , where the singular value decomposition *SVD*(·) produces a vector containing the singular values in decreasing order and *τ* is the threshold.

Let *v*(*k*+<sup>1</sup>) <sup>4</sup>(*<sup>i</sup>*,*j*) denotes the (*i*, *j*)-element of **V**(*k*+<sup>1</sup>) 4 , finally, the solution of **V**(*k*+<sup>1</sup>) 4 is obtained by

$$w\_{4(i,j)}^{(k+1)} = \max(r\_{4(i,j)}, 0) \tag{28}$$

where *<sup>r</sup>*4(*<sup>i</sup>*,*j*) denotes the (*i*, *j*)-element of **R**4.

> The whole procedure of ADMM is summarized in Algorithm 1.


#### **4. Experiment and Analysis**

We tested the proposed algorithm on several simulated data sets for three signal-to-noise ratio (SNR) levels, i.e., 10, 20, and 30 dB, and two real data sets. We evaluated the results by conducting a fair comparison with the CLSUnSAL [28] and SunSAL-TV [27]. State-of-the-art low-rank algorithm is also compared, which is sparse and low-rank unmixing by using ADMM (ADSpLRU) [34].

#### *4.1. Simulated Data Sets*

To simulate the condition of hyperspectral data with and without the presence of pure pixels, we used two types of data distribution for data generation. Both use the same library generated from 240 types of minerals selected randomly from the splib06 USGS library [42], which consists of 224 spectral bands ranging between 0.4–2.5 μm. The mutual coherence among the spectral signatures is very close to one, but we set the SA to be larger than 4.4 to make the sparse regression problem easier.

The first data set, *DS*, is a representation of the data with pure pixels and adopted from that of Iordache et al. [27] consisting of 224 bands for 75 × 75 pixels. The data generation follows the LMM with the abundance sum-to-one constraint imposed on each pixel. Five spectral signatures are randomly selected from the library as the endmembers and distributed spatially in the form of distinct square regions. In some pixels, the endmembers stay pure and in others they are mixed with two until there are five endmembers. In Figure 4, the red squares in each abundance map represent 100% intensity which means the pure pixel regions of each endmember. The background consists of mixed pixels with randomly fixed fractional abundance values of 0.1149, 0.0741, 0.2003, 0.2055, and 0.4051 for the five endmembers.

**Figure 4.** True abundance matrix of simulated data set 1 (*DS*). (**a**) Endmember 1; (**b**) Endmember 2; (**c**) Endmember 3; (**d**) Endmember 4; (**e**) Endmember 5.

To demonstrate the proposed algorithm under the condition without the presence of pure pixels, the distribution with a distinct spatial pattern and mixture was selected. We used the fractal database (*FR*) [39] consisting of five data sets, namely *FR1, FR2, FR3, FR4*, and *FR5*. Each is composed of 100 × 100 pixels with 224 spectral bands for each pixel and contains no completely pure pixels that are close to the ground-truth characteristic in which completely pure pixels are rarely found. The distribution is generated such that pixels near the edges of regions are more highly mixed than those in the center of the regions. These center pixels have a purity index between 0.95–0.99, directly proportional to the broadness of the regions. In this experiment, we set the number of endmembers to 9. Figure 5 shows *FR1, FR2, FR3, FR4*, and *FR5* represented in pseudocolor.

**Figure 5.** Fractal data sets represented in pseudocolor. (**a**) *FR1*; (**b**) *FR2*; (**c**) *FR3*; (**d**) *FR4*; (**e**) *FR5*.

#### *4.2. Real Data Sets*

For the real-data experiment, we used two real data from different sensors. The first hyperspectral scene is the widely used data set of Cuprite mining district, Nevada in 1997 [43]. We used a subscene with the size of 150 × 130 pixels whose area is shown in Figure 6a. The data are composed of 224 spectral bands with 3.7 m spatial resolution from the AVIRIS sensor. Prior to analysis, several bands were removed due to the low SNR; thus, remaining 188 bands. In this experiment, we used the USGS library of 498 spectral signatures as the standard spectral library for the data, with the corresponding bands removed. Figure 6b shows the USGS mineral distribution map of the Cuprite area [44]. From the figure, the area of interest contains at least three types of minerals: *alunite, chalcedony*, and *kaolinite*. The mineral map was produced using Tricorder 3.3 software in 1995, while the AVIRIS Cuprite data were collected in 1997. Hence, in our experiment, the mineral map was used only for visual qualitative evaluation, compared with the abundance maps of different sparse unmixing algorithms.

**Figure 6.** (**a**) Cuprite data generated in pseudocolor. Black rectangle shows area of our experiment; (**b**) USGS mineral distribution map of Cuprite mining district in Nevada [44].

The second hyperspectral scene is Urban data captured by the HYDICE sensor over an area located at Copperas Cove near Fort Hood, TX, U.S., in October 1995. It consists of 307 × 307 pixels with 2 m of the pixel resolution. The wavelengths range from 0.4 to 2.5 μm divided into 210 spectral bands. After some bands with low SNRs due to dense water vapor and atmospheric effects are discarded, it remains 162 bands. We used a subscene with the size of 100 × 100 pixels. Figure 7a shows the subscene used in the experiment. The ground truth of the Urban data set is not available, however, we used the reference abundance maps obtained from [45]. The maps are achieved via the method provided in [46–48] and consist of four endmembers, i.e., *asphalt, grass, tree,* and *roof*. Figure 7b shows the spectral signatures of the four endmembers.

**Figure 7.** (**a**) A subscene of Urban data used in our experiment, generated in pseudocolor; (**b**) Spectral signatures of the endmembers[48–50], *x*-axis and *y*-axis represent the band number and reflectance unit (0–1), respectively.

#### *4.3. Parameters Setting and Evaluation Metrics*

In the simulated-data experiment, to build spectral library **A**, the spectral signatures in the USGS spectral library were selected and sorted such that the SAs between the spectral signatures were not less than 4.4 degrees in increasing order. The parameter settings of J-LASU are for the collaborative sparsity (*λ*), TV (*γ*), and LA nuclear norm (*ρ*) regularizer. For the compared algorithms, *λSP* is the sparsity term for CLSUnSAL, SUnSAL-TV, and ADSpLRU [34]. For SUnSAL-TV, the TV term is controlled by *λTV*. The low-rank regularizer parameter is denoted as *λLR* for ADSpLRU. These parameters are adjusted for every data set under different SNR levels. However, we used the same parameter settings for the five fractal data sets since the characteristics of the scenes tend to be similar. Table 1 summarizes these settings. The values of *λLR* were the optimal ones after the experiment for some recommended values. For the LA regularization, the block size is another parameter to be set to control the coverage of adjacent pixels in the spatial and the endmember directions. After several trials in this experiment, the optimum size was found to be [5 5 5] with no overlap.


**Table 1.** Parameter Settings.

We evaluated the performance of the algorithms using root mean square error (RMSE) [39,51] and signal-to-reconstruction error (SRE) [26]. The RMSE measures the error between the original and reconstructed abundance matrices. The lower the RMSE, the more accurate the estimation is. The RMSE formula for the *i*-th endmember is defined as

$$\text{RMSE}\_{i} = \sqrt{\frac{1}{n} \sum\_{h=1}^{n} (\mathbf{X}\_{i,h} - \mathbf{X}\_{i,h})^2},\tag{29}$$

where *n*, **X** and **X** ¯ represent the number of pixels, true abundance matrices, and estimated abundance matrices, respectively. Then, we compute the mean value of all endmembers' RMSEs.

The SRE represents the ratio between the reconstructed abundance matrix and error, and is defined as

$$\text{SRE} = 10 \log\_{10} \left( ||\mathfrak{X}||\_F^2 / ||\mathfrak{X} - \mathfrak{X}||\_2^F \right). \tag{30}$$

For the simulated data, the original abundance matrix was generated for each data set. We compared the visual appearance among the maps of the estimated abundance matrix in addition to RMSE and SRE comparison. As for the first real data set, Cuprite, the comparison was among the estimated abundance maps of the sparse unmixing algorithms and the mineral map of each expected endmembers. For the second real data set, Urban, RMSE and SRE of each method are calculated with the ground truth abundance maps as the reference value.
