*2.2. MASR*

Compared with pixelwise SRC model, the JSRC can achieve more accurate classification results because of incorporating spatial information of local regions. However, the region size (or the *region scale*) has grea<sup>t</sup> influence on the classification performance. It is of grea<sup>t</sup> importance to determine an optimal region scale for the JSRC.

Then Fang et al. proposed the MASR to release the difficulty of choosing region scale. The MASR effectively exploits spatial information at multiple scales via an adaptive sparse strategy. Not only does the adaptive sparse strategy restrict pixels from different scales to be represented by training atoms from a particular class but also allow the selected atoms for these pixels to be varied, thus providing an improved representation. Given one test pixel **y**1 in HSI, its *T* neighboring regions are selected via different predefined scales. Neighboring regions are defined by multiscale patches centered with test pixel. Then a multiscale matrix **Y**mp = [**<sup>Y</sup>**1, ··· ,**Y***t*, ··· ,**Y***T*] can be constructed by pixels within the selected regions, where the **Y***t* includes pixels from the *t*th scale region. Since spatial structures and characteristics for different scales of regions are distinct, the generated multiscale matrix **Y**mp for the test pixel **y**1 should provide complementary ye<sup>t</sup> correlated information, which can be utilized to classify **y**1 more accurately.

In MASR, an adaptive sparse strategy is adopted to utilize the correlated information among multiscales and achieve a flexible selection process for atoms. An important part of the adaptive strategy is the adoption of a collection of adaptive sets. Each adaptive set is denoted as the indexes of a set of nonzero scalar coefficients, which belong to the same class in the multiscale sparse matrix **A**mp. By combining the adaptive set with the row,0 norm, a new adaptive norm adaptive,0 is created on **A**mp, which can be used to select a small number of adaptive sets from **A**mp. Then, **A**mp matrix can be recovered by applying the adaptive norm as follows:

$$\begin{aligned} \mathbf{\dot{A}^{mp}} &= \arg\min\_{\mathbf{A^{mp}}} ||\mathbf{Y^{mp}} - \mathbf{D} \mathbf{A^{mp}}||\_F \\ \text{subject to } ||\mathbf{A^{mp}}||\_{\text{adaptive},0} &\leqslant K \end{aligned} \tag{4}$$

After recovering the multiscale sparse representation matrix **A**ˆ mp, a single decision can be made on the test pixel **y**1 based on the lowest total representation error:

$$\mathcal{E} = \arg\min\_{\mathbf{c}} ||\mathbf{Y}^{\text{mp}} - \mathbf{D}\_{\mathbf{c}} \mathbf{\hat{A}}\_{\mathbf{c}}^{\text{mp}}||\_{F}, \mathbf{c} = \mathbf{1}, \dots, \mathbf{C} \tag{5}$$

where **A**ˆ mp *c*represents rows in **A**ˆ mp corresponding to the *c*th class.

#### **3. Multiscale Union Regions Adaptive Sparse Representation**

The aforementioned MASR shows good performance for HSI classification. But the MASR utilizes multiscale patches to exploit spatial information. In a patch, maybe most of the pixels are different from the test pixel, such as a pixel on the edge of a building. The classification may be misled by those noise pixels from other classes which are similar to the atoms in the dictionary, thus providing an incorrect classification for test pixel. In computer vision, superpixels have been studied to provide an efficient representation, which can facilitate visual recognition [35–37]. Each superpixel is a perceptually meaningful region, whose shape and size can be adaptively changed according to different spatial structures. But how to find an optimal scale for superpixels is still a challenge. Without optimal scale, some mixed superpixels will be generated. Based on the fact that patch and superpixel may include pixels from different classes, a multiscale union regions adaptive sparse representation model is proposed to decrease the influence of noise pixels for the test pixel. The union region is the overlap of the patch and corresponding superpixel with the same scale (see Figure 2). For a test pixel, if the patch includes some noise pixels, the superpixel can provide more similar pixels to reduce the impact of noise pixels. In the same way, if the test pixel is located in the wrong superpixel which has seldom pixels similar to test pixel, the patch can provide more similar pixels to enhance the right representation.

**Figure 2.** Three kinds of spatial regions: (**a**) fixed-size patch; (**b**) adaptive size superpixel; and (**c**) union of patch and superpixel. The blue pixel represents test pixel, orange pixels are neighbors defined by patch, green pixels are neighbors defined by superpixel and red pixels are overlap of neighbors defined by patch and superpixel.

#### *3.1. Generation of Multiscale Union Regions*

Before generating multiscale union regions, we should ge<sup>t</sup> multiscale superpixels. There are various researches focusing on the segmentation [36–39]. In this paper, an oversegmentation algorithm called ERS [37] is applied to generate 2-D superpixel maps on the base images because of its high efficiency. Unlike the single-band gray or three-band color image, the HSI usually has hundreds of spectral bands. To improve the computational efficiency, PCA [40] is first used to reduce the spectral bands of the HSI. Since the important information of the HSI exists in the principle components (e.g., first three principle components), they are used as the base images. In this paper, only the first principle component is chosen as the base image. Instead of choosing scales for superpixels empirically, we calculate scales of superpixels based on corresponding patch sizes. Assuming that **PS***t* refers to the

patch size of *t*th scale and **N***total* is the total number of pixels in the image (note that origin image will be extended for edge pixels), the superpixels number **n***t* for *t*th segmentation is calculated as:

$$\mathbf{n}\_t = \mathbf{N}\_{total} / \mathbf{PS}\_t \tag{6}$$

In this way, the average size of superpixels is equal to patch size. Then most superpixels will have similar sizes with patches. It guarantees that superpixel and patch can have similar influence on union region. What's more, with the increasing of patch size, the superpixels number decreases fast. Thus, only limited number of segmentations can be generated. According to the performance of limited number of segmentations, it will be easier for users to determine the scales number. After segmentations, *T* superpixels are generated for each test pixel **y**1 and these superpixels construct the corresponding multiscale matrix **Y**ms = [**<sup>Y</sup>**1, ··· ,**Y***t*, ··· ,**Y***T*], where the **Y***t* includes pixels from the *t*th superpixel. Then for a specific *t*th scale, the union region **Y**mu *t* is defined as following:

$$\mathbf{Y}\_t^{\mathrm{mu}} = \mathbf{Y}\_t^{\mathrm{ms}} \cup \mathbf{Y}\_t^{\mathrm{mp}} \tag{7}$$

#### *3.2. Multiscale Union Regions Adaptive Sparse Representation*

For a test pixel **y**1, the corresponding multiscale matrix is **Y**mu = [**<sup>Y</sup>**1, ··· ,**Y***t*, ··· ,**Y***T*], where **Y***t* is the union of **Y**mp *t* and **Y**ms *t* . Then the sparse coefficients matrix **A**mu can be recovered by solving following problem:

$$\begin{aligned} \mathbf{\dot{A}^{\text{mu}}} &= \arg\min\_{\mathbf{A}^{\text{mu}}} \|\mathbf{Y}^{\text{mu}} - \mathbf{D} \mathbf{A}^{\text{mu}}\|\_{F} \\ \text{subject to } & \|\mathbf{A}^{\text{mu}}\|\_{\text{adaptive},0} \leqslant K \end{aligned} \tag{8}$$

To solve this problem, the method used in MASR is applied. At each iteration, the current residual correlation matrix is calculated firstly. Then a a new adaptive set based on the current residual correlation matrix will be selected. Once the selecting of the new adaptive set is finished, the new adaptive set will be merged with previously selected adaptive sets. Then the sparse coefficients matrix is estimated based on the merged adaptive sets. Finally, the residue is updated. The iterations will stop if the termination criterion is satisfied. After the multiscale sparse representation matrix **A**ˆ mu is recovered, the final label of the test pixel **y**1can be determined by minimal total representation error:

$$\mathcal{E} = \arg\min\_{\mathcal{C}} ||\mathbf{Y}^{\text{mu}} - \mathbf{D}\_{\mathcal{C}}\mathbf{\hat{A}}\_{\mathcal{C}}^{\text{mu}}||\_{F}, \mathcal{E} = 1, \dots, \mathcal{C} \tag{9}$$

#### *3.3. Probability Majority Voting*

Because multiscale union regions adaptive sparse representation is a pixel-based classifier, there will be some pepper salt noise pixels in ground truth objects. Therefore, a majority voting process will be helpful to optimize the classification result. As mentioned above, for each test pixel in each scale, a union region will be generated. Then for the union region, the probabilities belonging to all classes are calculated. If a union region at *i*th scale contains **N***total i* labeled pixels and **N***ji* pixels classified to *j*th class, the probability belonging to *j*th class **P***ji*is calculated as:

$$\mathbf{P}\_{i}^{l} = \mathbf{N}\_{i}^{l} / \mathbf{N}\_{i}^{total} \tag{10}$$

Assuming that there are **k** classes, *T* scales of segmentation maps, the class label of the test pixel ˆ *j* can be obtained by:

$$\mathbf{j}' = \arg\max\_{j} (\sum\_{i=1}^{T} \mathbf{P}\_{i}^{j}), j = 1, \dots, \mathbf{k} \tag{11}$$

#### **4. Experimental Results and Discussion**

## *4.1. Data Sets*

To verify the effectiveness of the proposed MURASR method and superiority of the union region, experiments are conducted on the following three hyperspectral data sets: the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Indian Pines data, the AVIRIS Salinas data, and the Reflective Optics System Imaging Spectrometer (ROSIS-03) University of Pavia data. The AVIRIS Indian Pines image has 220 data channels with the size of 145 × 145 across the spectral range from 0.2 to 2.4 μm. It was captured over the agricultural Indian Pine test site in northwestern Indiana with a spatial resolution of 20 m per pixel. Before classification, 20 water absorption bands (No. 104–108, 150–163 and 220) were discarded [41]. Figure 3a,b show the color composite of the Indian Pines image and the corresponding reference data with 16 reference classes from different types of crops.

**Figure 3.** Indian Pines image: (**a**) three-band color composite image; (**b**) reference image.

The Salinas image was also acquired by the AVIRIS sensor over Salinas Valley, California. The image is of size 512 × 217 × 224 with a spatial resolution of 3.7 m per pixel. Similar to the Indian Pines image, 20 water absorption spectral bands (No. 108–112, 154–167 and 224) were removed and 16 different reference classes are considered for this image. Figure 4a,b show the color composite of the Salinas image and the corresponding reference data.

**Figure 4.** Salinas image: (**a**) three-band color composite image; (**b**) reference image.

The University of Pavia image, which captures an urban area surrounding the University of Pavia, Italy, was recorded by the ROSIS-03 sensor. The image is of size 610 × 340 × 115 with a spatial resolution of 1.3 m per pixel and a spectral coverage ranging from 0.43 to 0.86 μm. The 12 very noisy channels were discarded before the experiments, and nine information classes are considered for this image. Figure 5a,b show the color composite of the University of Pavia image and the corresponding reference data.

**Figure 5.** University of Pavia image: (**a**) three-band color composite image; (**b**) reference image.

#### *4.2. Comparison of Experiment Results*

In the experiments, all related algorithms are based on sparse representation. Except for published algorithms SRC, JSRC and MASR, JUSRC (Joint Union Sparse Representation Classification), MJSRC (Multiscale Joint Sparse Representation Classification), MJUSRC (Multiscale Joint Union Sparse Representation Classification), MURASR\* and MURASR were conducted in the experiments. To verify the priority of union region further, the patch used in JSRC was replaced by JUSRC with the union region. For demonstrating the superiority of multiscale adaptive strategy, we extended the JSRC and JUSRC with a simple multiscale scheme that applied the majority voting to the results of all scales for the final decision-making. The extended algorithms are called MJSRC and MJUSRC. What's more, the MURASR\* is the MURASR without probability majority voting process. The comparison between MURASR\* and MURASR can show the difference of whether the probability majority voting method was used or not. The parameters for the SRC, JSRC, and JUSRC algorithms were tuned to reach the best results in these experiments. For all multiscale algorithms, seven different scales were simultaneously adopted, and the selected region scales were as follows: 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15. Then superpixels numbers for segmentation were calculated with Equation (6) and listed in Table 1. Other parameters in MJSRC, MJUSRC, MASR, MURASR\*, and MURASR were the same as [28]. To evaluate the performance of classifiers, three objective metrics (overall accuracy (OA), average accuracy (AA) and kappa coefficient) are adopted. In addition, the McNemar's test is applied to analyse the experiment results. The McNemar's test is based on the standardized normal test statistic, as described in [42]:

$$\mathbf{Z} = \frac{h\_{12} - h\_{21}}{\sqrt{h\_{12} + h\_{21}}} \tag{12}$$

where *h*12 represents the samples correctly classified by method 1 but incorrectly classified by method 2. If |**Z**| > 1.96, the accuracy between two methods can be considered statistically significant. The sign of the **Z** indicates which method is better. If **Z** > 0, the method 1 is more accurate than method 2.

The Indian Pines data set was classified firstly. 10% of the labeled pixels were randomly sampled for training from each class, while the rest 90% were used to test the classifiers (see Table 2). The classification maps generated by different classifiers on the Indian Pines image are shown in Figure 6. The details of the classification results averaged by ten runs with randomly sampled training samples are tabulated in Table 3. The results of the McNemar's tests between classifiers are listed in Table 4. It is easy to find that JUSRC, MJUSRC and MURASR\* perform better than JSRC, MJSRC and MASR, which demonstrates the priority of union region over patch region. In addition, the multiscale majority voting based MJSRC and MJUSRC perform worse than the multiscale adaptive strategy based MASR and MURASR\* for this image. Compared with MJSRC and MJUSRC, accuracy improvements of MASR and MURASR\* are more than 3%. MURASR gets a better result than MURASR\* in accuracy and classification map. As can be observed from the classification maps of MURASR\* and MURASR, many misclassifications in MURASR\* can be eliminated efficiently by probability majority voting method. What's more, MURASR performs best among all algorithms in terms of OA and AA, and the results of the McNemar's test are statistically significant and coherent with the obtained overall accuracies.

**Table 1.** Number of Superpixels in Each Scale.



**Table 2.** Sixteen reference classes in the Indian Pines image.

The second experiment was performed on the Salinas data set. To compare the classification with MASR, only 1% of the labeled pixels for each class were randomly selected for training. Then the remaining 99% labeled data were classified with the classifiers to demonstrate the superiority of the proposed MURASR (see Table 5). The classification maps for various classifiers are illustrated in Figure 7 and the average quantitative results of ten runs are tabulated in Table 6. Moreover, the results of the McNemar's tests are shown in Table 7. As can be observed, union region based algorithms JUSRC, MJUSRC and MURASR\* still ge<sup>t</sup> more accurate results than patch region based JSRC, MJSRC and MASR in terms of OA, AA and Kappa coefficients. The classification maps of MJSRC and MJUSRC have more pepper salt noise pixels than MASR and MURASR\*. Comparing classification maps of MURASR\* and MURASR, we can find that most misclassifications generated by MURASR\* can be corrected by probability majority voting method. In addition, the average accuracy of MURASR is 99.70% which is very high for classification. Moreover, it should be noted that the McNemar's tests between classifiers are also statistically significant and coherent with the obtained overall accuracies.

The final experiment was conducted on the University of Pavia image. The shapes of surface objects in this image are more complex than previous two images. For each reference class, 200 train samples were randomly selected from the labeled data and the remaining pixels were used for testing the performance of various classifiers (see Table 8). The classification maps are demonstrated in Figure 8 and the detail data averaged by ten runs in term of OA, AA, and Kappa coefficients is listed in Table 9. The McNemar's tests between classifiers also were conducted on this image and the results are tabulated in Table 10. Same as previously mentioned two images, union region based classifiers also performed better than patch region based classifiers. Multiscale adaptive strategy still works better than multiscale majority voting strategy in this image. The accuracy improvement gained by probability majority voting is less than previous two images because the University of Pavia image has less large homogenous regions. And from Table 9, we can find that MASR only has more accurate result than MURASR with one class and MURASR performs best among all classifiers with 7 classes, which proves the priority of MURASR further. The results of the McNemar's tests also provide enough support for the analysis.

Compared with many presented algorithms, MASR is a time-consuming algorithm. In this paper, the proposed MURASR is designed based on the multiscale adaptive representation in MASR. Also, the generation of union regions will consume some time. Moreover, the union region has more pixels than patch region. Therefore, the MURASR is also a time-consuming algorithm and the time cost of MURASR is about twice as much as MASR. But the proposed MURASR was coded in MATLAB (R2016a, Mathworks, Portola Valley, CA, USA) and was not optimized for speed. The MURASR can be significantly sped up by changing the compiling code from MATLAB to C++ and adopting a general-purpose graphics processing unit (GPU).

**Figure 6.** Classification maps for the Indian Pines image by different algorithms: (**a**) SRC-Pixel-Wise; (**b**) JSRC; (**c**) JUSRC; (**d**) MJSRC; (**e**) MJUSRC; (**f**) MASR; (**g**) MURASR\*; and (**h**) MURASR.


**Table 3.** Classification accuracy (averaged on ten runs with randomly sampled training samples) of the Indian Pines image. The best results are highlighted in bold typeface.

**Table 4.** The McNemar's tests between classifiers (averaged on ten runs with randomly sampled training samples) of the Indian Pines image.


**Table 5.** Sixteen reference classes in the Salinas image.



**Table 6.** Classification accuracy (averaged on ten runs with randomly sampled training samples) of the Salinas image. The best results are highlighted in bold typeface.

**Table 7.** The McNemar's tests between classifiers (averaged on ten runs with randomly sampled training samples) of the Salinas image.


**Table 8.** Nine reference classes in the University of Pavia image.




**Figure 7.** Classification maps for the Salinas image by different algorithms: (**a**) SRC-Pixel-Wise; (**b**) JSRC; (**c**) JUSRC; (**d**) MJSRC; (**e**) MJUSRC; (**f**) MASR; (**g**) MURASR\*; and (**h**) MURASR.


**Table 10.** The McNemar's tests between classifiers (averaged on ten runs with randomly sampled training samples) of the University of Pavia image.

**Figure 8.** Classification maps for the University of Pavia image by different algorithms: (**a**) SRC-Pixel-Wise; (**b**) JSRC; (**c**) JUSRC; (**d**) MJSRC; (**e**) MJUSRC; (**f**) MASR; (**g**) MURASR\*; and (**h**) MURASR.

#### *4.3. Effects of Region Scales*

Except for SRC-Pixel-wise, other related algorithms can be affected by different number of scales. In the previously mentioned experiments, 7 scales have been chosen to compare the performance of all algorithms. The effect of region scales for JSRC, MJSRC, and MASR has been presented in [28].

**Figure 9.** Effect of the region scales on single scale algorithms JSRC, JUSRC and the multiscale algorithms MJSRC, MJUSRC, MASR, MURASR\* and MURASR for the: (**a**) Indian Pine image; (**b**) Salinas image; and (**c**) University of Pavia image.

From Table 1, we can find when the scale number is 7, the calculated scale for superpixels is large enough. If the scale continues increasing, there will be more mixed superpixels generated. Moreover, the classification results of MURASR on three images are encouraging when the number of scales is 7. Therefore, the effects of scales number under or equal to 7 will be analyzed in this section. It means that scales for patches range from 3 × 3 to 15 × 15. Figure 9 shows the average OA of ten runs for JSRC, JUSRC, MJSRC, MJUSRC, MASR, MURASR\* and proposed MURASR. For multiscale algorithms, each scale represents the combination of the current scale and its smaller scales. It is easy to find that the union region based classifiers JUSRC, MJUSRC, and MURASR\* generally outperform corresponding patch region based JSRC, MJSRC and MASR. And the probability majority voting method can optimize the classification result on each region scale. In addition, the proposed MURASR consistently outperforms other algorithms on all the region scales.

#### *4.4. Effects of Training Samples Number*

The number of training samples may affect the performance of the classifiers. Therefore the effects of different number of training samples on the JSRC, MJSRC, JUSRC, MJUSRC, MASR, MSPASR and proposed MURASR were examined on the three images. For the Indian Pines, the number of selected training samples for every class varies from 1% to 20% percentage. For the Salinas, the percentage is from 0.1% to 2%. For the University of Pavia, 60–500 training samples were selected for each reference class. The difference in terms of classification OA for each classifier with different number of training samples is illustrated in Figure 10. The OA is also the average of ten runs. As can be observed, the union region based classifiers JUSRC, MJUSRC and MURASR\* always perform better than corresponding patch region based JSRC, MJSRC, and MASR. Comparing the result of MURASR\* and MURASR, it is easy to find that the improvement obtained from probability majority voting method increases with the decreasing of training samples number. Moreover, the proposed MURASR generally outperforms other classifiers on all the training samples.

**Figure 10.** Effect of the number of training samples on JSRC, JUSRC, MJSRC, MJUSRC, MASR, MURASR\* and MURASR for the: (**a**) Indian Pine image; (**b**) Salinas image; and (**c**) University of Pavia image.
