*3.3. Classification Results*

Classification results by all the compared methods are shown in Figures 4–6 and Tables 1–3. Since the H2F is a fusion approach of some spectral and spatial features, we especially chose the methods that use single or combine two features for comparison [32,37,46,47], so as to validate the effectiveness of the proposed fusion strategy.

**Figure 4.** Classification maps by compared methods for Indian Pines data set. (**a**) The ground truth (**b**) GE (**c**) LGE (**d**) EPF (**e**) IIDF (**f**) RVCANet (**g**) HiFi (**h**) H2F.

**Figure 5.** Classification maps by compared methods for KSC data set. (**a**) the ground truth; (**b**) GE; (**c**) LGE; (**d**) EPF; (**e**) IIDF; (**f**) RVCANet; (**g**) HiFi; (**h**) H2F.

**Figure 6.** Classification maps by compared methods for GRSS\_DFC\_2014 data set. (**a**) the ground truth; (**b**) GE; (**c**) LGE; (**d**) EPF; (**e**) IIDF; (**f**) RVCANet; (**g**) HiFi; (**h**) H2F.



*Remote Sens.* **2017**, *9*, 1094



**Table 3.** Classification accuracies of different methods on the GRSS\_DFC\_2014 data set (%).


#### 3.3.1. Results on Indian Pines Data Set

Experiments on this data set appear in nearly all the HSI classification works. Maybe, it is because this data set is a little more difficult for classification than some other popular ones such as Salinas or KSC, especially when training samples number is limited. It can be seen from Table 1 that the seven compared methods present various performance with only 20 training samples per class. H2F slightly outperforms HiFi, and achieves 5–10% advantages over other methods. It is worth noting that, in some classes with a large number of testing samples (such as classes 2, 3 and 11), all of the methods present plunges. This is because such few training samples cannot fully represent the data distribution in these classes. On the other hand, we can see from Figure 4 that the spatial consistency is roughly preserved by every one of the methods. Since all of these methods have utilized joint spatial-spectral features, Figure 4 demonstrates that spatial information is really beneficial to HSI classification.

#### 3.3.2. Results on KSC Data Set

It is observed in Figure 5 and Table 2 that results in this data set are much better. Although only 20 samples per class are used for training, H2F presents above 99% accuracies, which achieves about 0.4% advantage. Additionally, we find that H2F reports more than 96% accuracy in each class. Among all of the 13 classes, H2F performs better in nine of them. However, we must recognize that, since most methods have achieved better than 97% OA in this data set, it is not safe to conclude which one is the best. Therefore, experiments on more challenging data sets are of vital importance.

#### 3.3.3. Results on GRSS\_DFC\_2014 Data Set

Apparently, this data set is more difficult for classification. Although there are still 20 samples per class used for training, accuracies by all the methods present an obvious decline, as shown in Figure 6 and Table 3. The reason may be that the imaging quality in long-wave infrared channels is relatively lower. However, H2F still outperforms other methods by about 2%. Comparison with LGE is especially more meaningful because H2F could be regarded as an improvement of LGE, where we extract the hierarchical features rather than a simple fusion. From Tables 1–3, we can find that H2F is slightly better than LGE in all of the three data sets. These results may indicate that the hierarchical strategy in H2F is effective.

#### *3.4. Analysis and Discussion*

Figure 7 shows the box plots of OAs by different methods. The box plot is a simple summary for the data distribution. In this paper, we have conducted all the methods 50 times, and the results in each running are displayed by box plots. In a box plot, the red line in the box denotes the median. The top and bottom of a box are the 75th and 25th percentiles, respectively. Data outside the box are mild and extreme outliers. Because LGE, HiFi and H2F present the closest accuracies in the three data sets, we only show the box plots by these methods in Figure 7, and take OA for example. We can see that the boxes of H2F are higher than the others in all the three data sets, and the advantage is more apparent in GRSS\_DFC\_2014. Moreover, we use a paired *t*-test to further validate that the improvements by H2F are statistically significant, which is defined as follows:

$$\frac{\left(\overline{a\_1} - \overline{a\_2}\right)\sqrt{n\_1 + n\_2 - 2}}{\sqrt{\left(\frac{1}{n\_1} + \frac{1}{n\_2}\right)(n\_1 s\_1^2 + n\_2 s\_2^2)}} > t\_{1-a} \left[n\_1 + n\_2 - 2\right],\tag{12}$$

where *a*1 and *a*2 are the OA of H2F and a compared method, *s*1 and *s*2 are the corresponding standard deviations, *n*1 and *n*2 are the repetition running times, which is set as 50 here, and *t*1−*<sup>α</sup>* is the *α*th best quantile of the Student's law. Results indicate that the improvements by H2F is statistically significant in all of the three data sets (at level 90%).

**Figure 7.** Box plots of different methods on (**a**) Indian Pines; (**b**) KSC and (**c**) GRSS\_DFC\_2014 data sets.

It is worth noting that it is not necessary to tune the parameters in each single feature such as Gabor and RGF. H2F needs to ensemble many groups of features, and setting different parameters is a natural step to generate various sub-features. Therefore, the most important parameters in H2F are the number of sub-feature sets *M* and the number of features *N* in each subset. In Figure 8, we provide an analysis for *M* and *N*. The results are interesting. We find that, although *M* and *N* have drastic changes, the OAs vary little in Indian Pines and KSC data sets. However, in Figure 8c, results demonstrate that more features will contribute to better accuracy. The reason may be that, in the former two data sets, the multiple features have already included some redundancy information. In other words, it is not necessary to extract too many features in Indian Pines and KSC data sets. However, it is not suitable for GRSS\_DFC\_2014 data set, where further increasing the multiple features would continue improving the classification accuracies. Because GRSS\_DFC\_2014 is long wave infrared data set, its quality is much lower than that of the other two. It is not appropriate to infer that GRSS\_DFC\_2014 also has information redundancy. In this case, integrating more features may further enhance the ability of feature representation in GRSS\_DFC\_2014. Results in Table 3 could also support this opinion. Overall, the most important point we try to emphasize in Figure 8 is that information redundancy does not exist in all of the HSI data. For some popular data sets such as Indian Pines and KSC, maybe information redundancy really exists. However, not all the HSI data includes redundancy information. It is not safe to conclude that dimension reduction could bring competitive or even better classification accuracies. In addition, this is just why we try to extract hierarchical features.

**Figure 8.** The influence of parameters on OA (%) in H2F. Results on (**a**) Indian Pines; (**b**) KSC and (**c**) GRSS\_DFC\_2014 data sets. *M* is the number of sub-feature sets, and *N* is the number of features in each subsets.

Since the deep features extracted by H2F are usually of high dimension, some popular classifiers such as SVM are time-consuming. In Table 4, we compare the training and testing time by ELM and SVM. To be fair and avoid parameter tuning, linear kernel is adopted by both of them. Another advantage of using linear kernel is that it could reduce the computational complexity. Furthermore, the OAs by ELM and SVM are also reported. Note that the running time in Table 4 is only composed of the classifiers' training and testing process, not including the feature extraction process. We can see from Table 4 that ELM presents slightly better performance than SVM with lower computational consumption. Because the selection of classifier is not the emphasis in H2F, we choose ELM according to the results in Table 4.

**Table 4.** The OA (%)/running time (s) by ELM and SVM.


Finally, we give an evaluation for the influence of training samples number in Figure 9. Classes with totally 20 around samples are ignored because they have little influence on OA. Similar to Figure 7, HiFi and LGE are used for comparison. As is expected, the accuracy improves with the increase of training samples number. H2F outperforms the others in most cases. In particular, we note that the gaps are more apparent when training samples are limited. This results may indicate that H2F could provide more representative feature expression for the original HSI data.

**Figure 9.** Influence of training samples number on (**a**) Indian Pines; (**b**) KSC and (**c**) GRSS\_DFC\_2014 data sets.
