**1. Introduction**

Hyperspectral sensors can provide images with hundreds of continuous spectral bands, which has attracted a number of applications such as environmental monitoring and mineral prospecting [1–4]. Among many surveys about hyperspectral imagery (HSI) analysis, land cover accurate classification is an important research topic. Supervised spectral classifiers are popular in the early research, including multinomial logistic regression [5], support vector machines (SVMs) [6–8] and sparse representation classifier [9].

During the last decade, a lot of endeavors have been devoted to extracting more representative features from original HSI data. It is widely recognized that joint spectral and spatial information can significantly improve the performance of HSI classification methods. Markov random field (MRF) is a powerful tool that is able to model the spatial relationship around pixels. In [10,11], MRF was combined with subspace multinomial logistic regression and Gaussian mixture model, respectively. In [12], MRF

was used as a postprocessing to refine the classification maps obtained by SVM. Morphological profile (MP) is another powerful tool to utilize the spatial contextual information. In [13], Benediktsson et al. improved the original MP, and proposed the extended morphological profile (EMP) method for HSI classification. Motivated by the promising performance of EMP, two improved methods, extended attribute profile and extended multi-attribute profile, were proposed in [14].

Because a single kind of feature may not describe the integrated characteristics of HSI data, multiple feature fusion (MFF) were proposed and used to improve the performance of HSI classification models. MFF based methods can be roughly divided into four classes [15]: multiple kernel learning, band selection, subspace feature extraction and ensemble based methods. Li et al. constructed a series of generalized composite kernels where no weight parameters were required [16]. Gu et al. employed multiple kernel learning to combine different spectral-spatial features [17–19]. Band selection methods try to find the most discriminative hyperspectral channels while preserving their physical meanings. In [20], discriminative sparse multimodal learning based method was proposed for multiple feature selection. In [21], spectral and spatial information are utilized simultaneously to select the representative bands. Different from band selection, subspace methods refer to transforming the original multiple features to a new low-dimension sub-feature space. Zhang et al. introduced a patch alignment and a modified stochastic neighbor embedding based methods for feature fusion [22,23]. In [24], a low-rank representation based feature extraction method was proposed for HSI classification, where locally spatial similarity and spectral space structure were combined. In [15], Zhong et al. conducted dimension reduction on multiple features by hashing methods. Ensemble learning is another typical feature fusion method. Ensemble learning methods aim at achieving better generalization capacity by integrating different features or individual learners [25]. SVM [26,27] and random forest [28–30] based HSI classification methods were proposed in recent studies. In [31], Chen et al. improved the classification accuracy by stacked generalization of magnitude and shape feature spaces. In [32], Pan et al. combined spatial relationships in different scales via a weighted voting strategy. In addition, feature fusion methods using different data sources have also been investigated [33,34].

Recently, deep learning based methods have attracted grea<sup>t</sup> interest in HSI classification, e.g., [35–40]. The basic idea of these methods is to extract the "deep" feature from the original HSI data, thus hierarchical network models are designed. This idea is promising and encouraging. In some natural scene image classification tasks, deep learning methods have achieved even better results than human level [41]. In [35], the deep learning method was firstly used in HSI classification, where a stacked autoencoder was adopted. Subsequently, deep belief networks [42], convolutional neural networks [39,43,44] and recurrent neural networks [38] were investigated. In order to improve the computational efficiency, some simplified deep learning models were developed [36,37]. Most of these methods have also considered the spatial relationship via 3D networks or neighborhood information. However, the performance of deep learning methods is heavily dependant on abundant training samples that are difficult to acquire from HSI data. Compared with traditional methods, deep learning methods usually require more labeled samples. For example, in [35,36], about half of all the labeled pixels were used for training. Although deep features could really improve the classification accuracy, more research is required on finding a new way out of deep learning to extract hierarchical features.

Inspired by the ideas of MFF and deep learning, in this paper, we propose a novel hashing based hierarchical feature (H2F) extraction method for HSI classification. The motivations of H2F come from two aspects: (1) low-level features such as spectral variations, local texture and global texture information, should be combined to produce a comprehensive feature set. This feature set could serve as inputs of the next layer; (2) based on the obtained feature set, a further feature extraction process should be followed, so as to generate a hierarchical feature. This hierarchical feature should present better performance than every single feature. Different from traditional MFF based methods, H2F is not a simple combination or voting of multiple features. Instead, H2F attempts to construct more representative feature descriptor from the already extracted feature set.

Based on the two motivations, we propose a cascaded feature extraction framework with two major processes: the generation of spectral-spatial feature set and hashing based hierarchical feature extraction. In the first process, we construct a feature set which is composed of spectral variations, local and global textures. In this paper, we use rolling guidance filtering (RGF) [45], local binary pattern (LBP) [46] and global Gabor filtering [47] to form the multiple features. Although many recent works have demonstrated that there is information redundant in some popular HSI data sets [48–50], it may be not appropriate to conclude that information redundant exists in all the HSI data. Therefore, different from traditional feature fusion based methods, in this paper, we do not conduct dimension reduction so as to better preserve the distinctive classification information. All these features are collected to a feature set. In the second process, we design a hashing histograms based feature extraction strategy to give a more representative description for the HSI data. To avoid complex computation, the feature set is separated into several groups. The hashing histogram features in all the groups are concatenated as the final feature expression. It is worth noting that H2F is actually an ensemble based method, rather than deep learning based.

At last, an extreme learning machine (ELM) classifier [51] is used to determine the label of each pixel. The most important reason of using ELM is to improve the computing speed. Usually, feature fusion methods will generate relatively high-dimensional features, and this is more apparent in H2F since dimension reduction is not adopted. ELM has a simple structure, and it can be trained very fast because of its random weights generation in inputs and least squares solution in outputs. Furthermore, some research has proven that ELM is effective for HSI classification [46,52,53]. We compare the effectiveness and efficiency of ELM and several other classifiers in the experiments' part.

The major contribution of this paper is that a hashing based hierarchical feature ensemble method is developed. The ensemble strategy proposed in H2F could provide a new way to utilize multiple features.

The reminder of this paper is organized as follows. In Section 2, we give a detailed description about the proposed method. In Section 3, experiments and discussion on two popular and one challenging data sets are provided. We conclude this paper in Section 4.

#### **2. H2F Based Classification**

The proposed H2F based HSI classification method can be divided into three steps: (1) multiple features extraction; (2) hashing based hierarchical feature representation and (3) ELM based classification. The flowchart of the proposed method is shown in Figure 1.

#### *2.1. Multiple Features Extraction*

Research has demonstrated that spectral-spatial joint information could significantly contribute to the performance of HSI classification methods. However, it is hard to judge which feature extraction approach performs best. Actually, each single feature has its unique emphasis. In this paper, we select three disparate features that reflect different characteristics of HSI data to construct a feature set, namely, RGF (for spectra), LBP (for local texture) and Gabor (for global texture). It is worth noting that each feature will generate one or several sub-feature sets. Take Gabor feature for example. Suppose that four wavelengths and four orientations are used. Then, for each pixel, there will be 16 sub-features. If we set eight as the number of features in a sub-feature set, two groups of sub-feature sets could be obtained. The following hierarchical feature representation operation is conducted on these subsets. Using the whole feature set directly for hierarchical feature representation is not appropriate because different types of features are heterogeneous.

**Figure 1.** The flowchart of the H2F based method.
