1. Introduction
With the development of sensor technologies, various information about different materials on the ground can be collected from various image sensor sources, e.g., hyperspectral images (HSI) with spectral characteristics, light detection and ranging (LiDAR) images with shape and elevation information, thermal images with temperature information and synthetic aperture radar (SAR) images with terrain texture information. These multi-source images make it possible to make full use of the complementary and redundant information of each source which can provide reliable image interpretation and improve the performance of intelligent processing algorithms. Although these datasets provide a wealth of information, the automatic interpretation of remote sensing data is still very challenging [
1].
Hyperspectral images (HSI) have hundreds of narrow spectral bands throughout the visible and infrared portions of the electromagnetic spectrum. LiDAR is an active remote sensing method, using the pulsed laser to measure distances (ranges). Similar to Radar, the coherent light pulses are transmitted, reflected by a target, and detected by a receiver. LiDAR data products usually include LiDAR point cloud, digital terrain model (DTM), canopy height model (CHM) and digital surface model (DSM). LiDAR point cloud contains 3D coordinates, classified ground returns, above ground level (AGL) heights and apparent reflectance. The CHM represents the actual height of buildings, trees, etc. without the influence of ground elevation. The DTM is the bare-earth elevation. The DSM describes the height of all the objects on the surface of the earth, which is the sum of the DTM and the CHM. The passive sensing of hyperspectral systems can describe the spectral characteristics of the observed scenes, whereas the active sensing of LiDAR systems can offer the height and shape information of the scenes. LiDAR also has high accuracy and flexibility, since it can be operated at any time, less sensitive to weather conditions and system parameters are adjustable, for example, flying speed/height, scan angle, pulse rate, and scan rate amongst others.
Currently, various algorithms are only designed for HSI or LiDAR. Several algorithms for classification, feature extraction, and segmentation are proposed for HSI [
2,
3,
4,
5,
6,
7,
8,
9,
10], while many feature extraction and detection algorithms are only designed for LiDAR [
11,
12,
13,
14,
15,
16]. However, it is evident that no single type of sensors can always be adequate for reliable image interpretation. For instance, HSI should not be utilized to distinguish objects composed of the same material, such as roofs and roads with the same pavement material. On the other hand, LiDAR data alone cannot be adopted to differentiate objects with the same elevation, such as roofs with the same height, but made of concrete or tile [
1].
In many applications, HSI and LiDAR have been used in combination successfully, such as biomass estimation [
17], micro-climate modeling [
18], mapping plant richness [
19] and fuel type mapping [
20]. Furthermore, the combined use of HSI and LiDAR results in higher classification accuracies than using each source separately. For example, elevation and shape information in LiDAR data and spectral information acquired by HSI have been investigated [
1,
21,
22,
23,
24,
25,
26,
27,
28]. The above work show that LiDAR and HSI can complement each other well, and by adequately integrating the two data sets, the advantages of both can be fully utilized while the shortcomings of each data set can be solved as well. In the above work, the results have shown that the combined use of LiDAR and optical data can improve classification accuracies in forests and urban areas. The research work sequence on the combined use of LiDAR and HSI led to the 2013 and 2018 data fusion contest organized by the Geoscience and Remote Sensing Society (GRSS).
Hyperspectral and LiDAR fusion classification methods can be feature level fusion or the decision level fusion. In the feature level fusion, features extracted in the HSI and LiDAR are stacked for further processing [
21]. However, the stacked features may contain redundant information. With the limited number of labeled samples in many real applications, the stacked features may pose the problem of the curse of dimensionality and therefore result in the risk of overfitting the training data. Therefore, many works in the literature adopt the fusion framework of stacking and dimension reduction [
22,
23,
26,
27,
28]. W. Liao et al. Y. Gu et al. and P. Ghamisi et al. use the graph-based method, where calculating the graph needs a lot of calculations and memory [
22,
23,
28]. B. Rasti et al. use a sparse representation based method [
26], and B. Rasti et al. use a total variation component analysis based method, in which solving the optimization equation by loops requires much calculation [
27]. In the decision level fusion, the classification results of different classifiers are merged with the majority voting strategy [
24]. However, voting can lead to rough results. In this paper, we propose an effective fusion method with low computational complexity which is suitable for real-time applications.
In our fusion framework, we need to extract the spatial features of HSI and LiDAR images. M. Dalla Mura et al. introduced the concept of attribute profile (AP) [
29] as a generalization of morphological profile [
30]. The AP extracts multi-level representations of an image, which utilizes a sequential application of morphological attribute filters. To further improve the conceptual capability of the AP and the corresponding classification accuracies, the extinction profiles (EPs) was proposed [
31,
32]. The EP has the following advantages. Firstly, it can remove insignificant details and preserve the geometrical characteristics of the input image simultaneously. Secondly, it can deliver a better recognition performance than the traditional spatial-spectral AP feature extraction method.
For fusion, the spatial features of HSI and LiDAR images as well as spectral features of HSI are stacked first. The stacked features have complete information. Then, to avoid the high dimensionality and the risk of overfitting the training data, we need to find a fast and efficient method to remove redundant information with the HSI and LiDAR data. For the traditional dimension reduction (DR) methods, for example, linear discriminant analysis (LDA) [
33,
34], local Fisher discriminant analysis (LFDA) [
35], local discriminant embedding (LDE) [
36], nonparametric weighted feature extraction (NWFE) [
37] and so on, they belong to the spectral-based DR methods. However, for the stacked features, the vector-domain similarity is not sufficient to show the intrinsic relationships among different samples. If two samples have a small vector distance, they may have a large spatial pixel distance. The projection based on the vector similarity metric may result in wrong features. A. Plaza et al. have proved that spatial contexture information is useful to increase the classification accuracy of HSI [
38]. Therefore, a spatial similarity-based method, a local pixel neighborhood preserving embedding (LPNPE) is utilized [
39]. The LPNPE learns the discriminant projections through the local spatial pixel neighborhood. However, it may also have a problem that two samples in the same spatial pixel neighborhood may have a sizeable vector distance. To solve the above problem, an entropy rate superpixel (ERS) segmentation method is utilized [
40]. A superpixel created by the ERS contains the pixels with small vector distance. Therefore, we intersect the superpixel and the defined local spatial pixel neighborhood to remove the pixels with large vector distances and get an ideal spatial neighborhood. Finally, the discriminant projections can be learned by the ideal spatial neighborhoods. Recently, some new clustering algorithms have been proposed [
41,
42]. These clustering algorithms provide the possibility to design better fusion algorithms and will be discussed in future work.
For the application, our proposed fusion framework can be successfully applied to G-LiHT data. The G-LiHT airborne imager created by the US National Aeronautics and Space Administration (NASA) is an airborne system that simultaneously collects LiDAR data, HSI, and thermal data [
43]. The G-LiHT data contains CHM, DTM, Lidar Point Cloud, a hyperspectral reflectance image in the same area at 1m spatial resolution. The G-LiHT data provides new opportunities for the design of new hyperspectral and LiDAR fusion classification algorithms. C. Zhang et al. have evaluated the G-LiHT data for mapping urban land-cover types [
24]. However, C. Zhang et al. take merely advantage of LiDAR’s CHM data and HSI, and it does not fully exploit the potential of the G-LiHT data [
24].
In this paper, an innovative method, superpixel segmentation based local pixel neighborhood preserving embedding (SSLPNPE) method, is proposed for the fusion of HSI and LiDAR data. In particular, the main contributions of this paper are as follows.
- (1)
This paper presents a novel fusion method SSLPNPE. Our proposed method has low computational complexity and can significantly improve the classification accuracy of the LiDAR and HSI fusion.
- (2)
A new workflow is proposed to calibrate the G-LiHT data. With the workflow, our proposed method can be applied for practical applications. Experimental results prove that for the G-LiHT data, the proposed method SSLPNPE is fast and effective in hyperspectral and LiDAR data fusion. The proposed workflow can be generalized to any fusion classification method and the G-LiHT data in any scene.
- (3)
This paper presents that processing the CHM and the DTM separately can achieve higher classification accuracy than the DSM. To the best of our knowledge, we utilize the CHM and the DTM separately instead of the DSM for the first time in the remote sensing community.
The structure of the paper is as follows.
Section 2 introduces the methodology.
Section 3 presents the data, experiment setup and experimental results.
Section 4 provides the main concluding remarks.