Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey

Grewal, Reaya; Singh Kasana, Singara; Kasana, Geeta

doi:10.3390/electronics12030488

Open AccessReview

Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey

by

Reaya Grewal

,

Singara Singh Kasana

^* and

Geeta Kasana

Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, India

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(3), 488; https://doi.org/10.3390/electronics12030488

Submission received: 17 December 2022 / Revised: 6 January 2023 / Accepted: 8 January 2023 / Published: 17 January 2023

(This article belongs to the Special Issue Deep Learning Based Techniques for Multimedia Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The growth of Hyperspectral Image (HSI) analysis is due to technology advancements that enable cameras to collect hundreds of continuous spectral information of each pixel in an image. HSI classification is challenging due to the large number of redundant spectral bands, limited training samples and non-linear relationship between the collected spatial position and the spectral bands. Our survey highlights recent research in HSI classification using traditional Machine Learning techniques like kernel-based learning, Support Vector Machines, Dimension Reduction and Transform-based techniques. Our study also digs into Deep Learning (DL) techniques that involve the usage of Autoencoders, 1D, 2D and 3D-Convolutional Neural Networks to classify HSI. From the comparison, it is observed that DL-based classification techniques outperform ML-based techniques. It has also been observed that spectral-spatial HSI classification outperforms pixel-by-pixel classification because it incorporates spectral signatures and spatial domain information. The performance of ML and DL-based classification techniques has been reviewed on commonly used land cover datasets like Indian Pines, Salinas valley and Pavia University.

Keywords:

hyperspectral images; classification; deep learning; PSO; SVM; KNN; decision tree; PCA; DWT; ANN; CNN

1. Introduction

Remote Sensing (RS) has advanced significantly, leading to the use of new technologies. This enables novel data processing methods and better quality data with enhanced spatial and spectral resolutions for a variety of research applications. HSI are high-resolution electromagnetic spectrum images with a large number of contiguous bands. The spectral response of a material to incident light is distinct and this response is responsible for its colour. The availability of HSI has enriched RS research with sounder data quality and capability to distinguish different features by their spectral profile, which is unique to them as shown in Figure 1. HSI has two spatial dimensions (Sx and Sy) and one spectral dimension (Sz). The hyperspectral data is illustrated as a 3D hyperspectral data cube in Figure 2. Spatial and spectral resolution play essential roles in various HSI applications. It has drawn researchers interested in developing HSI techniques related to HSI in both spatial and spectral domains.

The term “classification” refers to the process of assigning individual pixels of an image to a class and producing a classification map as output.-based on the training pattern and availability of the data labels, classification techniques are broadly categorized as supervised and unsupervised. Supervised techniques classify input data by using a set of representative samples having labels. The unsupervised classifiers segregate the pixels on the basis of similar spectral or spatial behaviour without any prior information. HSI classification consistently suffers from various hurdles such as high dimensionality, limited or unbalanced training patterns, spectral variability and mixed pixels.

1.1. Advantages of HSI

Hyperspectral techniques are used vastly in various research areas and various applications due to their proficiency in detecting and distinguishing between the characteristic features of entities using multiple contiguous spectral channels. A few advantages of HSI technology are enlisted below:

HSI captures the finer details about different objects within similar features (e.g., different plant species or tree species).
HSI delivers better-detailed information than multispectral or RGB images. For example, HSI can differentiate three minerals in a region, it is only because of its increased spectral resolution. In contrast, multispectral or RGB imaging could not do the same due to its wider spectral width.
Another benefit of HSI is that the data analyst does not need any prior knowledge of the sample as a full spectrum is acquired at each point and post-processing can extract all available information from the data set.
HSI can also exploit the structural relationships between the various spectra in a neighborhood, fostering more sophisticated spectral structure models for more precise analysis and classification of an image.
The rich spectral-spatial detail has been very beneficial for implementing traditional and some newer classification methods.

1.2. Applications of HSI

HSI is being employed for various industrial, commercial and military applications. A few applications have been enlisted below and illustrated in Figure 3.

Food and Safety—HSI has contributed in food quality assessment and safety. It has been used for identification of defects and levels of contamination. For e.g., Leiva et al. [3] employed HSI to find firmness of blueberries and achieved an accuracy of 87%.
Medical Diagnosis—Due to high spectral resolution, there is sharp capture of materials and their chemical and physical compositions are highlighted. HSI has embarked on excellent performance for studying and diagnosing tissues. For e.g., Liu, Wang and Li [4] utilized HSI images of tongue tissues to detect the tumor. the spectral signatures of tissues played a vital role for detection.
Precision Agriculture—Manual crop monitoring is limited since apparent symptoms often develop late in the disease’s progression, making it difficult to restore plant health. Advances in HSI methodologies have made crop stress assessment and study of soil and vegetation attributes more cost-effective. For e.g., Liu et al. [5] used spectral signatures to estimate the yield of wheat crop.
Environment Monitoring—HSI has also been applied for floods and water resources management. HSI provides efficient and reliable information on water quality parameters which include hydrophysical, biochemical and biological properties. HSI measured chlorophyll content in water bodies by Kutser et al. [6].

There are many approaches to classify a HSI image. In this work, ML and DL classification techniques have been reviewed and compared. ML-based image classification focuses on developing algorithms to predict and detect patterns without human intervention. Various classifiers like Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Decision Trees (DT) etc. are trained. Several steps of data pre-processing and feature engineering need to be performed to get insights from raw images and improve performance of classification techniques. In this study, we have sub-categorised traditional ML techniques into commonly employed techniques in recent years like kernel-based learning, SVM classification, dimension reduction and transform-based techniques. Peers have majorly used kernel-based techniques to efficiently learn non-linearity of HSI dataset. Spectral and spectral-spatial kernels have been added as another dimension of learning by authors to capture complex details of HSI. SVM classifier also belongs to the family of kernel learning. SVM has been extensively used to classify the high-dimension HSI data and discuused. With transform-based techniques, authors have been able to extract useful information while suppressing noise in HSI. HSI dataset. The influence of classification grows with the increase of available training samples. The limited availability of HSI training samples diminishes the classification performance with the rise of spectral dimension. This effect is famously termed the “Hughes phenomenon.” To address this challenge, many authors have implemented dimension reduction techniques prior to classification. We have discussed various dimension reduction driven HSI classification that works on spectral features.

Unlike traditional ML techniques, DL delivers a dynamic approach for unsupervised feature learning using a huge raw image data set. DL-based techniques can depict complex relationships of data using numerous neural connections. The DL models for HSI classification generally consist of three layers: (i) Input data, (ii) Construction of the deep layer (iii) Classification [7]. A general representation of DL-based HSI classification has been illustrated in Figure 4.

The papers reviewed are focused on how different state of the art classification techniques have been used for HSI in the previous decade. A brief discussion on existing classification techniques is in Section 2. Methodology adopted to conduct this survey has been briefly stated too. The Section 3 elaborates traditional ML techniques employed by authors like SVM, kernel-based methods, dimension reduction and transform-based methods. Section 4 emphasises on DL techniques for spectral and spectral-spatial HSI classification. Section 5 and Section 6 highlight the analysis of this survey. It brings out comparison in performance of ML and DL techniques for HSI. The paper is concluded with challenges and future scope of research and improvement in HSI analysis.

2. Preliminaries

This section briefly defines the HSI classification techniques utilised in the surveyed publications.

2.1. Overview of Classification Techniques

The major techniques mentioned in the paper are briefly discussed:

Kernel-based Classification Techniques—HSI is highly non-linear and complex in nature and can be addressed using mathematical functions termed as a kernel [9]. It helps with better data representation and segregation into separate classes as depicted in Figure 5. Since available HSI datasets suffer from limited training samples, this approach has been viewed as stable and efficient.
Support Vector Machine-based Classification—SVM is typically a linear classifier associated with kernel functions and optimization theory, and is prominent in HSI classification. It uses a mathematical kernel as hyperplane to distinguish data points such that the points nearest to hyperplane are farthest from data points of other class as shown in Figure 6. SVM separates the inseparable data by learning an optimal hyperplane which distinguishes well in a high-dimensional features space. It outperforms the conventional supervised classification methods especially under prevailing conditions such as an increased number of spectral bands and the limited availability of training samples.

Figure 5. A schematic approach of Kernel-based learning to classify data.

Figure 6. A schematic approach of SVM hyperplane separating different classes [10].

Wavelet Transform-based Classification—Wavelet transform-based classification techniques have been used in many of the subsequent studies [11]. These techniques filter out noise and also aids in data compression. Wavelet analysis is a time-frequency method for determining the optimum frequency band for images-based on their attributes. Figure 7 shows a general approach of how wavelet transform breaks down data into approximated data, horizontal and vertical detailed data with different decomposition levels.
Dimension Reduction—The dimensionality curse, often known as the Hughes phenomenon, poses the greatest difficulty to HSI categorization. To overcome this problem, feature extraction methods are employed to reduce dimensionality by picking the most important characteristics. Unsupervised approaches arrange pixels with comparable spectral features (means, standard deviations, etc.) into discrete clusters-based on statistically derived parameters. Furthermore, no prior information is required to train the data using unsupervised categorization algorithms. Unsupervised methods that are well-known are Principal Component Analysis (PCA) [12] and Independent component analysis (ICA). Figure 8 illustrates basic steps of PCA dimension reduction.
Deep Learning-based Classification—The vast and growing family of Neural Networks have been studied for classification of HSI as depicted in Figure 4. Recent researches show that DL has improved the accuracy of the classification of HSI. The DL models have exploited spectral, spatial and both spectral-spatial features. Apart from kernel and dimension reduction methods, DL methods themselves extract features automatically and solve non-linearity problem. Some of the commonly employed DL models implemented for HSI are Convolutional Neural Networks (CNNs), Long Short Term Memory Networks (LSTMs), Multilayer Perceptrons (MLPs), Deep Belief Networks (DBNs) and more. The limitations however still persist with limited training samples and over optimistic results due to overlapping training and testing dataset.

2.2. Methodology Adopted for Survey

The research papers have been reviewed from 2010 onwards, from reputed and leading sources like ScienceDirect, IEEE and others.
The paper has been categorised majorly into two techniques: Traditional ML and DL techniques.
A comparison on performance of aforementioned techniques is discussed in the end to help readers understand difference in performance of techniques.
Land cover benchmark datasets such as Indian Pines, Pavia University, and others have been used to assess the performance of techniques.
Each technique’s performance was compared on the basis of their accuracy in classification.

3. Traditional Machine Learning Classification Techniques

3.1. Kernel Learning-Based Classification Techniques

Gu and Liu [13] classified the limited samples of HSI using Sample Screening Multiple Kernel Learning (S2MKL) in 2016. The training samples were adaptively screened using their probability distributions. Subsets of samples were fed into different base kernels of SVM. Using a linear combination, the optimal weights of the base kernels were determined. Instead of the original spectra, MP were extracted using erosion and dilation operations. MP provided spatial and spectral features used in Multiple Kernel Learning (MKL) with Adaboost strategy. Executed on the Indian Pines, Salinas valley and Pavia University, the proposed method achieved better results as compared to other state of the art approaches.

In 2017, Fang et al. [14] used Extinction Profiles (EPs)-based spectral spatial feature extraction method. The earlier researchers used EPs in a stacking manner in which whole information would not be utilised. Hence, to extract better features, the authors created a fusion framework namely, EPs-F. In this they exploited the information among and within EPs. For within EP information, three independent components of HSI were taken into account, adaptive superpixel-based composite kernels were proposed which fused the kernels of various features. The classification map obtained from different EPs reflected the information of whole HSI in different perspectives. Hence, for information among EPs, these maps were fused together and a final classification map was produced. Their approach achieved an Overall Accuracy (OA) of 96.12% and 99.14% on Indian Pines and Pavia University, respectively.

Li et al. [15] used composite kernels in 2018, to extract spatial and spectral features. Adaboost framework with weighted Extreme Learning Machine (ELM) was used. Radial Basis Function (RBF) kernel was applied for spatial information. Polynomial function kernel was applied to extract spectral information. Adaboost algorithm was applied iteratively and weights were updated accordingly. The experiment was performed on Indian Pines and Pavia University dataset and achieved an accuracy of 98.08% and 96.46%, respectively, in comparison with SVM, ELM, Kernel-SVM, Kernel-ELM, Combined Kernel–SVM and Combined Kernel-ELM. In this work, simple spatial features were extracted like standard, local mean. Features like gabor filter, MP can be applied in future for better results. The classification accuracy could be improved using various other machine learning algorithms.

In 2019, Li, Lu and Zhang [16] proposed a combination of SVM and Multiple Kernel Boosting (MKB). It searched for optimum combination of kernel SVMs. Initially multiple kernel functions were prepared and every SVM was trained over the dataset forming pool of kernel SVMs. From this pool, which the strong SVM classifiers were repeatedly chosen and added to the final decision function. Weight of each SVM and samples were updated in each iteration. The experiment was performed on reflectance spectrum of planton, Salinas Valley and Pavia university datasets. It outperformed classifiers like KNN, SVM, Bayes and Sparse Representation method. In future, the combination of spatial-contextual constrain with MKB framework could achieve higher accuracy.

In 2020, Li, Wang and Kong [17] proposed an adaptive kernel sparse representation method-based on multiple feature learning (AKSR-MFL). Initially multiple spatial and spectral features from different perspectives were extracted. They extracted five types of feature descriptor from the original HSIs, including spectral, Extended MP (EMP), differential MP, Local Binary Pattern (LBP) and Gabor texture. The last four features gave spatial information. For each test pixel, shape adaptive region was constructed using shape adaptive algorithm. It is-based on local variations of HSI to conform spatial structure and explore the contextual information. AKSR method was designed using kernel joint sparse pattern to address the linearly inseparable problem in multiple feature space. It grouped the pixels having similar distribution. In composite kernel, base kernels used for different feature descriptors were combined with optimal weights. The experiment was performed on Indian Pines, Pavia University and Salinas Valley datasets and achieved the highest OA in comparison with various state of the art approaches.

In the same year, Gao et al. [18] used a composite Spectral-Spatial Kernel for Anomaly Detection (SSCAD). It considered non-linear characteristics of data unlike other detection models that worked in linear space and just exploited spectral information. Using a kernel-based approach, the data is implicitly mapped into high dimensional features space that deals with non-linear problems well. Using local homoegeneity, superpixels were extracted using ERS that provided spatial information. It was fused with direct spectral information extracted from images to form composite kernel. Weights were adaptively determined using iterative kernel learning algorithm-based on Centred Kernel Alignment (CKA). CKA measured cosine similarity between two centred kernels. High value of CKA determined that two kernels are similar to each other. The authors focused on obtaining highest possible value of CKA between the composite kernel and target kernel. The detection map was built using kernel-based Reed-Xiaoli anomaly detection algorithm. It used Mahalanobis distance to form decision rules to distinguish text pixels and backgrounds. The proposed work was implemented on real datasets obtained using HYDICE sensor, ROSIS sensor over Pavia centre and AVIRIS sensor over San Diego area. It gave better performances in terms of Receiver Operating Characteristic (ROC) curve and Area Under the ROC curve (AUC) when compared with state of the art anomaly detection methods.

Following this, A MKL-based approach involving spectral, spatial and semantic information using SVM were used by Wang et al. [19] for better classification results of HSI. First three PCs (PC1-PC3) were obtained by applying PCA. These were used to obtain Gabor features, entropy rate superpixel segmentation map and EMPs. Structure and textural features were extracted and stacked as feature vectors for each pixel using combination of gabor and EMP features. For uniformity in spatial characteristics, Mean filtering was performed within each superpixel. For semantic information, k-means clustering map and segmentation map via ERS were used to produce semantic feature vector for each superpixel. Each superpixel was treated as a separate document/image. Spectral features, ERS map and manually decided ‘k’ number of cluster centroids were inputs to create semantic features using Bag of Visual Words (BOVW). K-means clustering was performed on the spectral features to cluster them into ‘k’ cluster centres that was used as visual dictionary. Number of pixels belonging to each cluster inside each superpixel were counted. Creation of k × 1 histogram feature vector was done for each superpixel. Three individual kernels were used to extract spectral, spatial and semantic information. For final results, composite kernel with SVM was applied using weighted sum of these three kernels. The work was implemented on Indian Pines and Pavia university and obtained highest OA of 98.39% and 99.77%, respectively.

HSI dataset faces with mixed pixels and purely pixel driven classifiers like SVM cannot deal with overlapping data. Recently in 2021, Ma et al. [20] overcame it using Kernel Constrained Energy Minimization (KCEM) and Kernel Linearly Constrained Minimum Variance (KLCMV) classification. KCEM was for binary classification whereas KLCMV for multi-classification. KCEM achieved an OA of 99.48% and 99.50% for Indian Pines dataset, respectively. Both the former and latter achieved an OA of 99.6% on Salinas Valley. It surpassed the performance of other spectral spatial methods. The aforementioned Kernel-based classification techniques have been compared in Table 1.

3.2. Support Vector Machine

In 2010, Dalla Mura et al. [24] dealt with data redundancy and extracted useful information from raw HSI using ICA and Extended Morphological Attribute Profiles (EAPs). This helped them to map the data into a subspace where components were as independent as possible. With the help of various attributes, APs were applied on each of these components followed by morphological processing and SVM classification. Their methodology obtained highest OA of 94.47% on Pavia University.

PCA has been earlier researched upon to extract useful bands from HSI but it is a linear spectral technique applied on non-linear HSI. In 2011, Licciardi et al. [25] classified HSI using MPs built using selected and informative features by non-linear PCA. The authors classified it using SVM and Neural Networks. It had improved results as compared to Kernel PCA and standard PCA.

In 2013, Dopido et al. [26] worked on the limited training samples challenge of the HSI classification. They used active learning methods where a trained expert was used to select unlabeled samples-based on spatial information-based on labeled samples. This was further adapted to a self-learning framework in which the classifier chose the most informative unlabeled samples for classification. This reduced effort and cost as the classifier itself determined the labels for selected pixels. They employed probabilistic SVM and multinominal logistic regression as classifiers for the same.

Zhong et al. [27] used both spatial and spectral information for HSI classification in 2018. SVM classifier is a pixel-based spectral technique. They used it iteratively in combination with Gaussian filter. Initially, they combined original image with its first PC and obtained classification map using SVM. Using this map, spatial information was extracted through a Gaussian filter. They fed this map back in the next iteration to be combined with the currently processing data cube. Their approach obtained high OA of 98.60% and 98.68% on Indian Pines and Pavia University dataset, respectively.

SVM is a spectral classifier that cannot deal with spectral variability on its own. Recently in 2022, Pathak et al. [28] improved the aforementioned issue by implementing SVM with RBF, polynomial and linear kernels to obtain spectral spatial features of each pixel. Patches for each pixel were chosen and equivalent spatial features were fused with the spectral values. It achieved an OA of 95.75% with RBF kernel on Indian Pines dataset. For Pavia university, Polynomial kernel gave the highest OA of 98.84%.

In 2022, Li et al. [29] also focused on spectral spatial features using SVM. The authors used Shape Adaptive Reconstruction in pre processing, which extracted neighbours for each pixel-based on Pearson correlation in their shape adaptive regions. The probability maps for classification was obtained from SVM. In post processing, it was denoised and final classification map was achieved using Smoothed Total variation (STV). It achived an OA of 92% ad 95% for Pavia University and Indian Pines dataset, respectively.

3.3. Transform-Based Techniques

In 2008, Akbari et al. [30] applied HSI analysis in surgeries to detect organs. Compression of images was done using wavelet transform. Linear Vector Quantization (LVQ) and ANN were used for segmentation and classification. The images were post processed using image fill function and region growing. The experiment was performed on seven images each of different abdominal organs captured using ImSpector HSI sensor. False Negative Rate (FNR) and False Positive Rate (FPR) were evaluation criteria for quality of detection. Spleen was detected the best with FNR of 1.3% and FPR of 0.5%.

In 2014, Chen et al. [31] used Mean Shift Algorithm (MSA) and wavelet transform to obtain useful features and good classification results. The dimension was reduced using PCA. The images were smoothed using MSA. It converged the spectral curves of similar pixels. Edges information was extracted using wavelet transform. The experiment was conducted on Indian Pines and Changchun data set. Compared with Canny and Log operator, it produced better results with less noisy edges.

Spectral spatial classification of HSI were done using wavelet transforms and morphological operations by Quesada et al. [32]. Dimensions of HSI were reduced using spectral features extracted by wavelet. EMPs were obtained from dimensionally reduced images. Noise removal is done using wavelets as well. The information obtained fro both was combined and classified using SVM. The proposed work achieved an OA of 98.8%, AA of 99.0% on Pavia university Dataset.

In 2017, Two-Dimensional Empirical Wavelet Transform (2D-EWT) was used by Prabhakar and Geetha [33] for selection of informative and non-redundant bands. It was compared with Image Empirical Mode Decomposition (IEMD). EWT segmented the signal via Fourier transform and estimated supports. The estimated supports helped in building the wavelet filter banks. The signal was filtered that provided with frequency component to be processed having detail and approximation coefficients. The proposed work implemented a 2-D extension of Littlewood-Paley transform. Sparse-based classifiers were employed for the classification of the HSI dataset like Subspace Pursuit (SP) and Orthogonal Matching Pursuit (OMP) along with SVM, Hybrid Support Vector Selection and Adaptation (HSVSA). The methodology was performed on the Indian Pines dataset. IEMD gave better OA but in more time as compared to 2D-EWT. The low frequency components of IEMD and 2D-EWT had improved kappa measure, OA and Average Accuracy (AA).

In 2019, Ji et al. [34] detected bruises on potatoes using Discrete Wavelet Transform (DWT) technique. Characteristic bands were selected using PCA. Top significant PC images were chosen. The images’ texture were enhanced with histogram equalization. The processed PC images were decomposed using DWT. The textural properties like contrast, entropy, and correlation were obtained using GLCM. These feature data were further extracted using AdaBoost-Fisher Linear Discriminant (FLD) Algorithm. The identification of bruised potatoes was done by Adaboost modeling. The highest detection accuracy of 99.82% was obtained.

In 2021, Anand et al. [35] extracted 3D spectral spatial features of an HSI cube simultaneously. Haar, Coiflet and Fejer-Korovkin filters were used for the same. The features were fed into SVM, KNN and Random Forest. It achieved highest performance with KNN and Random Forest.

Recently in 2022, Xu, Zhao and Liu [36] implemented 3D wavelet transform in the pre-processing step to reduce the number of the learnable parameters of CNN. It extracted both spatial and spectral features and had robust feature representation. Haar wavelet was used as the mother wavelet.

In 2022, Miclea et al. [37] with the aid of wavelet obtained spectral features which were concatenated with spatial features of LBP. The spectral spatial features were fed to SVM. To prevent data overlapping which caused exaggerated classification results, the training and testing sets were divided through controlled sampling. The training samples were selected such which had spectral and spatial variance. The samples were added through region growing with a specified window size. The aforementioned Transform-based classification techniques have been compared in Table 2.

3.4. Dimension Reduction-Based Techniques

High dimensionality is a challenge for HSI classification. This section discusses and analyses the researchers’ contribution in their work for improving HSI classification. A general schema of dimension reduction is depicted in Figure 9.

3.4.1. Unsupervised

In 2011, Villa et al. [43] focused on removal of redundant bands and used Independent Component Discriminant Analysis (ICDA) for the same. The authors obtained classification results using Bayesian classifier. Their approach achieved better accuracy than SVM classification.

In 2016, HSI band selection using combination of entropy filtering and K-means clustering was done by Santos and Pedrini [44]. For increased intra cluster similarity and inter cluster variance, the bands were grouped together using their correlations. The images were downsized by selecting fewer features vector using bi-cubic interpolation to improve computation time. K-means was applied where each band was treated as a sample and the Pearson correlation coefficient was used. K Representative bands were selected from grouped bands and a 2d entropy filter was applied to each band. The central pixel of each kernel was replaced with computed entropy giving a new vector that was submitted to radial kernel SVM. The methodology obtained an OA of 97.1%, 98.3% and 97.1% on Indian Pines, Salinas valley and Pavia centre datasets, respectively.

In 2017, Schclar and Averbuch [45] focused on improving the classification results of HSI using Diffusion Bases (DB)-based methodology. The non-linear correlations amongst wavelengths were captured that produced low dimension representation of data, reducing the amount of noise. A modified version of the DB method was also proposed that used eigendecomposition of symmetric matrices. These were conjugate to the non-symmetric Markov matrix and used weight functions comprising pairwise similarity between pixels. To cluster the low dimensional data, two-phased histogram-based segmentation method named as Wavelength-Wise Global segmentation (WWG) was used. In wavelength wise understanding of n-band HSI, cube was considered as collection of n images having size m*m. The clustering was performed on the basis of colour similarity. The colour-based segmentation included normalisation of input image followed by it’s quantization. The frequency colour histogram was built in which certain number of highest peaks were detected that were assumed to belong to different objects in the image. The highest peak being the largest homogeneous area i.e., background. It was assumed that quantized colour vectors belonging to same peak were part of same coloured object. After identification of peaks, each quantized colour vector was associated with a single peak using euclidean distance and final images were constructed. Microscopy and remotely sensed images of Washington DC’s National Mall were used on which various iterations of proposed methodology were performed. The classification results were dependent on the dimension of diffusion space whose optimal value selection was yet to be studied by the authors.

In 2018, Jain et al. [46] proposed classification of HSI and trained the important features by optimizing the SVM using Self Organizing Maps (SOM). They classified the interior and exterior pixels using the posterior probabilities. SOM is data compression technique in which the incoming signal/pattern of any dimension is reduced to 1D or 2D lattice using competitive learning of neurons. In their approach the input images were converted to grayscale, and ROI were selected over which SOM algorithm was applied to properly group together the pixels in terms of features and intensity levels. The SOM training algorithm provided inputs and weights to each edge of the image. On the basis of neighbourhood Best Matching Unit (BMU) using Euclidean distance, each neighbouring node’s weights were updated iteratively. It brought them closer to the input pattern. For classification of interior and exterior pixels, posterior probabilities and an optimal threshold were computed. If the probability of a pixel was greater than the threshold, then the pixel belonged to the interior of the particular class else it belonged to the boundary of certain class. The experiment was performed on Indian Pines and Pavia University dataset where it outperformed other baseline methods achieving highest accuracy of 85.29% and 95.46%, respectively.

Band reduction techniques would reveal nonlinear properties but at the expense of losing orginal data’s representation. To address the same, Ahmad et al. [47] in 2019 used non-linear Unsupervised, non-segmented and segmented Denoising Autoencoder(UDAE)-based b method for improving the classification of HSI. For segmented UDAE, the HSI cubes were segmented spatially-based on the pixel locations and further processing of segmented HSI images was done spectrally by autoencoder. The experiment was performed on Pavia Centre, Pavia university and Salinas valley dataset where the proposed methodology achieved highest accuracy using SVM.

3.4.2. Semi-Supervised

In 2016, Romaszewski et al. [48] proposed a co-training approach-based on P-N learning scheme inspired by the Tracking-Learning-Detection framework (TLD) used to track the objects in videos. In P-N scheme, two independent learners P and N were used that scored the unlabeled samples in different feature spaces and extended the training set. P-expert assumed same class for spatially close pixels-based on region growing. The score function was estimated using Gaussian Kernel Density Estimation that used distance from known samples (seeds). N-expert assumed the same class for pixels with similar spectra and was defined as a Nearest Neighbor classifier (NN) having a rejection score for pixel i. It identified the n-closest spectral neighbours from the seeds and spectral Euclidean distance was computed between the pixel i and pixel j. The score formula was-based on the probability estimation with the distance-weighted KNN rule. The scores from both the expert were combined. Spectral classification was performed for unlabeled pixels that could not be labeled using region growing due to disjoint regions. They applied the approach on six data sets: the Indian Pines, Salinas Valley, University of Pavia, La Selva Biological Station and Madonna, Villelongue, France. The method achieved highest classification accuracy in comparison with various state of the art approaches.

3.4.3. Supervised

In 2016, Li et al. [49] used dual -layer supervised Mahalanobis distance kernel for HSI classification. The traditional unsupervised approach was modified using supervised Mahalanobis matrix to obtain a new kernel using relativity information of the various materials present in the images. The proposed approach was executed in two steps where firstly, the traditional Mahalanobis matrix was used to map the raw data. Then using the mapped data, difficult-to-identify classes from the various classes were selected and second mahalanobis matrix was learned using this particular data only. A new mahalanobis kernel was formed using the combination of these two matrices. In the end, on this dimensionally reduced data, SVM was used achieving high performance on the Indian Pines, Salinas valley and Pavia university dataset. It resolved the drawback of traditional Mahalanobis distance metric learning, which learned a matrix without taking into accounts the weights of each class.

Nhaila et al. [50] performed supervised classification of HSI in 2019 using SVM, KNN, RF and Linear Discriminant Analysis (LDA) with different kernels along with MI for dimension reduction. The features/bands were selected by computing the MI between the ground truth and each band. The subsets of bands were intialised with the band having highest MI value with ground truth. The average of last band and new candidate band built a reference map called as ground truth estimated. Finally, the candidate band was added to the subset if it increased the previous MI value between ground truth and the reference map. The experiment was performed on Indian Pines, Salinas valley and Pavia university dataset. SVM with RBF kernel and RF outperformed other learners.

The aforementioned supervised, semi-supervised and unsupervised dimension reduction-based classification techniques have been compared in Table 3.

3.4.4. Features Selection

In 2016, MKB framework was used by Qi et al. [54]. Kullback-Leibler (KL) distance-based kernel function was used to develop SVM. In this ensemble learning framework, KL-MKB used Adaboost strategy to learn MKL-based classifier. Optimum Index Factor (OIF) was employed for selection of informative features. The OIF value selected bands with most variance and least correlation. The work had a stable performance and gave higher accuracy of 85.89% in comparison with SVM, using a single fixed kernel and Simple MKL on Indian Pines dataset. In future, band clustering and selection could be used. Sparse MKL could be built for compact representation. The drawback was choosing an appropriate number of kernels which was a tradeoff between efficiency and accuracy. The number was chosen between 9 and 12.

In 2017, Yang et al. [55] too worked on representative band selection in HSI. The distances between spectral bands were computed using disjoint information. Bands were clustered using k-means and ‘K’ representative bands were selected from these clusters. The criteria for optimal selection was-based on minimizing the distances between bands inside the clusters and maximizing the gap between different representative bands. The disjoint information was calculated using joint entropy and MI of two spectral images. The proposed technique used KNN and SVM classifiers on the Indian Pines dataset and outperformed various state of the art techniques.

In 2018, Medjahed et al. [56] proposed feature selection in HSI as optimization problem by using a stochastic approach namely. Simulated annealing was used to optimize the objective function embedded with classification accuracy rate and relevance among features in terms of MI. The experiment was compared with existing feature selection approaches like Mutual Information (MI) Feature Selection, MI Maximization (MIM), Joint MI (JMI), Minimum Redundancy Maximum Relevance (MRMR) and Conditional MI Maximization (CMIM). The proposed work achieved highest accuracy rate of 88.75% having 10 features as compared to above techniques on the Pavia university dataset. Their study achieved highest OA of 91.47% as compared to the other classifiers in their literature on the same dataset. For Indian Pines dataset, the highest OA of 76.48% and AA of 71.72% was obtained in comparison with SVM, genetic algorithm and using 10 features of 20% training pixels.

Xie et al. [57] addressed the problem of dimensionality reduction in 2019 via features/bands selection that was information rich and less redundant. Improved Subspace Decomposition (ISD) and Artificial Bee Colony algorithm (ABC) were used. The correlation coefficients between adjacent bands were calculated. Local minima and spectral curve visualization helped in achieving the subspace decomposition of choosing m bands from the original n bands. Band subset selection was done where randomly k bands were chosen from each band subspace. It was optimized by the ABC algorithm with the help of ISD and maximum entropy. In the end, SVM was applied for the classification of the obtained optimized band subsets. The proposed work was implemented on Pavia University, Indian Pines and Salinas Valley datasets and achieved better performance than the various state of the art approached for features selection.

In 2019, Sellami et al. [58] focused on tackling the curse of dimensionality and limited number of training samples by selecting appropriate features/bands. Adaptive dimension reduction was used that seeked relevant bands with high discrimination, information, low redundancy. To extract spatial-spectral information, the spatial window includes features from neighbouring pixels. These were loaded into a semi-supervised 3-D CNN with convolutional encoder-decoder layers for 3-D convolution and max-pooling. The categorization map was created using a linear regression classifier. The investigation was carried out using data from Indian Pines, Pavia University, and Salinas Valley. In comparison to other recent techniques, the suggested study attained the highest OA for all datasets.

Elzaimi et al. [59] used a filter-based approach using information gain function to reduce the dimensionality in 2019. The bands were chosen-based on their interaction and complimentarity. Classification was performed using SVM. The algorithm selected the discriminative bands using an evaluation of interaction gain that maximised the compromise of the MI between the ground truth and the selected band. The average of the interaction information helped in controlling the redundancy. The selected bands subset was initialized with a band that had highest MI with class label that served as ground truth estimated. Iteratively, candidate bands were added by computing their MI with ground truth. Their information gain was calculated-based on the mean interaction information between the candidate bands, ground truth and the estimated ground truth. The band that maximized the information gain criterion was chosen in each step. The experiment was performed on two benchmark hyperspectral datasets Indian Pines and Pavia University and compared with other band selection algorithms like MI Feature Selection, Minimum Redundancy Maximum Relevance (MRMR) method and MI-based Filter approach (MIBF). The proposed work achieved highest OA of 95.25% and 96.83% in Indian Pines and Pavia University dataset, respectively.

In 2020, Sawant et al. [60] proposed meta-heuristic-based optimization method of bands selection using Modified Cuckoo Search algorithm (MCS). Initially, Chebyshev chaotic map was used in the algorithm to initialize the nest locations (solutions). This ensured non-repetition of generation of similar bands. Fitness value and current iteration number were used to update iteratively the step size and a scaling factor of the Levy Flight method. It generated new solutions (bands) in every iteration. These two modifications in the standard Cuckoo Search algorithm gave MCS and helped in escaping from local optimum. They used wrapper-based selection method due to which accuracy was checked by involving the classifier in every iteration. Global best solution was obtained in the end. The proposed technique outperformed standard CS algorithm and achieved the maximum OA of 95.10% for Pavia University dataset, and 86.92% for Indian Pines dataset.

To reduce complexity of numerous spectral bands, Zhu et al. [61] used Affinity Propagation (AP) clustering algorithm. An improved AP was used where subsets were created inside the clusters, the information entropy was combined to change the availability matrix and create clusters with arbitrary shapes. It achieved an OA of 91.5% on Salinas Valley.

The aforementioned features selection-based classification techniques have been compared in Table 4.

3.4.5. Features Extraction

In 2016, Imani et al. [64] used Binary Coding-based Feature Extraction (BCFE) in HSI data. The spectral signature of every pixel was divided into equal segments. In each segment the features were extracted using weighted mean of the spectral bands. A new method for calculation of weights was used in BCFE where the class means’ binary codes were obtained. The information present in the positive or the negative edges and the binary values of the class means in each band were used to calculate the weight of each spectral band. The proposed work employed SVM and maximum likelihood classifiers. It achieved better results than various other features extraction techniques on Indian Pines, Pavia University and Kennedy Space Center.

In 2017, Qi et al. [65] used MKB framework-based on Particle Swarm Optimization (PSO). Features were extracted using standard deviation, correlation coefficient and KL divergence. PSO with mutation mechanism was chosen to select the best parameters for SVM. The proposed mechanism outperformed many state of the art approaches by achieving OA of 88.02% and 95.81% on the Indian Pines and Pavia University, respectively. The methodology performed faster than single kernel methods but slower than mixture kernel methods. Computational costs and efficiency needs to be improved.

Ksieniewi et al. [66] proposed a novel pipeline in 2018 of features extraction and classification of HSI in which the statistical properties of the images were extracted that were embedded into feature space having 14 features/channels for dimension reduction that were input into the ensemble learning-based on randomized neural networks. The ensemble of ELMs were used for randomized feature subspaces and trained combiner. For statistical features of images, initially, edge detection was performed followed by generation of filter to drain out the noisy pixels and then features from filtered images like red channel, green channel, blue channel, minimum value, highest value, median, mean of spectral signature were extracted. The ensemble of ELMs were formed on the basis of Random Subspace Method (RSM) where the image having d- dimensional feature vector in given feature space F the base classifiers were constructed using r features where r < d and randomly selected from F. For trained combination of ELMs, weighted classifier combination was used and the continuous outputs of ELMs for different classes, were considered as support values in form that classifier supports that a particular object belongs to class m and final class to be outputted by ELM was-based on maximum rule (winner takes all). Perceptron-based combiner was used to compute the weights assigned to each class and the classifier. The experiment was performed on Salinas Valley, Salinas A (Subset of Salinas Valley), Indian Pines, Pavia Centre and university, Botswana and Kennedy Space center dataset and compared with single model ELM, SVM with one-versus-all strategy and one-versus-one strategy, where the proposed method achieved higher accuracy.

Qiao et al. [67] used joint bilateral filtering and spectral similarity-based Joint Sparse Representation Classification (SS-JSRC). Extracted the spatial features via joint bilateral filter on first PC image. SS-JSRC filtered out the pixels of different classes within the neighborhood window of every pixel. The proposed work gave better results than SRC, JSRC and NLW-JSRC. The OA of the proposed method on Indian Pines dataset achieved was 98.13% which was higher than other state of the art approaches. Similarly, for Pavia University dataset, highest OA was achieved of 99.76%. The work could be improved using saliency-based algorithms, weakly supervised learning, histogram of sparse codes.

Paul et al. [68] used MI-based S-SAE method in 2018. MI is a dependency measure between bands. 1 indicates high dependency while 0 indicates independent bands. Non-parametric MI-based spectral segmentation was performed. Local features of each segment were extracted using S-SAE. MPs of the segmented spectral features gave spatial information. The experiment was performed on 10%, 5% and 10% training samples of each class of the Indian Pines, Pavia University and Botswana dataset. SVM with Gaussian kernel gave better performance in classification of Pavia University and Botswana datasets. Random Forest classified Indian Pines dataset better. It overcame the limitation of time consuming and complex SAE-based features extraction method. The methodology performed well even for limited number of samples. In future, various other non-linear feature extraction methods like kernel PCA could be used with the proposed method. DL models could be assimilated for spectral-spatial classification.

The comparative study of aforementioned features extraction-based classification techniques is presented in Table 5.

4. Deep Learning-Based Classification

Recently, peers have explored DL-based classification techniques tremendously. Their research has been discussed below and a general mechanism of DL is depicted in Figure 10.

In 2010, Ratle et al. [73] proposed semi-supervised neural network-based framework to deal with limited samples of HSI. They added a flexible embedding regularizer to the loss function and some additional balancing constraints to Stochastic Gradient Descent (SGD) to avoid local minima problem. On comparison with methods like k-means, Laplacian SVM, Transductive SVM, their approach had better accuracy and scalability.

Lin et al. [74] used autoencoder in 2013 with different depths to classify HSI. PCA for spectral dimension reduction was used while autoencoders extracted spatial features. Single-layer autoencoder extracted spectral features and classified them using SVM. For deep representation, stacked autoencoder was used with logistic regression classification. PCA was introduced in second layer for dimension reduction. The experiment was performed on Kennedy Space Centre and Pavia University. Low error rates were recorded.

In 2015, yue et al. [75] utilised both spatial and spectral features using novel DL framework. Their framework was a hybrid of Deep CNNs (DCCNs), PCA and Logistic Regression. The spectral feature maps were generated using a mathematical algorithm over which PCA was applied to generate joint spectral spatial information. The authors used this joint information over which DCNNs and LR were applied to extract high level features and fine tune the model. They achieved highest OA of 95.18% on Pavia University.

Hu et al. [76] got inspired from application of CNN on 2D images in 2015 and applied the same in the spectral domain of HSIs. They used 1-D CNN with five layers consisting of input, convolution, max pooling and fully connected layers. It helped in discriminating each spectral signature amongst others. Their 5-layer architecture of CNN achieved better accuracy than traditional SVM, 2-layer Neural Network and LeNet-5 architecture.

Chan et al. [77] proposed a DL-based network in 2015. It consisted of basic processing components. Cascaded PCA to learn multistage filter banks, binary hashing and blockwise histograms for indexing and pooling. This net was called PCANet. It was applied to benchmark visual datasets for digit and face recognition. PCANet served as an effective baseline where more advanced processing components or more sophisticated architectures could be justified.

DL has been extensively used for HSI analysis and classification. But high quality labeled samples are needed for DL to be utilised efficiently. In 2016, Liu et al. [78] tackled this challenge using weighted incremental dictionary learning on which active learning-based algorithm was developed. They selected only those training samples which improved the selection criteria namely uncertainty and representative. This trained deep network on how and which samples to select at each iteration for training. Their approach achieved accuracy of 92.4% and 91.6% on Pavia University and Botswana dataset, respectively.

In 2016, Chen et al. [79] dealt with the challenges of limited training samples and high dimensionality using regularized deep feature extraction method. To obtain better spectral spatial features, the authors employed 3D CNN. They also applied L2 regularization and dropout techniques to overcome overfitting. The authors improved the CNN performance by also using virtual samples. These were generated by multiplying a random factor with training samples and added noise. Their work achieved an OA of 97.56%, 99.54% and 96.31% on Indian Pines, Pavia University and Kennedy Space Centre dataset, respectively. In future, a post processing methodology could help in further improvements in classification.

Dimension reduction and features were extracted by Zabalza et al. [80] in 2016, using Segmented Stacked Autoencoders (S-SAE). With S-SAE, the spectral segmentation of the pixels was performed. The original features were confronted into smaller segments of data processed separately by smaller and local SAEs on the segmented spectrum. The complexity was highly reduced with the proposed method. It achieved better accuracy in segmentation and classification of the scenes in Indian Pines and Centre of Pavia dataset. The work could be extended using saliency detection methods, adaptive sparse representation and weakly supervised learning. The major drawback was not extracting the spatial features.

To deal with highly correlated bands and limited samples Yu et al. [81], proposed CNN in 2017 which dealt with raw HSI input in an end to end manner. Also, they used small training dataset to optimise the parameters of CNN which helped with the problem of overfitting. To deal with HSI information 1 × 1 convolutional layers were adopted. Their approach obtained high OA of 64.19% on Indian Pines, 67.85% on Pavia University and 85.4% on Salinas Valley using 3 labelled samples per class using training.

In CNN, a lot of parameters are needed and hence more training samples are desired for the convolution filters. But due to limited samples of HSI, overfitting happens which gives overoptimistic results. Addressing these concerns, Chen et al. [82] focused on reduction of feature extraction of CNN by using Gabor filters which extracted spatial information, edge information and textural features. They combined convolution filters with gabor filters. Grid search was also used to find parameters for gabor filters. On comparison with traditional methods like SVM, CNN with PCA and simple CNN, their approach achieved highest OA and AA.

Deep CNN was used to reconstruct images and enhance their spatial features by Yunsong et al. [83]. Each band was normalized in the range [0,1]. The spatial features of different classes that had similar characteristics were enhanced to avoid spectral distortion. PCA was performed to extract PC images. First PC image was chosen as reference image due to high spatial information. Gray Level Co-Occurrence (GLCM) was used to extract spatial features like entropy, contrast, correlation, dissimilarity. GLCM features of bands were compared with the specific features of the first PC and used them in a ratio. The band with minimum value of ratio was selected as the training label. CNN model with optimized parameters as used to train the data. ELM was used for further classification. This combined framework gave high performance for lesser training samples of Indian Pines, Salinas Valley and Centre of Pavia dataset. Using image reconstruction helped in increasing the AA of ELM by as high as 30.04%. It performed faster than other state of the art classifiers.

Although earlier authors gave good performance with 1-D CNNs, but it resulted in information loss while representing HSI pixels as they are sequence-based datasets. Hence, Mou et al. [84] analysed the pixels using deep Recurrent Neural Network (RNN) with Parametric Rectified tanh (PRetanh) instead of regular activation functions used by others like tanh or rectified linear unit. With this approach, band to band variability and spectral correlation was understood well. It also helped them to learn with high learn rates without risk of divergence in the training period. The authors reduced the number of parameters by using gated recurrent unit to build their network. These units used Pretanh for hidden representation and efficiently processing HSI. Their approach outperformed traditional methods like SVM, RF and CNN.

In 2018, Zhang et al. [85] used CNN framework encoded with semantic information which was context aware. Their approach had more discriminative power due to diverse region-based inputs. Their model had different branches of CNN with each branch representing different regions for pixel under inspection. Unlike traditional square window across a pixel, they extracted six regions namely, right, left, top, bottom, whole region and local region of a pixel with flexible shapes of patches. They also extracted deep spectral spatial features using a multi-scale summation module which dealt with limited training samples, enhanced learning capability and improved generalization. An accuracy of 98.54% and 98.33% was recorded for Indian Pines and Salinas Valley, respectively.

Although, earlier many joint spectral spatial representations of features of limited samples were done, but those were not very generic and robust. Deng et al. [86] built a unified deep network in combination with Active Transfer learning (ATL). Initially, the authors extracted joint spectral—spatial features using Stacked Sparse AutoEncoders (SSAE). With the help of ATL, they transferred the pre-trained SSAE network and limited training samples from a source domain to a target domain. The SSAE network was correspondingly fine tuned using limited samples from both source and target domain using active learning strategies. They obtained highest OA of 99.61% and 99.86% after transferring the samples from Pavia university to Pavia Centre dataset and vice versa, respectively.

HSI classification is improved by fusing spectral-spatial information. Taking advantage of the same, Liang et al. [87] extracted deep multi-scale spectral spatial features for HSI and named the framework as DMSF. They transferred the filter banks of VGG16 model which learned about the spatial structure of HSI. They fused these deep spatial features with raw spectral information using sparse autoencoders. They obtained the final discriminative features by a weighted fuse of these spectral spatial features in VGG16. Their proposed algorithm was classified using SVM and obtained high accuracy.

Wang et al. [88] focused on improving the training time and accuracy for classification of HSI. The traditional methods used hand crafted features and needed improvement in accuracy. To solve this, they developed end to end Fast Dense Spectral-Spatial Convolution framework (FDSSC). They did not rely on PCA or any other feature extractors. In FDSSC, they used “valid” convolutions of different sizes to extract spectral spatial features and reduce dimensions. To achieve highly accurate results, they used densely connected layers where each previous layer of neurons had a contribution in next layers. Authors resorted to dynamic learning rate, parametric Rectified Linear Unit (ReLU) activation, batch normalisation and dropout layers for more speed and reduce overfitting. This helped authors to achieve high performance within 80 epochs.

Yang et al. [89] exploited the success of CNN in HSI classification in 2018. They used spectral and spatial information both and built different models like 2D CNN, 3D CNN, Recurrent 2D CNN (R-2D CNN) and R-3D CNN. Their models converged faster in comparison with traditional methods like CNN and SVM. Although, their models were superior yet those needed more training samples than other methods. Incorporating prior domain knowledge of dataset and transfer learning could help improve performance more.

Pan et al. [90] used PCANet as the foundation, where multi-grain and semi-supervised information were integrated. A multi-grained network called MugNet was used. It was a simplified DL model to deal with less samples of training data. In each grain, there was a DL model. Classification results were obtained via ensemble approach. MugNet was built with three strategies to enhance the classification accuracy. First, multi-grained scanning approach, to utilize the spectral relationships between the bands and the spatial correlation within the neighbouring pixels. This scanning strategy extracted the joint spatial-spectral information. In second strategy, the convolutional kernels were generated in semi-supervised manner. Lastly, it did not include any hyperparameters for tuning. The MugNet has two parallel branches: spectral MugNet and spatial MugNet. Their frameworks were-based on Semi-Supervised PCA Net (SSPCANet) that had 4 layers: 1 input, 2 convolutional layers and 1 output layer. SSPCANet used the unlabeled pixels for more representative convolutional kernels. The labeled pixels were used in training using SVM classifier. It obtained highest OA of 90.65%, 90.82% and 93.15% on Indian Pines, Grss_dfc_2013 and Grss_dfc_2014 datasets, respectively in comparison with other state of the art approaches. The computational efficiency needs to be improved. In future, MugNet could be transformed to a completely end-to-end manner.

Paoletti et al. [91] proposed a 3D CNN architecture to obtain spectral and spatial features of HSI and classified them using Graphics Processing Unit (GPUs). A border mirroring strategy was applied to process the border areas in the image. The images were divided into patches of dxdxn where d was the width and height of the neighbourhood window centered at a pixel and n were the number of spectral bands of original image. d/2 pixels of border were mirrored outwards so that they could be used like any other pixel in the image. The 3D patches were grouped into batches and sent to convolution layers. Four fully connected layers were used and cross entropy was the loss function of CNN. The experiment was performed on Indian Pines and Pavia University dataset using various values of parameter d. On comparison with 1D, 2D, 3D CNNs and Multi-Layer, it achieved highest accuracy for different values of parameter d. The classification accuracy was dependent on manual selection of parameters.

In 2018, Chen et al. [92] proposed a joint spatial and spectral features driven HSI classification. Image blocks containing local neighbourhood features gave spatial and spectral features were merged using the convolutional layers. The results were obtained from the fully connected layer and it outperformed other state of the art approaches. The proposed network was also combined with the SVM (RBF kernel) in some of the fully connected layers. Adaptive mechanism to select the spatial window size was proposed. For obtaining the features, the first convolution layer was Multi-scale features extraction layer that extracted features invariant of deformation and scaling. The second convolution layer, feature fusion layer merged the spatial and spectral features followed by features reduction convolution layer. The proposed network obtained an OA of 98.02% on Indian Pines dataset which was higher than other approaches. On combination with SVM, highest accuracy of 98.39% and 98.44% was obtained in the Indian Pines and Pavia University dataset, respectively. The best size selection for the adaptive window was done on the basis of confidence criterion where Conf(k) represented the possibility of input pattern being classified into kth class. The algorithm worked as follows: two random size of window A × A and B × B were chosen. When A > B, ‘m’ was the most possible class when window is A × A and ‘n’ being the second most possible class. If for A, Conf(n) < Conf(m) × theta then the output would be mth class. But if condition was not satisfied then window size B × B would give higher confident result and classify the input block into m’ th class. Adaptive window size selection helped in overcoming the problem of large window size that might contain many intersecting categories hence confusing the network. This proposed method improved the classification accuracy for HSI significantly.

Earlier classification techniques did not extract HSI features effectively. To address the same concern, Singh and Kasana [93] used deep features to classify HSI. The authors initially reduced the dimension to suppress data redundancy using Locality Preserving Projection (LPP). This processed data was forwarded to Stacked Auto Encoder (SAE) for deep feature extraction. Logistic regression was used and their work achieved an OA of 84.4% and 87.2% on Indian Pines and Salinas Valley, resp.

In 2019, Zhou et al. [94] used spectral-spatial LSTM networks shown in Figure 11, for the classification of HSI. The spectral values of each pixel in all the channels were fed into the Spectral LSTM (SeLSTM) as shown. Initially, the pixel vector having K number of bands was transformed into K-length sequence. This sequence was fed one by one into SeLSTM and the last output was fed to the SVM. 1st PC image, local patches centered at a pixel and the row vectors of each image patch were one by one fed into the spatial LSTM (SaLSTM). The rows of neighbourhood were converted into S-length sequence. Figure 12 and Figure 13 display structure of SeLSTM and SaLSTM, respectively. For classification, spectral and spatial features were obtained separately for each pixel. A decision fusion strategy was adopted to obtain joint features. For joint spectral-spatial classification, results of individual LSTMs were intuitively fused in weighted summation. The performance of SeLSTM, SaLSTM and SSLSTMs were compared with several methods, including PCA, LDA, non-parametric weighted feature extraction (NWFE), regularized local discriminant embedding (RLDE), matrix-based discriminant analysis (MDA) and CNN where their method improved the classification accuracy by at least 2.69%, 1.53% and 1.08% on Indian Pines, Pavia University and Kennedy Space Centre dataset, respectively.

In 2019, Fang et al. [95] also extracted deep spectral spatial features at different patch scales using 3D dilated convolutions. All the feature maps were densely connected with each other. To obtain more distinguishing and less redundant spectral features, the authors also built spectral-wise attention mechanism(SA) which used soft weights for features. It achieved an OA of 86.62% on Indian Pines and 92.99% on Pavia University.

Earlier researches implementing ELM did not deal with insufficient samples efficiently. To address the same, Liu et al. [96] in 2020 implemented ELM-based ensemble transfer learning. The learners of the target domain helped in determining whether the source dataset was useful or not. They retained biases and weights learned of the ELM in target domain and utilised the instances of the source domain to iteratively update the output weights of ELM. These weights were used by the authors for the training models which were further ensembled using the same. In this manner, they used source data to improve the ability of the learner in target domain. They used Pavia University and Pavia Centre interchangeably as source and target domains to check efficiency of their approach.

Ramamurthy et al. [97] tried to reduce computational complexity by denoising and reducing dimensions of HSI. Initially, they recognised edges of images through image denoising and David Marr edge recognition with Canny edge detector. Further, they segmented HSIs into pixels, reconstructed them and optimised the reconstruction loss. The HSI were denoised again using AutoEncoders and dimension was reduced using PCA. In the end, they obtained classification results using CNN. They obtained high OA of 92.5% on Pavia University dataset.

Sharifi et al. [98] also focused on extracting spectral spatial features of HSI. Earlier, gabor filters were used to extract shallow texture features and fed into DL model. The authors aimed to improve the performance and hence extracted two stage textural features. The authors applied PCA, afterwards extracted gabor features and took mean of them in all directions in each scale. Then they obtained LBP of these gabor filters which were more discriminative than gabor features and LBP alone. They stacked these features and used 3D CNN for classification. Their work recorded OA of 97.72% on Indian Pines dataset.

Cao et al. [99] proposed a new architecture for CNN termed as 3D-2D SSHDR. It was an end to end hybrid dilated residual networks. 3D hyperspectral cubes were the input. 3D-2D SSHDR contained five parts, i.e., spectral feature learning process, 3D to 2D deformable part, spatial feature learning process, an average pooling layer, and a fully connected layer. The 3D spectral residual blocks learned discriminant spectral features. For spatial feature learning, the extracted spectral features of 3D images were converted into 2D features map. To continue learning discriminative spatial features, hybrid dilated convolution (HDC) residual blocks were used that increased the receptive field of the convolution kernel. It did not increase any other parameters The proposed network was trained using supervised learning. The experiment was applied on Indian Pines, Kennedy Space center and Pavia University datasets achieving high OA of 99.46%, 99.89% and 99.81%, respectively as compared with other models of CNN. The spatial features had not been extracted in 3D. Also, in future transfer learning could help to extend samples and improve accuracy.

Nalepa et al. [100] proposed resource frugal quantized spectral CNN. The weights/ activations were represented in compact format like integer or binary numbers without affecting the classification process. They utilized multi-stage quantization aware training. The deep model was trained in full precision followed by fake quantization and trained again before being quantized to final low-bit version. Fake quantization was used as intermediate step to simulate the quantization of weights/activations. The experiment was performed on Pavia University and Salinas Valley. This model, four times smaller in size than the original counterparts segmented equally well. It helped to reduce the memory footprint of large-capacity model to classify the HSI. Varying the quantization levels could help understand abilities of DL model better.

Vaddi et al. [101] worked on data normalization and CNN-based classification of HSI. The normalization was performed by downsizing pixel scalar values by dividing them with the maximum pixel intensity value. Probabilistic PCA was used to extract spectral features. Gabor filter helped in acquiring the spatial features. Both the spatial and spectral information were integrated to form fused features used by CNN. The experiment was performed on Indian pines, Salinas valley and Pavia University dataset where the proposed approach gave highest accuracy as compared to other state of art approaches. The running time of the propose approach needs to be improved.

Various deep neural network models were used by Jiao et al. [102] for HSI classification. In first approach, multi scale spatial features were extracted using convolution network-based on VGG-verydeep-16. It contained 13 convolutional layers, five pooling layers, three fully connected layers and activation and dropout layers. The deep scale spatial features were fused with spectral features using weighted fusion method and z-score. It was used to segment the scenes and obtained pixel-based classification results on Indian Pines dataset. In second approach, Recursive Autoencoders were employed. It formed high level spatial spectral features from the original data. It learned local homogeneous area of the image using the pixel under investigation. The spatial features of the pixel were learned using weighting scheme-based on the neighbouring pixels. The weights were determined using the spectral similarity between the investigated pixel and neighbouring pixels. Unsupervised RAE was employed on Pavia University dataset achieving an accuracy of 99.91%. Third approach involved Superpixels-based Multi Local CNN (SML-CNN). Superpixels were formed using a linear iterative clustering algorithm. Multiple local regions of superpixels were jointly represented namely original, central and corner regions. It gave different semantic environment of each superpixel even if there was spectral similarity. Features were fused from the same. The classification was improved using multi-information modification strategy to eliminate the errors by combining semantic (superpixel level) and detailed information (pixel level). The proposed algorithm achieved a good accuracy.

Sharifi et al. [98] extracted complex spatial features using multi-scale CNN where patches of different sizes were used. The spatial features were proved to improve the classification performance. Hence, the authors included spatial features obtained from gabor filters, morphological operations and LBP. All these features were fused with PCA’s spectral features at the decision level for classification. It achieved an OA of 97.98% and 99.44% on 1% and 5% training samples from each class.

Due to radiometric and atmospheric corrections, many informative bands would be lost. In 2021, Singh and Kasana [103] performed a different spectral-spatial classification by approximating lost noisy bands. They used linear interpolation to gain approximated bands. Further, they reduced spectral dimension and obtained spatial features through a combination of LPP and PCA. The features were classified using deep network alongwith SAE. The work achieved an OA of 88.9%, 93.3%, 91% and 91.5% on IP, Sa, KSC and PU, resp. The recent DL classification techniques discussed above have been compared in Table 6.

5. Discussion

After an extensive survey of spectral, spatial and spectral-spatial features-based classification of HSI, following insights have been observed.

Majorly, land cover HSI datasets have been covered in this work. Indian Pines and Pavia University are the commonly used dataset for classification as depicted in Figure 14. Figure 15 displays the highest and lowest OA achieved by different classification techniques in the survey.
In traditional ML, kernel-based techniques have been employed for landcover images. Table 1 shows the greatest OA of 99.5%, obtained with Shape adaptable kernels. It incorporated spectral and spatial features, which helped to increase performance. The main disadvantage of mathematical kernel is calculations overhead.
SVM classifier, a kernel-based classifier, has been widely used for land cover images. The highest performance was an accuracy of 98.68%. SVM classifier improves classification results when combined with a spatial Gaussian filter.
The transform-based techniques aid in the denoising and compression HSI. Table 2 demonstrates the highest OA with SVM on benchmark landcover photos of 99.0% and 99.82 percent using Adaboost modelling to detect bruising in fruits.
PCA has been commonly utilised as a data pre-processing step in traditional ML approaches. It aided in the elimination of unnecessary spectral data.
Many classification methods include dimension reduction techniques as pre-processing steps. However, we have explicitly included a few different strategies, such as supervised, unsupervised, feature selection, and extraction, to emphasise their performance. Table 5 demonstrates that the land cover image with bilateral filtering and spectral similarity calculated and used in sparse representation classification and had the greatest OA of 99.76%.
DL techniques have heavily invaded into the research for HSI. It has shown better performances due to in-built features processing and convolution kernels to deal with complex HSI data. The resource frugal networks for land cover image achieved the highest OA of 99.89% as evident in Table 7. However, the data partitioning remains a challenge for HSI. Due to limited samples, training and testing data overlaps and exaggerated results are recorded.

The purpose of this paper is to explore how well various categorization techniques performed for HSI analysis. Some authors employed either spectral or spatial data, however in recent papers, the emphasis has changed to both spectral and spatial data. In terms of OA, Table 7 demonstrates significant differences between classic ML and DL approaches. Although the OA of both algorithms is comparable, DL outperforms due to its automatic feature development and robustness in dealing with complex HSI.

6. Conclusions and Future Scope

6.1. Conclusions

Exploration of HSI has created a new path in the field of research. With numerous real world applications, its efficient classification is of utmost importance. Imaging hundreds of spectral bands has certain advantages over multispectral and RGB imaging but still there are fair share of limitations and disadvantages. The prime drawbacks are enlisted below: There needs to be an improvement in dealing with the challenges that HSI analysis brings with it as enlisted below:

High cost and data complexity.
Even though high spectral information is available, the low spatial resolution offers irregularities and difficult interpretations.
Vast number of continuous spectral channels also gives birth to redundant and less informative bands.
The dataset available have limited labelled training samples.
With lesser number of samples and huge number of spectral bands, Hughes Phenomena occurs in HSI. In this, with increasing bands and data, the classification performance increases initially but decreases gradually.
Target detection also remains one of HSI’s significant challenges, as the inherent variability in target and background spectra poses a severe obstacle to developing effective target detection algorithms for HSI. This may be due to the problem of unknown backgrounds or shortage of sufficient target data, making it more challenging and becoming a problem to be solved by more sophisticated techniques.

6.2. Future Scope

The exhaustive survey brought into light the existing classification techniques practiced on HSI and compared their performances. Keeping the limitations and challenges of HSI in view, below discussed techniques can put the researchers in a newer and brighter side of HSI analysis.

Meta-Learning: Learning how to learn is the spine of meta-learning. It constructs algorithms that combine the predictions of other models and apply them to identical datasets. The prospect of meta-learning is an untapped area of research for HSI classification.
Some different datasets: Existing research and improved techniques need to be tested on newer datasets such as Berlin, RIT-18 Remote Sensory Dataset and others. This provides robust and more general techniques for classification.
ELM: ELM is a new area to be explored for HSI that might handle overfitting better and slow training pace.
Research in automatic selection and optimization of parameters for SVM, dimensionality reduction, DL, and other techniques demands an efficient evolutionary technique or genetic algorithms, as this is still an open research area with enormous possibility for refinement.

Author Contributions

All the authors made significant contributions to this work. Conceptualization, S.S.K. and G.K.; Writing—original draft preparation, R.G.; Writing—revision and editing, R.G., S.S.K. and G.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://rslab.ut.ac.ir/data.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

Huete, A.R. Vegetation indices, remote sensing and forest monitoring. Geogr. Compass 2012, 6, 513–532. [Google Scholar] [CrossRef]
Khan, M.J.; Khan, H.S.; Yousaf, A.; Khurshid, K.; Abbas, A. Modern trends in hyperspectral image analysis: A review. IEEE Access 2018, 6, 14118–14129. [Google Scholar] [CrossRef]
Leiva-Valenzuela, G.A.; Lu, R.; Aguilera, J.M. Prediction of firmness and soluble solids content of blueberries using hyperspectral reflectance imaging. J. Food Eng. 2013, 115, 91–98. [Google Scholar] [CrossRef]
Liu, Z.; Wang, H.; Li, Q. Tongue tumor detection in medical hyperspectral images. Sensors 2011, 12, 162–174. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, L.; Wang, J.; Huang, W.; Zhao, C.; Zhang, B.; Tong, Q. Improving winter wheat yield prediction by novel spectral index. Trans. CSAE 2004, 20, 172–175. [Google Scholar]
Kutser, T.; Paavel, B.; Verpoorter, C.; Kauer, T.; Vahtmäe, E. Remote sensing of water quality in optically complex lakes. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 39, B8. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Zhang, L.; Du, B. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geosci. Remote Sens. Mag. 2016, 4, 22–40. [Google Scholar] [CrossRef]
Gogineni, R.; Chaturvedi, A. Hyperspectral image classification. In Processing and Analysis of Hyperspectral Data; IntechOpen: London, UK, 2019. [Google Scholar]
Gu, Y.; Chanussot, J.; Jia, X.; Benediktsson, J.A. Multiple kernel learning for hyperspectral image classification: A review. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6547–6565. [Google Scholar] [CrossRef]
Rani, A.; Kumar, N.; Kumar, J.; Sinha, N.K. Machine learning for soil moisture assessment. In Deep Learning for Sustainable Agriculture; Elsevier: Amsterdam, The Netherlands, 2022; pp. 143–168. [Google Scholar]
Lakshmi, T.V.H.; Madhu, T. Satellite Image Resolution Enhancement Using Discrete Wavelet Transform and Gaussian Mixture Model. Int. Res. J. Eng. Technol. IRJET 2015, 2, 95–100. [Google Scholar]
Maduranga, U. Dimensionality Reduction in Data Mining. 2020. Available online: https://towardsdatascience.com/dimensionality-reduction-in-data-mining-f08c734b3001 (accessed on 25 December 2022).
Gu, Y.; Liu, H. Sample-screening MKL method via boosting strategy for hyperspectral image classification. Neurocomputing 2016, 173, 1630–1639. [Google Scholar] [CrossRef]
Fang, L.; He, N.; Li, S.; Ghamisi, P.; Benediktsson, J.A. Extinction profiles fusion for hyperspectral images classification. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1803–1815. [Google Scholar] [CrossRef]
Li, L.; Wang, C.; Li, W.; Chen, J. Hyperspectral image classification by AdaBoost weighted composite kernel extreme learning machines. Neurocomputing 2018, 275, 1725–1733. [Google Scholar] [CrossRef]
Li, F.; Lu, H.; Zhang, P. An innovative multi-kernel learning algorithm for hyperspectral classification. Comput. Electr. Eng. 2019, 79, 106456. [Google Scholar] [CrossRef]
Li, D.; Wang, Q.; Kong, F. Adaptive Kernel Sparse Representation Based on Multiple Feature Learning for Hyperspectral Image Classification. Neurocomputing 2020, 400, 97–112. [Google Scholar] [CrossRef]
Gao, Y.; Cheng, T.; Wang, B. Nonlinear Anomaly Detection Based on Spectral-Spatial Composite Kernel for Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1269–1273. [Google Scholar] [CrossRef]
Wang, Y.; Yu, W.; Fang, Z. Multiple kernel-based SVM classification of hyperspectral images by combining spectral, spatial, and semantic information. Remote Sens. 2020, 12, 120. [Google Scholar] [CrossRef] [Green Version]
Ma, K.Y.; Chang, C.I. Kernel-based constrained energy minimization for hyperspectral mixed pixel classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–23. [Google Scholar] [CrossRef]
Ansari, M.; Homayouni, S.; Safari, A.; Niazmardi, S. A New Convolutional Kernel Classifier for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11240–11256. [Google Scholar] [CrossRef]
Krishna, S.L.; Jeya, I.; Deepa, S. Fuzzy-twin proximal SVM kernel-based deep learning neural network model for hyperspectral image classification. Neural Comput. Appl. 2022, 34, 19343–19376. [Google Scholar] [CrossRef]
Wang, A.; Xing, S.; Zhao, Y.; Wu, H.; Iwahori, Y. A hyperspectral image classification method based on adaptive spectral spatial kernel combined with improved vision transformer. Remote Sens. 2022, 14, 3705. [Google Scholar] [CrossRef]
Dalla Mura, M.; Villa, A.; Benediktsson, J.A.; Chanussot, J.; Bruzzone, L. Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis. IEEE Geosci. Remote Sens. Lett. 2010, 8, 542–546. [Google Scholar] [CrossRef] [Green Version]
Licciardi, G.; Marpu, P.R.; Chanussot, J.; Benediktsson, J.A. Linear versus nonlinear PCA for the classification of hyperspectral data based on the extended morphological profiles. IEEE Geosci. Remote Sens. Lett. 2011, 9, 447–451. [Google Scholar] [CrossRef]
Dópido, I.; Li, J.; Marpu, P.R.; Plaza, A.; Dias, J.M.B.; Benediktsson, J.A. Semisupervised self-learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4032–4044. [Google Scholar] [CrossRef] [Green Version]
Zhong, S.; Chang, C.I.; Zhang, Y. Iterative support vector machine for hyperspectral image classification. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 3309–3312. [Google Scholar]
Pathak, D.K.; Kalita, S.K.; Bhattacharya, D.K. Hyperspectral image classification using support vector machine: A spectral spatial feature based approach. Evol. Intell. 2022, 15, 1809–1823. [Google Scholar] [CrossRef]
Li, R.; Cui, K.; Chan, R.H.; Plemmons, R.J. Classification of hyperspectral images using SVM with shape-adaptive reconstruction and smoothed total variation. arXiv 2022, arXiv:2203.15619. [Google Scholar]
Akbari, H.; Kosugi, Y.; Kojima, K.; Tanaka, N. Wavelet-based compression and segmentation of hyperspectral images in surgery. In Medical Imaging and Augmented Reality, Proceedings of the International Workshop on Medical Imaging and Virtual Reality, Tokyo, Japan, 1–2 August 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 142–149. [Google Scholar]
Chen, C.; Guo, B.; Wu, X.; Shen, H. An edge detection method for hyperspectral image classification based on mean shift. In Proceedings of the 2014 7th International Congress on Image and Signal Processing, Dalian, China, 14–16 October 2014; pp. 553–557. [Google Scholar]
Quesada-Barriuso, P.; Argüello, F.; Heras, D.B. Spectral–spatial classification of hyperspectral images using wavelets and extended morphological profiles. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1177–1185. [Google Scholar] [CrossRef]
Prabhakar, T.N.; Geetha, P. Two-dimensional empirical wavelet transform based supervised hyperspectral image classification. ISPRS J. Photogramm. Remote Sens. 2017, 133, 37–45. [Google Scholar] [CrossRef]
Ji, Y.; Sun, L.; Li, Y.; Ye, D. Detection of bruised potatoes using hyperspectral imaging technique based on discrete wavelet transform. Infrared Phys. Technol. 2019, 103, 103054. [Google Scholar] [CrossRef]
Anand, R.; Veni, S.; Aravinth, J. Robust classification technique for hyperspectral images based on 3D-discrete wavelet transform. Remote Sens. 2021, 13, 1255. [Google Scholar] [CrossRef]
Xu, J.; Zhao, J.; Liu, C. An Effective Hyperspectral Image Classification Approach Based on Discrete Wavelet Transform and Dense CNN. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Miclea, A.V.; Terebes, R.M.; Meza, S.; Cislariu, M. On Spectral-Spatial Classification of Hyperspectral Images Using Image Denoising and Enhancement Techniques, Wavelet Transforms and Controlled Data Set Partitioning. Remote Sens. 2022, 14, 1475. [Google Scholar] [CrossRef]
Ji, Y.; Sun, L.; Li, Y.; Li, J.; Liu, S.; Xie, X.; Xu, Y. Non-destructive classification of defective potatoes based on hyperspectral imaging and support vector machine. Infrared Phys. Technol. 2019, 99, 71–79. [Google Scholar] [CrossRef]
Cao, X.; Yao, J.; Fu, X.; Bi, H.; Hong, D. An enhanced 3-D discrete wavelet transform for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1104–1108. [Google Scholar] [CrossRef]
Zikiou, N.; Lahdir, M.; Helbert, D. Hyperspectral image classification using graph-based wavelet transform. Int. J. Remote Sens. 2020, 41, 2624–2643. [Google Scholar] [CrossRef] [Green Version]
Manoharan, P.; Boggavarapu, P.K.L. Improved whale optimization based band selection for hyperspectral remote sensing image classification. Infrared Phys. Technol. 2021, 119, 103948. [Google Scholar] [CrossRef]
Tulapurkar, H.; Banerjee, B.; Buddhiraju, K.M. Multi-head attention with CNN and wavelet for classification of hyperspectral image. Neural Comput. Appl. 2022, 1–15. [Google Scholar] [CrossRef]
Villa, A.; Benediktsson, J.A.; Chanussot, J.; Jutten, C. Hyperspectral image classification with independent component discriminant analysis. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4865–4876. [Google Scholar] [CrossRef] [Green Version]
Santos, A.; Pedrini, H. A combination of k-means clustering and entropy filtering for band selection and classification in hyperspectral images. Int. J. Remote Sens. 2016, 37, 3005–3020. [Google Scholar] [CrossRef]
Schclar, A.; Averbuch, A. A diffusion approach to unsupervised segmentation of hyper-spectral images. In Computational Intelligence, Proceedings of the International Joint Conference on Computational Intelligence, Funchal-Madeira, Portugal, 1–3 November 2017; Springer: Cham, Switzerland, 2017; pp. 163–178. [Google Scholar]
Jain, D.K.; Dubey, S.B.; Choubey, R.K.; Sinhal, A.; Arjaria, S.K.; Jain, A.; Wang, H. An approach for hyperspectral image classification by optimizing SVM using self organizing map. J. Comput. Sci. 2018, 25, 252–259. [Google Scholar] [CrossRef]
Ahmad, M.; Alqarni, M.A.; Khan, A.M.; Hussain, R.; Mazzara, M.; Distefano, S. Segmented and non-segmented stacked denoising autoencoder for hyperspectral band reduction. Optik 2019, 180, 370–378. [Google Scholar] [CrossRef] [Green Version]
Romaszewski, M.; Głomb, P.; Cholewa, M. Semi-supervised hyperspectral classification from a small number of training samples using a co-training approach. ISPRS J. Photogramm. Remote Sens. 2016, 121, 60–76. [Google Scholar] [CrossRef]
Li, L.; Sun, C.; Lin, L.; Li, J.; Jiang, S. A dual-layer supervised Mahalanobis kernel for the classification of hyperspectral images. Neurocomputing 2016, 214, 430–444. [Google Scholar] [CrossRef] [Green Version]
Nhaila, H.; Elmaizi, A.; Sarhrouni, E.; Hammouch, A. Supervised classification methods applied to airborne hyperspectral images: Comparative study using mutual information. Procedia Comput. Sci. 2019, 148, 97–106. [Google Scholar] [CrossRef]
Ren, J.; Wang, R.; Liu, G.; Feng, R.; Wang, Y.; Wu, W. Partitioned relief-F method for dimensionality reduction of hyperspectral images. Remote Sens. 2020, 12, 1104. [Google Scholar] [CrossRef] [Green Version]
Liu, H.; Li, W.; Xia, X.G.; Zhang, M.; Tao, R. Superpixelwise Collaborative-Representation Graph Embedding for Unsupervised Dimension Reduction in Hyperspectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4684–4698. [Google Scholar] [CrossRef]
Ding, S.; Keal, C.A.; Zhao, L.; Yu, D. Dimensionality reduction and classification for hyperspectral image based on robust supervised ISOMAP. J. Ind. Prod. Eng. 2022, 39, 19–29. [Google Scholar] [CrossRef]
Qi, C.; Wang, Y.; Tian, W.; Wang, Q. Multiple kernel boosting framework based on information measure for classification. Chaos Solitons Fractals 2016, 89, 175–186. [Google Scholar] [CrossRef]
Yang, R.; Su, L.; Zhao, X.; Wan, H.; Sun, J. Representative band selection for hyperspectral image classification. J. Vis. Commun. Image Represent. 2017, 48, 396–403. [Google Scholar] [CrossRef]
Medjahed, S.A.; Ouali, M. Band selection based on optimization approach for hyperspectral image classification. Egypt. J. Remote Sens. Space Sci. 2018, 21, 413–418. [Google Scholar] [CrossRef]
Xie, F.; Li, F.; Lei, C.; Yang, J.; Zhang, Y. Unsupervised band selection based on artificial bee colony algorithm for hyperspectral image classification. Appl. Soft Comput. 2019, 75, 428–440. [Google Scholar] [CrossRef]
Sellami, A.; Farah, M.; Farah, I.R.; Solaiman, B. Hyperspectral imagery classification based on semi-supervised 3-D deep neural network and adaptive band selection. Expert Syst. Appl. 2019, 129, 246–259. [Google Scholar] [CrossRef]
Elmaizi, A.; Nhaila, H.; Sarhrouni, E.; Hammouch, A.; Nacir, C. A novel information gain based approach for classification and dimensionality reduction of hyperspectral images. Procedia Comput. Sci. 2019, 148, 126–134. [Google Scholar] [CrossRef]
Sawant, S.; Manoharan, P. Hyperspectral band selection based on metaheuristic optimization approach. Infrared Phys. Technol. 2020, 107, 103295. [Google Scholar] [CrossRef]
Zhu, Q.; Wang, Y.; Wang, F.; Song, M.; Chang, C.I. Hyperspectral band selection based on improved affinity propagation. In Proceedings of the 2021 11th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, The Netherlands, 24–26 March 2021; pp. 1–4. [Google Scholar]
Uddin, M.P.; Mamun, M.A.; Afjal, M.I.; Hossain, M.A. Information-theoretic feature selection with segmentation-based folded principal component analysis (PCA) for hyperspectral image classification. Int. J. Remote Sens. 2021, 42, 286–321. [Google Scholar] [CrossRef]
Zhang, J. A hybrid clustering method with a filter feature selection for hyperspectral image classification. J. Imaging 2022, 8, 180. [Google Scholar] [CrossRef] [PubMed]
Imani, M.; Ghassemian, H. Binary coding based feature extraction in remote sensing high dimensional data. Inf. Sci. 2016, 342, 191–208. [Google Scholar] [CrossRef]
Qi, C.; Zhou, Z.; Sun, Y.; Song, H.; Hu, L.; Wang, Q. Feature selection and multiple kernel boosting framework based on PSO with mutation mechanism for hyperspectral classification. Neurocomputing 2017, 220, 181–190. [Google Scholar] [CrossRef]
Ksieniewicz, P.; Krawczyk, B.; Woźniak, M. Ensemble of Extreme Learning Machines with trained classifier combination and statistical features for hyperspectral data. Neurocomputing 2018, 271, 28–37. [Google Scholar] [CrossRef]
Qiao, T.; Yang, Z.; Ren, J.; Yuen, P.; Zhao, H.; Sun, G.; Marshall, S.; Benediktsson, J.A. Joint bilateral filtering and spectral similarity-based sparse representation: A generic framework for effective feature extraction and data classification in hyperspectral imaging. Pattern Recognit. 2018, 77, 316–328. [Google Scholar] [CrossRef] [Green Version]
Paul, S.; Kumar, D.N. Spectral-spatial classification of hyperspectral data with mutual information based segmented stacked autoencoder approach. ISPRS J. Photogramm. Remote Sens. 2018, 138, 265–280. [Google Scholar] [CrossRef]
Chen, Z.; Jiang, J.; Zhou, C.; Fu, S.; Cai, Z. SuperBF: Superpixel-based bilateral filtering algorithm and its application in feature extraction of hyperspectral images. IEEE Access 2019, 7, 147796–147807. [Google Scholar] [CrossRef]
Li, Q.; Zheng, B.; Tu, B.; Wang, J.; Zhou, C. Ensemble EMD-based spectral-spatial feature extraction for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5134–5148. [Google Scholar] [CrossRef]
Wang, D.; Du, B.; Zhang, L.; Xu, Y. Adaptive spectral–spatial multiscale contextual feature extraction for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2461–2477. [Google Scholar] [CrossRef]
Liang, N.; Duan, P.; Xu, H.; Cui, L. Multi-View Structural Feature Extraction for Hyperspectral Image Classification. Remote Sens. 2022, 14, 1971. [Google Scholar] [CrossRef]
Ratle, F.; Camps-Valls, G.; Weston, J. Semisupervised neural networks for efficient hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2271–2282. [Google Scholar] [CrossRef]
Lin, Z.; Chen, Y.; Zhao, X.; Wang, G. Spectral-spatial classification of hyperspectral image using autoencoders. In Proceedings of the 2013 9th International Conference on Information, Communications & Signal Processing, Taiwan, China, 10–13 December 2013; pp. 1–5. [Google Scholar]
Yue, J.; Zhao, W.; Mao, S.; Liu, H. Spectral–spatial classification of hyperspectral images using deep convolutional neural networks. Remote Sens. Lett. 2015, 6, 468–477. [Google Scholar] [CrossRef]
Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep convolutional neural networks for hyperspectral image classification. J. Sens. 2015, 2015, 258619. [Google Scholar] [CrossRef] [Green Version]
Chan, T.H.; Jia, K.; Gao, S.; Lu, J.; Zeng, Z.; Ma, Y. PCANet: A simple deep learning baseline for image classification? IEEE Trans. Image Process. 2015, 24, 5017–5032. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, P.; Zhang, H.; Eom, K.B. Active deep learning for classification of hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 712–724. [Google Scholar] [CrossRef] [Green Version]
Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef] [Green Version]
Zabalza, J.; Ren, J.; Zheng, J.; Zhao, H.; Qing, C.; Yang, Z.; Du, P.; Marshall, S. Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 2016, 185, 1–10. [Google Scholar] [CrossRef] [Green Version]
Yu, S.; Jia, S.; Xu, C. Convolutional neural networks for hyperspectral image classification. Neurocomputing 2017, 219, 88–98. [Google Scholar] [CrossRef]
Chen, Y.; Zhu, L.; Ghamisi, P.; Jia, X.; Li, G.; Tang, L. Hyperspectral images classification with Gabor filtering and convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2355–2359. [Google Scholar] [CrossRef]
Li, Y.; Xie, W.; Li, H. Hyperspectral image reconstruction by deep convolutional neural network for classification. Pattern Recognit. 2017, 63, 371–383. [Google Scholar] [CrossRef]
Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef]
Deng, C.; Xue, Y.; Liu, X.; Li, C.; Tao, D. Active transfer learning network: A unified deep joint spectral–spatial feature learning model for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1741–1754. [Google Scholar] [CrossRef] [Green Version]
Liang, M.; Jiao, L.; Yang, S.; Liu, F.; Hou, B.; Chen, H. Deep multiscale spectral-spatial feature fusion for hyperspectral images classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2911–2924. [Google Scholar] [CrossRef]
Wang, W.; Dou, S.; Jiang, Z.; Sun, L. A fast dense spectral–spatial convolution network framework for hyperspectral images classification. Remote Sens. 2018, 10, 1068. [Google Scholar] [CrossRef] [Green Version]
Yang, X.; Ye, Y.; Li, X.; Lau, R.Y.; Zhang, X.; Huang, X. Hyperspectral image classification with deep learning models. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5408–5423. [Google Scholar] [CrossRef]
Pan, B.; Shi, Z.; Xu, X. MugNet: Deep learning for hyperspectral image classification using limited samples. ISPRS J. Photogramm. Remote Sens. 2018, 145, 108–119. [Google Scholar] [CrossRef]
Paoletti, M.; Haut, J.; Plaza, J.; Plaza, A. A new deep convolutional neural network for fast hyperspectral image classification. ISPRS J. Photogramm. Remote Sens. 2018, 145, 120–147. [Google Scholar] [CrossRef]
Chen, C.; Jiang, F.; Yang, C.; Rho, S.; Shen, W.; Liu, S.; Liu, Z. Hyperspectral classification based on spectral–spatial convolutional neural networks. Eng. Appl. Artif. Intell. 2018, 68, 165–171. [Google Scholar] [CrossRef]
Singh, S.; Kasana, S.S. Efficient classification of the hyperspectral images using deep learning. Multimed. Tools Appl. 2018, 77, 27061–27074. [Google Scholar] [CrossRef]
Zhou, F.; Hang, R.; Liu, Q.; Yuan, X. Hyperspectral image classification using spectral-spatial LSTMs. Neurocomputing 2019, 328, 39–47. [Google Scholar] [CrossRef]
Fang, B.; Li, Y.; Zhang, H.; Chan, J.C.W. Hyperspectral images classification based on dense convolutional networks with spectral-wise attention mechanism. Remote Sens. 2019, 11, 159. [Google Scholar] [CrossRef] [Green Version]
Liu, X.; Hu, Q.; Cai, Y.; Cai, Z. Extreme learning machine-based ensemble transfer learning for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3892–3902. [Google Scholar] [CrossRef]
Ramamurthy, M.; Robinson, Y.H.; Vimal, S.; Suresh, A. Auto encoder based dimensionality reduction and classification using convolutional neural networks for hyperspectral images. Microprocess. Microsyst. 2020, 79, 103280. [Google Scholar] [CrossRef]
Sharifi, O.; Mokhtarzade, M.; Beirami, B.A. A Deep Convolutional Neural Network based on Local Binary Patterns of Gabor Features for Classification of Hyperspectral Images. In Proceedings of the 2020 International Conference on Machine Vision and Image Processing (MVIP), Qom, Iran, 18–20 February 2020; pp. 1–5. [Google Scholar]
Cao, F.; Guo, W. Deep hybrid dilated residual networks for hyperspectral image classification. Neurocomputing 2020, 384, 170–181. [Google Scholar] [CrossRef]
Nalepa, J.; Antoniak, M.; Myller, M.; Lorenzo, P.R.; Marcinkiewicz, M. Towards resource-frugal deep convolutional neural networks for hyperspectral image segmentation. Microprocess. Microsyst. 2020, 73, 102994. [Google Scholar] [CrossRef]
Vaddi, R.; Manoharan, P. Hyperspectral image classification using CNN with spectral and spatial features integration. Infrared Phys. Technol. 2020, 107, 103296. [Google Scholar] [CrossRef]
Jiao, L.; Shang, R.; Liu, F.; Zhang, W. Brain and Nature-Inspired Learning, Computation and Recognition; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar]
Singh, S.; Kasana, S.S. A Pre-processing framework for spectral classification of hyperspectral images. Multimed. Tools Appl. 2021, 80, 243–261. [Google Scholar] [CrossRef]
Li, L.; Ge, H.; Gao, J. A spectral-spatial kernel-based method for hyperspectral imagery classification. Adv. Space Res. 2017, 59, 954–967. [Google Scholar] [CrossRef]
Manifold, B.; Men, S.; Hu, R.; Fu, D. A versatile deep learning architecture for classification and label-free prediction of hyperspectral images. Nat. Mach. Intell. 2021, 3, 306–315. [Google Scholar] [CrossRef]
Xue, Z.; Yu, X.; Liu, B.; Tan, X.; Wei, X. HResNetAM: Hierarchical residual network with attention mechanism for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3566–3580. [Google Scholar] [CrossRef]
Sellami, A.; Tabbone, S. Deep neural networks-based relevant latent representation learning for hyperspectral image classification. Pattern Recognit. 2022, 121, 108224. [Google Scholar] [CrossRef]
Zhan, Y.; Wu, K.; Dong, Y. Enhanced Spectral–Spatial Residual Attention Network for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7171–7186. [Google Scholar] [CrossRef]
Sharifi, O.; Mokhtarzadeh, M.; Asghari Beirami, B. A new deep learning approach for classification of hyperspectral images: Feature and decision level fusion of spectral and spatial features in multiscale CNN. Geocarto Int. 2021, 37, 1–26. [Google Scholar] [CrossRef]

Figure 1. Spectral curves of soil, unhealthy and healthy plant [1].

Figure 2. Hyperspectral Image Cube Representation [2].

Figure 3. General Applications of HSI.

Figure 4. A general representation of DL classification of an Hyperspectral image [8].

Figure 7. A schematic approach of Wavelet Transform decomposing data into two levels.

Figure 8. General schema of PCA dimension reduction.

Figure 9. General schema of dimension reduction used for classification.

Figure 10. General DL mechanism for image classification.

Figure 11. Joint spectral spatial-based LSTM [94]. Adapted with permission from ref. [94]. 2019 Neurocomputing.

Figure 12. Spectral LSTM architecture [94]. Adapted with permission from ref. [94]. 2019 Neurocomputing.

Figure 13. Spatial LSTM architecture [94]. Adapted with permission from ref. [94]. 2019 Neurocomputing.

Figure 14. Percentage of majorly used datasets in existing techniques.

Figure 15. The highest and lowest OA achieved by different classification techniques in the survey.

Table 1. Comparison analysis of Kernel-based classification.

Year	Authors	Methodology Used	Evaluation Parameters
2016	Gu and Liu [13]	Sample Screening MKL, Boosting strategy for limited number of training samples, MPs, SVM classification.	Indian Pines: OA-93.30% and AA-95.52%. Pavia University: OA-96.03% and AA-96.36%. Salinas valley: OA-94.78% and AA-97.15%
2018	Li et al. [15]	Composite kernels, Adaboost framework, weighted ELM, RBF kernel, polynomial function kernel, spectral spatial features extraction.	Indian Pines: OA-98.08% and AA-97.67%. Pavia University: OA-96.46% and AA-93.32%.
2019	Li, Lu and Zhang [16]	Combination of SVM, MKL, Boosting algorithm.	Pavia university: OA-85.3% and AA-89.74%. Salinas Valley: OA-97.55% and AA-98.02%.
2020	Li, Wang and Kong [17]	Multiple feature learning, Shape Adaptive kernel with Sparse Representation. Spectral, EMP, DMP, LBP and Gabor texture feature descriptors were used.	Indian Pines: OA-99.19% and AA-99.24% Pavia University: OA-97.09% and AA-95.55%. Salinas valley: OA-99.57% and AA-99.51%.
2020	Gao et al. [18]	Composite spectral-spatial kernel, ERS for superpixels, CKA-based kernel learning, Reed-Xiaoli anomaly detection algorithm. Mahalanobis distance to form decision rules.	Better performances in terms of ROC and area under the ROC curve.
2020	Wang et al. [19]	A MKL approach. Obtained spectral features through PCA, spatial features through Gabor and EMP, semantic information using k-means clustering on each superpixel. Composite kernels used in SVM classification.	Indian Pines: OA-98.39% and AA-98.30%. Pavia university: OA-99.77% and AA-99.80%.
2021	Ma et al. [20]	Spectral spatial kernel generation network, Region segmentation, clustering and mapping operations for spatial kernels, Correlation characteristics of bands into spectral attention mechanism for spectral kernels.	Indian Pines: OA-99.29%. Pavia university: OA-99.56%. Salinas Valley: OA-99.37%.
2021	Ansari et al. [21]	Convolutional kernel classifier, Nystrom’s approximation to reduce high dimensionality of basis kernels, Deep kernels using 1D CNN.	Pavia University: OA-95.22% and AA-94.26%. Salinas Valley: OA-95.23% and AA-97.66%.
2022	Krishna et al. [22]	Fuzzy twin SVM kernel deep learning framework, Gaussian, ANOVA and linear spline kernel with fuzzy triangular function SVM hyperplanes.	Indian Pines: OA-99.82%. Pavia university: OA-99.73%. Salinas Valley: OA-99.83%.
2022	Wang et al. [23]	Spectral spatial kernel, Vision transformer	Indian Pines: OA-98.81% and AA-98.83%. Pavia university: OA-99.76% and AA-99.60%.

Table 2. Comparison analysis of transform-based methods.

Year	Authors	Methodology Used	Evaluation Parameters
2008	Akbari et al. [30]	Compression of medical image using wavelet transform. LVQ	ANN for segmentation and classification. Post processing using region growing.
2014	Chen et al. [31]	PCA, MSA and edge extraction using wavelet transform.	False Negative Rate (FNR) and False Positive Rate (FPR) were evaluation criteria for quality of detection. Spleen was detected the best with FNR of 1.3% and FPR of 0.5%. In comparison with Canny and Log operator, it produced better results with less noisy edges.
2014	Quesada et al. [32]	Morphological operations, spectral spatial features extraction through wavelet, SVM classification.	The proposed work achieved an OA of 98.8%, AA of 99.0% on Pavia university Dataset.
2017	Prabhakar and Geetha [33]	2D-EWT for band selection, 2-D extension of Littlewood-Paley transform and sparse-based classifiers.	IEMD gave better OA but in more time as compared to 2D-EWT. The low frequency components of IEMD and 2D-EWT had improved kappa measure, OA and AA.
2019	Ji et al. [38]	PCA, image decomposition using DWT, GLCM and AdaBoost-FLD. CLassification using Adaboost modeling.	Accuracy of 99.82% in detection of bruises.
2020	Cao et al. [39]	3D-DWT to extract features and remove stripe noise effect. CNN classification with active learning strategy.	Indian Pines: OA-91.98% and AA-80.34% Pavia University: OA-91.27% and AA-82.37%.
2020	Zikiou et al. [40]	Texture classification using graph-based wavelet transform, de-correlation between close pixels-based on spectral similarity to build spectral graph wavelets, SVM classification.	Indian Pines: OA- 98.90% and AA-98.77%. Pavia University: OA- 99.65% and AA-99.47%. Kennedy Space Centre: OA- 99.73% and AA-99.80%.
2021	Manoharan et al. [41]	Whale optimization band selection technique, 3D-DWT to increase variation in selected bands, CNN with 3D convolutions fused using spectral and wavelet spatial features.	Indian Pines: OA- 99.44%. Pavia University: OA- 99.85%. Salinas Valley: OA-99.83%.
2021	Anand et al. [35]	3D spectral spatial features. Haar, Frejet-Korovkin and Coiflet filters. SVM, KNN and RF classification.	Indian Pines: OA- 90.4%. Salinas Valley: OA- 96.7%.
2022	Xu et al. [36]	3D Haar wavelet filter for spectral spatial features, CNN.	Indian Pines: OA- 98.15%. Pavia University: OA- 96.27%. Kennedy Space Centre: OA- 96.84%.
2022	Miclea et al. [37]	Spectral features through wavelet, LBP for spatial features and SVM classification.	Indian Pines: OA- 93.85%. Pavia University: OA- 92.86%.
2022	Tulapurkar et al. [42]	Multi-head Transformer-based attention mechanism to select features, Coiflet wavelet filter-CNN-based feature extraction.	Indian Pines: OA- 65.54% and AA-57.95%. Pavia University: OA- 94.62% and AA-94.25%.

Table 3. Comparison analysis of supervised, semi-supervised and unsupervised dimension reduction-based classification techniques.

Year	Authors	Methodology Used	Evaluation Parameters
2016	Romaszewski et al. [48]	P-N learning scheme, P learner extracted spatial features through region growing, N learner extracted spectral filters using NN classifier.	Indian Pines: OA-94.05%. Pavia University: OA-97.40% Salinas Valley: OA-98.38%.
2016	Li et al. [49]	Dual-layer supervised Mahalanobis distance kernel for HSI classification. SVM classification.	Indian Pines: OA-71.24% and AA-78.43%. Pavia University: OA-77.67% and AA-84.12%. Salinas valley: OA-88.04% and AA-94.01%.
2016	Santos and Pedrini [44]	Entropy filtering, K-means clustering, band grouping using correlation, SVM classification.	Indian Pines: OA-97.1%. Salinas valley: OA-97.1%. Pavia centre: OA-98.3%. Pavia university: OA-96.2%.
2017	Schclar and Averbuch [45]	Diffusion bases method, wavelength wise global segmentation to cluster low dimensional data.	NA
2018	Jain et al. [46]	Data compression using SOM, SVM classification	Indian Pines: OA-85.29% and AA-86.23%. Pavia University: OA-95.46% and AA-94.27%.
2019	Ahmad et al. [47]	UDAE for spectral and spatial features extraction.	For segmented DAE OA = 99.06% on Salinas-A OA = 91.37% on Salinas OA = 97.69% on Pavia Centre. OA = 84.07% on Pavia University. For non-segmented DAE OA = 98.79% on Salinas-A OA = 91.50% on Salinas OA = 98.13% on Pavia Centre. OA = 89.20% on Pavia University.
2019	Nhaila et al. [50]	MI band reduction, Supervised classification of HSI using SVM, KNN, RF and LDA with different kernels.	Indian Pines: Highest OA-93.27% of SVM. Salinas Valley: Highest OA-97.09% of RF. Pavia University: Highest OA-95.50% of RF.
2020	Ren et al. [51]	Supervised dimension reduction using Relief-F method. Band selection from reduced feature set on the basis of high importance scores to eliminate contiguous bands.	Salinas Valley: OA-94.45%. Pavia University: OA-94.71%.
2021	Liu et al. [52]	Superpixel Collaborative Represention of pixels (SPCR) having similar spectral signatures and spatial adjacency. Global projection matrix to reduce discrepancies between original spectral features and SPCR.	OA of 97.64% and AA of 97.77% on Pavia University.
2022	Ding et al. [53]	Supervised Isometric Feature mapping for dimension reduction. Triple geodesic distance learning using pixel label, neighbourhood and credibility information. Generalised regression NN classification.	OA of 96.83% on Indian Pines.

Table 4. Comparison analysis of features selection-based classification.

Year	Authors	Methodology Used	Evaluation Parameters
2016	Qi et al. [54]	OIF for dimension reduction, KL distance-based kernel function, MKL and SVM classification.	Accuracy of 85.89% on Indian pines dataset.
2017	Yang et al. [55]	K-means for band clustering, KNN and SVM classification.	NA
2018	Medjahed and Ouali [56]	Simulated annealing for features, MI and classification embedded with objective function.	Accuracy rate of 88.75% having 10 features on the Pavia university dataset. For Indian Pines dataset, highest OA of 76.48% and AA of 71.72%.
2019	Xie et al. [57]	Band selection through ISD, ABC and maximum entropy algorithms. SVM classification.	NA
2019	Sellami et al. [58]	Adaptive dimension reduction and spectra spatial features using semi-supervised 3-D CNN.	NA
2019	Elzaimi et al. [59]	Filter-based approach using information gain function to reduce the dimensionality.	Indian Pines: OA-95.25%. Pavia University: OA-96.83%.
2020	Sawant and Manoharan [60]	Band selection using Modified Cuckoo Search algorithm, Levy flight and Meta-heuristic-based optimization method.	Indian Pines: OA-86.92%. Pavia University: OA-95.10%.
2021	Uddin et al. [62]	Feature selection using MI-based minimum redundancy and maximum variance	OA-95.39% and AA-95.09% on Indian Pines.
2022	Zhang et al. [63]	Hybrid clustering with filtering feature selection-based on weights of similarity measure between bands.	OA of 79.24% on Indian Pines.

Table 5. Comparison analysis of features extraction-based Classification.

Year	Authors	Methodology Used	Evaluation Parameters
2016	Imani et al. [64]	Binary coding-based feature extraction (BCFE), SVM and maximum likelihood classifiers.	NA
2017	Qi et al. [65]	Features extraction using PSO, standard deviation, correlation coefficient and KL divergence, MKB framework.	Indian Pines: OA-88.02%. Pavia University: OA-95.81%.
2018	Ksieniewicz et al. [66]	14 statistical features, ensemble of ELMS using Random Subspace Method.	NA
2018	Qiao et al. [67]	Joint bilateral filtering, Spectral similarity, PCA, Joint Sparse Representation Classification (JSRC)	Indian Pines: OA-98.13%. Pavia University: OA-99.76%.
2018	Paul et al. [68]	MI-based SAEs, MPs for spatial features.	NA
2019	Chen et al. [69]	Bilateral filter-based feature extraction on superpixels, SVM classification.	Indian Pines: OA-93.69% and AA-89.98%. Pavia University: OA-93.30% and AA-92.78%.
2020	Li et al. [70]	PCA, Adaptive total variation filtering to extract features from top PCs and transformed features using Ensemble IEMD. Stacked all the features obtained for classification.	Indian Pines: OA-98.84% and AA-99.01%. Salinas Valley: OA-99.51% and AA-99.68%.
2021	Wang et al. [71]	Multi-scale spectral features using band grouping and LSTM, spatial features-based on multi-scale spectral features and convolution LSTM.	Indian Pines: OA-98.30% and AA-99.09%. Pavia University: OA-96.26% and AA-98.12%.
2022	Liang et al. [72]	Dimension reduction using Minimum Noise Fraction (MNF) and local features extracted using relative total variation, superpixel segmentation to extract non-local structural features	Indian Pines: OA-90.32% and AA-93.04%. Salinas Valley: OA-98.13% and AA-97.57%.

Table 6. Comparison analysis of DL Classification Techniques.

Year	Authors	Methodology Used	Evaluation Parameters
2016	Zabalza et al. [80]	Bands segmentation using SSAE, local SAEs to process original features.	Indian Pines: OA-80.66%. Pavia Centre: OA-97.5%.
2017	Li et al. [104]	Band normalisation between [0, 1], PCA, GLCM for spatial features extraction, CNN and ELM model.	Indian Pines: OA-98.08%, AA-97.67% and k-97.81%. Pavia University: OA-96.46%, AA-93.32% and k-95.31%. Using image reconstruction helped in increasing the AA of ELM by as high as 30.04%.
2018	Singh and Kasana [93]	LPP, Deep features using SAE, Logistic Regression	Indian Pines: OA-84.4%. Salinas Valley: OA-87.2%.
2018	Pan et al. [90]	Spectral and Spatial Mugnet DL architecture to deal with lesser samples, SVM classification.	Indian Pines: OA-90.65%. OA of 90.82% and 93.15% on Grss_dfc_2013 and Grss_dfc_2014 datasets.
2018	Paoletti et al. [91]	3D CNN run on GPUs for spatial and spectral features.	Indian Pines: OA-98.37%, AA-99.27% and k-98.15%. Pavia university: OA-98.06%, AA-98.61% and k-97.44%.
2018	Chen et al. [92]	Large spatial windows to extract local neighbourhood features, convolution kernels to merge spectral features. SVM classification with RBF kernel.	Indian Pines: OA-98.02%. On combination with SVM, highest accuracy of 98.39% and 98.44% was obtained in the Indian Pines and Pavia University dataset.
2019	Zhou et al. [94]	Spectral-spatial LSTM, PCA and softmax classification.	Improved the classification accuracy by at least 2.69%, 1.53% and 1.08% on Indian Pines, Pavia University and Kennedy Space Centre datasets, respectively.
2020	Cao et al. [99]	Supervised learning using 3D-2D spectral spatial hybrid dilated residual networks.	Indian Pines: OA-99.46%, AA-99.43% and k-99.38%. Kennedy Space center: OA-99.89%, AA-99.77% and k-99.88%. Pavia University: OA-99.81%, AA-99.69% and k-99.74%.
2020	Nalepa et al. [100]	Resource frugal spectral CNN. Deep model trained in full precision followed by fake quantization and then trained again before being quantized to final low-bit version.	The model four times smaller in size than original counterparts, segmented equally well.
2020	Vaddi et al. [101]	Data normalization and CNN-based classification of HSI, Probabilistic PCA for spectral features and gabor filter for spatial features.	Indian Pines: OA-99.02% and AA-99.17%. Pavia University: OA-99.94% and AA-99.92%.
2020	Jiao et al. [102]	Various approaches of deep neural network models. VGG-verydeep-16, RAEs and superpixels-based multi-local CNN.	In second approach, accuracy of 99.91% was obtained on Pavia University dataset.
2021	Singh and Kasana [103]	Approximation of lost noisy bands, PCA and LPP-based spectral-spatial features, Deep network SAE.	Indian Pines: OA-99.02% and AA-99.17%. Pavia University: OA-99.94% and AA-99.92%.
2021	Manifold et al. [105]	U-within-UNet architecture to handle both spectral and spatial features for segmentation, feature extraction and classification.	OA of 99.48% on Indian Pines.
2021	Xue et al. [106]	Hierarchical Residual network with attention mechanism for multi-scale spectral spatial features.	OA of 99.80% and AA of 99.29% on Pavia University.
2022	Sellami et al. [107]	Spatial features using AE, Fused spectral and spatial features using deep AE into joint latent representation, graph CNN.	Indian Pines: OA-97.68% and AA-97.55%. Salinas Valley: OA-98.24% and AA-98.17%. Pavia University: OA-99.16% and AA-99.04%.
2022	Zhan et al. [108]	Combination of LSTM, residual network and spectral spatial attention network for HSI classification	Indian Pines: OA-97.69% and AA-97.19%. Salinas Valley: OA-98.34% and AA-98.84%. Pavia University: OA-95.87% and AA-95.37%.

Table 7. Comparison of performances of ML and DL techniques on landcover HSI.

Techniques	OA	Remarks
SVM [28]	95.75%	SVM implemented with different kernels.
SVM + DL [92]	98.4%	CNN offers more computations to handle complex data and generate useful features. It does not need an expert for manual labeling which is the case for supervised classifiers like SVM. In this work, convolution kernels were used for spatial features. It helped spectral SVM to perform better classification.
Wavelet Transform [37]	93.85%	DWT was used for denoising and enhancement of HSI and fed to SVM.
Wavelet Transform + DL [36]	98.64%	CNN offers high computations and deals with complex data with many parameters. Here, DWT combined with CNN reduced the learnable parameters and created a light, robust CNN architecture.
Simple Band reduction [65]	95.8%	Extracted features through heavy computations of KL divergence, PSO and MKB
Band Reduction + DL [47]	99.06%	DL offers automatic and multi-layer processing for extracting features. It is more powerful than manual hit and trial of different feature engineering techniques. The authors implemented the computation power of Autoencoders to extract informative spectral spatial features.
SVM + Gaussian Filter [27]	98.68%	SVM is a spectral classifier. The spatial features from filter were combined with SVM classification map.
DL + Gabor Filter [109]	99.38%	High computation power of multi-scale CNN extracted better spectral-spatial features than SVM + Gaussian.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Grewal, R.; Singh Kasana, S.; Kasana, G. Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey. Electronics 2023, 12, 488. https://doi.org/10.3390/electronics12030488

AMA Style

Grewal R, Singh Kasana S, Kasana G. Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey. Electronics. 2023; 12(3):488. https://doi.org/10.3390/electronics12030488

Chicago/Turabian Style

Grewal, Reaya, Singara Singh Kasana, and Geeta Kasana. 2023. "Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey" Electronics 12, no. 3: 488. https://doi.org/10.3390/electronics12030488

APA Style

Grewal, R., Singh Kasana, S., & Kasana, G. (2023). Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey. Electronics, 12(3), 488. https://doi.org/10.3390/electronics12030488

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey

Abstract

1. Introduction

1.1. Advantages of HSI

1.2. Applications of HSI

2. Preliminaries

2.1. Overview of Classification Techniques

2.2. Methodology Adopted for Survey

3. Traditional Machine Learning Classification Techniques

3.1. Kernel Learning-Based Classification Techniques

3.2. Support Vector Machine

3.3. Transform-Based Techniques

3.4. Dimension Reduction-Based Techniques

3.4.1. Unsupervised

3.4.2. Semi-Supervised

3.4.3. Supervised

3.4.4. Features Selection

3.4.5. Features Extraction

4. Deep Learning-Based Classification

5. Discussion

6. Conclusions and Future Scope

6.1. Conclusions

6.2. Future Scope

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI