A Novel Technique for Semantic Segmentation of Hyperspectral Images Using Multi-View Features

Grewal, Reaya; Kasana, Geeta; Kasana, Singara Singh

doi:10.3390/app14114909

Open AccessArticle

A Novel Technique for Semantic Segmentation of Hyperspectral Images Using Multi-View Features

by

Reaya Grewal

¹,

Geeta Kasana

^1,* and

Singara Singh Kasana

²

¹

Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, India

²

Computer Science and Information Technology, Central University of Haryana, Mahendergarh 123031, India

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4909; https://doi.org/10.3390/app14114909

Submission received: 15 April 2024 / Revised: 26 May 2024 / Accepted: 29 May 2024 / Published: 5 June 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

This research presents an innovative technique for semantic segmentation of Hyperspectral Image (HSI) while focusing on its dimensionality reduction. A unique technique is applied to three distinct HSI landcover datasets, Indian Pines, Pavia University, and Salinas Valley, acquired from diverse sensors. HSIs are inherently multi-view structures, causing redundancy and computation overload due to their high dimensionality. The technique utilizes Canonical Correlation Analysis (CCA) variants, Pairwise CCA (PCCA) and Multiple Set CCA (MCCA), to extract features from multiple views of the input image simultaneously. The performance of PCCA and MCCA is compared with the traditional Principal Component Analysis (PCA) on all datasets. The superior performance of CCA variants, particularly MCCA, is demonstrated in achieving higher Overall Accuracy (OA) for semantic segmentation compared to PCA. The research extends the analysis by integrating machine learning classifiers for per pixel prediction, demonstrating the effectiveness of the proposed techniques i.e., PCCA-SVM and MCCA-SVM.

Keywords:

hyperspectral image; multi-view learning; semantic segmentation; canonical correlation analysis; principal component analysis; curse of dimensionality

1. Introduction

Hyperspectral Imaging (HSI) has emerged as a pivotal technology in remote sensing, revolutionizing our ability to observe and comprehend the world from above. Unlike traditional RGB images, which capture just three bands of light, hyperspectral sensors are capable of acquiring data across hundreds of contiguous narrow spectral bands across the electromagnetic spectrum. This unique capability enables hyperspectral images to provide a profound level of spectral detail, which is instrumental in a number of applications ranging from vegetation, mineral and forest mapping [1,2,3], chemical imaging [4], landcover classification [5], and anomaly detection [6] among others.

To comprehend HSIs information and do extended analysis, Semantic segmentation is an advanced computer vision task that involves classifying each pixel in an image into specific semantic categories, thereby assigning a label to every region of interest. It offers a finer level of information by supplying semantic labels to individual pixels in the image. This degree of detail improves the machine’s ability to learn and helps it comprehend more intricate concepts, such the boundaries and spatial relationships between several objects in an image. Additionally, manual labeling of images which may be expensive and time-consuming is eliminated by semantic segmentation. It can also be used to identify several things in a picture and is not restricted to a single sort of object.

Due to advancements and continuous research based on Semantic segmentation techniques, they are being used for a variety of data-rich and diversified remote sensing problems [7]. It has played an immense role in various applications such as lane analysis for autonomous vehicles [8] and geolocalization for Unmanned Aerial Vehicles [9]. Environmental monitoring [10], crop cover and type analysis [11], and land usage analysis in urban areas [12] are a few instances of semantic segmentation of remote sensing imagery. However, semantic segmentation becomes particularly challenging due to the high-dimensional nature of the data, where each pixel is represented by a spectrum across numerous spectral bands [13]. The spectral bands are hundreds in number, redundant and cause Hughes’s curse of dimensionality [14] which necessitates efficient techniques that can capture the relevant information while handling the risk of overfitting and computational inefficiency.

To deal with the Curse of Dimensionality, a bunch of Feature Selection (FS) and Feature Extraction (FE) techniques exist. While FS picks from raw features it also loses some of the original features, whereas FE converts the high dimension into meaningful low-dimensional feature space without losing any important feature. FE techniques like Principal Component Analysis (PCA) [15], Independent Component Ananlysis (ICA) [16], Linear Discriminant Analysis (LDA) [17], Wavelet Transform [18] and more have been implemented on HSI. While some adopted Bilateral filter based feature extraction on Superpixels of HSI [19], others utilised adaptive variation filtering over the principal components acquired through PCA [19]. Minimum Noise Fraction has also played a vital role to reduce dimensions on superpixels of HSI to extract reduced spectral and spatial features [20]. The most important and relevant spectral bands are also found by utilizing a variety of metaheuristic optimization techniques, including Particle Swarm Optimization (PSO) [21], Genetic Algorithm (GA) [22], Gray Wolf Optimization (GWO) [23], and Whale Optimization Algorithm (WOA) [24]. Ranking-based [25] and clustering-based [26] techniques have been explored as well to reduce the dimensions.

The aforementioned dimension reduction techniques focus on the global features and consider HSI as a single dataset and give a singular reduced spectral dataset. However, a Novel perspective emerges when we consider hyperspectral images as a multi-view dataset. In traditional multi-view learning scenarios, multiple sets of features or views of the same data are available. The term ’views’ refers to the different ways in which the data is observed, captured, or represented. These views provide complementary information that can enhance the learning process and lead to more robust models. Recent researches have focused on multi-view feature learning and their fusion along with supremacy of Support Vector Machine (SVM). A variety of features like Local Binary Pattern (LBP), Gabor Filters, Gray Level-Gradient Co-occurrence matrix (GLCM) were fused to provide important features to SVM to detect weed and corn in a field [27]. Kernelized SVM helped to fuse different selected features for multi-class classification with recursive feature elimination [28]. Multi-view features were embedded with SVM to detect clouds in remote sensing images [29]. Multiple feature views extracted through GLCM and LBP fused with SVM also improved classification of mitotic and non-mitotic cells in breast cancer images [30]. Hence, Multi-view feature learning forms a strong research area for application on HSI. HSI by their very nature, can be treated as a multi-view dataset by partitioning them on the basis of each wavelength at which the scene has been captured. Each spectral band provides us a distinct view of the same scene. To extract useful patterns from multiple views and extract features simultaneously, Canonical Correlation Analysis (CCA) has been proposed to be a useful technique.

1.1. Motivation

The major motivation of this paper is to highlight the superiority of CCA in comparison with PCA. As discussed above, many researchers have addressed high dimensionality of HSI and subsequently classified it. To the best of our knowledge, this is the first study to employ CCA while treating HSI as multi-view and obtaining reduced spectral features. The reduced features have been projected to SVM for semantic segmentation (dense per pixel prediction), owning to its aforementioned advantages and good performances.

Pairwise CCA (PCCA) and Multi-Set CCA (MCCA) have been employed to extract features in the multi-view fashion. While both techniques share the common objective of uncovering correlations across multiple views, they differ significantly in their approach and applicability. PCCA is a variant of CCA that operates by computing pairwise correlations between each pair of views and extracts features between two views at a time.

On the other hand, MCCA is generalized CCA, which extends the concept of CCA to more than two views simultaneously. Unlike PCCA, which focuses on pairwise comparisons, MCCA considers the joint correlation structure across multiple views. It aims to uncover common latent factors that are shared across all views, capturing the global relationships among all datasets simultaneously.

The suitability and effectiveness of the CCA-SVM approach in semantic segmentation of landcover HSI, can further enhance the opportunities among the research community in this domain.

1.2. Contribution

The following are the main contributions of the paper to the field of HSI analysis:

A novel multi-dimensional perspective and feature extraction approach to HSI analysis is introduced, treating each hyperspectral image as multi-view/multi-modal dataset. Each spectral band of the image provides a distinct “view” of the scene, facilitating innovative techniques for simultaneous feature extraction from multiple views. For this, CCA is applied for the first time due to its ability to capture correlated information across all the views simultaneously.
The extracted multi-view features using CCA are used to predict per pixel class and performed semantic segmentation using SVM.
A rigorous experimental investigation is conducted to evaluate the performance metrics associated with semantic segmentation of HSIs using multi-view features obtained through CCA. The approach is meticulously compared with conventional feature extraction technique PCA and other machine learning algorithms for segmentation. This comprehensive analysis not only validates the effectiveness of the Proposed Technique but also provides insights into its superiority over existing techniques.

1.3. Organisation of Paper

This paper is structured as follows: Section 1 introduces the concept of HSI, highlighting one of the major challenges of HSI which is Curse of Dimensionality and how semantic segmentation helps in HSI analysis. Section 2 discusses the existing state of the art techniques for HSI dimension reduction and segmentation. Section 3 gives a deeper and mathematical understanding of the techniques proposed in this research paper. Section 4 outlines our Proposed Technique, elucidating the fusion of CCA and SVM within the context of per-pixel prediction i.e., semantic segmentation. Section 5 presents our experimental setup, including the datasets used and performance metrics employed. The results and discussion are provided as well, offering insights into the effectiveness of our approach. Finally, Section 6 encapsulates our conclusions and outlines avenues for future research.

2. Background

Literature suggests for handling huge spectrum, Principal Component Analysis (PCA) has been a commonly employed technique to reduce the spectral dimension for HSI analysis in various applications. In 2002, Rodarmel and Shan [31] introduced the application of statistical approach of PCA on HSI to reduce the number of spectral bands. Their study on two landcover datasets HYDICE and AVIRIS revealed that only first few Principal Components (PCs) gave about 70% correct classification rate. Their results were benchmarked for other researchers who heavily relied on first few PCs in their experiments.

Even till recently, PCA stands as the most widely adopted pre-processing step for dimensionality reduction, which optimizes variance preservation and retains comprehensive spatial information on a global scale. Castaings et al. [32] employed PCA and a guassian kernel PCA to get informative bands. The reduced spectrum along with morphological operations, was used to classify hyperspectral images. Che et al. [33] segmented the bruises on apple, where PCA was primarily used to remove redundant information from HSI cube. The first three PCs were used as input for the segmentation task. Torres et al. [34] reduced the abundant spectral information using a combination of PCA and Analysis of Variance (ANOVA) for separating the green oranges from leaves. Fernandez et al. [35] pointed out the complex matrix computations of eigenvectors and eigenvalues on such high dimensional data that were needed for PCA. So they introduced a reconfigurable hardware version of PCA using field-programmable gate arrays for high-speed computing of HSI. Sun et al. [36] selected informative bands from HSI using fast and robust PCA on Laplacian graph. The original data could be cleanly represented in low-rank approximation due to PCA. After double PCA implementation by Li et al. [37], the third PC of HSI gave the best results for segmenting the decayed portions of apples. Due to global variance in data and the dominance of visible and near-infrared bands, PCA is unable to extract the fundamental structure of HSI, and cumulative-variance accumulation is unable to take advantage of non-linear correlations among the transformed features that PCA-based approaches provide. Hence, Uddin et al. [38] used Mutual Information based Minimum Redundance and Maximum Relevance measure based PCA to capture non linear features of HSI. The existing literature is saturated with a variety of combinations of PCA employed on HSI to reduce dimensions.

CCA and PCA can be considered as sister techniques. However, PCA ignores the loss of information and lack of interpretation of the principal components in terms of original features. A brief abstract overview and difference between PCA and CCA has been depicted in Figure 1. Since it can reduce the labeled instance complexity, CCA, which was first developed to measure the linear correlation between two sets of variables, was formally described as a multi-view dimensionality reduction technique by Foster et al. [39]. An additional attractive feature of CCA is that in the event that noise in one view is uncorrelated to the other, the learnt subspace will not contain the noise in the uncorrelated dimensions [40].

CCA is an effective approach for extracting features from multi-view images by uncovering the relationships between different sets of variables. Recent researches show it improves the robustness of classification techniques by combining the strengths of different viewpoints to create a more comprehensive representation. CCA created a hybrid and improved kernel for HSI classification by capturing similarities between spectral and spatial features and hence combining them for SVM [41]. It has also addressed limited training samples issue by performing cross-domain collaborative learning [42]. It assigns labels to unknown samples based on correlation between target clusters created using random walker algorithm. With the advent of big data, CCA has helped to find patterns and associations between hundreds or even thousands of behavioural, neurological, and genetic phenotypic subject characteristics of brain image dataset in neuroscience [43]. CCA has also played a significant role in medical field by combining with deep learning techniques to segment white blood cells and classify blood samples [44]. CCA extracted various features from overlapping nuclei, compressed input images’ dimensions and helped network converge faster. Along with dimension reduction, the powerful predictive power of machine learning algorithms has shown great results for the classification and segmentation of such images [45].

For the last decade, Deep Learning based techniques have been employed to perform Semantic Segmentation [46]. Some popular Convolutional Neural Network (CNN) architectures have been used on which various researchers have based their semantic segmentation. Their work is inspired from architectures of AlexNet [47], VGGNet [48] Unet [49] and GoogleNet [50]. AlexNet is made up of three fully-connected layers and five convolutional layers. A pooling layer, designed to lower dimensionality, sits between two convolutional layers that are next to each other. Three fully linked layers and several convolutional layers make up VGGNet. VGG-11 and VGG-16 are two examples of a suite of VGGNets that may be made by changing the number of convolutional layers. Three things set GoogLeNet apart from other CNN variations: it has one fully connected layer, has an inception module, and uses auxiliary classifiers during training. The inception model concatenates the filtering results with the max-pooling result after applying filters of three sizes: 1 × 1, 3 × 3, and 5 × 5 to the input. The things that set GoogLeNet apart from other CNN variations: it has one fully connected layer and has an inception module which concatenates the filtering results with the max-pooling result after applying filters of three sizes: 1 × 1, 3 × 3, and 5 × 5 to the input. Unet has been a frequently applied architecture to semantically segment HSI. It consists of an expanding path (decoder) and a contracting path (encoder) forming a symmetric U-shaped structure. The contracting path gradually reduces spatial dimensions while increasing feature depth, capturing context through the use of convolutional and max-pooling layers. In order to provide accurate localization and comprehensive segmentation outputs, the expanding path upsamples the feature maps using transposed convolutions and concatenates them with equivalent feature maps from the contracting path. Some researchers combined UNet with Transformers to extract spectral spatial features in HSI [51].

Despite their impressive capabilities, deep learning models come with some significant drawbacks. The limited HSI samples are challenging for these models as they are heavily reliant on large volumes of labeled data for effective training, which can be both costly and time-consuming to gather. Another issue is their tendency to overfit, particularly when the HSI dataset is small, resulting in poor generalization to new, unseen data. The training and testing samples tend to overlap which also cause overfitting. Additionally, deep learning models get computationally extensive and maintaining hyperparameters becomes challenging. Machine learning Techniques like SVM [52] become a robust choice to perform semantic segmentation on HSI due to its ability to handle high-dimensional data efficiently by finding optimal hyperplane that maximizes class separation for each pixel.

SVM has been a powerful, supervised learning algorithm for HSI analysis in a variety of applications. It excels in scenarios where the data exhibits complex, nonlinear relationships and is particularly well-suited for hyperspectral images due to their high dimensionality. SVM classification is based on the formation of a decision boundary that owns the largest margin of separation between data samples from several classes. This decision boundary could be linear or non-linear with the help of kernels [53]. The ability of SVM to discern complex, nonlinear relationships and form decision boundaries with the maximum margin of separation between classes makes it invaluable for classification and segmentation tasks. Hu et al. [54] utilized gaussian kernel SVM for spectral spatial classification of Pavia University data along with mathematical morphology. The SVM multi-class results were converted into multiple binary classification results to reduce the noise. Ye et al. [55] applied HSI analysis to detect decay in potatoes where too SVM played a pivotal role. To achieve the best detection accuracy, the parameters of SVM were tuned through the grid search method. In the approach by Ji et al. [56], SVM played a major role in the non-invasive identification of defective potatoes. On being combined with k-means clustering, it gave the desired results. Apart from being in use in agriculture and food quality assessment, SVM has been utilized to identify multi-class landcover scenes as well. Akbari et al. [57] employed SVM on spectral spatial features obtained through Gabor and wavelet filters on Pavia University (PU), obtaining high classification of 96.7% accuracy. SVM has also been applied in multiple kernel learning environment, like Wang et al. [58] utilized spectral, semantic and spatial features infused kernel. This composite kernel SVM was used to classify the landcover Indian Pines and Pavia University dataset. Anand et al. [59] also fed 3D spectral-spatial features of these datasets into SVM for per-pixel prediction. Miclea et al. [18] reduced spectral dimensions through wavelet transform and classified using SVM. As evident from above discussed literature, the ability of SVM to discern complex, nonlinear relationships and form decision boundaries makes it invaluable for classification and segmentation tasks.

3. Materials and Methods

In this section we discuss the datasets and major methods used for multi-view features extraction and performing its semantic segmentation.

3.1. Datasets Description

The following datasets have been used for the implementation of the Proposed Technique.

3.1.1. Indian Pines

Indian Pines is situated in Indiana, USA. This landscape, which includes lush, agricultural land, vegetation, and a natural woodland area, has a total resolution of 145 × 145 × 224. The water’s ability to reflect and refract light causes the water’s absorption bands to disappear. Bands 104 to 108, 150 to 163, and 200 have been eliminated. As a result, the updated image has 200 bands. As seen in Figure 2, it has 16 distinct classes with a total pixel count ranging from 20 to 2455.

3.1.2. Pavia University

The ROSIS sensor uses a flight above Pavia, Italy, to capture the scene at Pavia University. The scene resolution was originally 610 × 610 × 115, however, some areas of the image were removed since they were missing information, leaving 610 × 340 × 103 as the final resolution. As illustrated in Figure 3, it comprises nine distinct classes and has a geometric resolution of 1.3 m. Radiometric adjustments involve the removal of 12 bands.

3.1.3. Salinas Valley

The Salinas scene, with a high spatial resolution of 3.7 m per pixel, is located in California. This scene is recorded by the AVIRIS sensor. It has a resolution of 512 × 217 × 224. Removed are the bands ranging from 108 to 112, 154 to 167, and 224. There are 204 bands on the updated picture. Figure 4 illustrates its 16 classes, which include stubble, Romaine lettuce, green broccoli weeds, and fallow.

3.2. Canonical Correlation Analysis

CCA is a statistical technique employed to examine the associations between two sets of variables. It seeks linear combinations of variables in each set, referred to as canonical variables, such that the correlation between corresponding canonical variables is maximized. CCA is particularly useful when exploring relationships between multivariate datasets, aiming to uncover latent structures and patterns that exist between them. The primary objective of CCA is:

Maximize Correlation: Find pairs of canonical variables, one from each set, such that the correlation between these pairs is maximized. These canonical variables capture the shared variance between the sets.
Sequential Independence: Subsequent canonical variables are sought in a way that ensures they are uncorrelated with previous canonical variables. This ensures that each new canonical variable captures distinct, independent information.

Let

X = [x_{1}, x_{2}, x_{3}, \dots, x_{N}] \in R^{d_{x} \times N}

and

Y = [y_{1}, y_{2}, y_{3}, \dots, y_{N}] \in R^{d_{y} \times N}

represent two data matrices with number of instances as ‘N’ and

d_{x}, d_{y}

represent the features of X and Y data matrices, respectively. CCA works towards finding ‘K’ pairs of linear projections, which are called Canonical Vectors represented as

W_{x} = [w_{x, 1}, w_{x, 2}, \dots w_{x, K}] \in R^{d_{x} \times K}

and

W_{y} = [w_{y, 1}, w_{y, 2}, \dots w_{y, K}] \in R^{d_{y} \times K}

. These vectors are such that the correlation between

{W_{x}}^{T} X

and

{W_{y}}^{T} Y

is maximized. If we consider a canonical vector

w_{x} \in R^{d \times 1}

for X and a canonical vector

w_{y} \in R^{d \times 1}

for Y, the correlation coefficient

ρ

is maximised by CCA between

X^{T} w_{x}

and

Y^{T} y_{x}

as shown in Equation (1)

ρ (X^{T} w_{x}, Y^{T} y_{x}) = \frac{{w_{x}}^{T} X Y^{T} w_{y}}{\sqrt{({w_{x}}^{T} X X^{T} w_{x}) ({w_{y}}^{T} Y Y^{T} w_{y})}}

(1)

The Equation (1) in the constrained form is represented as:

\begin{matrix} max_{w_{x}, w_{y}} {w_{x}}^{T} X Y^{T} w_{y} \\ s . t . {w_{x}}^{T} X X^{T} w_{x} = 1, {w_{y}}^{T} Y Y^{T} w_{y} = 1 \end{matrix}

(2)

The Covariance matrix

X X^{T}

or

Y Y^{T}

is singular due to high feature dimensionality which results in undetermined optimization problem. Also, Regularization in CCA helps to deal with noise in the data. By introducing Equations (3) and (4), regularizations to the covariance matrices can help with the problem.

\sum_{x x} = \frac{1}{N} X X^{T} + r_{x} I

(3)

\sum_{y y} = \frac{1}{N} Y Y^{T} + r_{y} I

(4)

where

r_{x}, r_{y}

are regularization coefficients.

It is assumed that the noise vectors are identically distributed, gaussian, and independent in the

d_{x}

and

d_{y}

columns of X and Y, respectively. As a result, all possible combinations of the covariances of the columns in X and Y will equal zero, except the covariance between a specific column vector and itself. This variance of each column of X and Y is labeled

r_{x}

and

r_{y}

are called the regularization coefficients. For choosing these parameters, let

w_{x}^{i}

and

w_{y}^{i}

be the weights calculated through CCA when data samples

X_{i}

and

Y_{i}

are removed, where

i \in {1, 2, \dots, n}

. The coefficients

r_{x}

and

r_{y}

are chosen through Grid Search [60] optimization of the following cost function [61]:

m a x_{r_{x}, r_{y}} [c o r r ({X_{i} w_{x}^{i}}_{i = 1}^{n}, {Y_{i} w_{y}^{i}}_{i = 1}^{n})]

(5)

where ‘corr’ refers to the Pearson’s Correlation Coefficient. Whenever a sample i is removed, the above cost function measures changes in

w_{x}^{i}

and

w_{y}^{i}

and seek optimal values of the regularization coefficients where the change is minimized [62].

Eigen Value decomposition (EVD) and Singular Value Decomposition (SVD) are two major approaches for calculating

W_{x}

and

W_{y}

.

Generalised EVD is formulated using:

$[\begin{matrix} 0 & \sum_{x y} \\ \sum_{y x} & 0 \end{matrix}] [\begin{matrix} w_{x} \\ w_{y} \end{matrix}] = λ [\begin{matrix} \sum_{x x} & 0 \\ 0 & \sum_{y x} \end{matrix}] [\begin{matrix} w_{x} \\ w_{y} \end{matrix}]$

(6)
Here, substitutions are made where ∑’s are Covariance matrices.

$\begin{matrix} \sum_{x y} = \frac{1}{N} X Y^{T}, \\ \sum_{y x} = \frac{1}{N} Y X^{T} . \end{matrix}$

(7)
The top ‘K’ generalised eigen vectors are: ${[w_{x}, k; w_{y}, k]}_{k = 1}^{K}$ .
The Kth generalised eigen value is equal to the correlation $ρ (w_{x, k}^{T} X, w_{y, k}^{T} Y)$ .

The second solution to calculate

W_{x}, W_{y}

involves performing SVD.

SVD is formulated on a matrix T.
The matrix T is given by the expression, $T = \sum_{x x}^{- 1 / 2} \sum_{x y} \sum_{y y}^{- 1 / 2}$ .
Let $W_{x}, W_{y}$ be the Kth leading left and right singular vectors of matrix T.
This gives us canonical matrices which will be $\sum_{x x}^{- 1 / 2} W_{x}, \sum_{y y}^{- 1 / 2} W_{y}$ .
The Kth leading singular vector T is equal to the correlation $ρ (w_{x, k}^{T} X, w_{y, k}^{T} Y)$ .

On obtaining

W_{x}, W_{y}

, Canonical Variables i.e., the new projected features are computed as:

Z_{x} = W_{x}^{T} X

and

Z_{y} = W_{y}^{T} Y

.

However, CCA works for two views at a time. To utilise multiple views in a single time, MCCA can be employed. Let’s denote

X X^{T}

as the Covariance matrix

C_{x x}

. Similarly,

X Y^{T}

would be

C_{x y}

and so on. Using the transformations

w_{k}, w_{q}

MCCA can be computed, where

k, q

are the set of indices of the views varying as

k, q = 1, 2, 3 \dots K

and ‘K’ are the total number of views. It satisfies the following:

\begin{matrix} a r g max_{w_{k}, w_{q}} \frac{\sum_{k q} {(X_{k} w_{k})}^{T} (X_{q} w_{q})}{\sqrt{\sum_{k} {(X_{k} w_{k})}^{T} (X_{k} w_{k})} \sqrt{\sum_{q} {(X_{q} w_{q})}^{T} (X_{q} w_{q})}} \\ = \frac{\sum_{k q} w_{k}^{T} C_{k q} w_{q}}{\sqrt{\sum_{k} w_{k}^{T} C_{k k} w_{k}} \sqrt{\sum_{q} w_{q}^{T} C_{q q} w_{q}}} \end{matrix}

(8)

The denominators can be fixed as

w_{k}^{T} C_{k k} w_{k} = 1

, since we are interested in the direction of

w_{k}

and we get:

\begin{matrix} a r g max_{w_{k}, w_{q}} \sum_{k q} w_{k}^{T} C_{k q} w_{q} \\ s . t . \sum_{k} w_{k}^{T} C_{k k} w_{k} = 1 \sum_{k q} w_{k}^{T} C_{k} q w_{q} \end{matrix}

(9)

By using Lagrange multipliers

λ

and equating the partial derivatives to zero, above optimization can be solved. This sums up the MCCA as:

\begin{matrix} [\begin{matrix} C_{11} & C_{12} & \dots & C_{1 K} \\ C_{21} & C_{22} & \dots & C_{2 K} \\ . & . & \dots & . \\ . & . & \dots & . \\ C_{K 1} & C_{K 2} & \dots & C_{K K} \end{matrix}] [\begin{matrix} w_{1} \\ w_{2} \\ . \\ . \\ w_{K} \end{matrix}] \\ = λ [\begin{matrix} C_{11} & 0 & \dots & 0 \\ 0 & C_{22} & \dots & 0 \\ . & . & \dots & . \\ . & . & \dots & . \\ 0 & 0 & \dots & C_{K K} \end{matrix}] [\begin{matrix} w_{1} \\ w_{2} \\ . \\ . \\ w_{K} \end{matrix}] \end{matrix}

(10)

To obtain CCA for just two views would be a special case of the above equations.

3.3. Support Vector Machine

SVM constitute a powerful class of supervised machine learning algorithms widely employed in classification and regression tasks. Developed by Cortes et al. [63], SVM has since gained popularity for their effectiveness in handling high-dimensional data and achieving robust generalization performance. For a binary classification problem with two classes, let’s denote the classes as +1 and −1. The goal of SVM is to find a hyperplane represented by the Equation (11):

f (x) = w \cdot x + b

(11)

where w is the weight vector, x is the input vector and b is the bias term. The decision rule for classification is based on the sign of

f (x)

:

Predict class y = \{\begin{matrix} + 1 & if f (x) \geq 0 \\ - 1 & if f (x) < 0 \end{matrix}

(12)

The goal of SVM is to maximise the margin i.e., the distance between the two closest points of different classes, which are the support vectors. The distance from a data point to the decision boundary is given by the equation:

Distance = \frac{| f (x) |}{∥ w ∥}

(13)

where w represents the Euclidean magnitude of the weight vector. The objective of SVM is to maximise the margin while reducing the classification errors. It is achieved by solving optimization problem as:

Minimise \frac{1}{2} {∥ w ∥}^{2}

(14)

subjected to the constraints,

y_{i} (w \cdot x_{i} + b) \geq 1

(15)

for all data points

(x_{i}, y_{i})

where,

y_{i}

is the class label of the i-th data point. Whenever the data is not linearly separable, SVM uses Kernel Trick i.e., it uses a mathematical function to transform the data into higher dimensions where it is separable [64].

4. Proposed Technique

The hyperspectral image is envisioned to be segmented by utilizing the hyperspectral data’s block structure in a multi-view manner. The entire hypercube is automatically divided into disjoint sub-blocks along the spectral dimension. An overview of the proposed technique to perform semantic segmentation on HSI is shown in Figure 5.

With each spectral band essentially representing a distinct “view” of the same scene. These views encapsulate valuable information pertaining to the unique spectral characteristics of the materials and objects present in the scene. By treating each spectral band as an individual view, CCA can be employed to uncover meaningful relationships and correlations among these views. The key insight behind this approach lies in the fact that different spectral bands may convey complementary information. While some bands may excel in capturing the spectral signatures of specific materials, others may be more adept at highlighting spatial or structural details. By applying CCA to these multi-views, it becomes possible to extract features that are not only representative of the spectral information but also incorporate spatial characteristics. However, considering each spectral band as a different view for CCA hikes the computational complexity.

In practical terms, hyperspectral images are divided into subsets or views, with each subset corresponding to a specific wavelength or band. CCA is then applied to identify linear combinations of bands that exhibit the highest correlation. These linear combinations effectively serve as new features, encapsulating both spectral and spatial information from the original dataset.

4.1. Multi-View Feature Extraction

CCA is a multivariate statistical technique used to explore relationships across two datasets at a time. The algorithm begins with the pre-processing step, standardizing each dataset to have zero mean and unit variance. This normalization ensures that each dataset contributes equally to the analysis, regardless of differences in scale. CCA works on two views at a time, and is referred as PCCA in our paper.

Alongwith PCCA, we have also utilised MCCA because the former considers the information locally among two datasets at a time only. Whereas, MCCA provides simultaneous exploration of shared information across all the datasets via reduced features, thus also reducing the overload and time. In the context of MCCA, PCCA is a special case when there are only two datasets.

The next step involves formulating covariance matrices for within-dataset and between-dataset relationships. These matrices, denoted as

C_{i i}

and

C_{i j}

, capture the statistical associations within each dataset and the relationships between different datasets. Solving the generalized eigenvalue problem leads to the identification of canonical vectors

a_{1}, a_{2}, \dots, a_{k}

and their corresponding canonical correlations

λ_{1}, λ_{2}, \dots, λ_{m}

. These canonical variables represent linear combinations of the original features that maximize the associations between datasets.

The output of CCA, including the canonical vectors and canonical correlations, can be further utilized in various applications. In this study, the exploration of hyperspectral image data involves the application of CCA to unveil intricate relationships among various views and reduce dimension of HSI. The derived canonical variables, representing consolidated information from different spectral views, serve as reduced features that encapsulate the underlying multi-view structure. These canonical variables, extracted through the synergy of MCCA, are instrumental in capturing the shared information across multiple datasets. Leveraging these reduced features, subsequent analyses, such as semantic segmentation using SVM, gained a comprehensive understanding of the inter-dataset relationships. The Canonical variables provide a compact representation that enhances the efficiency and interpretability of downstream tasks, thereby contributing to the advancement of hyperspectral image analysis techniques.

Treating each spectral band as a different view, HSI gives us the automatic opportunity to exploit it for CCA and reduce the dimension of original HSI cube. Rather than picking every band as a view for CCA, we divided the whole dataset into subsequent groups of views. This reduced the computational overload for CCA and gave better results. For one single HSI dataset with hundreds of bands,

K = 4

number of views were generated by combining together spectral bands from the consecutive wavelengths. This became four different datasets

X_{1}, X_{2}, X_{3}, X_{4}

for CCA to act upon.

4.2. Semantic Segmentation Using Support Vector Machine

SVM is a powerful machine learning algorithm commonly used for classification tasks. In this paper, we focus on two fundamental tasks: per-pixel prediction and semantic segmentation. Per-pixel prediction involves the classification of each individual pixel within an HSI dataset into specific land-cover categories. Semantic segmentation, on the other hand, extends this concept by partitioning the entire image into coherent regions, each corresponding to a distinct class or category. The algorithm begins by splitting the dataset into training and testing sets to evaluate the model’s performance. To ensure consistent and meaningful comparisons, the feature matrix obtained from CCA is normalized to have zero mean and unit variance. This step is crucial for preventing certain features from dominating the learning process due to their larger scales.

The selection of an appropriate kernel function and regularization parameter, C is a pivotal aspect of SVM which has been selected through Grid Search Optimization. The kernel computes the similarity between data points in the transformed space, where the choice of the kernel bandwidth parameter,

σ

, is crucial in capturing the relationships between the features. The SVM model is trained using the normalized feature matrix, incorporating the kernel matrix that quantifies the pairwise similarities between data points. C balances the trade-off between achieving a low training error and a simple model, helping prevent overfitting.

Once trained, the SVM model is capable of predicting the class labels of new data points. This is achieved by computing the decision function, which evaluates the weighted sum of the Lagrange multipliers (

α_{i}

) and the corresponding labels (

y_{i}

) for the support vectors. The model’s performance is then assessed on the testing set, using standard metrics such as accuracy. Overall, SVM is a versatile and effective algorithm for semantic segmentation tasks, providing a robust solution when combined with feature reduction techniques like CCA.

The Proposed Technique of extracting features from the multi-view dataset and performing Semantic Segmentation using SVM has been discussed in Algorithm 1.

Algorithm 1: Semantic Segmentation using Multi-View Features Algorithm

Require: Hyperspectral image dataset

X_{n}^{m}

with n pixel samples and m spectral bands, Number of Subsets/Views K.
Ensure: Predicted Classes of each Pixel

C = 1, 2, 3 \dots C_{i}

.

1:: begin
2:: Divide the hyperspectral image dataset X into K subsets, each containing adjacent spectral bands. Denote these subsets as ${X_{i}}_{i = 1}^{K}$ .
3:: for j from 3 to 10 do
4:: Standardize the data in every $X_{i}$ to have zero mean and unit variance.
5:: Perform CCA to extract Canonical Variables $U_{j}$ .
6:: Combine the least correlated $U_{j}$ to form a feature matrix U.
7:: end for
8:: Hyper-tune the different Parameters of SVM like kernel function, C and $g a m m a$ using Grid Search Optimization.
9:: Compute the kernel matrix M based on the chosen best values of kernel function, C and $g a m m a$ .
10:: Train the SVM classifier on U.
11:: Predict the class labels for each pixel in the hyperspectral image using the trained SVM model.
12:: end

5. Experimental Results and Discussion

The proposed technique has been implemented using Python 3.10.9 on Jupyter Notebook IDE running on the Lenovo Legion 5 Pro system with an Intel Core i7 processor and 16 GB RAM, manufactured in India. The experiments have been performed on real Hyperspectral images which are landcover scenes i.e., IP, Pu and SA. To tackle such high dimension of spectral features, HSI has been employed in a new light of having multiple views and extracted features using CCA variants.

5.1. Evaluation Parameters

The Semantic Segmentation i.e., per pixel classification is done with the help of SVM. The performance of the proposed technique is measured through parameters listed below.

Overall Accuracy (OA):- It measures the percentage of correctly classified samples. It is summation of accurate recognition of samples divided by total number of samples.

$O A = \frac{\sum_{i = 1}^{C} M_{i i}}{N}$

(16)

where C is the total number of labels/classes. $M_{i i}$ represents the samples that actually belonged to ith class and were predicted to belong to ith class.
Average Accuracy (AA):- It measures the average percentage of correctly classified samples for an individual class.

$A A = \frac{(\sum_{i = 1}^{C} (M_{i i} / \sum_{i = 1}^{C} M_{i j}))}{C}$

(17)

where C is the total number of labels/classes. $M_{i j}$ represents the samples that actually belonged to ith class and were predicted to belong to jth class.

5.2. Comparitive Analysis of PCA and CCA

The analysis of hyperspectral images is very challenging due to many complexities. One such complexity is Hughes phenomena or the curse of dimensionality which when handled beforehand aids in smooth HSI analysis. As discussed earlier, PCA has been the most common technique for dimension reduction. However, it does not focus on HSI as a composition of hundreds of views or single-band datasets. PCA works on single dataset and gives reduced features, capturing maximum variance. With CCA, we have been able to reduce dimension by treating HSI in its original habitat i.e., multiple views. By applying PCCA which works locally between two views at a time, we have achieved higher OA than PCA for semantic segmentation of IP, PU and SA. The paper also implements an improved and faster version of CCA which is MCCA that works on views globally and gives higher OA on all the three datasets. The detailed comparison between PCA, PCCA and MCCA has been highlighted in Table 1, Table 2 and Table 3.

Table 1, Table 2 and Table 3 bring out the better performance of CCA as compared to PCA for semantic segmentation with SVM using reduced features of IP, PU and SA datasets resp. The features reduced using the inherent multi-view nature of HSI have given a better performance than the regularly reduced features using PCA. In the earlier studies of PCA implementation for dimension reduction, researches have usually selected the value of Number of Components

= 3

. Here, the Number of Components means the number of reduced features or Principal Components being selected. For CCA, Number of Components selects the number of Canonical Variables which is a linear combination of original features. Table 1 presents the results of feature extraction with 3 number of components on the IP dataset with the highest OA of 77.02% and 85.75% for PCCA and MCCA, respectively as compared to OA 62.32% by PCA. For the same number of components on PU dataset PCA records an OA of 78.83% whereas a significant jump in OA is observed by PCCA and MCCA, giving an OA of 89.95% and 90.90 %,resp. which indicates better dense pixel prediction. The proposed technique of PCCA and MCCA records the highest OA on SA as compared to IP and PU. For 3 number of components, an OA of 92.15% and 93.88% is recorded for PCCA and MCCA, resp.

Although, choosing No. of Components

= 3

can be the criteria according to earlier researches too but for experimental and comparison point of view, the performance of PCA and CCA has been compared against a range of

3, 4, 5 \dots 10

. It has shown superiority of multi-view features extraction using CCA as compared to PCA. Figure 6, Figure 7 and Figure 8 show the resultant segmentation maps obtained from the PCA-SVM, PCCA-SVM and MCCA-SVM. These have been compared together with the Ground Truth images of the datasets. These figures show comparison of ground truth, PCA, PCCA and MCCA on IP, PU and SA, resp. where MCCA showcases the best segmentation map with lesser misclassified pixels (seen as noise in the map) followed by PCCA. It is evident from the images of better segmentation and lesser misclassified pixels using the Proposed Technique with CCA variants.

5.3. Comparison of Proposed Technique with Existing Techniques

The performance of the proposed technique is compared with other traditional machine learning algorithms i.e., Naive Bayes ([65]), Random Forest ([66]) and k-NN ([67]).

5.3.1. Pairwise Canonical Correlation Analysis

In PCCA, the analysis is conducted independently for each pair of views of dataset, seeking canonical variables that maximize the correlation between the two views. The comparison of different ML algorithms has been evaluated against different number of Canonical Variables i.e., No. of Components. This approach explores the relationships between pairs of datasets individually, providing insights into the shared information between each distinct combination. It has aided us in dimension reduction which enabled semantic segmentation of SVM through reduced, extracted features. Table 4, Table 5 and Table 6 show the robustness of SVM to handle high dimensionality and with appropriate kernel selection enhancing its performance compared to other machine learning classifiers. Table 4 compares OA of PCCA with different ML techniques on IP where it is evident that SVM when combined with PCCA gives the highest OA of 77.02% followed by RF with OA of 71.80% for number of components = 3. Similarly, Table 5 highlights supremacy of SVM for PU, with an OA of 89.95% followed by RF with OA of 86.58%. For SA, SVM with CCA recorded the highest OA of 92.15%.

The superiority of the fusion of PCCA and SVM is also evident in the Class-wise accuracies of the HSI dataset for No. of Components = 3 as shown in Table 7, Table 8 and Table 9. The proposed technique of PCCA-SVM records an AA of 78.00% on IP as shown in Table 7, which is the highest as compared to AA of 70%, 64.00% and 70.50% of RF, Naive bayes and KNN. For PU in Table 8, PCCA-SVM give the highest OA of 87.00% in comparison with the AA of 80%, 72% and 78% of RF, Naive Bayes and KNN. Amongst all the datasets used in the research, highest AA of 96.00% is obtained for SA as shown in Table 9. RF records the second best AA of 95.00% followed by KNN with an AA of 94.00%.

The segmentation maps generated after application of PCCA and different machine learning techniques has been shown in Figure 9, Figure 10 and Figure 11. The proposed technique gives better segmentation maps in comparison to other techniques. In Figure 9 PCCA-SVM produces the least distorted segmentation map of IP. The segmentation map of Naive Bayes and KNN produce the noisiest segmentation results with highest misclassified pixels. Similarly, for PU and SA in Figure 10 and Figure 11, the proposed technique produces segmentation map as closest to the Ground Truth indicating more accurate semantic segmentation of the landcover.

5.3.2. Multiple-Set Canonical Correlation Analysis

The PCCA has given better performance than PCA as evident in Section 5.2. We have applied PCCA on four views of every HSI dataset by selecting two views at a time and reduced the dimensions. On pairing the reduced features with SVM for per pixel prediction i.e., semantic segmentation, it outperforms other machine learning algorithms and gives good performance. However, when working with more than two views, sequential application of pairwise CCA might lead to information loss, as the joint interdependencies among all datasets are not considered simultaneously. In such cases, MCCA becomes advantageous, offering a more comprehensive exploration of relationships in multi-view datasets. The comparison of different existing ML techniques has been evaluated against different number of Canonical Variables i.e., Number of Components. Table 10, Table 11 and Table 12 highlight the proposed technique i.e., the fusion of MCCA-SVM robustness over other existing techniques.

Table 10 compares OA of MCCA with different ML techniques on IP where it is evident that SVM when combined with MCCA gives the highest OA of 85.75% followed by RF with OA of 83.41% for number of components = 3. Similarly, Table 11 highlights supremacy of SVM for PU, with an OA of 90.90% followed by RF with OA of 84.59%. For SA, SVM with MCCA recorded the highest OA of 93.88%.

The superiority of the fusion of MCCA and SVM is also evident in the Class-wise accuracies of the HSI dataset for No. of Components = 3 as shown in Table 13, Table 14 and Table 15. The proposed technique of MCCA-SVM records an AA of 85.75% on IP as shown in Table 13, which is the highest as compared to AA of 80%, 64.00% and 79.50% of RF, Naive bayes and KNN. For PU in Table 14, MCCA-SVM give the highest OA of 88.50% in comparison with the AA of 79%, 73% and 78% of RF, Naive Bayes and KNN. Amongst all the datasets used in the research, highest AA of 97.00% is obtained for SA as shown in Table 15. RF records the second best AA of 96.00% followed by KNN with an AA of 94.00%.

The segmentation maps generated after application of the Proposed Technique (MCCA-SVM) and existing techniques have been shown in Figure 12, Figure 13 and Figure 14. The proposed MCCA-SVM gives better segmentation maps in comparison to other techniques with MCCA. In Figure 12 MCCA-SVM produces the best segmentation map of IP closest to the Ground Truth as compared to other techniques. The segmentation map of Naive Bayes produced the noisiest segmentation results with highest misclassified pixels. Similarly, for PU and SA in Figure 13 and Figure 14, the proposed technique produces segmentation map as closest to the Ground Truth indicating more accurate semantic segmentation of the landcover.

It is evident from Table 13, Table 14 and Table 15 that the Proposed Technique has performed better than existing techniques in terms of the metric ‘Average Accuracy (AA)’. For IP, PU and SA dataset, Highest AA of 85.75%, 88.50% and 97.00% is given by the Proposed Technique (MCCA-SVM).

5.3.3. Comparison between Semantic Segmentation Performed Using the Proposed Techniques PCCA-SVM and MCCA-SVM

Understanding HSI as a multi-view dataset and application of CCA has given better results than PCA-SVM for semantic segmentation of HSI. With the implementation of MCCA in conjunction with SVM has notably improved computational results in HSI segmentation when compared to its counterpart, PCCA. PCCA involves multiple analyses, each focusing locally on a pair of views, potentially leading to scalability issues as the number of views increases. In contrast, MCCA addresses the dimensionality challenge by jointly analyzing all views in a unified framework. MCCA excels in handling multiple sets of data simultaneously, enabling a more comprehensive exploration of spectral relationships within hyperspectral scenes. The enhanced computational efficiency of MCCA arises from its parallelism and intrinsic capacity to consider correlations across multiple sets of spectral bands, leading to a better understanding of the intricate relationships present in hyperspectral data. This extension not only refines the characterization of spectral information but also contributes to the optimization of the SVM-based classification process.

By incorporating a broader spectrum of correlations, MCCA provides a richer feature space for SVM, facilitating more accurate and discriminative classification outcomes. Figure 15 highlights the improved OA and AA of MCCA with respect to PCCA.

Table 16 highlights the improvement in computational time of MCCA-SVM as compared to PCCA-SVM. For simplicity, No. of Components is specified as ‘n_comps’ in the Table 16. The results in Table 16 and Figure 16 have been rigorously studied and compared by varying No. of Components for all the three HSI datasets where MCCA-SVM outshines PCCA-SVM.

It is evident in Table 16 that MCCA-SVM has given results in lesser computational time as compared to PCCA-SVM. The lowest time is recorded as 3.02 s for IP dataset, 6.23 s for PU dataset and 8.59 s for SA dataset by Semantic Segmentation using MCCA-SVM. Additionally, MCCA-SVM takes an average of 3.58 s, 8.19 s, and 12.22 s on IP, PU, and SA, respectively. Due to variations in spatial resolution, the time curves for the three datasets differ in Figure 16. IP has the lowest computational time average followed by PU and SA. PU and SA have approximately 4.2 and 5.4 times more pixel samples than IP, respectively, which suggests their higher time curves than IP’s.

6. Conclusions and Future Scope

In conclusion, this research has presented a comprehensive exploration of Canonical Correlation Analysis variants, namely PCCA and MCCA, as powerful techniques in fusion with SVM for hyperspectral image semantic segmentation. Through rigorous experimentation and evaluation on benchmark datasets, the efficacy of these CCA variants in capturing meaningful spectral relationships has been demonstrated. The approach exhibits notable performance in preserving fine-grained details and accurately delineating land cover classes within hyperspectral scenes. The Proposed Technique’s ability to mitigate the curse of dimensionality, effectively exploit spectral information, and adapt to varying scene complexities positions it as a promising solution for HSI segmentation challenges. The fusion of PCCA and MCCA facilitates the capture of pairwise and collective spectral correlations, offering a more holistic representation of hyperspectral data. Furthermore, the scalability of the approach has been evidenced through experiments on datasets with varying spatial and spectral resolutions. The consistent performance across different scenarios underscores the robustness of the CCA-based technique. As the research community endeavours to advance remote sensing capabilities, the suggested future directions emphasize the potential for further refinement and extension of the Proposed Technique. Integration with deep learning models, adaptation to larger datasets, exploration of unsupervised learning scenarios, and incorporation of multimodal data present exciting opportunities for enhancing the accuracy and versatility of hyperspectral image segmentation. In essence, the findings presented in this research contribute to the evolving landscape of remote sensing applications, providing a foundation for future innovations and improvements in hyperspectral image analysis. The demonstrated capabilities of CCA variants pave the way for more sophisticated and effective techniques, fostering advancements in environmental monitoring, agriculture, and resource management.

Author Contributions

R.G.; Conceptualization, Methodology, Result analysis, Visualization, Writing—original draft. G.K.; Investigation and Supervision. S.S.K.; Investigation and Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found at https://rslab.ut.ac.ir/data (accessed on 8 August 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kruse, F.A.; Boardman, J.W.; Huntington, J.F. Comparison of airborne hyperspectral data and EO-1 Hyperion for mineral mapping. IEEE Trans. Geosci. Remote Sens. 2003, 41, 1388–1400. [Google Scholar] [CrossRef]
Yang, C.; Everitt, J.H.; Bradford, J.M. Yield estimation from hyperspectral imagery using spectral angle mapper (SAM). Trans. ASABE 2008, 51, 729–737. [Google Scholar] [CrossRef]
Fagan, M.E.; DeFries, R.S.; Sesnie, S.E.; Arroyo-Mora, J.P.; Soto, C.; Singh, A.; Townsend, P.A.; Chazdon, R.L. Mapping species composition of forests and tree plantations in Northeastern Costa Rica with an integration of hyperspectral and multitemporal Landsat imagery. Remote Sens. 2015, 7, 5660–5696. [Google Scholar] [CrossRef]
Burger, J.; Gowen, A.A. The interplay of chemometrics and hyperspectral chemical imaging. In Proceedings of the 2011 3rd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Lisbon, Portugal, 6–9 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1–4. [Google Scholar]
Manian, V.; Velez-Reyes, M. Support vector classification of land cover and benthic habitat from hyperspectral images. Int. J. High Speed Electron. Syst. 2008, 18, 337–348. [Google Scholar] [CrossRef]
Wang, X.; Luo, G.; Tian, L. Application of hyperspectral image anomaly detection algorithm for Internet of things. Multimed. Tools Appl. 2019, 78, 5155–5167. [Google Scholar] [CrossRef]
Ball, J.E.; Anderson, D.T.; Chan, C.S. Comprehensive survey of deep learning in remote sensing: Theories, tools, and challenges for the community. J. Appl. Remote Sens. 2017, 11, 042609. [Google Scholar] [CrossRef]
Fischer, P.; Azimi, S.M.; Roschlaub, R.; Krauß, T. Towards hd maps from aerial imagery: Robust lane marking segmentation using country-scale imagery. ISPRS Int. J. Geo-Inf. 2018, 7, 458. [Google Scholar] [CrossRef]
Nassar, A.; Amer, K.; ElHakim, R.; ElHelw, M. A deep CNN-based framework for enhanced aerial imagery registration with applications to UAV geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1513–1523. [Google Scholar]
Yuan, X.; Sarma, V. Automatic urban water-body detection and segmentation from sparse ALSM data via spatially constrained model-driven clustering. IEEE Geosci. Remote Sens. Lett. 2010, 8, 73–77. [Google Scholar] [CrossRef]
Jadhav, J.K.; Singh, R. Automatic semantic segmentation and classification of remote sensing data for agriculture. Math. Model. Eng. 2018, 4, 112–137. [Google Scholar] [CrossRef]
Fang, B.; Li, Y.; Zhang, H.; Chan, J.C.W. Hyperspectral images classification based on dense convolutional networks with spectral-wise attention mechanism. Remote Sens. 2019, 11, 159. [Google Scholar] [CrossRef]
Grewal, R.; Kasana, S.S.; Kasana, G. Hyperspectral image segmentation: A comprehensive survey. Multimed. Tools Appl. 2023, 82, 20819–20872. [Google Scholar] [CrossRef]
Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
Datta, A.; Ghosh, S.; Ghosh, A. PCA, kernel PCA and dimensionality reduction in hyperspectral images. In Advances in Principal Component Analysis: Research and Development; Springer: Singapore, 2018; pp. 19–46. [Google Scholar]
Datta, D.; Mallick, P.K.; Bhoi, A.K.; Ijaz, M.F.; Shafi, J.; Choi, J. Hyperspectral image classification: Potentials, challenges, and future directions. Comput. Intell. Neurosci. 2022, 2022. [Google Scholar] [CrossRef] [PubMed]
Jia, S.; Zhao, Q.; Zhuang, J.; Tang, D.; Long, Y.; Xu, M.; Zhou, J.; Li, Q. Flexible Gabor-based superpixel-level unsupervised LDA for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10394–10409. [Google Scholar] [CrossRef]
Miclea, A.V.; Terebes, R.M.; Meza, S.; Cislariu, M. On spectral-spatial classification of hyperspectral images using image denoising and enhancement techniques, wavelet transforms and controlled data set partitioning. Remote Sens. 2022, 14, 1475. [Google Scholar] [CrossRef]
Li, Q.; Zheng, B.; Tu, B.; Wang, J.; Zhou, C. Ensemble EMD-based spectral-spatial feature extraction for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5134–5148. [Google Scholar] [CrossRef]
Liang, N.; Duan, P.; Xu, H.; Cui, L. Multi-View Structural Feature Extraction for Hyperspectral Image Classification. Remote Sens. 2022, 14, 1971. [Google Scholar] [CrossRef]
Paul, A.; Chaki, N. Band selection using spectral and spatial information in particle swarm optimization for hyperspectral image classification. Soft Comput. 2022, 26, 2819–2834. [Google Scholar] [CrossRef]
Paul, A.; Chaki, N. Dimensionality reduction of hyperspectral image using signal entropy and spatial information in genetic algorithm with discrete wavelet transformation. Evol. Intell. 2021, 14, 1793–1802. [Google Scholar] [CrossRef]
Wang, Y.; Zhu, Q.; Ma, H.; Yu, H. A hybrid gray wolf optimizer for hyperspectral image band selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
Phaneendra Kumar, B.L.; Manoharan, P. Whale optimization-based band selection technique for hyperspectral image classification. Int. J. Remote Sens. 2021, 42, 5105–5143. [Google Scholar] [CrossRef]
Datta, A.; Ghosh, S.; Ghosh, A. Combination of clustering and ranking techniques for unsupervised band selection of hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2814–2823. [Google Scholar] [CrossRef]
Beirami, B.; Mokhtarzade, M. Supervised and unsupervised clustering based dimensionality reduction of hyperspectral data. Int. J. Eng. 2021, 34, 1407–1412. [Google Scholar]
Chen, Y.; Wu, Z.; Zhao, B.; Fan, C.; Shi, S. Weed and corn seedling detection in field based on multi feature fusion and support vector machine. Sensors 2020, 21, 212. [Google Scholar] [CrossRef] [PubMed]
Guo, Y.; Zhang, Z.; Tang, F. Feature selection with kernelized multi-class support vector machine. Pattern Recognit. 2021, 117, 107988. [Google Scholar] [CrossRef]
Zhang, W.; Jin, S.; Zhou, L.; Xie, X.; Wang, F.; Jiang, L.; Zheng, Y.; Qu, P.; Li, G.; Pan, X. Multi-feature embedded learning SVM for cloud detection in remote sensing images. Comput. Electr. Eng. 2022, 102, 108177. [Google Scholar] [CrossRef]
Rehman, M.U.; Akhtar, S.; Zakwan, M.; Mahmood, M.H. Novel architecture with selected feature vector for effective classification of mitotic and non-mitotic cells in breast cancer histology images. Biomed. Signal Process. Control 2022, 71, 103212. [Google Scholar] [CrossRef]
Rodarmel, C.; Shan, J. Principal component analysis for hyperspectral image classification. Surv. Land Inf. Sci. 2002, 62, 115–122. [Google Scholar]
Castaings, T.; Waske, B.; Atli Benediktsson, J.; Chanussot, J. On the influence of feature reduction for the classification of hyperspectral images based on the extended morphological profile. Int. J. Remote Sens. 2010, 31, 5921–5939. [Google Scholar] [CrossRef]
Che, W.; Sun, L.; Zhang, Q.; Tan, W.; Ye, D.; Zhang, D.; Liu, Y. Pixel based bruise region extraction of apple using Vis-NIR hyperspectral imaging. Comput. Electron. Agric. 2018, 146, 12–21. [Google Scholar] [CrossRef]
Torres, I.; Sánchez, M.T.; Cho, B.K.; Garrido-Varo, A.; Pérez-Marín, D. Setting up a methodology to distinguish between green oranges and leaves using hyperspectral imaging. Comput. Electron. Agric. 2019, 167, 105070. [Google Scholar] [CrossRef]
Fernandez, D.; Gonzalez, C.; Mozos, D.; Lopez, S. FPGA implementation of the principal component analysis algorithm for dimensionality reduction of hyperspectral images. J. Real-Time Image Process. 2016, 16, 1395–1406. [Google Scholar] [CrossRef]
Sun, W.; Du, Q. Graph-regularized fast and robust principal component analysis for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3185–3195. [Google Scholar] [CrossRef]
Li, J.; Luo, W.; Wang, Z.; Fan, S. Early detection of decay on apples using hyperspectral reflectance imaging combining both principal component analysis and improved watershed segmentation method. Postharvest Biol. Technol. 2019, 149, 235–246. [Google Scholar] [CrossRef]
Uddin, M.P.; Mamun, M.A.; Afjal, M.I.; Hossain, M.A. Information-theoretic feature selection with segmentation-based folded principal component analysis (PCA) for hyperspectral image classification. Int. J. Remote Sens. 2021, 42, 286–321. [Google Scholar] [CrossRef]
Foster, D.P.; Kakade, S.M.; Zhang, T. Multi-View Dimensionality Reduction via Canonical Correlation Analysis; Toyota Technical Institute: Chicago, IL, USA, 2008. [Google Scholar]
Andrew, G.; Arora, R.; Bilmes, J.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; PMLR: Atlanta, GA, USA, 2013; pp. 1247–1255. [Google Scholar]
Chen, H.; Liu, J.; Xiao, L. An improved composite kernel framework for hyperspectral image classification using canonical correlation analysis. Remote Sens. Lett. 2019, 10, 411–420. [Google Scholar] [CrossRef]
Qin, Y.; Bruzzone, L.; Li, B.; Ye, Y. Cross-domain collaborative learning via cluster canonical correlation analysis and random walker for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3952–3966. [Google Scholar] [CrossRef]
Wang, H.T.; Smallwood, J.; Mourao-Miranda, J.; Xia, C.H.; Satterthwaite, T.D.; Bassett, D.S.; Bzdok, D. Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists. NeuroImage 2020, 216, 116745. [Google Scholar] [PubMed]
Patil, A.; Patil, M.; Birajdar, G. White blood cells image classification using deep learning with canonical correlation analysis. Irbm 2021, 42, 378–389. [Google Scholar] [CrossRef]
Grewal, R.; Singh Kasana, S.; Kasana, G. Machine Learning and Deep Learning Techniques for Spectral Spatial Classification of Hyperspectral Images: A Comprehensive Survey. Electronics 2023, 12, 488. [Google Scholar] [CrossRef]
Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1–9. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Niu, B.; Feng, Q.; Chen, B.; Ou, C.; Liu, Y.; Yang, J. HSI-TransUNet: A transformer based semantic segmentation model for crop mapping from UAV hyperspectral imagery. Comput. Electron. Agric. 2022, 201, 107297. [Google Scholar] [CrossRef]
Kaul, A.; Raina, S. Support vector machine versus convolutional neural network for hyperspectral image classification: A systematic review. Concurr. Comput. Pract. Exp. 2022, 34, e6945. [Google Scholar] [CrossRef]
Scholkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Hu, L.; Qi, C.; Wang, Q. Spectral-spatial hyperspectral image classification based on mathematical morphology post-processing. Procedia Comput. Sci. 2018, 129, 93–97. [Google Scholar] [CrossRef]
Ye, D.; Sun, L.; Tan, W.; Che, W.; Yang, M. Detecting and classifying minor bruised potato based on hyperspectral imaging. Chemom. Intell. Lab. Syst. 2018, 177, 129–139. [Google Scholar] [CrossRef]
Ji, Y.; Sun, L.; Li, Y.; Li, J.; Liu, S.; Xie, X.; Xu, Y. Non-destructive classification of defective potatoes based on hyperspectral imaging and support vector machine. Infrared Phys. Technol. 2019, 99, 71–79. [Google Scholar] [CrossRef]
Akbari, D. Improving spatial-spectral classification of hyperspectral imagery by using extended minimum spanning forest algorithm. Can. J. Remote Sens. 2020, 46, 146–153. [Google Scholar] [CrossRef]
Wang, Y.; Yu, W.; Fang, Z. Multiple kernel-based SVM classification of hyperspectral images by combining spectral, spatial, and semantic information. Remote Sens. 2020, 12, 120. [Google Scholar] [CrossRef]
Anand, R.; Veni, S.; Aravinth, J. Robust classification technique for hyperspectral images based on 3D-discrete wavelet transform. Remote Sens. 2021, 13, 1255. [Google Scholar] [CrossRef]
Guo, Y.; Hastie, T.; Tibshirani, R. Regularized linear discriminant analysis and its application in microarrays. Biostatistics 2007, 8, 86–100. [Google Scholar] [CrossRef] [PubMed]
González, I.; Déjean, S.; Martin, P.G.; Baccini, A. CCA: An R package to extend canonical correlation analysis. J. Stat. Softw. 2008, 23, 1–14. [Google Scholar] [CrossRef]
Golugula, A.; Lee, G.; Master, S.R.; Feldman, M.D.; Tomaszewski, J.E.; Speicher, D.W.; Madabhushi, A. Supervised regularized canonical correlation analysis: Integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery. BMC Bioinform. 2011, 12, 483. [Google Scholar] [CrossRef] [PubMed]
Cortes, C.; Vapnik, V. Support vector machine. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Tharwat, A. Parameter investigation of support vector machine classifier with kernel functions. Knowl. Inf. Syst. 2019, 61, 1269–1302. [Google Scholar] [CrossRef]
Everingham, M.; Zisserman, A.; Williams, C.K.; Van Gool, L.; Allan, M.; Bishop, C.M.; Chapelle, O.; Dalal, N.; Deselaers, T.; Dorkó, G.; et al. The 2005 pascal visual object classes challenge. In Proceedings of the Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment: First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, 11–13 April 2005; Revised Selected Papers. Springer: Berlin/Heidelberg, Germany, 2006; pp. 117–176. [Google Scholar]
Cutler, A.; Cutler, D.R.; Stevens, J.R. Random forests. In Ensemble Machine Learning: Methods and Applications; Springer: New York, NY, USA, 2012; pp. 157–175. [Google Scholar]
Deng, Z.; Zhu, X.; Cheng, D.; Zong, M.; Zhang, S. Efficient kNN classification algorithm for big data. Neurocomputing 2016, 195, 143–148. [Google Scholar] [CrossRef]

Figure 1. Overview of PCA vs. CCA. (a) A breakdown of major steps in PCA, (b) A breakdown of major steps in CCA.

Figure 2. Indian Pines image (a) Sample band image (b) Ground Truth.

Figure 3. Pavia University image (a) Sample band image (b) Ground Truth.

Figure 4. Salinas Valley image (a) Sample band image (b) Ground Truth.

Figure 5. Overview of the Proposed Technique for Semantic Segmentation of HSI. (The same image scene captured at different spectral bands is shown as views with different colors).

Figure 6. Comparison of PCA and CCA on IP dataset (a) Ground truth, (b) Segmentation map after PCA-SVM (c) Segmentation map after PCCA-SVM, (d) Segmentation map after MCCA-SVM.

Figure 7. Comparison of PCA and CCA on PU dataset (a) Ground truth, (b) Segmentation map after PCA-SVM (c) Segmentation map after PCCA-SVM, (d) Segmentation map after MCCA-SVM.

Figure 8. Comparison of PCA and CCA on SA dataset (a) Ground truth, (b) Segmentation map after PCA-SVM (c) Segmentation map after PCCA-SVM, (d) Segmentation map after MCCA-SVM.

Figure 9. Comparison of Proposed Technique (PCCA-SVM) with Existing Techniques on IP dataset (a) Ground Truth, (b) Segmentation map after SVM, (c) Segmentation map after RF, (d) Segmentation map after Naive Bayes, (e) Segmentation map after KNN.

Figure 10. Comparison of Proposed Technique (PCCA-SVM) with Existing Techniques on PU dataset (a) Ground Truth, (b) Segmentation map after SVM, (c) Segmentation map after RF, (d) Segmentation map after Naive Bayes, (e) Segmentation map after KNN.

Figure 11. Comparison of Proposed Technique (PCCA-SVM) with Existing Techniques on SA dataset (a) Ground Truth, (b) Segmentation map after SVM, (c) Segmentation map after RF, (d) Segmentation map after Naive Bayes, (e) Segmentation map after KNN.

Figure 12. Comparison of MCCA and different ML techniques on IP (a) Ground Truth, (b) Segmentation map after SVM, (c) Segmentation map after RF, (d) Segmentation map after Naive Bayes, (e) Segmentation map after KNN.

Figure 13. Comparison of MCCA and different ML techniques on PU (a) Ground Truth, (b) Segmentation map after SVM, (c) Segmentation map after RF, (d) Segmentation map after Naive Bayes, (e) Segmentation map after KNN.

Figure 14. Comparison of MCCA and different ML techniques on SA (a) Ground Truth, (b) Segmentation map after SVM, (c) Segmentation map after RF, (d) Segmentation map after Naive Bayes, (e) Segmentation map after KNN.

Figure 15. Comparison of PCCA-SVM and MCCA-SVM in terms of OA and AA on (a) Indian Pines, (b) Pavia University and (c) Salinas Valley.

Figure 16. Comparison of the time taken by PCCA-SVM and MCCA-SVM on (a) Indian Pines, (b) Pavia University and (c) Salinas Valley.

Table 1. Comparison of Overall Accuracy of PCA and CCA on IP dataset.

No. of Components	PCA-SVM (%)	PCCA-SVM (%)	MCCA-SVM (%)
3	62.32	77.02	85.75
4	63.25	73.46	89.41
5	64.14	75.07	89.56
6	64.42	73.31	88.92
7	64.61	72.53	88.24
8	64.87	71.02	88.09
9	65.44	70.90	88.48
10	65.75	69.25	87.46

Table 2. Comparison of Overall Accuracy of PCA and CCA on PU dataset.

No. of Components	PCA-SVM (%)	PCCA-SVM (%)	MCCA-SVM (%)
3	78.83	89.95	90.90
4	81.34	88.56	92.66
5	81.98	88.38	95.07
6	84.86	87.94	94.92
7	86.22	87.45	94.79
8	86.75	87.39	94.55
9	86.83	86.66	94.50
10	87.62	86.43	94.43

Table 3. Comparison of Overall Accuracy of PCA and CCA on SA dataset.

No. of Components	PCA-SVM (%)	PCCA-SVM (%)	MCCA-SVM (%)
3	86.00	92.15	93.88
4	87.80	91.20	94.45
5	88.12	90.33	94.67
6	88.07	89.05	94.64
7	88.38	88.36	94.62
8	89.16	88.01	94.88
9	89.18	87.51	95.08
10	89.22	87.61	95.04

Table 4. Comparison of Proposed Technique (PCCA-SVM) with Existing Techniques on IP dataset.

No. of Components	PCCA-SVM (%)	PCCA-RF (%)	PCCA-Naive Bayes (%)	PCCA-KNN (%)
3	77.02	71.80	54.68	69.21
4	73.46	71.95	54.97	69.36
5	75.07	67.36	52.97	63.02
6	73.31	53.90	42.14	47.70
7	72.53	51.95	40.68	47.90
8	71.02	43.75	36.92	40.48
9	70.90	42.97	34.34	38.39
10	69.25	44.5	40.14	37.60

Table 5. Comparison of Proposed Technique (PCCA-SVM) with Existing Techniques on PU dataset.

No. of Components	PCCA-SVM (%)	PCCA-RF (%)	PCCA-Naive Bayes (%)	PCCA-KNN (%)
3	89.95	86.58	62.76	83.62
4	88.56	71.85	57.81	66.88
5	88.38	67.91	55.71	66.88
6	87.64	47.76	46.24	41.19
7	87.45	45.05	45.87	38.20
8	87.39	44.91	45.47	37.75
9	86.66	43.25	44.21	35.88
10	86.43	46.13	46.5	38.62

Table 6. Comparison of Proposed Technique (PCCA-SVM) with Existing Techniques on SA dataset.

No. of Components	PCCA-SVM (%)	PCCA-RF (%)	PCCA-Naive Bayes (%)	PCCA-KNN (%)
3	92.15	91.32	81.36	89.54
4	91.20	88.92	79.26	85.61
5	90.33	85.10	68.53	78.59
6	89.05	78.93	66.57	73.37
7	88.36	66.92	63.22	67.02
8	88.01	62.95	62.77	61.28
9	87.51	62.09	59.39	60.34
10	87.61	50.66	59.99	57.39

Table 7. Comparison of Class-wise Accuracy of the Proposed Technique with Existing Techniques on IP dataset.

Class	PCCA-RF (%)	PCCA-Naive Bayes (%)	PCCA-KNN (%)	Proposed Technique (PCCA-SVM (%))
1	67.00	100.00	83.00	69.00
2	65.00	16.00	68.00	73.00
3	51.00	38.00	55.00	61.00
4	47.00	65.00	41.00	58.00
5	71.00	53.00	79.00	86.00
6	87.00	69.00	85.00	91.00
7	78.00	100.00	100.00	80.00
8	100.00	98.00	100.00	99.00
9	25.00	75.00	25.00	100.00
10	57.00	54.00	63.00	69.00
11	81.00	21.00	73.00	79.00
12	80.00	40.00	70.00	75.00
13	86.00	93.00	93.00	95.00
14	83.00	74.00	65.00	87.00
15	35.00	27.00	29.00	39.00
16	100.00	100.00	100.00	94.00
AA	70.00	64.00	70.50	78.00
OA	71.80	54.68	69.21	77.02

Table 8. Comparison of Class-wise Accuracy of the Proposed Technique with Existing Techniques on PU dataset.

Class	PCCA-RF (%)	PCCA-Naive Bayes (%)	PCCA-KNN (%)	Proposed Technique (PCCA-SVM (%))
1	92.00	77.00	86.00	92.00
2	96.00	54.00	95.00	96.00
3	59.00	62.00	60.00	63.00
4	82.00	76.00	75.00	87.00
5	100.00	100.00	100.00	100.00
6	73.00	60.00	64.00	82.00
7	45.00	60.00	55.00	73.00
8	77.00	59.00	68.00	86.00
9	98.00	96.00	98.00	100.00
AA	80.00	72.00	78.00	87.00
OA	86.58	62.76	83.62	89.95

Table 9. Comparison of Class-wise Accuracy of the Proposed Technique with Existing Techniques on SA dataset.

Class	PCCA-RF (%)	PCCA-Naive Bayes (%)	PCCA-KNN (%)	Proposed Technique (PCCA-SVM (%))
1	96.00	83.00	95.00	99.00
2	98.00	96.00	100.00	99.00
3	96.00	80.00	95.00	99.00
4	99.00	98.00	100.00	99.00
5	97.00	84.00	98.00	99.00
6	99.00	96.00	100.00	100.00
7	98.00	97.00	100.00	100.00
8	88.00	55.00	83.00	89.00
9	98.00	93.00	98.00	99.00
10	97.00	89.00	97.00	98.00
11	90.00	91.00	87.00	93.00
12	95.00	80.00	92.00	99.00
13	96.00	97.00	96.00	100.00
14	98.00	93.00	97.00	97.00
15	74.00	74.00	62.00	75.00
16	98.00	97.00	99.00	99.00
AA	95.00	88.00	94.00	96.00
OA	91.32	81.36	89.54	92.15

Table 10. Comparison of Proposed Technique (MCCA-SVM) with Existing Techniques on IP dataset.

No. of Components	MCCA-SVM (%)	MCCA-RF (%)	MCCA-Naive Bayes (%)	MCCA-KNN (%)
3	85.75	83.41	59.12	79.07
4	89.41	87.51	63.17	82.92
5	89.56	87.65	63.27	85.02
6	88.92	87.85	63.07	83.95
7	88.24	87.24	62.24	84.09
8	88.09	87.12	64.48	84.14
9	88.48	87.31	63.65	83.41
10	87.46	86.63	63.75	82.97

Table 11. Comparison of Proposed Technique (MCCA-SVM) with Existing Techniques on PU dataset.

No. of Components	MCCA-SVM (%)	MCCA-RF (%)	MCCA-Naive Bayes (%)	MCCA-KNN (%)
3	90.90	84.59	71.76	82.24
4	92.66	88.94	74.66	86.61
5	95.07	92.85	79.13	91.07
6	94.92	93.27	80.31	91.70
7	94.79	92.97	80.58	90.76
8	94.55	91.87	81.42	89.60
9	94.50	91.85	82.46	89.25
10	94.43	91.68	82.41	88.53

Table 12. Comparison of Proposed Technique (MCCA-SVM) with Existing Techniques on SA dataset.

No. of Components	MCCA-SVM (%)	MCCA-RF (%)	MCCA-Naive Bayes (%)	MCCA-KNN (%)
3	93.88	92.60	82.84	89.59
4	94.45	93.54	85.53	90.57
5	94.67	93.78	85.82	90.61
6	94.64	94.41	86.50	90.21
7	94.62	94.61	88.34	90.66
8	94.88	94.69	88.98	90.95
9	95.08	95.26	88.25	91.09
10	95.04	95.60	89.34	91.02

Table 13. Comparison of Class-wise Accuracy of the Proposed Technique with Existing Techniques on IP dataset.

Class	MCCA-RF (%)	MCCA-Naive Bayes (%)	MCCA-KNN (%)	Proposed Technique (MCCA-SVM (%))
1	67.00	83.00	100.00	100.00
2	79.00	47.00	76.00	80.00
3	66.00	36.00	64.00	70.00
4	84.00	37.00	71.00	92.00
5	96.00	30.00	96.00	94.00
6	97.00	83.00	97.00	96.00
7	67.00	100.00	89.00	89.00
8	100.00	89.00	99.00	97.00
9	50.00	50.00	50.00	50.00
10	76.00	56.00	70.00	83.00
11	85.00	53.00	77.00	85.00
12	76.00	44.00	68.00	83.00
13	100.00	95.00	100.00	100.00
14	98.00	98.00	96.00	97.00
15	47.00	29.00	32.00	62.00
16	94.00	100.00	94.00	100.00
AA	80.00	64.00	79.50	85.75
OA	83.41	59.12	79.07	86.00

Table 14. Comparison of Class-wise Accuracy of the Proposed Technique with Existing Techniques on PU dataset.

Class	MCCA-RF (%)	MCCA-Naive Bayes (%)	MCCA-KNN (%)	Proposed Technique (MCCA-SVM (%))
1	90.00	70.00	84.00	90.00
2	97.00	78.00	94.00	96.00
3	53.00	28.00	54.00	78.00
4	90.00	88.00	84.00	93.00
5	100.00	100.00	100.00	100.00
6	41.00	31.00	45.00	63.00
7	54.00	80.00	65.00	82.00
8	87.00	86.00	78.00	88.00
9	100.00	99.00	100.00	99.00
AA	79.00	73.00	78.00	88.50
OA	84.59	71.76	82.24	90.90

Table 15. Comparison of Class-wise Accuracy of the Proposed Technique with Existing Techniques on SA dataset.

Class	MCCA-RF (%)	MCCA-Naive Bayes (%)	MCCA-KNN (%)	Proposed Technique (MCCA-SVM (%))
1	99.00	92.00	97.00	99.00
2	99.00	97.00	98.00	100.00
3	97.00	73.00	93.00	99.00
4	100.00	99.00	100.00	100.00
5	98.00	97.00	97.00	98.00
6	99.00	96.00	100.00	100.00
7	100.00	99.00	100.00	100.00
8	90.00	53.00	82.00	88.00
9	99.00	93.00	97.00	99.00
10	98.00	93.00	98.00	99.00
11	95.00	90.00	91.00	93.00
12	100.00	96.00	99.00	100.00
13	98.00	98.00	97.00	97.00
14	99.00	94.00	96.00	99.00
15	75.00	75.00	62.00	73.00
16	99.00	97.00	99.00	100.00
AA	96.00	90.00	94.00	97.00
OA	92.60	82.84	89.59	93.88

Table 16. Time comparison between PCCA-SVM and MCCA-SVM on HSI datasets.

n_comps	Indian Pines		Pavia University		Salinas Valley
	PCCA-SVM	MCCA-SVM	PCCA-SVM	MCCA-SVM	PCCA-SVM	MCCA-SVM
3	29.34 s	3.38 s	14.57 s	6.23 s	23.60 s	8.59 s
4	38.98 s	3.23 s	48.35 s	6.60 s	32.62 s	9.83 s
5	43.48 s	3.02 s	71.45 s	7.33 s	49.06 s	9.96 s
6	60.33 s	3.36 s	114.63 s	7.49 s	70.77 s	10.63 s
7	62.44 s	3.73 s	210.43 s	8.33 s	108.98 s	11.77 s
8	68.24 s	3.80 s	250.99 s	9.35 s	135.11 s	13.27 s
9	73.83 s	4.07 s	273.71 s	9.99 s	154.00 s	15.86 s
10	86.90 s	4.08 s	300.11 s	10.26 s	201.33 s	17.89 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Grewal, R.; Kasana, G.; Kasana, S.S. A Novel Technique for Semantic Segmentation of Hyperspectral Images Using Multi-View Features. Appl. Sci. 2024, 14, 4909. https://doi.org/10.3390/app14114909

AMA Style

Grewal R, Kasana G, Kasana SS. A Novel Technique for Semantic Segmentation of Hyperspectral Images Using Multi-View Features. Applied Sciences. 2024; 14(11):4909. https://doi.org/10.3390/app14114909

Chicago/Turabian Style

Grewal, Reaya, Geeta Kasana, and Singara Singh Kasana. 2024. "A Novel Technique for Semantic Segmentation of Hyperspectral Images Using Multi-View Features" Applied Sciences 14, no. 11: 4909. https://doi.org/10.3390/app14114909

APA Style

Grewal, R., Kasana, G., & Kasana, S. S. (2024). A Novel Technique for Semantic Segmentation of Hyperspectral Images Using Multi-View Features. Applied Sciences, 14(11), 4909. https://doi.org/10.3390/app14114909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Technique for Semantic Segmentation of Hyperspectral Images Using Multi-View Features

Abstract

1. Introduction

1.1. Motivation

1.2. Contribution

1.3. Organisation of Paper

2. Background

3. Materials and Methods

3.1. Datasets Description

3.1.1. Indian Pines

3.1.2. Pavia University

3.1.3. Salinas Valley

3.2. Canonical Correlation Analysis

3.3. Support Vector Machine

4. Proposed Technique

4.1. Multi-View Feature Extraction

4.2. Semantic Segmentation Using Support Vector Machine

5. Experimental Results and Discussion

5.1. Evaluation Parameters

5.2. Comparitive Analysis of PCA and CCA

5.3. Comparison of Proposed Technique with Existing Techniques

5.3.1. Pairwise Canonical Correlation Analysis

5.3.2. Multiple-Set Canonical Correlation Analysis

5.3.3. Comparison between Semantic Segmentation Performed Using the Proposed Techniques PCCA-SVM and MCCA-SVM

6. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI