1. Introduction
Hyperspectral images are a challenging area due to unfavorable resolution of images, high dimensionality, and insufficient ground truth data during training. Hyperspectral systems acquire images in about a hundred or more continuous spectral bands [see
Figure 1]. At different wavelengths, the materials reflect and absorb differently. If the detection system has sufficient spectral and spatial resolution, the materials can be identified from their spectral signatures [
1]. The unsupervised classification discovers the information on its own and also leaves the number of classes to be chosen by the user.
Hyperspectral image classification is very complex and difficult to obtain because of image noises, complex background, various characters [
2] or high dimensionality [
3]. Each pixel is assigned to a class, leaving the user to find the meaning for each class. Also, existing classifiers do not always offer a better classification for all types of data, which is why studying the classification of hyperspectral images can provide us general information about classification accuracy. Hyperspectral images are an important tool for identifying materials spectrally unique, as it provides sufficient information for classifying data from different areas. Furthermore, hyperspectral images could be used to caption the different phases of the eruptive activity of volcanoes, to obtain a detailed chronology of the lava flow emplacement and radiative power estimation [
4].
In Reference [
6], unsupervised algorithms (means K and ISODATA) were applied to hyperspectral images and, although both gave accurate results, K-means had better results. Another approach of hyperspectral classification is described in Reference [
7], where the authors used Maximum Likelihood (ML), Spectral Angle Mapper (SAM), and Support Vector Machine (SVM), where SVM was more effective than the others.
Studies such as those in Reference [
8] proposed hierarchical SVM which achieved an acceptable accuracy value [
9], which experienced supervised and unsupervised methods of classification, have concluded that the best result was obtained by the hierarchical tree [
10] which presented an approach by using principal component analysis (PCA), k-Means, and Multi-Class Support Vector Machine (M-SVM), which achieved accuracy values over 77% confirms that hyperspectral images classification is a challenging task.
Other approaches such as that in Reference [
11], which proposed a compression-based non-negative tensor canonical polyadic (CP) decomposition to analyze hyperspectral images, and Reference [
12], which proposed algorithms for extraction and classification which is based on orthogonal or non-negative tensor decomposition and higher-order discriminant analysis, was realized to gain the full potential of hyperspectral images.
An approach to analyze and process hyperspectral images is Cellular Nonlinear Network (CNN), which could be mostly used in recognition problems to improve the run-time. The authors of the paper [
13] proved that CNN is an appropriate tool to discover recurrent patterns, to create a link between circuits, art, and neuroscience. The hyperspectral analysis it is not an easy task due to high dimensionality, spatial, and spectral resolution of hyperspectral images.
2. Results of Unsupervised Clustering
In this section, we explore the efficacy of the tensor decomposition, k-means, hierarchical clustering and tensor decomposition classification. Ground truths available for the studied data sets were provided by direct observation. To obtain the results, we used Python on a machine under Windows 10 with 8 GB of memory. The data sets used in our study were taken from Grupo de Inteligencia Computational [
14]. The data set of the Danube Delta was acquired from U.S. Geological Survey [
15]. The tensor dimensions of data sets were: 103 × 101 × 204 (Set 1), 401 × 146 × 204 (Set 2), 181 × 111 × 103 (Set 3), 191 × 151 × 105 (Set 4) and 181 × 151 × 53 (Set 5). The geometric resolutions for our images were 3.7 m for Set 1 and Set 2, 1.3 m for Set 3 and Set 4, and 30 m for Set 5.
We prepared the analysis by building the matricized tensor, which was constructed by reordering the elements into a matrix. To compare the final results, we chose to classify the training data, and we build a matrix that contained only the spectral bands known for which we knew the class in which belongs. We also classified the entire data set, taking into account the spectral bands whose class we do not know. Before we applied the classifiers, the data were normalized and the values were between 0 and 1.
2.1. Hierarchical Clustering
The
Agglomerative Hierarchical clustering Technique assumes that each data point is an individual cluster. Similar clusters merge with other clusters iteratively until all similar data points form a single cluster [
16].
The confusion matrix acquired for Set 1,
where an observation known to be in class
i was predicted in class
j [see
Table 1]. As we can see, the table demonstrates a good classification [see
Figure 2], except for class 4, which was classified as class 2.
Figure 3 shows the output of classification for Set 2.
Figure 4 depicts the output of Set 5, which is represented by Danube Delta, Romania, data for which we do not have ground truth. By visual inspection of the classification results shown in
Figure 4, it can be seen that the classification is quite well defined.
2.2. K-Means Clustering
The K-means algorithm classifies the data by grouping the items into
k different clusters. The results are obtained by using the Euclidean distance [
17].
As previously, the confusion matrix is shown
Table 2. As we can see, the table demonstrates a good classification [see
Figure 5], except in class 4, which was classified as class 2.
Figure 6 shows the output of classification for Set 2.
Figure 7 shows the output of Set 5, which is represented by Danube Delta, Romania, data for which we do not have ground truth. By visual inspection of the classification results shown in
Figure 7, it can be seen that the classification is quite well defined.
2.3. Parafac Classification
Parallel Factor Analysis (Parafac) is a method to decompose multi-way data and analyze the data using non-supervised classification tools. In
Figure 8 are the mean of spectral bands for each class for Danube Delta, Romania (Set 5). The number of Parafac components was 100, the algorithm converged after 2 iterations, and the explained variation was 98.73.
Figure 9 shows the estimated abundance maps for each class,
blue for low contribution to
yellow to high contribution. Also, we can observe that there are classes which do not distinguish very well and one of the reasons for this could be the poor resolution of the hyperspectral image. On the fourth row in the second image, it can be easily seen that the areas are very well delimited by analyzing
Figure 7 (left).
The pseudo-code that describes the construction of an abundance map is presented in Algorithm 1.
Figure 9 shows the abundances maps for each class, constructed by using the algorithm described previously. We used the same algorithm for all the data sets we discussed in this paper. In
Section 5, we discuss the results presented previously.
Algorithm 1: The construction of abundance map |
|
3. Discussion
The results from three classifications methods are summarized in
Table 3,
Table 4 and
Table 5. Because we started with the assumption that the ground truth is unknown, we evaluated the classification using the following measures—
Silhouette Coefficient, and
Davies-Bouldin Index. Further, we calculated the accuracy and confusion matrix by using the ground truth.
Table 3 and
Table 4 contain the evaluation results of classification. The good classification performance is demonstrated by accuracy values which are higher than 84% for both K-means and Hierarchical clustering. We can see that the number of classes from ground truth does not match with the number of classes proposed by the Silhouette score for all data sets. One reason can be the resolution of images and the quality of ground truth. In the case of Set 5—Danube Delta—because the ground truth is missing, we do not have the accuracy value and number of classes.
In
Table 5, we have on the line number of sets, and on the column, we have the classes. For each set, we calculated The Root Mean Squared Error between means of spectral bands for each class from ground truth and the means of spectral bands for each class from classification result. In
Table 5, we highlighted the pairs with the lowest value of Root Mean Squared Error. Even though we classified the spectral bands in 7 clusters for this data set, the RMSE associate one class resulted from Parafac Decomposition with more than one class from the ground truth [see
Table 5].
Also,
Table 5 shows the execution time classification of spectral bands after Parafac decomposition, K-means and Hierarchical clustering. Given that the run time after decomposition is noticeably shorter, the run time sounds promising concerning big data analysis [see
Table 3 and
Table 4].
5. Conclusions
This study has addressed the problem of the classification of hyperspectral images using K-means, Parafac Decomposition, and Hierarchical clustering. The following discussion implies a comparative analysis of three different classification methods for hyperspectral images. The run-time is the total time from when an algorithm starts to its finish [
26]. Taking into account this challenge, the execution time depends on many factors, but the most important is the memory used, the percentage of CPU utilized by an algorithm and the data set dimensions. Considering these results, Parafac decomposition is very useful for big data, because it is important to reduce the execution time and to use memory efficiently. According to the results in
Section 3, we can say that for small data, K-means or hierarchical classification can be used. For big data, Parafac decomposition can be used because it reduces the data and becomes easy to interpret; the execution time will be considerably reduced. Additionally, the abundance maps are very well defined, considering that the fraction of a pixel class is estimated by an abundance map. In other words, the abundance map could be seen as ground truth, noting that possible errors of human errors of the class assignment are non-existent. Regarding K-means and Hierarchical clustering, we can see that it offered very good accuracy values but, at the same time, we must take into account that, for big data, the execution time will grow considerably.
By comparing the supervised and unsupervised classification, the supervised classification yielded better results, but it forces us to have ground truth [
27]. However, the results for unsupervised classification are not so far from supervised classification. The unpleasant results of classification after Parafac decomposition can be caused by spatial and spectral resolution. Finally, the studied algorithms have been shown to be effective for our problems, are very useful in applications such as model reduction, pattern recognition and classification.