Next Article in Journal
Long-Range Acoustic Communication Using Differential Chirp Spread Spectrum
Next Article in Special Issue
Semantic 3D Mapping from Deep Image Segmentation
Previous Article in Journal
Effect of Uncertainties in Material and Structural Detailing on the Seismic Vulnerability of RC Frames Considering Construction Quality Defects
Previous Article in Special Issue
Multi-Frame Labeled Faces Database: Towards Face Super-Resolution from Realistic Video Sequences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dual-Window Superpixel Data Augmentation for Hyperspectral Image Classification

1
Centro Singular de Investigación en Tecnologías Inteligentes, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Spain
2
Departamento de Electrónica y Computación, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Spain
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(24), 8833; https://doi.org/10.3390/app10248833
Submission received: 6 November 2020 / Revised: 1 December 2020 / Accepted: 5 December 2020 / Published: 10 December 2020
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)

Abstract

:
Deep learning (DL) has been shown to obtain superior results for classification tasks in the field of remote sensing hyperspectral imaging. Superpixel-based techniques can be applied to DL, significantly decreasing training and prediction times, but the results are usually far from satisfactory due to overfitting. Data augmentation techniques alleviate the problem by synthetically generating new samples from an existing dataset in order to improve the generalization capabilities of the classification model. In this paper we propose a novel data augmentation framework in the context of superpixel-based DL called dual-window superpixel (DWS). With DWS, data augmentation is performed over patches centered on the superpixels obtained by the application of simple linear iterative clustering (SLIC) superpixel segmentation. DWS is based on dividing the input patches extracted from the superpixels into two regions and independently applying transformations over them. As a result, four different data augmentation techniques are proposed that can be applied to a superpixel-based CNN classification scheme. An extensive comparison in terms of classification accuracy with other data augmentation techniques from the literature using two datasets is also shown. One of the datasets consists of small hyperspectral small scenes commonly found in the literature. The other consists of large multispectral vegetation scenes of river basins. The experimental results show that the proposed approach increases the overall classification accuracy for the selected datasets. In particular, two of the data augmentation techniques introduced, namely, dual-flip and dual-rotate, obtained the best results.

1. Introduction

Hyperspectral images (HSIs) are formed by a grid of pixels, each of them represented by a high-dimensional vector capturing a fraction of the electromagnetic spectrum for that point, sampled at different wavelengths [1]. The high density of the spectral information contained in HSIs, in the order of tens to hundreds of bands for a single scene, increases the ability to identify the materials present in it. This characteristic makes HSIs popular candidates for supervised classification in the field of remote sensing [2].
Deep learning (DL) models have been introduced in the last few years for HSI classification tasks [3,4,5,6] with promising results. Convolutional neural networks (CNNs) [7] in particular, have been successfully used for solving problems requiring multi-class, multi-label classification involving feature extraction (FE) from images [8]. CNNs operate over small cubes of data called patches instead of relying on spectral information alone. These patches are centered around a pixel of the image and taken from a sliding window of a certain size in order to extract spatial-spectral information. A patch is extracted for every pixel of the image using this procedure.
CNNs generally require large amounts of training samples in order to prevent overfitting. Data augmentation is a technique that synthetically generates new samples by applying a set of domain-specific transformations over the original input dataset to improve the generalization capabilities of a classification model. Several data augmentation techniques applicable to HSIs have been proposed, most of which are based on geometric transformations commonly used for image recognition tasks [9]. More recently, [10,11] described hyperspectral data augmentation techniques where pixels are grouped in blocks and different block pairs are used as the input to a CNN. In [12], samples in the original dataset were shifted along its first principal component or based on the average value in each band. Augmentation based on randomly erasing parts of the input patches has also been proven effective for HSI classification in [13]. Finally, generative adversarial networks have been proposed recently as a data augmentation technique in order to generate new samples mimicking the distribution of the original data [14,15,16].
Segmentation is a preprocessing technique capable of simplifying images, reducing them to meaningful, independent regions of pixels with high intra-region similarity and high inter-region dissimilarity called segments. The simplification makes this technique useful to reduce the complexity of subsequent processing tasks. Examples of the use of superpixels in hyperspectral classification as a way to exploit context information can be found in [17,18,19], and as part of a DL classification scheme in [20,21]. In our proposal, superpixel segmentation [22] is used to reduce the computational cost associated with CNN-based classification. During the training and prediction stages, only a representative subset of pixels from each superpixel of the image is selected, allowing for a significant reduction of the computational cost when compared to the sliding window extraction from pixel-based classification.
In this paper, several augmentation techniques relying on geometric transformations aimed at efficient, superpixel-based DL classification for large images are introduced. The main contributions are:
1.
A data augmentation framework called dual-window superpixel (DWS), based on a combination of superpixel segmentation for patch extraction and geometric transformations is proposed. Patches are divided into two regions and the transformations are applied independently to them. This framework is introduced as part of a CNN classification scheme capable of improving the classification accuracy and significantly reducing the execution time of the classification process.
2.
A number of fast and simple data augmentation techniques based on the DWS data augmentation framework are also proposed.
The rest of the paper is divided into four sections. Section 2 describes DWS, the proposed classification scheme and the derived data augmentation techniques. Section 3 details the experimental setup and lists the results obtained. Section 4 presents the discussion about the experimental results. Finally, Section 5 summarizes the main conclusions.

2. Dual-Window Superpixel Data Augmentation (DWS)

This section describes in detail the DWS data augmentation framework developed as part of this work, and the data augmentation techniques obtained from it. The main stages of DWS are explained below.

2.1. Superpixel-Based Patch Extraction

Usually, in hyperspectral classification using CNNs, a sliding window of a certain size is applied [23] to extract spatial-spectral information from the image I. The contents of this window P, also called patch, are then fed to the network.
The proposed scheme replaces the sliding window with the extraction of patches based on superpixel information in order to reduce the computational cost. These superpixels are obtained by applying simple linear iterative clustering (SLIC) [24], a low-complexity superpixel segmentation method commonly used in computer vision, to the image I. Figure 1 shows the result of applying SLIC to the Salinas hyperspectral scene. Strong adherence of the superpixel boundaries to the edges of the image can be observed.
Figure 2 shows the first stages of DWS, related to the acquisition of the patches, prior to the data augmentation itself. The chosen patch size in this example is 25 × 25 × B , B being the number of bands in the image I. After the application of SLIC to I, each of the resulting superpixels is considered a sample, and a patch is extracted from its center. This reduces the number of processed patches from W × H , W and H being the spatial dimensions of the image, to a much smaller number of superpixels.

2.2. Patch Subdivision

Augmentation operations are commonly performed on the patch as a whole. The complete patch is flipped, rotated, etc. This paper hypothesizes that, in contrast, applying transformations over independent regions of the patch would produce better results. The patch subdivision operation in Figure 2 depicts the patch subdivision stage proposed in this paper. Patches are divided into two regions according to the distance from the central pixel. In the example, the inner region is set to 15 × 15 pixels. This subdivision makes it possible to apply transformations to the inner region and outer regions of the patch independently. Any transformation able to operate over a patch of data can be applied, regardless of its nature.
The process of patch extraction and subdivision is shown in more detail in Figure 3. On the left, the borders of the different superpixels obtained after applying SLIC over the image are depicted, along with the central pixel of one of the superpixels. The black square centered on that same pixel represents a patch of the desired dimensions that will be extracted at that location. The patch, shown on the right, is then subdivided into two regions as part of this stage.

2.3. Patch Transformation

Several augmentation techniques are introduced in this paper using the patch subdivision principle described in the previous section in combination with the traditional rotate and flip transformations. They can be divided into techniques with transformations applied to the inner region (prefixed with the term inner) and techniques with transformations applied to both the outer and inner regions independently (prefixed with the term dual).
Figure 4 shows examples of the outputs of all the techniques considered in this paper. None indicates that no data augmentation is applied to the input patch. Random occlusion (RO), as presented in [13], performs selective data erasing. Rotate and flip are the traditional data augmentation techniques where the homonymous operation is applied over the whole patch. In addition to those, the following techniques are proposed:
1.
Inner-Rotate ( 4 × ): A set of 90, 180, and 270-degree rotation operations are applied to the inner region only, yielding up to four samples.
2.
Inner-flip ( 4 × ): A set of three flip or mirroring operations is applied to the inner region only, yielding up to four samples.
3.
Dual-rotate ( 4 × / 16 × ): A set of 90, 180 and 270-degree rotation operations are applied to the full patch, yielding four samples. Afterward, for each of those four samples, the same operations are applied again to only the inner region, producing up to sixteen samples per input patch.
4.
Dual-flip ( 4 × / 16 × ): A set of three flip or mirroring operations is applied to the full patch, yielding four samples. Afterward, for each of those samples, the same operations are applied again to only the inner region, producing up to sixteen samples per input patch.
The dual-flip 16 × technique is illustrated in Figure 5. The arrows next to the patches indicate flip transformations performed on certain axes. Long arrows each represent a flip applied to the outer patch, whereas small arrows each represent the same operation applied to the inner region. During dual-flip 16 × patch transformation, the following operations take place: first, flip transformations over the horizontal axis, vertical axis and a combination of both are applied to the full patch producing patches 1, 5, 9 and 13. Next, for each of the outputs from that step, the same flip transformations are applied only to the inner region (row of transformations at the bottom of the figure).

3. Results

This section contains information about the experimental conditions, datasets used in the experiments and parameter selection. Lastly, the classification results obtained are presented.

3.1. Experimental Conditions

All the experiments were run using the classification scheme and network architecture described in Figure 6. The figure shows, on the left, a patch extracted from the HSI. The black boundaries represent the superpixel edges. The extraction process is explained in Section 2.1. The data augmentation technique of choice was then applied to the patch. The data augmentation techniques based on DWS are explained in Section 2.3. The patches resulting from this data augmentation process were then fed to the CNN.
The network consists of two blocks of 2D-convolutional layers coupled with 2D-max-pooling layers, and two dense layers. With the aim of reducing overfitting, two dropout layers were added to the network, both of them using an aggressive dropout ratio of 0.5. Table 1 details the parameters for each layer. ELU activations were used for all layers due to the advantages this function has over others such as the ReLU family. Namely, ELU provides more robust training and faster learning rates [25]. All trainings were run for 112 epochs using a NADAM [26] optimizer with learning _ rate = 0.0001 , β 1 = 0.9 , β 2 = 0.999 , ϵ = 1 × 10 7 .
In the experiments, data augmentation was performed online for the first epoch and then cached for the rest of the training. The techniques considered are a set of commonly used data augmentation techniques from the literature and the four proposals introduced as part of this work. In the first group we have: rotate, the standard 90 degree rotation applied four times; flip, the standard flip over both axes of the image; PVS(+/−) or pixel-value shift augmentation, where pixel values of the input image are shifted relative to the average band [12]; and random occlusion, which removes rectangular regions of the patch on up to 50% of the input patches in a batch [13]. The new proposals to be studied include inner-rotate, inner-flip, dual-rotate and dual-flip, as described in Section 2.3. The results are compared in terms of classification accuracy after the application of the different data augmentation techniques.
All the tests were run on a 6-core Intel i5 8400 CPU at 2.80 GHz and 48 GB of RAM, and an NVIDIA GeForce GTX 1060 with 6 GB. All experiments were run under Ubuntu Linux 16.04 64-bits, Docker 19.03.5, Python 3.6.7, Tensorflow 2.0.0 and CUDA toolkit 10.0. All the training instances were performed on Tensorflow with GPU support enabled, using single-precision arithmetic.

3.2. Datasets

This section describes the characteristics, including the compositions of the disjoint training and testing sets, of the six scenes used to evaluate the performance of the proposed data augmentation techniques. Three widely available hyperspectral scenes [27] from the literature (standard dataset, from now onwards) and four large multispectral images from river basins belonging to the the Galicia dataset [28] were considered. All the images of the Galicia dataset were captured at an altitude of 120 m by a UAV mounting a MicaSense RedEdge multispectral camera [29]. Its spatial resolution is 0.082 m/pixel and it covers a spectral range from 475 to 840 nm. Figure 7 and Figure 8 display the false color composite and reference data for the scenes of the standard and Galicia datasets, respectively.
More specifically, the Salinas Valley, Pavia University and Pavia Centre scenes from the standard dataset along with the River Oitavén, Creek Ermidas, Eiras Dam and River Mestas scenes from the Galicia dataset were selected for the experiments. The detailed descriptions of the scenes are as follows:
  • Salinas valley (Salinas): Mixed vegetation scene in California. It was obtained by the NASA AVIRIS sensor with a spatial resolution of 3.7 m/pixel, covering a spectral range from 400 to 2500 nm. The image is 512 × 217 pixels and has 220 spectral bands. The reference information contains sixteen classes. The scene is located at 3639 33.8 N 12139 58.7 W.
  • Pavia University (PaviaU): Urban scene acquired by the ROSIS-03 sensor over the city of Pavia, Italy. Its spatial resolution is 2.6 m/pixel and covers a spectral range from 430 to 860 nm. The image is 610 × 340 pixels and has 103 spectral bands. The ground truth contains nine classes. The scene is located at 4512 09.2 N 908 08.6 E.
  • Pavia Centre (PaviaC): Urban scene acquired by the ROSIS-03 sensor over the city of Pavia, Italy. Its spatial resolution is 2.6 m/pixel and covers a spectral range from 430 to 860 nm. The image is 1096 × 715 pixels and has 103 spectral bands. The ground truth contains nine classes. The scene is located at 4511 12.7 N 908 48.7 E.
  • River Oitavén (Oitaven): Multispectral vegetation scene of the Oitaven river from Pontevedra, Spain. The image is 6689 × 6722 pixels and has 5 bands. The scene is located at 4222 15.3 N 825 47.4 W.
  • Creek Ermidas (Ermidas): Multispectral vegetation scene showing the point where Creek Ermidas and River Oitavén meet, from Pontevedra, Spain. The image is 11,924 × 18,972 pixels and has 5 bands. The scene is located at 4222 51.9 N 824 53.5 W.
  • Eiras Dam (Eiras): Multispectral vegetation scene showing the reservoir that supplies running water to the town of Vigo from Pontevedra, Spain. The image is 5176 × 18,224 pixels and has 5 bands. The scene is located at 4220 46.5 N 830 10.5 W.
  • River Mestas (Mestas): Multispectral vegetation scene showing the River Mestas from Pontevedra, Spain. The image is 4915 × 9040 pixels and has 5 bands. The scene is located at 4338 29.8 N 758 42.2 W.
Table 2 and Table 3 show the number of samples for each class for all the scenes in all the scenarios considered in this comparison. The scenes from both datasets were used as follows: 60% for training samples, 20% for testing samples and 20% for validation samples for superpixel-based classification. Table 2 also displays the number of samples for pixel-based classification; 20% of the samples were used, again, as the validation set for this scenario. The number of samples was chosen to prevent an excessively high baseline accuracy when no data augmentation technique was used.
Scenes 1 to 3 were segmented using SLIC with a superpixel size of 50 and a compactness parameter of 20, whereas datasets 4, 5, 6 and 7 used a superpixel size of 800 and a compactness parameter of 40. The compactness determines the balance of space and spectral proximity, with higher values favoring space proximity and causing segments to take on a more square shape.
Two augmentation factors of 4 × and 16 × were considered, and are displayed in the tables next to the name of each data augmentation technique. For every experiment, the following three accuracy measures [30] are reported: overall accuracy (OA), representing the number of overall pixels correctly predicted; average accuracy (AA), the mean of correctly classified pixels per class; and Kappa ( κ ), which measures the agreement between pixel predictions across all classes, also taking into account the occurrences attributed to chance [31]. The values shown are the results of 20 Monte Carlo runs for each scenario. All the values were obtained under identical experimental conditions.

3.3. Superpixel-Based Classification

This section contains the experimental results of the proposal for superpixel-based classification, as described in Section 2. A single scenario training the network with the 60% of the labeled superpixels available has been considered. The classification performance was measured at the pixel level, i.e., considering the same label for all the pixels in the same superpixel.
In order to select values for superpixel size and patch size, some tests were run using one image considered as representative of each of the datasets. PaviaC was selected from the standard dataset and Oitaven was chosen as the counterpart from the Galicia dataset. The results were obtained for two scenarios, one where no augmentation was applied, and a second one where the DWS-based dual-flip 16 × technique was used in order to improve accuracy. Values for superpixel sizes each represent an average superpixel area used in the SLIC segmentation. Values for patch sizes each represent the size of a side of the square patch used in the CNN classification process. Inner region sizes are 10 pixels smaller than the corresponding outer regions. The relationship between inner and outer size was chosen as a trade-off between the variability introduced by the transformations applied to the data and the relevance of the inner region. Small changes in inner patch size would not significantly alter the results obtained.
Figure 9 and Figure 10 show the overall classification accuracy for the PaviaC and Oitaven images, respectively, as the superpixel area (left) and the patch size (right) vary. In general, we can observe an inverse correlation between superpixel sizes and accuracy of the CNN model. The accuracy without applying data augmentation is correlated with patch size, and bigger patches produce better results for the Oitaven image. The observed effect is smaller in the case of PaviaC. In this last case, the oscillations in the graph are caused by the dispersion of the values across experiments. Finally, results show a very limited increase in accuracy as the size of the patches grows when using the dual-flip augmentation technique.
The patch size selected for the experiments was 25 × 25 pixels, with an inner region size of 15 × 15 . The complexity of the proposed CNN is low, and as such, increases in patch size have a small impact during training, allowing us to work with larger amounts of data at this stage. In contrast to this, the amount of samples, most especially when augmentation is applied, has a large impact on the speed of the training stage. Due to this, and in order to keep computation times at moderate values, 50 and 800 were selected as the superpixel sizes for the scenes of the standard and Galicia datasets, respectively.
Figure 11, Figure 12, Figure 13 and Figure 14 show the evolution of the training metrics across all the training epochs. It can be seen how the additional data generated by the augmentation methods cause the training process to converge earlier. As a result, significantly steeper curves in both loss and accuracy can be observed.
Table 4 shows the classification results for Salinas, PaviaU and PaviaC. These images were used for comparison purposes due to their prevalence in land-cover classification papers. Nevertheless, scenes of a size this small see limited benefits from a superpixel level classification, as they are low-resolution and contain very small and irregular structures. Dual-rotate 16 × obtained the highest OA for Salinas, which contains bigger and more regular structures. The best results for PaviaU and PaviaC scenes were obtained by the flip 4 × and rotate 4 × techniques.
Table 5 shows the classification results for the large multispectral scenes of the Galicia dataset, namely, Oitaven, Ermidas, Eiras and Mestas. It can be seen that the proposed classification scheme obtained high accuracy across all datasets, even when no data augmentation was applied during training. Among the techniques tested, approaches based on the DWS framework introduced in this work achieved the best results: dual-rotate 16 × and dual-flip 16 × reached 96.20% and 96.19% OA for Oitaven, respectively. The results for Ermidas, Eiras and Mestas share many similarities, with dual-flip 16 × and dual-rotate 16 × leading in terms of OA. When a lower data augmentation factor is considered ( 4 × ), inner-flip 4 × and inner-rotate 4 × can be seen systematically obtaining higher OA values than other methods. It can be seen that techniques based on flips tend to perform better than those based on rotations when applied under similar constraints. Figure 15 shows the resulting classification maps for the images of the dataset.
In order to summarize the results, Table 6 displays the performance differences between the baseline performance and each of the augmentation techniques. The results for the standard dataset show dual-rotate 16 × was the best performing method overall, followed by flip 4 × . For the Galicia dataset, dual-flip 16 × and dual-rotate 16 × obtained the highest increase in OA, with 1.51% and 1.42%, respectively.

3.4. Pixel-Based Classification

This section contains the experimental results of the proposal for pixel-based classification. This scenario was considered in order to show the performances of the proposed augmentation techniques with a traditional pixel-based classification scheme. The scheme applied was the same one described in Section 2, albeit without the superpixel segmentation step. The patches are centered on pixels using a sliding window approach. Tests were run for the Salinas, PaviaU and PaviaC scenes.
Table 7 displays the execution times of all the augmentation techniques reviewed in this study for the PaviaC scene. Based on the data obtained, we can observe that the total execution times are significantly higher in the case of pixel-based classification. Training times have a very strong linear correlation with the augmentation factor, and thus, with the number of samples being processed. Prediction times are only dependent on the dimensions of the image, and there is little variation across the different executions. It is worth noting that pixel-based classification is not practical for large images due to the large execution time required. The large number of pixels that need to be predicted, three orders of magnitude higher than the number of training samples, causes prediction times to be significantly higher than training times.
Table 8 summarizes the pixel-based classification results for the Salinas, PaviaU and PaviaC images. It can be seen that dual-flip 16 × obtained the highest OA for the Salinas scene; inner-flip 4 × yielded the highest OA for PaviaU; and flip 4 × was the best performing technique for PaviaC. Methods based on flip operations consistently outperformed rotations in this scenario.

4. Discussion

The literature on multispectral and hyperspectral image augmentation contains a multitude of different data augmentation techniques, most of which are used in combination with pixel-based classification. There is a notable lack of proposals approaching data augmentation in combination with superpixel-based classification schemes. Examples of the use of geometric transforms in pixel-based classification schemes can be seen in [13,32,33]. Attempts to synthesize new samples drawing data from a multivariate normal distribution initialized with the standard deviations of the bands in an HSI can be found in [34]. Reference [15] approaches the generation of synthetic samples to be later used in the training of deep networks through the use of GANs, as does [14]. These generative models are very costly to train and require fine-tuned adjustments of the hyperparameters on a per scene basis. The data augmentation framework proposed in this work leverages the composition of simple transformations that require no parameter tuning in order to achieve robust increases in classification accuracy when used in a superpixel-based classification scheme. This makes the derived methods a convenient replacement for the traditional rotation and flip (mirroring) operations.
Reference [35] already introduced the idea of the division of an input patch into several regions in order to better exploit the spatial information. The paper focuses on the selection and extraction of a number of predefined regions surrounding a pixel of interest that are later fed to several CNNs. More examples of the use of the spatial correlation between pixels in a scene can be found in [10], where the authors define pixel-pair features. During the training phase, sample pairs are fed to a CNN architecture that, once trained, is using during the testing phase to compute the labels of the surrounding samples of the pixel being tested and obtain the final label performing a majority vote of the outputs. A similar approach is taken in [11] with pixel-block pairs (PBPs), where the author further builds upon that idea by adding explicit spatial information to the new PBP features. These proposals try to achieve a goal that is similar, in essence, to what DWS does by using the central pixel of each superpixel in order to extract a patch. By selecting the central pixel, the probability of obtaining data from a region with high homogeneity is maximized, yielding an increase in classification accuracy.
To the best of our knowledge, no systematic comparison of data augmentation techniques focused on superpixel-based classification has been published as of yet in the field of remote sensing. The existing papers focus on pixel-based classification and the datasets usually comprise small, low-resolution scenes. An additional factor that increases the difficulty when trying to compare the results from different proposals arises from the use of different models or network architectures to obtain them, making the comparison of the quality between different techniques a challenging endeavor. In this work we provide a view of the current data augmentation landscape, by showing a comparison to prove the effectiveness of the proposed DWS approach.
The experimental results for all the datasets show that techniques based on the DWS proposal outperform the other techniques from the literature considered in this comparison for the classification of large, high-resolution images. It is important to note that using DWS, training is performed using a single observation per superpixel, reducing the amount of data that has to be evaluated by over two orders of magnitude compared to pixel-based schemes. In most of the previous tests, augmentation techniques making use of the DWS framework showed less dispersion in the results across runs for all scenes, with noticeably smaller standard deviations than traditional techniques. Based on the evidence observed, dual-rotate 16 × , with an increase in OA of 1.51% should be the preferred augmentation technique when processing small, low-resolution scenes, and dual-flip 16 × , with an increase in OA of 2.01%, should be the best method for large, high-resolution scenes.
As part of future work, we plan to study the viability of further improving the data augmentation techniques for images containing irregular structures when performing superpixel-based classification. The possibility of adding some parametrization to the augmentation techniques based on the characteristics of the patches extracted from the superpixels is being considered.

5. Conclusions

In this work, the DWS data augmentation framework for superpixel-based DL classification of large hyperspectral scenes is presented. DWS relies on patch extraction using a superpixel segmentation obtained by the application of the SLIC algorithm in order to reduce the complexity of the classification process. The extracted patches undergo patch subdivision, creating two regions over which transformations are independently applied. Four data augmentation techniques based on the DWS framework using rotate and flip transformations are proposed: inner-rotate, dual-rotate, inner-flip and dual-flip. These techniques can also be used for pixel-based classification with minimal changes.
A comprehensive comparison of the proposals to other data augmentation techniques from the literature was carried out for both superpixel and pixel-based classification scenarios in terms of classification accuracy and execution times. The results obtained show that the proposed DWS approach successfully manages to reduce overfitting and increase the generalization capabilities of the resulting models. Execution times are also reduced when compared to traditional pixel-based classification schemes. Based on the results obtained, the DWS-based dual-rotate 16 × is the preferred augmentation technique when processing small, low-resolution scenes, and dual-flip 16 × is the best method for large, high-resolution scenes.

Author Contributions

Conceptualization, F.A.; data curation, Á.A.; formal analysis, F.A. and D.B.H.; investigation, Á.A.; methodology, F.A. and D.B.H.; resources, Á.A., F.A. and D.B.H.; software, Á.A.; supervision, F.A. and D.B.H.; validation, Á.A., F.A. and D.B.H.; visualization, Á.A.; writing—original draft, Á.A.; writing—review and editing, Á.A., F.A. and D.B.H. All authors have read and agreed to the published version of the manuscript.

Funding

The images of the Galicia dataset were obtained in partnership with the Babcock company, supported in part by the Civil Program UAVs Initiative, promoted by the Xunta de Galicia. This work was supported in part by Ministerio de Ciencia e Innovación, Government of Spain (grant numbers PID2019-104834GB-I00 and BES-2017-080920), and Consellería de Educación, Universidade e Formación Profesional (grant number ED431C 2018/19, and accreditation 2019–2022 ED431G-2019/04). All are co-funded by the European Regional Development Fund (ERDF).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AAaverage accuracy
CNNconvolutional neural networks
DLdeep learning
DWSdual-window superpixel
FEfeature extraction
HSIhyperspectral image
OAoverall accuracy
ROrandom occlusion
Spp.superpixels
UAVunmanned aerial vehicle

References

  1. Ghamisi, P.; Maggiori, E.; Li, S.; Souza, R.; Tarabalka, Y.; Moser, G.; De Giorgi, A.; Fang, L.; Chen, Y.; Chi, M.; et al. New frontiers in spectral-spatial hyperspectral image classification: The latest advances based on mathematical morphology, Markov random fields, segmentation, sparse representation, and deep learning. IEEE Geosci. Remote Sens. Mag. 2018, 6, 10–43. [Google Scholar] [CrossRef]
  2. Ghamisi, P.; Plaza, J.; Chen, Y.; Li, J.; Plaza, A.J. Advanced spectral classifiers for hyperspectral images: A review. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–32. [Google Scholar] [CrossRef] [Green Version]
  3. Chen, Y.; Zhao, X.; Jia, X. Spectral—Spatial classification of hyperspectral data based on deep belief network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
  4. Lee, H.; Kwon, H. Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 2017, 26, 4843–4855. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Audebert, N.; Le Saux, B.; Lefèvre, S. Deep learning for classification of hyperspectral data: A comparative review. IEEE Geosci. Remote Sens. Mag. 2019, 7, 159–173. [Google Scholar] [CrossRef] [Green Version]
  6. Paoletti, M.; Haut, J.; Plaza, J.; Plaza, A. Deep learning classifiers for hyperspectral imaging: A review. ISPRS J. Photogramm. Remote Sens. 2019, 158, 279–317. [Google Scholar] [CrossRef]
  7. Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. Face recognition: A convolutional neural-network approach. IEEE Trans. Neural Netw. 1997, 8, 98–113. [Google Scholar] [CrossRef] [Green Version]
  8. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef] [Green Version]
  9. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  10. Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral image classification using deep pixel-pair features. IEEE Trans. Geosci. Remote Sens. 2016, 55, 844–853. [Google Scholar] [CrossRef]
  11. Li, W.; Chen, C.; Zhang, M.; Li, H.; Du, Q. Data augmentation for hyperspectral image classification with deep CNN. IEEE Geosci. Remote Sens. Lett. 2018, 16, 593–597. [Google Scholar] [CrossRef]
  12. Nalepa, J.; Myller, M.; Kawulok, M. Training-and test-time data augmentation for hyperspectral image segmentation. IEEE Geosci. Remote Sens. Lett. 2019, 17, 292–296. [Google Scholar] [CrossRef]
  13. Haut, J.M.; Paoletti, M.E.; Plaza, J.; Plaza, A.; Li, J. Hyperspectral image classification using random occlusion data augmentation. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1751–1755. [Google Scholar] [CrossRef]
  14. Alipourfard, T.; Arefi, H. Virtual Training Sample Generation by Generative Adversarial Networks for Hyperspectral Images Classification. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 63–69. [Google Scholar] [CrossRef] [Green Version]
  15. Audebert, N.; Le Saux, B.; Lefèvre, S. Generative adversarial networks for realistic synthesis of hyperspectral samples. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 4359–4362. [Google Scholar]
  16. Zhu, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Generative adversarial networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5046–5063. [Google Scholar] [CrossRef]
  17. He, Z.; Shen, Y.; Zhang, M.; Wang, Q.; Wang, Y.; Yu, R. Spectral-spatial hyperspectral image classification via SVM and superpixel segmentation. In Proceedings of the 2014 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) Proceedings, Montevideo, Uruguay, 12–15 May 2014; pp. 422–427. [Google Scholar]
  18. Fang, L.; Li, S.; Duan, W.; Ren, J.; Benediktsson, J.A. Classification of hyperspectral images by exploiting spectral–spatial information of superpixel via multiple kernels. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6663–6674. [Google Scholar] [CrossRef] [Green Version]
  19. Priya, T.; Prasad, S.; Wu, H. Superpixels for spatially reinforced Bayesian classification of hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1071–1075. [Google Scholar] [CrossRef]
  20. Liu, Y.; Cao, G.; Sun, Q.; Siegel, M. Hyperspectral classification via deep networks and superpixel segmentation. Int. J. Remote Sens. 2015, 36, 3459–3482. [Google Scholar] [CrossRef]
  21. Cao, J.; Chen, Z.; Wang, B. Deep convolutional networks with superpixel segmentation for hyperspectral image classification. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 3310–3313. [Google Scholar]
  22. Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [Green Version]
  23. Petersson, H.; Gustafsson, D.; Bergstrom, D. Hyperspectral image analysis using deep learning—A review. In Proceedings of the 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA), Oulu, Finland, 12–15 December 2016; pp. 1–6. [Google Scholar]
  24. Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC Superpixels; Technical Report; EPFL: Lausanne, Switzerland, 2010. [Google Scholar]
  25. Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv 2015, arXiv:cs.LG/1511.07289. [Google Scholar]
  26. Dozat, T. Incorporating Nesterov Momentum into Adam. In Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  27. Hyperspectral Remote Sensing Scenes. Available online: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 13 October 2020).
  28. Bascoy, P.G.; Garea, A.S.; Heras, D.B.; Argüello, F.; Ordóñez, A. Texture-based analysis of hydrographical basins with multispectral imagery. In Proceedings of the Remote Sensing for Agriculture, Ecosystems, and Hydrology XXI. International Society for Optics and Photonics, Strasbourg, France, 9–11 September 2019; Volume 11149, p. 111490Q. [Google Scholar]
  29. MicaSense RedEdge MX Multispectral Camera. Available online: https://micasense.com/rededge-mx/ (accessed on 13 October 2020).
  30. He, L.; Li, J.; Liu, C.; Li, S. Recent advances on spectral–spatial hyperspectral image classification: An overview and new guidelines. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1579–1597. [Google Scholar] [CrossRef]
  31. Richards, J.A.; Richards, J. Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany, 1999; Volume 3. [Google Scholar]
  32. Zhang, H.; Li, Y.; Zhang, Y.; Shen, Q. Spectral-spatial classification of hyperspectral imagery using a dual-channel convolutional neural network. Remote Sens. Lett. 2017, 8, 438–447. [Google Scholar] [CrossRef] [Green Version]
  33. Yu, X.; Wu, X.; Luo, C.; Ren, P. Deep learning in remote sensing scene classification: A data augmentation enhanced convolutional neural network framework. Gisci. Remote Sens. 2017, 54, 741–758. [Google Scholar] [CrossRef] [Green Version]
  34. Slavkovikj, V.; Verstockt, S.; De Neve, W.; Van Hoecke, S.; Van de Walle, R. Hyperspectral image classification with convolutional neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbarne, Australia, 26–30 October 2015; Association for Computing Machinery: New York, NY, USA; pp. 1159–1162.
  35. Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Cropped section of the Salinas scene after SLIC superpixel segmentation.
Figure 1. Cropped section of the Salinas scene after SLIC superpixel segmentation.
Applsci 10 08833 g001
Figure 2. Patch preparation stages of the dual-window superpixel data augmentation (DWS) framework.
Figure 2. Patch preparation stages of the dual-window superpixel data augmentation (DWS) framework.
Applsci 10 08833 g002
Figure 3. Details of the patch extraction and patch subdivision stages.
Figure 3. Details of the patch extraction and patch subdivision stages.
Applsci 10 08833 g003
Figure 4. List of data augmentation techniques considered. The red orientation marks (dots) are shown to indicate the orientation of the data relative to the original image.
Figure 4. List of data augmentation techniques considered. The red orientation marks (dots) are shown to indicate the orientation of the data relative to the original image.
Applsci 10 08833 g004
Figure 5. Proposed dual-flip 16 × data augmentation technique based on the DWS framework. The red orientation marks (dots) are shown to indicate the orientation of the data relative to the original image.
Figure 5. Proposed dual-flip 16 × data augmentation technique based on the DWS framework. The red orientation marks (dots) are shown to indicate the orientation of the data relative to the original image.
Applsci 10 08833 g005
Figure 6. Classification scheme and CNN architecture.
Figure 6. Classification scheme and CNN architecture.
Applsci 10 08833 g006
Figure 7. False color composite and reference data for the selected images from the stardard dataset: (a) Salinas, (b) PaviaU and (c) PaviaC.
Figure 7. False color composite and reference data for the selected images from the stardard dataset: (a) Salinas, (b) PaviaU and (c) PaviaC.
Applsci 10 08833 g007
Figure 8. False color composite and ground truth for images from the Galicia dataset: (a) Oitaven, (b) Eiras, (c) Ermidas and (d) Mestas.
Figure 8. False color composite and ground truth for images from the Galicia dataset: (a) Oitaven, (b) Eiras, (c) Ermidas and (d) Mestas.
Applsci 10 08833 g008
Figure 9. PaviaC. Evolution of overall accuracy (OA) values with varying superpixel size (left) and varying patch size (right). The tests used a base patch size of 25 × 25 and a base superpixel size of 50.
Figure 9. PaviaC. Evolution of overall accuracy (OA) values with varying superpixel size (left) and varying patch size (right). The tests used a base patch size of 25 × 25 and a base superpixel size of 50.
Applsci 10 08833 g009
Figure 10. Oitaven. Evolution of OA values varying the superpixel size (left) and the patch size (right). The tests used a base patch size of 25 × 25 and a base superpixel size of 800.
Figure 10. Oitaven. Evolution of OA values varying the superpixel size (left) and the patch size (right). The tests used a base patch size of 25 × 25 and a base superpixel size of 800.
Applsci 10 08833 g010
Figure 11. PaviaC. Evolution of loss (left) and accuracy (right) with no augmentation technique.
Figure 11. PaviaC. Evolution of loss (left) and accuracy (right) with no augmentation technique.
Applsci 10 08833 g011
Figure 12. PaviaC. Evolution of loss (left) and accuracy (right) with dual-flip 16 × data augmentation.
Figure 12. PaviaC. Evolution of loss (left) and accuracy (right) with dual-flip 16 × data augmentation.
Applsci 10 08833 g012
Figure 13. Oitaven. Evolution of loss (left) and accuracy (right) with no augmentation technique.
Figure 13. Oitaven. Evolution of loss (left) and accuracy (right) with no augmentation technique.
Applsci 10 08833 g013
Figure 14. Oitaven. Evolution of loss (left) and accuracy (right) with Dual-Flip 16 × data augmentation.
Figure 14. Oitaven. Evolution of loss (left) and accuracy (right) with Dual-Flip 16 × data augmentation.
Applsci 10 08833 g014
Figure 15. Classification maps for images from the Galicia dataset using superpixel-based classification with dual-flip 16 × augmentation: (a) Oitaven, (b) Eiras, (c) Ermidas and (d) Mestas. Color codes are the same as those introduced in Table 3.
Figure 15. Classification maps for images from the Galicia dataset using superpixel-based classification with dual-flip 16 × augmentation: (a) Oitaven, (b) Eiras, (c) Ermidas and (d) Mestas. Color codes are the same as those introduced in Table 3.
Applsci 10 08833 g015
Table 1. Neural network architecture.
Table 1. Neural network architecture.
#TypeOutput ShapeActivationFilter SizeStridePadding
12D-Convolutional 25 × 25 × 128 ELU 1 × 1 1 × 1 None
22D-MaxPooling 12 × 12 × 128 - 2 × 2 2 × 2 None
32D-Convolutional 10 × 10 × 256 ELU 3 × 3 1 × 1 None
42D-MaxPooling 5 × 5 × 256 - 2 × 2 2 × 2 None
5Dropout 5 × 5 × 256 ----
6Flatten 1 × 6400 ----
7Dense 1 × 256 ELU---
8Dropout 1 × 256 ----
9Dense 1 × C Softmax---
Table 2. Salinas, PaviaU and PaviaC scenes. Color codes for the classes and number of samples in the training and testing sets used for both superpixel and pixel-based classification.
(a)
(a)
Salinas
Spp Pixels
# Classes Train Test Train Test
1Brocoli_gw_1131111575
2Brocoli_gw_2209152971
3Fallow2613111573
4Fallow_r_p17991120
5Fallow_smooth3716152117
6Stubble5714223137
7Celery266142867
8Grapes_unt11641588952
9Soil_v_d7420264915
10Corn_s_g_w5812242615
11Lettuce_4wk15108842
12Lettuce_5wk2811151528
13Lettuce_6wk1576726
14Lettuce_7wk2136848
15Vinyard_unt7726385781
16Vinyard_v_t162101426
(b)
(b)
PaviaU PaviaC
Spp Pixels Spp Pixels
#Classes Train Test Train Test Classes Train Test Train Test
1Asphalt8729355207Water83330234052,377
2Meadows217709314,842Trees16654436020
3Gravel23791673Asphalt4415192457
4Trees6117172464S-B Bricks368132149
5Metal21481079Bitumen8920395263
6Bare soil4219264004Tiles18855487328
7Bitumen16441065Shadows5822345821
8Bricks5616292888Meadows51617121834,057
9Shadows2163754Bare Soil8625142280
Table 3. Oitaven, Ermidas, Eiras and Mestas scenes. Color codes for the classes and number of samples in the training and testing sets used for superpixel-based classification. The ”-” symbol denotes the absence of samples for the specified class.
Table 3. Oitaven, Ermidas, Eiras and Mestas scenes. Color codes for the classes and number of samples in the training and testing sets used for superpixel-based classification. The ”-” symbol denotes the absence of samples for the specified class.
OitavenErmidas
Total Total
#ClassesSpp.PixelsTrainTestClassesSpp.PixelsTrainTest
1Water611309,248367120Water322163,93018370
2Oak37101,374,8892227739Oak1992804,0401201390
3Tiles19278,78511248Tiles224138,67813733
4Meadows44642,440,3312709863Meadows59013,423,50635501142
5Asphalt9143,8615214Asphalt1632737,409976331
6Bare Soil434113,32925692Bare Soil388123,41625383
7Rock33679,15219272Rock859174,088522172
8Concrete358128,02220679Concrete6432,866458
9Endemic veg.968458,565592197Endemic veg.----
10Eucaliptus4260863,6982553838Eucaliptus45341,135,9972704926
11Pines717193,884430151Pines700184,54741 3149
Eiras Mestas
Total Total
#ClassesSpp.PixelsTrainTestClassesSpp.PixelsTrainTest
1Water1379734,617823278Water----
2Oak55592,067,38033321066Oak----
3Tiles178232122Tiles----
4Meadows2457773,9641476485Meadows89034,489,18953731780
5Asphalt14585,2098632Asphalt----
6Bare Soil40196,93524786Bare Soil27081,051,2751653487
7Rock1383144,800830286Rock----
8Concrete8527,0615616Concrete----
9Endemic veg.----Endemic veg.21480,85813840
10Eucaliptus318451235Eucaliptus9329449,504755401887
11Pines30995,13218770Pines----
Table 4. Superpixel-based classification performance (in percent) for Salinas, PaviaU and PaviaC scenes using the training and testing sets detailed in Table 2. The best result for each scene is displayed in bold.
(a)
(a)
Salinas
Technique OA AA κ
None93.86 ± 1.3395.7193.16
Rotate 4 × 94.02 ± 2.3095.5393.33
Inner-Rotate 4 × *95.18 ± 0.8995.7194.63
Dual-Rotate 4 × *86.24 ± 3.2389.8384.64
Dual-Rotate 16 × *97.04 ± 0.3597.4596.71
Flip 4 × 95.02 ± 1.0594.3494.46
Inner-Flip 4 × *95.66 ± 0.6395.9595.17
Dual-Flip 4 × *87.30 ± 2.0091.0585.80
Dual-Flip 16 × *96.55 ± 0.4496.3696.15
PVS(+/−) 4 × [12]95.22 ± 0.4696.3094.67
PVS(+/−) 16 × [12]95.97 ± 0.3795.9695.51
Random Occlusion 4 × [13]95.22 ± 0.5095.2494.68
Random Occlusion 16 × [13]95.85 ± 0.4296.5995.38
(b)
(b)
PaviaU PaviaC
Technique OA AA κ OA AA κ
None89.88 ± 0.8887.0686.5096.76 ± 0.7090.0095.40
Rotate 4 × 93.02 ± 0.6389.1090.7197.70 ± 0.1893.1796.74
Inner-Rotate 4 × *91.46 ± 1.0588.2888.6797.33 ± 0.2092.0796.22
Dual-Rotate 4 × *88.90 ± 1.2781.1485.2095.84 ± 0.8487.5794.11
Dual-Rotate 16 × *91.99 ± 0.5688.0289.3997.49 ± 0.2093.1196.44
Flip 4 × 93.69 ± 1.0290.8191.6297.60 ± 0.2392.8096.60
Inner-Flip 4 × *92.33 ± 0.4987.8689.8097.12 ± 0.2491.2195.92
Dual-Flip 4 × *88.50 ± 1.2681.8984.5296.15 ± 0.5488.3994.54
Dual-Flip 16 × *91.00 ± 0.6384.2688.0097.58 ± 0.1892.9696.57
PVS(+/−) 4 × [12]89.82 ± 0.4484.6386.4297.11 ± 0.2091.0895.90
PVS(+/−) 16 × [12]90.68 ± 0.7486.4187.5797.36 ± 0.1691.5096.25
Random Occlusion 4 × [13]89.45 ± 0.5884.6185.9697.07 ± 0.2490.1195.85
Random Occlusion 16 × [13]91.08 ± 0.4887.9488.1297.15 ± 0.1991.1295.95
* Denotes the techniques proposed in this work.
Table 5. Superpixel-based classification performance (in percent) for Oitaven, Ermidas, Eiras and Mestas scenes using the training and testing sets detailed in Table 3. The best result for each scene is displayed in bold.
Table 5. Superpixel-based classification performance (in percent) for Oitaven, Ermidas, Eiras and Mestas scenes using the training and testing sets detailed in Table 3. The best result for each scene is displayed in bold.
OitavenErmidas
TechniqueOAAA κ OAAA κ
None94.23 ± 0.4490.4692.3797.20 ± 0.3892.9296.00
Rotate 4 × 95.18 ± 0.4692.4093.6297.83 ± 0.2794.8196.91
Inner-Rotate 4 × *95.39 ± 0.5592.2293.9098.08 ± 0.2995.6197.25
Dual-Rotate 4 × *94.96 ± 0.3391.4593.3397.70 ± 0.1694.1196.71
Dual-Rotate 16 × *96.20 ± 0.1493.7694.9798.32 ± 0.1196.3697.61
Flip 4 × 95.21 ± 0.3992.7893.6798.01 ± 0.2695.4997.16
Inner-Flip 4 × *95.56 ± 0.2093.0894.1398.12 ± 0.2295.8697.31
Dual-Flip 4 × *95.17 ± 0.3091.7893.6297.91 ± 0.2695.5897.02
Dual-Flip 16 × *96.19 ± 0.1393.1694.9798.56 ± 0.1597.0097.95
PVSA(+/−) 4 × [12]95.32 ± 0.2591.2693.8197.82 ± 0.2995.0696.88
PVSA(+/−) 16 × [12]95.75 ± 0.1592.6694.3998.05 ± 0.1895.3597.22
Random Occlusion 4 × [13]93.81 ± 0.4989.1891.8197.19 ± 0.3093.6795.99
Random Occlusion 16 × [13]95.68 ± 0.2292.0794.2898.08 ± 0.2496.0897.26
EirasMestas
TechniqueOAAA κ OAAA κ
None95.92 ± 0.5388.2293.8591.05 ± 0.2185.2384.85
Rotate 4 × 96.82 ± 0.3290.9995.2190.76 ± 0.4184.2484.32
Inner-Rotate 4 × *96.82 ± 0.3588.1995.2191.32 ± 0.1886.0785.30
Dual-Rotate 4 × *96.36 ± 0.1985.6394.5191.06 ± 0.1384.3784.86
Dual-Rotate 16 × *97.08 ± 0.2588.3795.6091.67 ± 0.1087.6585.93
Flip 4 × 96.92 ± 0.2489.4795.3690.75 ± 0.3483.6284.30
Inner-Flip 4 × *96.96 ± 0.1789.6595.4291.27 ± 0.2486.9185.24
Dual-Flip 4 × *96.79 ± 0.2287.1595.1691.15 ± 0.2185.6985.02
Dual-Flip 16 × *97.13 ± 0.2088.8195.6791.70 ± 0.0987.5185.97
PVSA(+/−) 4 × [12]96.35 ± 0.2685.7094.4991.31 ± 0.2486.8985.30
PVSA(+/−) 16 × [12]96.36 ± 0.2686.2794.4991.35 ± 0.1485.3285.34
Random Occlusion 4 × [13]95.79 ± 0.2885.1393.6490.72 ± 0.1982.6684.27
Random Occlusion 16 × [13]96.83 ± 0.2086.1595.2191.47 ± 0.1186.2485.54
* Denotes the techniques proposed in this work.
Table 6. Superpixel-based overall accuracy delta (in percent) per augmentation technique over baseline performance for all the scenes from the standard and Galicia datasets. The best average result for each dataset is displayed in bold. RO denotes random occlusion.
Table 6. Superpixel-based overall accuracy delta (in percent) per augmentation technique over baseline performance for all the scenes from the standard and Galicia datasets. The best average result for each dataset is displayed in bold. RO denotes random occlusion.
RotateInner-RotateDual-RotateFlipInner-FlipDual-FlipPVSA(+/−)RO
Scene 4 × 4 × 4 × 16 × 4 × 4 × 4 × 16 × 4 × 16 × 4 × 16 ×
Salinas0.161.32−7.633.181.161.8−6.562.691.362.111.361.99
PaviaU3.131.57−0.982.113.82.45−1.381.11−0.070.8−0.441.19
PaviaC0.930.57−0.920.720.840.36−0.620.820.350.590.310.38
Average1.411.16−3.182.011.931.54−2.851.540.551.170.411.19
Oitaven0.951.160.721.970.981.330.941.961.091.52−0.421.45
Ermidas0.630.880.51.120.810.920.711.370.620.85−0.010.88
Eiras0.90.90.441.1611.040.881.210.430.44−0.130.91
Mestas−0.290.270.010.62−0.30.220.10.650.260.3−0.330.42
Average0.830.980.551.420.931.090.841.510.710.94−0.191.08
Table 7. Execution times (in seconds) for the classification of the PaviaC scene and for each augmentation technique. Prediction time refers to the computation for the whole image.
Table 7. Execution times (in seconds) for the classification of the PaviaC scene and for each augmentation technique. Prediction time refers to the computation for the whole image.
Superpixel-BasedPixel-Based
MethodTrainingPredictionTotalTrainingPredictionTotal
None76.485.0981.5731.262453.792485.05
Rotate 4 × 339.766.01345.77137.192645.132782.32
Inner-Rotate 4 × *325.285.88331.16130.442611.152741.59
Dual-Rotate 4 × *328.106.54334.64135.382550.932686.31
Dual-Rotate 16 × *1305.476.161311.63508.972770.773279.74
Flip 4 × 325.655.30330.95129.622525.462655.08
Inner-Flip 4 × *327.905.35333.25132.812541.492674.30
Dual-Flip 4 × *318.986.01324.99132.512528.272660.78
Dual-Flip 16 × *1320.237.471327.70505.772739.753245.52
PVSA(+/−) 4 × [12]329.507.62337.12133.042767.882900.92
PVSA(+/−) 16 × [12]1306.377.531313.90509.752770.443280.19
Random Occlusion 4 × [13]325.107.98333.08127.092724.732851.82
Random Occlusion 16 × [13]1297.857.581305.43510.652752.503263.15
* Denotes the techniques proposed in this work.
Table 8. Pixel-based classification performance (in percent) for Salinas, PaviaU and PaviaC scenes using the training and testing sets detailed in Table 2. The best result for each scene is displayed in bold.
(a)
(a)
Salinas
Technique OA AA κ
None82.76 ± 2.4286.1580.74
Rotate 4 × 87.74 ± 1.2189.0786.39
Inner-Rotate 4 × *87.96 ± 0.3589.7586.60
Dual-Rotate 4 × *87.12 ± 0.7089.5885.65
Dual-Rotate 16 × *91.02 ± 0.7193.3890.00
Flip 4 × 88.84 ± 0.9191.1487.58
Inner-Flip 4 × *87.92 ± 1.3989.3986.55
Dual-Flip 4 × *87.73 ± 1.7790.0186.35
Dual-Flip 16 × *93.40 ± 0.1794.8092.65
PVS(+/−) 4 × [12]86.57 ± 0.5686.9185.04
PVS(+/−) 16 × [12]91.15 ± 0.2591.6290.12
Random Occlusion 4 × [13]85.24 ± 0.6486.9583.54
Random Occlusion 16 × [13]92.21 ± 0.3494.6791.56
(b)
(b)
PaviaU PaviaC
Technique OA AA κ OA AA κ
None84.30 ± 0.6271.6978.9196.82 ± 0.1888.7895.50
Rotate 4 × 88.02 ± 1.4577.6083.8897.32 ± 0.3591.5296.21
Inner-Rotate 4 × *85.24 ± 0.7376.2580.2397.11 ± 0.1390.0195.89
Dual-Rotate 4 × *85.59 ± 0.5477.8680.9296.85 ± 0.4589.7895.53
Dual-Rotate 16 × *86.91 ± 0.2877.5482.5897.63 ± 0.0792.7196.63
Flip 4 × 85.51 ± 5.2278.5681.1098.23 ± 0.0294.5997.50
Inner-Flip 4 × *88.17 ± 1.8982.3384.3397.43 ± 0.1090.7796.36
Dual-Flip 4 × *87.02 ± 0.9078.1882.8697.32 ± 0.3591.3696.20
Dual-Flip 16 × *86.51 ± 0.8678.0282.1497.27 ± 0.0591.3396.13
PVS(+/−) 4 × [12]85.51 ± 0.8977.9180.7097.13 ± 0.0791.0395.93
PVS(+/−) 16 × [12]85.26 ± 0.5676.2180.3396.54 ± 0.2187.7895.09
Random Occlusion 4 × [13]82.94 ± 0.4570.8477.0396.74 ± 0.1789.7495.38
Random Occlusion 16 × [13]84.22 ± 0.6168.4778.7896.27 ± 0.2187.9494.70
* Denotes the techniques proposed in this work.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Acción, Á.; Argüello, F.; Heras, D.B. Dual-Window Superpixel Data Augmentation for Hyperspectral Image Classification. Appl. Sci. 2020, 10, 8833. https://doi.org/10.3390/app10248833

AMA Style

Acción Á, Argüello F, Heras DB. Dual-Window Superpixel Data Augmentation for Hyperspectral Image Classification. Applied Sciences. 2020; 10(24):8833. https://doi.org/10.3390/app10248833

Chicago/Turabian Style

Acción, Álvaro, Francisco Argüello, and Dora B. Heras. 2020. "Dual-Window Superpixel Data Augmentation for Hyperspectral Image Classification" Applied Sciences 10, no. 24: 8833. https://doi.org/10.3390/app10248833

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop