**Josué López 1,\*, Deni Torres 1, Stewart Santos <sup>2</sup> and Clement Atzberger <sup>3</sup>**


Received: 22 November 2019; Accepted: 11 January 2020; Published: 5 February 2020

**Abstract:** This work aims at addressing two issues simultaneously: data compression at input space and semantic segmentation. Semantic segmentation of remotely sensed multi- or hyperspectral images through deep learning (DL) artificial neural networks (ANN) delivers as output the corresponding matrix of pixels classified elementwise, achieving competitive performance metrics. With technological progress, current remote sensing (RS) sensors have more spectral bands and higher spatial resolution than before, which means a greater number of pixels in the same area. Nevertheless, the more spectral bands and the greater number of pixels, the higher the computational complexity and the longer the processing times. Therefore, without dimensionality reduction, the classification task is challenging, particularly if large areas have to be processed. To solve this problem, our approach maps an RS-image or third-order tensor into a core tensor, representative of our input image, with the same spatial domain but with a lower number of new tensor bands using a Tucker decomposition (TKD). Then, a new input space with reduced dimensionality is built. To find the core tensor, the higher-order orthogonal iteration (HOOI) algorithm is used. A fully convolutional network (FCN) is employed afterwards to classify at the pixel domain, each core tensor. The whole framework, called here HOOI-FCN, achieves high performance metrics competitive with some RS-multispectral images (MSI) semantic segmentation state-of-the-art methods, while significantly reducing computational complexity, and thereby, processing time. We used a Sentinel-2 image data set from Central Europe as a case study, for which our framework outperformed other methods (included the FCN itself) with average pixel accuracy (PA) of 90% (computational time ∼90s) and nine spectral bands, achieving a higher average PA of 91.97% (computational time ∼36.5s), and average PA of 91.56% (computational time ∼9.5s) for seven and five new tensor bands, respectively.

**Keywords:** fully convolutional network; semantic segmentation; spectral image; tensor decomposition

#### **1. Introduction**

Remote sensing RS images are of great use in many earth observation applications, such as agriculture, forest monitoring, disaster prevention, security affairs, and others [1]. The recent and upcoming availability of multispectral and hyperspectral satellites alleviates specific tasks, such as detection, classification, and semantic segmentation. In semantic segmentation, also called pixel-wise classification, each pixel in an RS image is assigned to one class [1]. This classification becomes easier when higher dimensional spectral information is acquired [1]. Spectral systems split, by physical filters, the incoming radiance, and provide a vector with spectral reflectance values called spectral signatures. The remotely sensed spectral signatures enable a precise interpretation and recognition of different elements of interest covering the earth surface [2].

Supervised and unsupervised classification of RS images is a very active research area in spectral analysis [3]. To reduce the data dimensionality, and to concentrate the information into a fewer number of features, a once widely used approach was to define various indices to facilitate the classification of diverse land cover [4]. For instance, normalized difference vegetation index (NDVI) [5] and normalized difference water index (NDWI) [6] use a combination of visible to near infrared (NIR) spectral reflectance respectively, to assess land cover, vegetation vitality, and water status [4]. Additionally, supervised machine learning techniques such as random forest [7], support vector machine (SVM) [8,9], decision trees [10], and ANN [11] have been used for RS spectral image classification and have achieved very high accuracy rates [12]. More recently, CNN has been used for semantic segmentation of multispectral images (MSI), promising to be an alternative for solving semantic segmentation issues [13].

The high spectral redundancy of spectral images produces a huge unnecessary number of computations in classification/segmentation algorithms. It is therefore advisable to implement these algorithms together with a dimensionality reduction preprocessing [14]. Spectral data are stored as three-dimensional arrays, so it seems possible to use tensor decomposition (TD) methods [15] for preprocessing, to reduce high redundancy while avoiding information loss [14]. Different to matrix-based decomposition algorithms, such as principal components analysis (PCA) [16] and SVD, TD approach allows to treat spectral data as third-order tensor preserving the spatial information, which sustains the pixel-wise classification task.

In this work we aim addressing two main issues: data compression at input space, and semantic segmentation; i.e., pixel-wise classification of RS imagery. We introduce a spectral data preprocessing that preserves tensor structure and reduces information loss through tensor algebra [17], with the ultimate aim of reducing processing time while keeping high accuracy in further semantic segmentation CNNs. This will produce MSI compression, preserving the spatial domain while reducing the spectral domain, decomposing the original tensor into a core tensor with same order but much lower dimensionality multiplied by a matrix in each mode in the context of tensor algebra [17]. The core tensor, with lower rank than the original data, is used as the input data to the semantic segmentation ANN instead of the MSIs, decreasing the number of computations and in turn the execution time. Previous experimental results demonstrate high performance in semantic segmentation with circa 10× speed up in execution time [18].

The proposed framework can be applied to multispectral, hyperspectral, and even multitemporal datasets. As a particular case, in this study we performed experiments using RS multispectral dataset from the european space agency (ESA) program Sentinel-2 [19] with five classes (soil, water, vegetation, cloud, and shadow).

#### *1.1. Related Work*

In recent years, spectral data for earth surface classification has been a very active research area. Methods proposed by Kemker et al. [11,20], Hamida et al. [21], and López et al. [18] use CNNs for RS-CNNMSI pixel-wise classification. Nevertheless, processing raw spectral data with deep learning (DL) algorithms is computationally very expensive. Wang et al. [22] introduced a salient band selection method for HSIs by manifold ranking, and Li et al. [23] proposed a band selection method from the perspective of spectral shape similarity analysis of RS-HSIs to obtain less computational complexity. However, some surface materials differentiate from each other in specific bands, so cutting off spectral bands negatively affected further classification tasks.

More recently, the use of tensor approach for spectral images compression has been introduced; see Zhang et al. [24]. Many authors adopted dimensionality reduction algorithms, such as PCA [16] and singular value decomposition (SVD), for spectral image compression. Other authors have made efforts

to reduce the computational cost in CNNs for image classification by using TD algorithms [25,26]. Astrid et al. in [25] proposed a CNN compression method based on CPD and the tensor power method where they achieved significant reduction in memory and computational cost. Chien et al. in [26] presents a tensor-factorized ANN, which integrates TD and ANNs for multi-way feature extraction and classification. Nevertheless, although the idea is to compress data in order to reduce computational cost and processing time, these works compress or decompose the data of the hyper-parameters within the network, which causes the training of the semantic segmentation or classification network to be slower due to the change of the weights in the tensor decomposition.

Recently, three works close to our research [27–29] were published. In [27] An et al. proposed an unsupervised tensor-based multiscale low rank decomposition (T-MLRD) method for hyperspectral image dimensionality reduction, and Li et al. in [28] proposed a low-complexity compression approach for multispectral images based on convolution neural networks CNNs with nonnegative Tucker decomposition (NTD). Nevertheless, these methods reduce the tensor in every dimension, which is self-defeating for a segmentation CNN. Besides, the non-negative decomposed tensor proposed in [28] causes slower convergence in DL algorithms. In [29] An et al. proposed a tensor discriminant analysis (TDA) model via compact feature representation, wherein the traditional linear discriminant analysis was extended to tensor space to make the resulting feature representation more discriminant. However, this approach still leads to a degradation of the spatial resolution, which disturbed the CNN performance. See Table 1 for a summary of the related works.

**Table 1.** Related work in spectral imagery semantic segmentation.


### *1.2. Contribution*

The contribution of this work is summarized into three main points:


The remainder of this work is organized as follows. Section 2 introduces tensor algebra notation and basic concepts to familiarize the reader with the symbology used in this paper. Section 3 presents the problem statement of this work and the mathematical definition. In Section 4, CNN theory is described for classification and semantic segmentation. Section 5 presents the framework proposed for compression and semantic segmentation of spectral images. Experimental results are presented in Section 6. Finally, Sections 7 and 8 present a discussion and conclusions based on the results obtained in the experiments.

#### **2. Tensor Algebra Basic Concepts**

For this work we used the conventional tensor algebra notation [15]. Hence, scalars or zero order tensors are represented by italic lowercase letters; e.g., *a*. Vectors or first order tensor are denoted by boldface lowercase letters; e.g., **a**. Matrices or tensor of order two are denoted by boldface capital letters, e.g., **A**, and three or higher order tensors by boldface Euler script letters, e.g., A. In a *N*-order tensor <sup>A</sup> <sup>∈</sup> <sup>R</sup>*I*1×···×*IN* , where <sup>R</sup> represents the set of real numbers, *In* indicates the size of the tensor in each mode *n* = {1, ... , *N*}. An element of A is denoted with indices in lowercase letters, e.g., *ai*1...*iN* where *in* denotes the *n*-mode of A [17]. A fiber is a vector, the result of fixing every index of a tensor but one, and it is denoted by **a**:*i*2*i*<sup>3</sup> , **a***i*1:*i*<sup>3</sup> , and **a***i*1*i*2—for column, row, and tube fibers respectively for a third order tensor instance. A slice is a matrix, the result of fixing every index of a tensor but two, and it is denoted by **A***i*1::, **A**:*i*2:, and **A**::*i*<sup>3</sup> , or more compactly, **A***i*<sup>1</sup> , **A***i*<sup>2</sup> , and **A***i*<sup>3</sup> for horizontal, lateral, and frontal slices respectively for a third order tensor instance. Finally, **A**(*n*) denotes a matrix element from a sequence of matrices [17].

It is also necessary to introduce some tensor algebra operations and basic concepts used in later explanations. These notations were taken textually from [17].

### *2.1. Matricization*

The mode-*n* matricization is the process of reordering the elements of a tensor into a matrix along axis *<sup>n</sup>* and it is denoted as **<sup>A</sup>**(*n*) <sup>∈</sup> <sup>R</sup>*In*×∏*m*=*<sup>n</sup> Im* .

### *2.2. Outer Product*

The outer product of *<sup>N</sup>* vectors <sup>X</sup> <sup>=</sup> **<sup>a</sup>**(**1**) ◦··· ◦ **<sup>a</sup>**(**N**) produces a tensor <sup>X</sup> <sup>∈</sup> <sup>R</sup>*I*1×···×*IN* where ◦ denotes the outer product and **<sup>a</sup>**(**n**) denotes a vector in a sequence of *<sup>N</sup>* vectors and each element of the tensor is the product of the corresponding vector elements; i.e., *xi*1*i*2...*iN* = *a* (1) *<sup>i</sup>*<sup>1</sup> ... *a* (*N*) *iN* .

#### *2.3. Inner Product*

The inner product of two tensors <sup>A</sup>,<sup>B</sup> <sup>∈</sup> <sup>R</sup>*I*1×···×*IN* is the sum of the products of their entries; i.e., A,B <sup>=</sup> <sup>∑</sup>*I*<sup>1</sup> *<sup>i</sup>*1=<sup>1</sup> ··· <sup>∑</sup>*IN iN*=<sup>1</sup> *ai*1...*iN bi*1...*iN* .
