*6.3. Metrics*

### 6.3.1. Pixel Accuracy (PA)

We used the PA metric to compute a ratio between the amount of correctly classified pixels and the total number of pixels as

$$PA = \frac{\sum\_{c=0}^{C} p\_{c\mathcal{L}}}{\sum\_{c=0}^{C} \sum\_{d=0}^{D} p\_{cd}} \tag{19}$$

where we have a total of *C* classes and *pii* is the amount of pixels of class *c* correctly assigned to class *c* (true positives), and *pcd* is the amount of pixels of class *c* inferred to belong to class *d* (false positives). We can see in Table 3 the PA values for our proposed framework in comparison with other state-of-the-art methods. From Table 3, we can see that:


The PA and the computational times for FCN and HOOI-FCN with different numbers of tensor bands are shown in Figure 5.

**Figure 5.** Box and whiskers plot of the pixel accuracy (PA) for the 10 testing scenarios shown in Table 3.

#### 6.3.2. Relative Mean Square Error (rMSE)

In order to compute the reconstruction error of the tensor X for the implementation of HOOI, the rMSE was used:

$$rMSE\left(\mathfrak{A}\right) = \frac{1}{\mathcal{Q}} \sum\_{q=1}^{\mathcal{Q}} \frac{\left\|\mathfrak{A}\_q - \mathfrak{A}\_q\right\|\_F^2}{\left\|\mathfrak{A}\_q\right\|\_F^2},\tag{20}$$

where X*<sup>q</sup>* represents the *q*-th CNNMSI from our dataset with *Q* MSIs and Xˆ *<sup>q</sup>* its corresponding reconstruction computed by (4).

Figure 6a shows the behavior of the reconstruction rMSE for our 100 training images for *J*<sup>3</sup> = 1, ... , *I*3. With this metric we can quantify how good the decomposition represents the input data. The rMSE is also one of the decisive parameters to set the value of the *rank*3(X) = *J*3. To preserve a high performance in the pixel-wise classification task, we set the threshold *ψ* to a value for which the rMSE error is less than or equal to 0.05%, since deeper decomposition decrease the PA to less than 90%, as we can see in Figure 5. For a rank decomposition (128, 128, 5) our rMSE is 0.04%, which means that we reduce the dimensionality of our input data to almost half with a very low loss in performance. Besides, comparing this error with matrix based methods as PCA, we can see that our tensor-based decomposition produces lower rMSE for every value of *J*<sup>3</sup> except for the first one.

#### 6.3.3. Orthogonality Degree of Factor Matrices and Tensor Bands

A way to analyze the algorithm HOOI efficiency is computing the orthogonality degree of the core tensor G and the projection matrices **U**(*n*). As we mentioned in Section 3, we use the all-orthogonality property proposed in [32] and described in (7) and (8) to evaluate the orthogonality degree of our core tensors. Table 4 shows the results of the inner products between each tensor band with the others from one of our training images. We can see that these values are practically zero, which means that our bands are orthogonal. Furthermore, we can see in Figure 6b that (8) is fulfilled.

It is also important to know the orthogonality degree in our projection matrices. From Theorem 2 in [32] we start from the condition **U**(*n*)T**U**(*n*) = **I**(*n*); then, we create a vector **o**ˆ where the components are the trace of each resulting matrix, i.e., tr(**I**(*n*)), and compute the MSE with respect to a vector rank **o** = (*J*1, *J*2, *J*3) as

$$MSE(\mathfrak{d}) = \sum\_{q=1}^{3} \left\| o\_q - \mathfrak{d}\_q \right\|\_F^2. \tag{21}$$

Using this orthogonality analysis, we obtain MSE values very close to zero, e.g., in order of 10<sup>−</sup>20, which means that projection matrices present a high orthogonality degree.


**Table 3.** Quantitative results for 10 test MSIs running in a NVIDIA GeForce GTX 1050 Ti graphics processing unit (GPU), Intel core i7 processor, 8 Gb RAM, SSD 128 Gb, and HDD 1 Tb. Values in blue and red represent the highest PA and the lowest time, respectively.

**Figure 6.** TD metrics (**a**) Reconstruction error computed by the relative mean square error (rMSE) for *J*<sup>3</sup> = 1, ..., *I*<sup>3</sup> and (**b**) norm of each subtensor G*in* , relative to the norm of the first tensor band G*i*<sup>1</sup> .

**Figure 7.** Box and whiskers plots of the behavior of five classes of interest: (**a**) in the original spectral domain, (**b**) the tensor band domain after decomposition for nine bands, and (**c**) the new tensor band domain for five bands.

#### *6.4. Fcn Specifications*

We used hyperparameter search [33] to set the learning rate to 1 × <sup>10</sup>−3. The model was run 100 epochs introducing 100 CNNMSI from our dataset. We used the Adam optimizer as our optimization algorithm. Xavier initialization was used for setting the initial values of the weights in the model. The Segnet FCN was used as the base model, since it achieves very high performance metrics in semantic segmentation [35].

#### *6.5. Hardware/Software Specifications*

Our framework was implemented using Python 3.7 with Tensorflow-GPU version 1.13. Experiments were run with a NVIDIA GeForce GTX 1050 Ti GPU. The processor used was an Intel core i7 with 8GB RAM, 128 GB SSD, and 1 TB HDD.

**Table 4.** Inner products of each tensor band with the others from one image of our dataset decomposed by HOOI.


**Figure 8.** Comparison of the PA and the computational time of FCN with the proposed HOOI-FCN (seven and five bands) for semantic segmentation. See Table 3.

**Figure 9.** Qualitative results testing a scene of interest with abundant vegetation, and presence of shadows and clouds. (**a**) Original true color scenario of 128 × 128 pixels, in Central Europe: (**b**) five classes semi-manually labeled ground truth of the MSIs, (**c**) classification with an unsupervised normalized difference index (NDI) fusion algorithm, and (**d**) output prediction after 100 epochs in the FCN used for this work without data compression. (**e**) PCA-FCN framework output; (**f**) prediction of the whole framework HOOI-FCN proposed in this work; and (**g**) PA behavior of the HOOI-FCN versus number of tensor bands.

**Figure 10.** Qualitative results testing a scene of interest with abundant vegetation, and presence of shadows and clouds. (**a**) Original true color scenario of 128 × 128 pixels, in Central Europe: (**b**) five classes semi-manually labeled ground truth of the MSIs, (**c**) classification with an unsupervised normalized difference index (NDI) fusion algorithm, and (**d**) output prediction after 100 epochs in the FCN used for this work without data compression. (**e**) PCA-FCN framework output; (**f**) prediction of the whole framework HOOI-FCN proposed in this work; and (**g**) PA behavior of the HOOI-FCN versus number of tensor bands.

**Figure 11.** Qualitative results testing a scene of interest with abundant presence of soil. (**a**) Original true color scenario of 128 × 128 pixels, in Central Europe: (**b**) five classes semi-manually labeled ground truth of the MSIs, (**c**) classification with an unsupervised normalized difference index (NDI) fusion algorithm, and (**d**) output prediction after 100 epochs in the FCN used for this work without data compression. (**e**) PCA-FCN framework output; (**f**) prediction of the whole framework HOOI-FCN proposed in this work; and (**g**) PA behavior of the HOOI-FCN versus number of tensor bands.

**Figure 12.** Qualitative results testing a scene of interest with abundant presence of clouds. (**a**) Original true color scenario of 128 × 128 pixels, in Central Europe: (**b**) five classes semi-manually labeled ground truth of the MSIs, (**c**) classification with an unsupervised normalized difference index (NDI) fusion algorithm, and (**d**) output prediction after 100 epochs in the FCN used for this work without data compression. (**e**) PCA-FCN framework output; (**f**) prediction of the whole framework HOOI-FCN proposed in this work; and (**g**) PA behavior of the HOOI-FCN versus number of tensor bands.

#### **7. Discussion and Comparison with Other Methods**

Original spectral bands (Figure 7a) were transformed or mapped into new tensor bands (Figure 7b,c) which preserved features of our classes of interest within the first tensor bands, avoiding the use of all the original spectral bands, thereby reducing computational load in further applications.

From Figure 7b,c, we can see that, for the classes of interest in this case study, the error margin selected *ψ* is indeed a good parameter to restrict the rank in the third mode, since the spectral information for differentiation of these five classes is a greater proportion than the first elements of the spectral domain. Nevertheless, if a smaller value for *J*<sup>3</sup> were used, there would be a trade off in the performance of the semantic segmentation.

Quantitative results in Figures 8–12 and Table 3 present a comparison of the processing time and PA from our proposed framework with a model without any preprocessing data decomposition algorithm and with a normalized differentiation index based method in different scenarios. The accuracy values obtained by the proposed HOOI-FCN framework are better in overall than those obtained by the other methods under same conditions and scenarios, but with a quite significant decrease of the processing time, in the order of 10 times. It is worth noting that our HOOI-FCN framework with seven and five tensor bands outperforms in PA to the same FCN with the original nine bands. This means that the decomposition produces better features for the classification ANN.

In the confusion matrix presented in Figure 13, we can see the accuracy of the framework proposed HOOI-FCN for each class and the overall accuracy. Rows correspond to the output class or prediction and the columns to the truth class. Diagonal cells show the correctly classified pixels. Off-diagonal cells show where the errors come from. The rightmost column shows the accuracy for each predicted class, while the bottom row shows the accuracy for each true class. It is important to note that vegetation and cloud classes are close to 95% accuracy, while for water and cloud shadows have less than 90% accuracy. The latter can be caused by the lack of samples with a greater contribution of these elements in the training dataset as well as the similarity of these elements to others in the scenes.

**Figure 13.** Confusion matrix of the proposed framework. The main diagonal indicates the pixel accuracy for each class in % for the ten selected scenarios.

#### **8. Conclusions**

Any RS-MSI or -HSI or third-order tensor image is mapped by the TKD to another tensor, called core tensor representative of the original, preserving its spatial structure, but with fewer tensor bands. In other words, a new subspace embedded in the original space was found and it was be used as the new input space for the task of pixel-level classification or semantic segmentation. Due to the success of DL for image processing, our approach employs an FCN network as the classifier, which delivers the corresponding prediction matrix of pixels classified element-wise.

The efficiency of the proposed higher order orthogonal iteration (HOOI)-FCN framework is measured by metrics such as pixel accuracy (PA) or recall as a function of the number of new tensor bands, which is defined by the reconstruction error computed by the rMSE. Another important parameter in the TKD is the orthogonality degree of each component, i.e., the core tensor and the factor matrices, computed by the inner products of each band with the others.

Our experimental results for a case study show that the proposed HOOI-FCN framework for CNNMSI semantic segmentation reduced the number of spectral bands from nine to seven or five tensor bands, for which PA values converge or are very close to the maximum.

State-of-the-art methods, such as normalized difference indexes, PCA with five principal components, and the same FCN network with nine original bands, with an average pixel accuracy 90% (computational time ∼90s), were outperformed by the HOOI-FCN framework, which achieved a higher average pixel accuracy of 91.97% (and computational time ∼36.5s), and average PA of 91.56% (computational time 9.5s) for seven and five new tensor bands respectively.

These results are very promising in RS, since the use of other algorithms for the calculation of core tensors and a deeper data analysis of weights and initialization of the convolutional neural network (CNN) can increase performance metrics of the segmentation for RS spectral data. Some limitations for a better validation of this approach are: denoising is not included; there is a need for new cases to enhance the input space; use of a greater number of classifiers is needed.

Finally, this research allows us to emphasize two main, relevant points. (1) RS images are characterized by a large number of bands, high correlation between neighbor bands, and high data redundancy; (2) besides, they are corrupted by several noises. Some issues related to our approach remain open.
