3.1.2. Sampling Interval Analysis

In order to simulate the use of different HS cameras where a different number of spectral bands are captured covering the same spectral rage, the following methodology was employed to vary the spectral sampling interval of the HS data. The main goal of this analysis is to reduce the number of bands employed for the classification in this particular application and, consequently, the HS camera size and cost, as well the computational effort.

The spectral sampling interval is the distance between adjacent sampling points in the spectrum or spectral bands. This spectral sampling interval is calculated by Equation (3), where λ*max* − λ*min* is the difference between the maximum and minimum wavelength captured by the sensor, also named as the spectral range. The number of spectral bands have been reduced while the sampling interval increase in order to simulate diverse HS cameras that capture different number of bands.

$$\text{Sampling Interval} \left(nm\right) = \frac{\lambda\_{\text{max}} - \lambda\_{\text{min}}}{number\ of\ bands} \tag{3}$$

Table 2 shows the different sampling interval values obtained for each number of spectral bands chosen. The original raw HS image captured by the sensor is composed by 826 bands, having 2–3 nm of spectral resolution and 0.73 nm of sampling interval. This HS camera covers the range between 400 and 1000 nm. In order to avoid the noise produced by the CCD (charge-coupled device) sensor in the extreme bands, they were removed, obtaining a final spectral range from 440 to 902 nm with 645 spectral bands and the same sampling interval. Using this HS cube as a reference, several sampling intervals were applied, reducing the number of bands and, in consequence, the size of the HS image. The original raw image size was higher than 1 GB and by reducing spectral bands from 826 to 8 the image size obtained was ~12 MB. These data were obtained from an average value of the HS test cubes, since the spatial dimension of each HS cube were different.


**Table 2.** Summary of the HS dataset with different sampling intervals and number of spectral bands.

#### 3.1.3. Training Dataset Reduction

Supervised classifiers rely on the quality and amount of the labeled data to perform the training and create a generalized model to produce accurate results. However, in some cases, the labeled data can be unbalanced between the different classes and may contain redundant information, increasing the execution time of the training process and even worsening the performance of the classification results. Taking into account that the optimization algorithms used in this work have to perform an iterative training of the classifier in order to find the most relevant wavelengths for obtaining an accurate classification, it is beneficial to accelerate the training process.

The methodology proposed in this section for optimizing the training dataset is based on the use of K-Means unsupervised clustering [54]. Figure 4 shows the block diagram of the proposed approach for reducing the training dataset. In this approach, the training dataset is separated in four groups that correspond with the different classes available in the labeled dataset: normal, tumor, hypervascularized and background. The total number of samples available in the entire dataset is 269,676 pixels, corresponding to 101,706 normal pixels, 11,054 tumor pixels, 38,784 hypervascularized pixels, and 118,132 background pixels (see Table 1). The K-Means clustering is applied independently to each group of labeled pixels in order to obtain 100 different clusters (K = 100) per group (400 clusters in total). Hence, 100 centroids that correspond to a certain class are obtained. In order to reduce the original training dataset, such centroids are employed to identify the most representative pixels of each class by using the SAM algorithm [44]. For each centroid, only the 10 most similar pixels are selected, having a total of 1000 pixels per class (100 centroids × 10 pixels). Thus, the reduced dataset is intended to avoid the inclusion of redundant information in the training of the supervised classifier. At the end, the reduced training dataset will be composed of 4000 pixels (after applying the K-Means four times independently) being the most representative pixels of the original dataset and obtaining a balanced dataset among the different classes. It is worth noticing that this procedure is executed within the leave-one-patient-out cross-validation methodology. Thus, the labeled pixels of the current test patient will not be included in the original dataset employed for the reduction process.

**Figure 4.** Block diagram of the training dataset reduction algorithm.
