*2.3. Hyperspectral Dataset*

Using the aforementioned instrumentation, some of the areas highlighted by pathologists from each slide were imaged. The positioning joystick of the microscope was used to select the initial position of the first HS image within a ROI to be captured. Then, we configured in the software the number of images to be captured consecutively. This number of images should keep relatively low to avoid the focus worsening of the images throughout the specimen. In this case, a maximum of 10 HS images were extracted consecutively from a ROI. We used a 20× magnification for image acquisition, producing a HS image size of 375 × 299 μm. This magnification was chosen because it allowed the visualization of the cell morphology; hence, the classifier was able to exploit both the spatial and the spectral features of data. In Figure 3, we show some examples of HS images used in this study, together with the spectral signatures of representative tissue components, i.e., cells and background for both tumor and non-tumor regions.

**Figure 3.** HS histopathological dataset. (**a**,**b**) HS cubes from tumor and non-tumor samples, respectively. (**c**) Spectral signatures of different parts of the tissue: tumor cells (red), non-tumor cells (blue), tumor background tissue (black), and non-tumor background tissue (green).

In this research, we used a CNN to perform the classification of the samples. Due to the nature of the data for this study, the ground truth assignment into tumor or non-tumor is shared across each selected ROI; thus, each HS image is assigned within a certain class. For this reason, it was decided to perform the classification in a patch-based approach because a fully-convolutional design was not feasible. There are two motivations on the selection of the patch size. Firstly, the patch should be large enough to contain more than one cell, but if the patch is too large, then the CNN could learn that the tumor is located only in dense cell patches. Secondly, the smaller the patches, the higher the quantity of patches will be extracted from a single HS image, so the number of samples to train the CNN will be increased. Finally, we choose a patch size of 87 × 87 pixels. In principle, from a spectral cube of size 800 × 1004, 99 patches can be extracted. However, there are some situations where most parts of the patches consisted only of a blank space of light. For this reason, we decided to reject patches that were composed by more than 50% of light, i.e., half of the patch is empty.

The method to reject the patches which presented high amount of light is as follows. Firstly, the RGB image is extracted from the HS cube and is transformed to the hue-saturation-value color representation. Then, the hue value of each image is extracted and binarized using a threshold empirically configured to separate the pixels belonging to the specimen and the pixels containing background light. The generation of the patches can be observed in Figure 4, where the last row of the patches (in Figure 4c) represents patches that have been rejected in the database due to high content of background light pixels.

**Figure 4.** Generation of patches. (**a**) Original HS image; (**b**) grid of patches within the HS image; (**c**) patches of size 87 × 87 used in the classification. The last row contained patches that were rejected for the dataset for having more than 50% of empty pixels. HSI: hyperspectral imaging.

The database used in this work consists of 527 HS images, where 337 are non-tumor brain samples and 190 were diagnosed as GB. It should be highlighted that only the biopsies from 8 patients presented both non-tumor and tumor samples; the other 5 patients only presented tumor samples. The summary of the employed dataset is detailed in Table 1. After extracting the patches that were valid to be processed, we had a total of 32,878 patches from non-tumor tissue and 16,687 from tumor tissue.

