*3.3. Heat Map Results*

Beyond the results obtained for the analysis of the patches, we also qualitatively evaluated the outcomes of the classification by generating classification heat maps from the HS images. In these maps, the probability of each pixel to be classified as tumor is represented, where red values indicate high probability and blue values indicate low probability. The inputs of our CNN are patches of 87 × 87 pixels, for this reason the resolution of the heat maps cannot contain pixel-level details. To provide them with resolution enough for a useful interpretation, we generated classification results for a 23-pixels sliding window length. We show two different types of heat maps in Figures 7 and 8.

**Figure 7.** Heat maps from good performance patients. (**a**) Non-tumor tissue with no false positive; (**b**) non-tumor tissue with some false positives; (**c**) tumor tissue with no false negative; (**d**) tumor tissue with some false negatives.

**Figure 8.** Heat maps from bad performing patients. (**a**,**b**) Non-tumor and tumor maps from Patient P4; (**c**,**d**) non-tumor and tumor maps from Patient P6.

On the one hand, in Figure 7, we illustrate the different types of results that are obtained in patients where the models were proven to classify accurately the samples. Figure 7a,c show examples of non-tumor and tumor images that were classified correctly, with no presence of neither false positives nor false negatives, respectively. In Figure 7b, we can see the presence of some false positives in a non-tumor tissue image, but such false positives are located in an area where there is a cluster of cells, indicating that it is a suspicious region. Finally, Figure 7d shows a tumor image where there are some regions classified as non-tumor tissue. Nevertheless, such false negatives are located in areas where there are no cells. The FP in Figure 7b suggests the CNN perceives areas with high cell density as tumor. The FN shown in Figure 7d has a clinical interpretation, but it is computed as a bad result in the quantitative evaluation of the classification. Furthermore, a more detailed ground truth scheme for classification may improve the classification performance, e.g., the inclusion of brain background tissue, blood vessels, or blood cells.

On the other hand, we show, in Figure 8, the heat maps from patients that present the worst performance in the quantitative evaluation of the results. Firstly, Figure 8a,b show the results for Patient P4. It can be observed that the heat maps for each kind of tissue are similar, presenting false positives for the non-tumor image and false negatives for the tumor image. For this patient, the heat maps and the quantitative results are coherent, showing that the CNN is not able to accurately classify the samples from this patient. Secondly, Figure 8c,d show the heat maps for Patient P6. As mentioned before, the non-tumor tissue of this patient was proven to be adjacent to the tumor area, and hence cannot be considered as non-tumor tissue. In this case, Figure 8c shows that the non-tumor area has

been classified as tumor, which is in fact correct. Finally, it was also discussed that Patient P6 presented a non-typical GB with low cellularity. A heat map from a tumor image from this patient (Figure 8d) shows that tumor cells are highlighted as tumor, but the areas with low cellular presence are diagnosed as non-tumor.

#### **4. Discussion and Conclusion**

In this research work, we present a hyperspectral microscopic imaging system and a deep learning approach for the classification of hyperspectral images of H&E pathological slides of brain tissue samples with glioblastoma of human patients.

As described in the introduction, this research is the continuation of a previous research that presented some drawbacks [23]. Firstly, the total number of HS cubes used in our previous work was limited to 40, only 4 HS cubes per patient. Secondly, the instrumentation used in this previous work presented limitations in both the spectral and spatial information. Regarding the spectral information, the spectral range was restricted to 419–768 nm due to limitations of the microscope optical path. The spatial information was limited due to the use of a scanning platform unable to image the complete scene, so the analysis of the HS images was restricted to a low magnification (5×), which was not sufficient to image the morphological features of the sample. Additionally, the main goal of such previous work was to develop a preliminary proof-of-concept on the use of HSI for the differentiation of tumor and non-tumor samples, showing promising results.

In this work, an improved acquisition system capable of capturing high-quality images in a higher magnification (20×), and with a higher spectral range (400–1000 nm) has been used to capture a total amount of 517 HS cubes. The use of 20× magnification allows the classifier to exploit both the spectral and the spatial differences of the samples to make a decision.

Such dataset was then used to train a CNN and to perform the classification between non-tumor and tumor tissue. Due to a limited number of patients involved in this study and with the aim to provide a data partition scheme with minimum bias, we decided to split the dataset in four different folds where the training, validation, and testing data belonged to different patients. Each fold was trained with 9 patients, where only 5 of them presented both types of samples, i.e., tumor and non-tumor tissue.

After selecting models with high AUC and balanced accuracy, sensitivity and specificity in the validation phase, some results on the test set were not accurate at all. For this reason, we carefully inspected the heat maps generated by the classifiers for each patient in order to find a rationale about the inaccurate results. After this, we detected four types of problems in the images that could worsened the results, namely the presence of ink or artifacts in the images, unfocused images, or excess of red blood cells. We reported both results, before and after cleaning wrong HS images, for a fair experimental design. We consider that the test results after removing such defective HS images are not biased because the rationale of removing the images from the test set is justifiable and transparent. These corrupted images were part of the training set, but it is unknown if the training process of the CNN was affected.

We also found a patient, P6, where the results were really inaccurate. For this reason, the regions of interest that were analyzed by HS were re-examined by the pathologists. After examining the sample, an atypical subtype of GB was found, and examination revealed that the ROIs selected as normal samples were close to the tumor area, which cannot be considered as non-tumor. Although the classification results were not valid for this patient, by using the outcomes of the CNN, we were able to identify a problem with the prior examination and ROI selection within the sample. Additionally, although this patient was used both as part of the training and as a patient used for validation, the results are not significantly affected by this fact. These results highlight the robustness of the CNN for tumor classification. Firstly, although the validation results of fold 1 were good when evaluating patient P6, the model from this fold was also capable of accurately classifying patients P1 and P11. Secondly, although patient P6 was used as training data for fold 3 and fold 4, the outcomes of these models were not proven to be significantly affected by contaminated training data.

Although the results are not accurate in every patient, after excluding incorrectly labeled and contaminated HS images, nine patients showed accurate classification results (P1 to P3 and P8 to P13). Two patients provided acceptable results (P5 and P7), and only a single patient presented results that were slightly better than random guessing (P4). Nevertheless, these results can be considered promising for two main reasons. First, a limited number of samples were used for training, especially for the non-tumor class, which was limited to only five training patients for each CNN. Second, the high inter-patient variability shows significant differences between tumor samples among the different patients. As can be observed in the analysis of heat maps (Figures 7 and 8), there is a significant heterogeneity in cellular morphology in different patients' specimens, which makes GB detection an especially challenging application. To handle these challenges, the number of patients should be increased in future works, and to deal with the high inter-patient variability, HS data from more than a single patient should be used to validate the models.

Finally, we found that HSI data perform slightly better than RGB images for the classification. Such improvement is more evident when the classification is performed on challenging patients (e.g., P5 or P7) or in patients with only tumor samples. Furthermore, the classification results of HSI are shown to provide more balanced sensitivity and specificity, which is the goal for clinical applications, improving the average sensitivity and specificity by 7% and 9% with respect to the RGB imaging results, respectively. Nevertheless, more research should be performed to definitively demonstrate the superiority of HSI over conventional RGB imagery.

**Author Contributions:** Conceptualization, S.O., M.H., H.F., G.M.C. and B.F.; software, S.O. and M.H.; validation, S.O. and M.H.; investigation, S.O., M.H., and H.F.; resources, G.M.C. and B.F.; data curation, R.C. and M.d.l.L.P.; writing—original draft preparation, S.O., and M.H.; writing—review and editing, H.F., F.G., R.C., M.d.l.L.P., G.M.C., and B.F.; supervision, R.C., M.d.l.L.P., G.M.C. and B.F.; project administration, G.M.C., and B.F.; funding acquisition, G.M.C., and B.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is supported in part by the Cancer Prevention and Research Institute of Texas (CPRIT) grant RP190588. This work has been supported by the Spanish Government through PLATINO project (TEC2017-86722-C4-4-R), and the Canary Islands Government through the ACIISI (Canarian Agency for Research, Innovation and the Information Society), ITHaCA project "Hyperspectral identification of Brain tumors" under Grant Agreement ProID2017010164. This work was completed while Samuel Ortega was beneficiary of a pre-doctoral grant given by the "*Agencia Canaria de Investigacion, Innovacion y Sociedad de la Información (ACIISI)*" of the "*Conserjería de Economía, Industria, Comercio y Conocimiento*" of the "*Gobierno de Canarias*", which is part-financed by the European Social Fund (FSE) (*POC 2014-2020, Eje 3 Tema Prioritario 74 (85%)*).

**Conflicts of Interest:** The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

**Ethical Statements:** Written informed consent was obtained from all of the participant subjects, and the study protocol and consent procedures were approved by the Comité Ético de Investigación Clínica-Comité de Ética en la Investigación (CEIC/CEI) of the University Hospital Doctor Negrin: IdenTificacion Hiperespectral de tumores CerebrAles (ITHaCA), Code: 2019-001-1.
