1. Introduction
The spread of Deep Learning (DL) techniques and frameworks has led to a revolution in the medical imaging field. The assessment of organ viability, by donor kidney biopsy examination, is essential prior to transplantation. The traditional evaluation of biopsies was based on the visual analysis by trained pathologists of biopsy slides using a light microscope which is a time consuming and highly variable procedure. The high variability between the observers resulted in poor reproducibility among pathologists, which may cause an inappropriate organ discard. Therefore, the development of new techniques able to objectively and rapidly interpret donor kidney biopsy to support pathologist’s decision making is strongly fostered. The increasing availability of whole-slide scanners, which facilitate the digitization of histopathological tissue, led to a new research field denoted as digital pathology and generated a strong demand for the development of Computer-Aided Diagnosis (CAD) systems. As stated in the literature, the application of deep learning techniques for the analysis of Whole-Slide Images (WSIs) has shown significant results and suggest that the integration of DL framework with CAD systems is a valuable solution.
In the realm of digital pathology, several recent studies have proposed CAD systems for glomerulus identification and classification in renal biopsies [
1,
2,
3,
4,
5,
6,
7,
8]. The eligibility for transplantation of a kidney retrieved from Expanded Criteria Donors (ECD) relies on rush histological examination of the organ to evaluate suitability for transplant [
9]. The Karpinski score is based on the microscopic examination of four compartments: glomerular, tubular, interstitial and vascular, in order to assess the degree of chronic injury. For each compartment is assigned a score from 0 to 3 where 0 corresponds to normal histology and 3 to the highest degree of, respectively, global glomerulosclerosis, tubular atrophy, interstitial fibrosis and arterial and arteriolar narrowing [
9,
10]. The evaluation of global glomerulosclerosis requires detection and classification of all the glomeruli present in a kidney biopsy, distinguishing between healthy (non-sclerotic) and non-healthy (sclerotic) ones.
The two fundamental components that characterize a non-sclerotic glomerulus are the capillary tuft with the mesangium and the Bowman’s capsule. The first one is placed inside the glomerulus while the second one is peripheral and has the function to contain the tuft. The space between these two components is called Bowman’s space. From a morphological point of view, the non-sclerotic glomerulus generally has an elliptic form. The capillary tuft has a pomegranate form, caused by the contemporary presence of blue points (nuclei of cells), white areas (capillary lumens) and variable amount of regions with similar tonality and different levels of saturation (mesangial matrix). A non-healthy glomerulus, from the point of view of Karpinski’s score, is a globally sclerotic glomerulus, namely a glomerulus where capillary lumens are completely obliterated for increase in extracellular matrix and Bowman’s space is completely filled by collagenous material. Examples of non-sclerotic and sclerotic glomeruli are depicted in
Figure 1.
Ledbetter et al. proposed a Convolutional Neural Network to predict kidney function (evaluated as the quantity of primary filtrate that passes from the blood through the glomeruli per minute) in chronic kidney disease patients from whole-slide images of their kidney biopsies [
3]. Gallego et al. proposed a method based on the pretrained AlexNet model [
11] to perform glomerulus classification and detection in kidney tissue segments [
2]. Gadermayr et al. focused on the segmentation of the glomeruli. The authors proposed two different CNN cascades for segmentation applications with sparse objects. They applied these approaches to the glomerulus segmentation task and compared them with conventional fully convolutional networks, coming to the conclusion that cascade networks can be a powerful tool for segmenting renal glomeruli [
4]. Temerinac-Ott et al. compared the performance between a CNN classifier and a support-vector machines (SVM) classifier which exploits features extracted by histogram of oriented gradients (HOG) [
12] for the task of glomeruli detection in WSIs with multiple stains, using a sliding window approach. The obtained results showed that the CNN method outperformed the HOG and SVM classifier [
1]. Kawazoe et al. faced the task of glomeruli detection in multistained human kidney biopsy slides by using a Deep Learning approach based on Faster R-CNN [
6]. Marsh et al. developed a deep learning model that recognizes and classifies sclerotic and non-sclerotic glomeruli in whole-slide images of frozen donor kidney biopsies. They used a Fully Convolutional Network (FCN) followed by a blob-detection algorithm [
13], based on Laplacian-of-Gaussian, to post-process the FCN probability maps into object detection predictions [
8]. Ginley et al. proposed a CAD to classify renal biopsies of patients with diabetic nephropathy [
7], using a combination of classical image processing and novel machine learning techniques. Hermsen et al. adopted CNNs, namely an ensemble of five U-Nets, for segmentation of ten tissue classes from WSIs of periodic acid-Schiff (PAS) stained kidney transplant biopsies [
14].
The analysis of the literature suggests that main works focused on the glomerular detection task only, without considering the further classification into sclerotic and non-sclerotic [
1,
2,
4,
6]. Few papers considered the assessment of global glomerulosclerosis from kidney biopsies [
7,
8,
14].
In our previous works we focused on other kidney biopsies analysis tasks, such as classification of tubules and vessels [
15] and classification of non-sclerotic and sclerotic glomeruli [
5]. In this work, we propose a CAD system to address the segmentation and the classification tasks of glomeruli, in order to obtain a reliable estimate of Karpinski histological score. The proposed work allowed us to obtain better results than the literature in the classification task.
4. Experimental Results
We distinguish between the results obtained at pixel-level (semantic segmentation task) and at object detection level.
In particular, for the semantic segmentation task we group the metrics in Dataset Metrics and Class Metrics [
33].
The group of Dataset Metrics includes semantic segmentation metrics aggregated over the data set: Global Accuracy, Mean Accuracy (the mean of the accuracies calculated per class), Mean IoU (the mean of the IoUs calculated per class), Weighted IoU (mean of the IoUs, weighted by the number of pixels in the class) and Mean F-score (mean of the F-measures calculated per class).
The group of
Class Metrics includes semantic segmentation metrics calculated for each class, namely:
Accuracy (
2),
IoU (
3) and
Mean F-score (F-measure for each class, averaged over all images).
For the object detection task, confusion matrices are calculated assuming that a true positive match between predicted mask and ground truth mask has pixel-wise IoU (
3) of at least 0.2. Besides confusion matrices, the metrics used for assessing the results of the object detection task are:
The best results on non-sclerotic glomeruli have been obtained using DeepLab v3+, while for sclerotic glomeruli the best model was SegNet. An example of the output of our semantic segmentation framework is depicted in
Figure 9.
4.1. Pixel-Level Metrics
Pixel-level dataset metrics for both SegNet and DeepLab v3+ are reported in
Table 5. The pixel-level class metrics of SegNet and DeepLab v3+ are reported in
Table 6 and
Table 7, respectively. The normalized pixel-level confusion matrix are in
Table 8 and
Table 9. Pixel-level confusion matrices are normalized per row; B, NS, S stand for Background, Non-sclerotic and sclerotic, respectively.
4.2. Object Detection Metrics
In object detection confusion matrices B, NS, S stand for Background, Non-sclerotic and Sclerotic, respectively.
The object detection confusion matrices for SegNet and DeepLab v3+ are reported in
Table 10 and
Table 11, respectively. The detection metrics for both the proposed models and a comparison with the method proposed by Marsh et al. [
8] are reported in
Table 12. The SegNet-based model obtained a better F-score for both the glomeruli classes. The DeepLab v3+-based model obtained a better F-score for non-sclerotic glomeruli and a slightly worse F-score for sclerotic glomeruli.
5. Conclusions and Future Work
The proposed approach allowed us to obtain high performance both at pixel and object detection level. The semantic segmentation achieved mean F-score higher than 0.81 and Weighted IoU higher than 0.97 for both SegNet and Deeplab v3+ approaches; the glomeruli detection achieved 0.924 as best F-score for non-sclerotic glomeruli and 0.730 as best F-score for sclerotic glomeruli. We compared our obtained performance with the state of the art. As stated in the
Section 1, there are three main works that face the problem of glomerular classification. Ginley et al. considered the glomerular assessment for patients affected by diabetic nephropathy but not for transplantation purposes [
7]. Hermsen et al. considered many tissue classes, but the number of sclerotic glomeruli in their datasets is too small for a comparison with our method [
14]. Marsh et al. considered the problem of global glomerulosclerosis from kidney transplant biopsies with haematoxylin and eosin (HE) stain [
8]. The performance comparison between our proposed methods and Marsh et al. work is reported in
Table 12. The obtained results show an improvement over the work of Marsh et al. Thus, CNNs for Semantic Segmentation are a viable approach for the purpose of glomerular segmentation and classification, allowing the obtaining of a reliable estimate of the global glomerulosclerosis. Assessing the suitability of kidney from ECD donors relies in many centers on the histological examination of kidney biopsies performed at the time of organ retrieval and processed and evaluated by on-call pathologist that, not necessarily, is an expert trained in renal pathology. The importance of training in renal pathology when assessing biopsy of such cases has been evaluated in some studies reporting better correlation with subsequent allograft outcome of histological scores provided by renal pathologists compared to those provided by general pathologist with potential risk of “overscoring” and the potential of discarding kidneys that could have been potentially transplanted [
34,
35,
36]. The results were validated by the renal pathologists which assessed the reliability of the proposed workflow; the applied methodology constitutes a milestone in the creation of a CAD system for the renal transplant assessment. The proposed system could help pathologists in accomplishing the laborious task of evaluating the eligibility of a kidney for transplantation, providing a rapid and accurate result. Future work will include the use of Deep Learning models explicitly designed for the detection task, such as Faster R-CNN and Mask R-CNN.