Kidney cancer is the 9th most commonly occurring cancer in men, and it is less common in women (14th) [
1]. Renal cell carcinoma (RCC) is the most common and aggressive type of kidney cancer in adults. It affects nearly 300,000 individuals worldwide annually, and it is responsible for more than 100,000 deaths each year [
2]. It develops in the lining of the proximal kidney tubules, where cancerous cells grow over time into a mass and might spread to another organ. The symptoms of RCC are usually hidden and not easily diagnosed.
The histological classification of RCC is essential for cancer prognosis and the management of treatment plans. Manual tumor subtype classification from renal histopathologic slides is time-consuming and subjective to pathologists’ experience. Each RCC subtype might lead to a different treatment plan, a different survival rate, and a completely different prognoses. We propose an automated CNN-based RCC subtype classification system to assist pathologists with renal cancer diagnosis and prognosis.
1.1. Background
RCC is a type of kidney cancer that arises from the renal parenchyma, as shown in
Figure 1. Renal parenchyma is the functional part of the kidney, where the tubes filter the blood in the kidneys.
RCC has many different subtypes shown in
Figure 2, each with different histology, distinctive genetic and molecular alterations, different clinical courses, and different responses to therapy [
2]. The three main histological types of RCC are [
4].
Clear cell RCC (70–90%): This type of lesion contains clear cells due to their lipid- and glycogen-rich cytoplasmic content. This tumor also depicts cells with eosinophil granular cytoplasm. The imaging of clear cell RCC (ccRCC) reveals hypervascularized and heterogeneous lesions because of the presence of necrosis, hemorrhage, cysts, and multiple calcifications [
5]. It has the worse prognosis among other RCC subtypes; its 5-year survival rate is 50–69%. When it expands in the kidney and spreads to other parts of the human body, treatment becomes challenging, and the 5-year survival rate falls to about 10% as per the National Cancer Institute (NCI).
Papillary RCC (14–17%): This type occurs sporadically or as a familial condition. Pathologists usually partition this type into two subtypes based on the lesion’s histological appearance and biological behavior. The two subtypes are Papillary type 1 and Papillary type 2, where they have entirely different prognostic factors, where type 2 has a poorer prognosis than type 1. Papillary cells appear in a spindle shape pattern with some regions of internal hemorrhage and cystic alterations [
5].
Chromophobe RCC (4–8%): Chromophobe is more frequent after the age of 60 and less aggressive than ccRCC. It exhibits the best prognosis among all three RCC subtypes. Under the microscope, it appears as a mass formed of large pale cells with reticulated cytoplasm and perinuclear halos. If sarcomatoid transformation occurs, the lesion develops to be more aggressive with a worse survival rate [
5].
Fifteen years ago, pathologists recognized clear cell papillary renal cell carcinoma shown in
Figure 3 as the fourth most prevalent type of renal cell carcinoma. It possesses distinct morphologic and immunohistochemical characteristics and an indolent clinical course. When seen under a microscope, it may resemble other RCCs with clear cell characteristics, such as ccRCC, translocation RCC, and papillary RCC with clear cell alterations. A high index of suspicion is needed to accurately diagnose ccpRCC in the differential diagnosis of RCCs with clear cell and papillary architecture. Clinical behavior is highly favorable with rare, questionable reports of aggressive behavior [
6].
Recent studies demonstrated that multiple imaging modalities such as magnetic resonance imaging (MRI) and computerized tomography (CT) scans could differentiate ccRCC from other histological types [
5]. A new RC-CAD (computer-assisted diagnosis) system presents the ability to differentiate between benign and malignant renal tumors [
7]. The system relies on contrast-enhanced computed tomography (CE-CT) images to classify the tumor subtype as ccRCC or not, while the proposed system achieves high accuracy on the given data, the CT imaging method lacks accuracy when the renal mass is small, leading to higher false-positive diagnostic decisions [
8,
9]. Since pre-surgical CT provides excellent anatomical features with the advantage of three-dimensional reconstruction of the renal tumor, in [
10] the authors propose another radiomic machine learning model for ccRCC grading. The presented machine learning model differentiates the low-grade from the high-grade ccRCC with an accuracy of up to 91%. All of the clinical studies conclude that biopsy is the only gold standard for a definite diagnosis of renal cancer. Microscopic images of hematoxylin and eosin (H&E)-stained slides of kidney tissue biopsies are the pathologists’ main tools for tumor detection and classification. Normal renal tissues present round nuclei with uniform distributed chromatin, while renal tumors exhibit large heterogeneous nuclei. Hence, many studies are based on nuclei segmentation and classification for cancer diagnosis.
Deep learning algorithms achieve human-level performance in medical fields on several tasks, including cancer identification, classification, and segmentation. RCC subtype microscopic images exhibit different morphological patterns and nuclear features. Hence, we apply a deep learning approach for RCC detection and subtype classification to extract these essential features.
1.2. Related Work
Computational pathology is a discipline of pathology that entails extracting information from digital pathology slides and their accompanying metadata, usually utilizing artificial intelligence approaches such as deep learning. Whole slide imaging (WSI) technology is essential to assess specimens’ histological features. WSIs are easy to access, store, and share, where pathologists can effortlessly embody their annotations and apply different image analysis techniques for diagnostic purposes.
Deep learning (DL) methods generated a massive revolution in medical fields such as oncology, radiology, neurology, and cardiology. DL approaches prove superior performance in image-based tasks such as image segmentation and classification [
11]. DL provides the most successful tools for segmenting histology objects such as cell nuclei and glands in computational pathology. However, providing annotations for these algorithms is time-consuming and potentially biased. Recent research aims to automate this process by proposing new strategies that could help provide many annotated images needed for deep learning algorithms [
12,
13].
Researchers recently deployed deep learning in various CAD systems for cancerous cell segmentation, classification, and grading. Deep CNNs were able to extract learned features that replaced hand-crafted features effectively. Most of the classification models in computational pathology can be categorized based on the available data annotations into two main categories:
Patch-level classification models: These models use a sliding window approach on the original WSI to extract small annotated patches. Feeding CNNs with high-resolution WSIs is impractical due to the extensive computational burden. Hence, patch-wise models rely on annotated data as cancerous cells region, normal cells, or specific tumor subtype cells performed by expert pathologists. In [
14], the authors developed a deep learning algorithm to identify Nasopharyngeal Carcinoma in Nasopharyngeal biopsies. Their supervised patch-based classification model relies on valuable detailed patch-wise annotated data. Pathologists require more than an hour to annotate only portions of a whole slide image, resulting in a high accuracy patch-level learning model. At the same time, it is a time-consuming process and requires experienced pathologists. They employed gradient-weighted class activation mapping to extract the morphological features and visualize the decision-making in a deep neural network. They applied the patch-level algorithm to classify image-level WSIs, which scored an AUC of 0.9848 for slide-level identification. In [
15], the authors present a deep learning patch-wise model for the classification of histologic patterns of lung adenocarcinoma. In their research, a patch classifier combined with a sliding window approach was used to identify primary and minor patterns in whole-slide images. The model incorporates ResNet CNN to differentiate between the five histological patterns and normal tissue. They evaluated the proposed model against three pathologists with a robust agreement of 76.7%, indicating that at least two pathologists out of 3 matched their model prediction results.
WSI-level classification models: Most deep learning approaches employ slide-level annotations to detect and classify cancer from histopathology WSIs. These systems follow weakly supervised learning techniques to overcome the lack of large-scale datasets with precisely localized annotations challenges. They usually incorporate a two-stage workflow (patch-level CNN and then slide-level algorithm) known as multiple-instance learning (MIL). In [
16], the authors developed a method for training neural networks on entire WSIs for lung cancer type classification. They applied the unified memory (UM) mechanism and several GPU memory optimization techniques to train conventional CNNs with substantial image inputs without modification in training pipelines or model architectures. They achieved an AUC of 0.9594 and 0.9414 for adenocarcinoma and squamous cell carcinoma classification, respectively, outperforming the conventional MIL techniques. The papers mentioned above deployed well-known CNN architectures such as Resnet and Inception-V3. In [
17], a simplified CNN architecture (PathCNN) was proposed for efficient pan-cancer whole-slide image classification. PathCNN training data combines three datasets of lung cancer, kidney cancer, and breast cancer tissue WSIs. The proposed architecture converged faster with less memory usage than complex architectures such as Inception-V3. Their model was able to classify the TCGA dataset for RCC subtypes TCGA-KIRC (ccRCC), TCGA-KIRP (Papillary RCC), and TCGA-KICH (Chromophobe RCC), with AUC 0.994, 0.994, and 0.992, respectively. However, they used a large dataset of 430 kidney WSIs partially annotated by specialists. In [
18], the researchers presented a semi-supervised learning approach to identify RCC regions using minimal-point-based annotation. A hybrid loss strategy utilizes their proposed classification model results for RCC subtyping. The minimal-point-based classification model outperforms whole-slide based models by 12%, although it requires partially annotated data, which is not always accessible and more subjective to human errors.
Attention-based models have been utilized in computational pathology and presented comparable results to conventional multiple-instance learning (MIL) approaches. However, we are proposing a multiscale MIL framework to elevate the classification performance of the conventional MIL methods. Attention mechanisms can be employed to provide a dynamic representation of features by assigning weights to each recognized feature, improving interpretability and visualization [
19]. A clustering-constrained attention multiple-instance learning (CLAM) system was proposed in [
20]. A weakly-supervised deep-learning approach employs attention-based learning to highlight sub-regions with high diagnostic value for more accurate tumor subtype classification. The system tests and ranks all the patches of a given WSI, assigning an attention score for each patch, which declares its importance to a particular class’s overall slide-level representation. The developers tested CLAM on three different computational pathology problems; renal cell carcinoma (RCC) subtyping is one of them. Applying a 10-fold macro-averaged one-vs.-rest mean test produced an accuracy of 0.991 for the three-class renal cell carcinoma (RCC) subtyping of papillary (PRCC), chromophobe (CRCC), and clear cell renal cell carcinoma (CCRCC). The attention model of CLAM helps to learn subcategory features in the data while simultaneously filtering noisy patches. However, CLAM did not evaluate the effect of the clustering component. Therefore, replacing the single-instance CNN with the MIL design, incorporating clustering in an end-to-end approach is recommended [
21]. Furthermore, CLAM disregards the correlation between instances and does not appraise the spatial information between patches [
22]. Therefore, in our proposed system, we extract overlapping patches to maintain the connectivity of the renal tissue and avoid any loss of contextual and spatial information. In [
23], the authors implemented the CLAM model for prostate cancer grading and compared multiclass MIL with CLAM and their proposed system. The multiclass MIL outperformed CLAM in terms of classification accuracy.
Proteomics data can be incorporated with histopathology images through machine learning in RCC diagnosis [
24]. The proposed proteomics-based RF classifier achieved an overall accuracy of 0.98 (10-fold cross-validation results) in distinguishing between ccRCC and normal tissues and an average sensitivity of 0.97 and specificity of 0.99, while the histology-based classifier scored an accuracy of 0.95 on the test dataset. However, the proposed system was not generalized by testing on different datasets, while it highlights the importance of investigating the predictive relationships between histopathology and proteomics-based diagnostic models.
Recent research employs multiscale WSI patches classification approaches to reflect the pathologist’s actual practices in classifying cancerous regions. To provide his precise evaluation, the pathologist needs to examine the slides at different magnification levels to utilize various cellular features at different scales. In [
25], the authors proposed a multiscale deep neural network for bladder cancer classification. The model can effectively extract similar tissue areas using the ROI extraction method. Their multiscale model achieved an F1-score of 0.986; however, their patch-based classification model uses carefully annotated data by expert pathologists. On the other hand, the researchers in [
26] proposed another multiscale MIL CNN for cancer subtype classification on slide-level annotation, which is similar to our research. Their proposed system addressed malignant lymphoma subtype classification. The multiscale domain-adversarial MIL approach provided an average accuracy of 0.871 achieved by 5-fold cross-validation. Authors in [
27] proposed a state-of-the-art semantic segmentation model for histopathology images. HookNet utilizes concentric patches at multiple resolutions fed to a series of convolutional and pooling layers to effectively combine information from context and WSI’s details. The authors present two high-resolution semantic segmentation models for lung and breast cancer. HookNet outperforms single-resolution U-Net and other conventional multi-resolution segmentation models for histopathology images.
In this research work, we propose a multiscale learning approach for renal cell carcinoma subtyping. We are the first study to apply the proposed idea of combining the decisions of three CNNs in order to provide high accuracy in histopathology classification. We address the problem of choosing the optimal patch size for pathology slides classification while imitating the pathologists’ practice. Since the small patch size closer to the cell’s size will embody different features than the larger patches that maintain the connectivity between the different cells. The main contributions of our proposed framework are as follows:
- i.
The final classification decision is obtained by a multiscale learning approach to integrate the global features extracted from the large-size patches and the local ones extracted from the smaller patches.
- ii.
The end-to-end system framework is fully automated; hence, hand-made feature extraction methods are not required.
- iii.
The decision fusion approach guarantees the discarding of the patches that do not represent the RCC subtype features that we are looking for in our diagnosis, which significantly improves the classification accuracy.