1. Introduction
OKCs account for 3–11% of all jaw cysts. They are benign neoplasms of odontogenic origin recognized for their invasive tendency. Of all odontogenic cysts, OKCs are of great interest in terms of high propensity to recur after surgical treatment, making up (2.5%, 2–100%). These significant differences are due to various postoperative follow-up periods, the operational methods used, or the inclusion of Nevoid basal cell carcinoma syndrome (NBCCS) cases [
1,
2].
It is essential to note that the distinction between recurring and non-recurring OKCs can sometimes be challenging based on clinical and radiographic features alone. Other factors, such as the specific location of the cyst, patient factors, and genetic factors, may also influence the likelihood of recurrence. A close follow-up, evidenced by histopathology confirmation, is essential in managing OKCs to monitor recurrence and ensure appropriate treatment [
3].
Per the literature evidence, various histological determinants predicting recurrence are para keratinization, basal mitosis, subepithelial split, satellite cysts, dental lamina rests, basal cell budding, reverse polarity, dense collagen, and diffuse inflammation and recently added additional histopathological (h/p) features that strongly suggest recurrence to the existing list including subepithelial hyalinization, incomplete cystic lining, and the corrugated/wavy surface [
4,
5,
6].
Patch-level classification results in a better classification of the entire slide image after aggregating patch-level results with a fusion model. This is performed with the output resulting from CNN on patches. An expectation maximization (EM) technique automatically identifies discerning patches with resilience, utilizing the spatial connections among these patches [
7].
Automating whole slide classification on cancer images is quite common. A lower zoom level is recommended to detect artifacts, tissue areas, and abnormalities. WSIs can be stored for years to refer to different results. Automation can help minimize diagnosis errors or help pathologists obtain first-hand opinions. Identifying the relevant areas of interest inside WSIs becomes critical for the algorithm to succeed. VIT performs better than CNN when considering a higher resolution of WSIs [
8].
Manual analysis of the whole slide image is time-consuming. The availability and accessibility of powerful computers make it easy to automate the detection of different diseases with automated systems. The issue with the current computer-aided system is the availability of standard datasets with the right annotations. CNN may fail to perform feature extraction on specific areas of interest in the whole slide image. Hence, the attention mechanism is appropriate for integration with CNN to achieve better performance in such cases [
9].
This study aims to develop an automation system that classifies whole slide images as either recurring OKC or non-recurring OKC based on specific h/p features. The current study is the first to analyze recurring and non-recurring OKC using whole slide images [
10].
Also called a digital or virtual slide, a WSI is a high-resolution digital rendering of an entire histopathology glass slide encapsulated in gigabytes of data. This technology captures the comprehensive image in a single sweep, affording the capability to zoom in and out on regions of interest (ROIs), an otherwise arduous task when using microscopy. The resultant digital depiction is an expansive, multi-gigapixel file, meticulously conserving all the inherent data within the original glass slide. This WSI platform opens the door to diverse image analysis methodologies, encompassing computer-aided algorithms for quantifying and extracting features [
11].
2. Related Work
2.1. Related Work on Whole Slide Image Challenges
The successful utilization of deep learning in analyzing whole slide images (WSIs) holds the potential to develop advanced clinical tools that excel in accuracy, reproducibility, and impartiality compared with current clinical methods. This approach also offers fresh insights into various pathological conditions. However, WSIs are large, multi-gigabyte images with resolutions of around 100,000 × 100,000 pixels. Existing hardware struggles to accommodate learning from such high-resolution images, necessitating some form of dimensionality reduction [
12].
2.2. Related Work on Preprocessing Images and Class Imbalances
In a previous study, whole slide images were taken from The Cancer Genome Atlas (TCGA) dataset and stained using hematoxylin and eosin. All the images with the highest resolution were taken and resized. A tile size of 1024 × 1024 was performed at 20× resolution. A trained pathologist investigated all these tiles to label them. Tiles with less information were discarded [
13].
Generative adversarial networks (GANs) are very good at generating synthetic data that keep the probability distribution of the original data intact. GANs can be used to create artificial data that are statistically similar to actual data. A hybrid system with GAN often solves the class imbalance problem. Class imbalance can lead to poor performance, as the recommendation system may learn to favor the majority class. A hybrid GAN approach addresses this problem by generating synthetic negative examples to balance the dataset. It helps to improve the performance of the recommendation system. In one study, GANs were trained using an adversarial setting, where the generator and discriminator constantly tried to outsmart each other. This process forced the generator to learn to produce increasingly realistic data while the discriminator learned to become better at differentiating between real and fake data [
14].
2.3. Related Work on Vision Transformer in Image Processing
The advancements achieved using transformer networks in natural language processing have sparked significant interest among the computer vision community to apply these models to vision-related tasks. These fundamental concepts played a role in the development of traditional transformer models. The concept of self-attention enables capturing “long-term” connections between elements within a sequence, a capability lacking in conventional recurrent models that struggle to encode such associations [
15].
The vision transformer (ViT) stands out as a trailblazer in demonstrating that a pure transformer architecture can achieve exceptional performance comparable to models like ResNets and EfficientNet in image classification tasks. This accomplishment becomes evident when dealing with sufficiently large datasets like ImageNet-22k and JFT-300M. ViT’s methodology involves partitioning each image into sequences of fixed-length tokens (non-overlapping patches) and subsequently using standard transformer layers, which encompass both the multi-head self-attention module (MHSA) and the position-wise feed-forward module (FFN), to examine and depict these tokens [
16].
2.4. Related Work on Deep Learning in OKC
In a previous study, the multi-model ensemble learning technique delivered satisfactory outcomes when distinguishing between recurrent and non-recurrent categories of OKCs. Additionally, the predictions generated individually with classifiers and using the conventional ensemble method displayed effectiveness, achieving an accuracy spanning from 85% to 93% in categorizing the dataset. In this study, the ensemble model was outperformed by other evaluated models, including the traditional ensemble [
17].
3. Materials and Methods
This section elaborates on preprocessing and the proposed algorithm with a flow diagram.
3.1. Data Collection
A collaborative study encompassed the pooling of slide archives from multiple centers, resulting in the consolidation of 48 histopathology slides from 113 reported between 2015 and 2020.
The Faculty of Dental Sciences, MSRUAS, Manipal College of Dental Sciences (MCODS) in Manipal, Institute of Dental Science in Bareilly, S Nijalingappa Institute of Dental Sciences and Research in Rajapur, Gulbarga, Karnataka, Maratha Mandal’s Nathajirao G Halgekar, Institute of Dental Science & Research Centre in Belgaum, Karnataka, and the SVS Institute of Dental Sciences in Mahbubnagar, Andhra Pradesh, India, voluntarily participated in the research initiative. Ethics approval was waived due to the retrospective nature of this study, and the collected slides were anonymized and encoded to eliminate any patient identifiers by other centers. Our institute permitted this study with ethics clearance (NO.EC-2021/F/058) for the archived slides. Based on the features identified in the pilot study by Augustine, D., Rao, R. S et al. (2021) [
18], histo/pathology features were re-evaluated on H and E-stained slides and recorded using WSI [
18]. This study included slides with no artifacts, good staining quality, and complete clinical records with post-treatment follow-up.
The slides were digitalized using a whole slide imaging scanner (Morphle Labs Whole Slide Scanner Model INDEX). A total of 48 OKC whole slide images were collected with various zoom levels. Slide sizes vary from 50 megabytes to 3 gigabytes depending on the scanner and zoom level used to obtain the whole slide image. Out of these, 17 recurring and 31 non-recurring slides were identified by expert pathologists.
3.2. Data Preprocessing and Dataset Generation
A total of 48 of the 113 OKC WSIs (e.g.,
Figure 1 and
Figure 2) were identified for this purpose and evaluated by a pathologist to segregate and annotate as recurring and non-recurring slides. An experienced pathologist was engaged in annotating the slides. To rule out subjectivity and inter-observer bias, a third pathologist, who was part of this study, was consulted, and the pathologists arrived at a consensus. Post-annotation, these slides were processed using an automated tile image generation system developed based on an open-source library open slide and deep zoom generator. This automated system generated a tiled image size of 2048 × 2048 and discarded white tiles. The same was performed by calculating the entropy and variance in each pixel value in each tile. Tiles with very low entropy or variance were likely to contain uniform or low information, so these were considered empty tiles or white tiles. After this, each of them was inspected by the pathologist again and was correctly labeled as recurring or non-recurring. The highest zoom-level slides were considered to generate the tiles during this preprocessing, as shown in
Figure 3. The entire dataset generation process flow is shown in
Figure 3.
3.3. Attention-Based Image Sequence Analyzer
The attention-based image sequence analyzer (ABISA) architecture is a hybrid model combining multi-head self-attention-based transformer architecture elements with the LSTM layer. The LSTM layer is used to capture temporal dependencies. This combination leverages the spatial relationships captured by the self-attention mechanism and enhances it with LSTM’s ability to model sequential patterns. The architecture flow diagram is described in
Figure 4.
3.4. Image Data Augmentation
Using the Keras image data generator, preprocessed tiles with the appropriate label were processed with the following parameters for the augmentation of images during the training process. A 70-10-20 rule for training, validation, and testing was used, i.e., 20% of the tiled images from different classes were used for testing the classification model. The following values were used for augmentation: rotation range = 20, width_shift_range = 0.1, height_shift_range = 0.1, shear range = 0.2, zoom_range = 0.2, and horizontal flip = True.
3.5. Patch Extraction
The patches extraction layer is a custom Keras layer that takes an input image and extracts non-overlapping patches. This process allows the model to process smaller image regions independently, providing spatial invariance and reducing computational complexity. For the patch size used in model was 6, each input image was divided into non-overlapping patches of size 6 × 6 pixels. The calculation num_patches = (image_size // patch_size) ** 2 is used to determine the total number of patches extracted from the input images. In this model, for an image with dimensions 64 × 64 (image_size = 64) and extracted patches of size 6 × 6, (64 // 6) ** 2 is used to calculate the total number of patches: 100. It extracted 100 patches from the 64 × 64 input image, each of size 6 × 6 pixels. The patches layer and the subsequent processing extracted and processed these patches for further transformation and classification.
3.6. Patch Encoder Layer
The patch encoder layer takes the extracted patches and encodes them into a meaningful representation. It uses dense (fully connected) layers and an embedding layer to map the patches into a higher-dimensional space. Additionally, it incorporates positional embeddings to retain spatial information about the original image patches. projection_dim = 64 is used in this model. This value specifies the dimensionality of the embeddings generated for each patch, which are projected to a 64-dimensional space. These encoded patch representations are then used in subsequent layers of the ABISA model to perform tasks like self-attention and classification.
3.7. Multi-Head Self-Attention Mechanism
A multi-head self-attention mechanism is used to analyze the input patches (encoded patches). It captures complex relationships and dependencies among them. The multi-head attention layer involves three key parameters. Those are key dimensions, values, and queries.
The key dimension parameter determines the number of attention heads used in parallel. Each attention head specializes in learning distinct aspects of the relationships between input patches. Notably, within each attention head, the dimensionality defined by the key dimension is consistently applied to queries, keys, and values; all are set to 64 in this case.
A dropout rate is applied to the attention scores to facilitate regularization. This means that the model computes attention scores to determine how each element in the encoded patches attends to every other element within itself, known as self-attention. Based on the relationships and interactions between the patches, the multi-head self-attention mechanism operation creates the attention output.
Skip Connection and Layer Normalization: After the attention operation, the attention output is combined with the original encoded patches using a skip connection (element-wise addition). Layer normalization is applied to the combined output to ensure stable training and to help with gradient flow during training.
3.8. LSTM Layer
The proposed LSTM layer had 32 units in its hidden and cell states. The encoded patches with attention were passed through the LSTM layer, which was designed to process sequential data. The LSTM layer takes the encoded patches with attention as input and processes them sequentially, considering the temporal order of the patches. This LSTM layer performs sequence modeling and captures temporal dependencies in the data. The output is a sequence of feature vectors representing the input data’s processed information, incorporating both spatial and sequential information.
Figure 5 describes model summary of proposed model.
3.9. Normalization and Flattening
The LSTM output was normalized and flattened to prepare it for further processing.
3.10. Dropout
The dropout layer helps prevent overfitting, in this case, 0.5 or 50%. This helps reduce reliance on specific features and encourages the model to learn more robust representations.
3.11. Multi-Layer Perceptron (MLP)
The GELU (Gaussian Error Linear Unit) activation function was used in the classification layer to predict the output.
4. Results
The dataset was experimented on using the standard CNN, pre-trained models, and vision transformer algorithms for image classification. The proposed model, which classifies recurring and non-recurring OKC, is better for the given dataset. Since the proposed model extends standard state-of-the-art vision transformer architecture, a comparison was made between the proposed model and the standard vision transformer model in the following section.
Table 1 describes the overall results of the performance metrics of various experiment models.
Table 2 gives all the hyperparameters used in the proposed model.
4.1. Confusion Matrix
A confusion matrix is a 2 × 2 table used in classification tasks to assess the performance of a machine learning model, as shown in
Figure 6 below.
4.2. ROC (Receiver Operating Characteristic) Curve
The area under the ROC curve (AUC) is a standard metric; a higher AUC indicates better model discrimination ability. The ROC is used to evaluate the performance of classification models. It plots the true positive rate (sensitivity) against the false positive rate (1-specificity) at various classification thresholds. The ROC curve illustrates how well the model distinguishes between positive and negative classes. The AUC score is a useful metric in situations where class imbalances exist in the dataset. It also provides a single value to compare different classifiers’ performance, making evaluating and choosing the best model for a given task easier. The score of 0.98 indicates the classifier’s performance is better compared with the standard ViT’s score of 0.94.
The ROC curve is given in
Figure 7 for the proposed model.
4.3. Training vs. Validation Loss Curve
The training vs. validation loss curve is a plot that shows the changes in the training and validation loss during the training process of a machine learning or deep learning model.
The training vs. validation loss curve was plotted with the epochs (training iterations) on the
x-axis and the corresponding loss values on the
y-axis. As the model was trained over multiple epochs, the training loss generally decreased because the model was learning to fit the training data better. However, the validation loss might behave differently. Initially, it decreased along with the training loss as the model generalized better. However, at some point, the validation loss started to increase. This indicates that the model was overfitting the training data, and its performance on the validation data was deteriorating, even though it improved on the training data. So, training should be stopped at this epoch, which is the 24th epoch in this study.
Figure 8 shows the curve.
4.4. Classification Report—ABISA
A classification report summarizes performance metrics for a classification model, typically presented in a tabular format. It includes key metrics such as precision, recall, F1-score, and accuracy for each class in a classification problem. This report provides insights into the model’s performance for individual classes, highlighting strengths and weaknesses. It is a valuable tool for evaluating the effectiveness of a classification model across different categories.
Table 3 describes the classification report for the proposed model.
4.5. Log Loss
Log loss (logarithmic loss) is a commonly used loss function for evaluating the accuracy of probabilistic classification models, such as logistic regression or neural networks, that predict probabilities for each class. It measures the discrepancy between predicted probabilities and target values, penalizing more significant deviations. Lower log loss values indicate better alignment between predicted probabilities and actual outcomes.
Table 4 describes a comparison of log loss among the models.
The proposed model has a log loss value of 0.13, indicating that the model’s predicted probabilities are pretty accurate and very close to the true labels. A log loss close to zero in binary classification indicates excellent performance, as the model’s predicted probabilities align well with the actual outcomes.
A standard ViT log loss value of 1.04 means that, on average, the model’s predicted probabilities are slightly far from the true labels. The model’s prediction confidence might be relatively low compared with the proposed model.
Table 5 describes different metrics for the proposed model.
4.6. Pipeline Result
The devised pipeline system aimed at forecasting whether an entire whole slide image (WSI) corresponds to recurring or non-recurring OKC instances. This pipeline system takes WSI as input, tiles the WSI in the pre-processing step, and then applies the proposed attention-based image sequence analyzer to classify the tiles as the recurring class or not. Based on the counts of predicted tiles in both classes, a designated threshold was formulated. This threshold was established considering factors like slide size and zoom level. While a 15% threshold was typically effective, smaller-sized slides (up to 500 megabytes) might necessitate a lower threshold. The threshold value was determined by collaborating with a pathologist with comprehensive knowledge of the slide scanner. To elaborate, this approach was tested across seven distinct slides, yielding the subsequent statistical outcomes. The pipeline testing considered a total of seven (four non-recurring and three recurring) WSIs (H and E-stained and scanned recurrent and non-recurrent WSIs); these seven WSIs were not used during the training and validation of the model. The prognostic accuracy of the framework in the presence of a pathologist was independently reported. Therefore, this model can be clinically applied to classify recurrent and non-recurrent OKCs.
Figure 9 describes the process flow.
Table 6 describes the sample whole slide image taken for pipeline experiments.
5. Discussion
This study performed different experiments on both the proposed and all state-of-the-art algorithms available for whole-slide image processing. Due to the massive size of the whole slide image, it was essential to split the images into multiple smaller images called tiled images. These images have a size of 2048, which helped to visualize these images with a standard computer and label them correctly by an expert pathologist. This manual label was a slightly longer process as each tiled image was labeled. During the experiment on this custom-labeled dataset, training was performed using standard available models like CNN, VGG16, VGG19, and Inception V3. Various model hyperparameters were tuned repeatedly during the experiment, and multiple experiments were carried out. The objective was to have high classification accuracy. The proposed ABISA model was built on top of the vision transformer model. With the increasing application of the vision transformer model into computer vision, here, the VIT architecture was customized to suit the OKC histopathology image dataset. Introducing the LSTM layer with a size of 32 after the self-attention block reduced the dimension from 64 to 32. This dimension reduction contributed to a significant reduction in the number of parameters as OKC was recurring, and non-recurring OKC was of smaller feature space. This model was more computationally efficient. The model was more parameter-efficient by reducing the dimensionality of the features before feeding them into the dense layers. This was particularly useful as fewer images were considered for training. Multiple experiments were conducted to arrive at an LSTM layer value of 32. The LSTM layer’s integration helped maintain a memory of past states while processing new patches. This was crucial for recognizing patterns that span across multiple patches in an image. It can learn to recognize object shapes, contours, and other sequential patterns contributing to image classification. The LSTM was beneficial when dealing with sequences of images where temporal relationships played a crucial role. Overall, by including an LSTM layer, this architecture creates a hybrid model that capitalizes on spatial and temporal dependencies. This integration is beneficial when working with images containing sequential patterns or aiming for higher-level feature abstraction. By training those OKC images with complete clinical variables and follow-up, the possibility of bias is eliminated. The trained model classified the risk of recurrence into recurrent and non-recurrent OKCs based on histopathological evaluation. Further validation was performed by blinding the images pooled from other centers to rule out false positives and negatives, meeting clinical standards.
6. Conclusions
The proposed pipeline for risk stratification of OKCs is a powerful tool to improve patients’ dental health. The proposed model significantly reduced (58%) trainable parameters compared with its peer state-of-the-art algorithms including vision transformer. A recall of 1.0 and precision of 0.96 suggested confidence in the proposed model for detecting recurring OKC correctly. These results stratified risks into correct groups of recurring or non-recurring OKC. This pipeline significantly boosts pathologists’ detection of recurring and non-recurring OKC. This model could be utilized locally or remotely based on any OKC WSI. The attention-based image sequence analyzer (ABISA) model reduces training time significantly and can detect an entire WSI in less than 10 min. Most of the time, the execution time depends on WSI size. The entire process was automated to use WSIs as input, concluding risk stratification for odontogenic keratocysts. Hence, managing such patients for dentists became comparatively smoother.
7. Future Work
Although the proposed model has a very high accuracy of detecting recurring and non-recurring OKC at the tile level, slide-level accuracy depends on the number of tiles having features of recurring and non-recurring. The dataset ignores blurred images and discards images that are not adequately stained. This study could help clinicians plan surgical management well in advance based on the automated h/p report. This adds to the advantage of adopting a conservative mode of treatment in our institute. Transformer-based architecture is rapidly gaining popularity in computer vision applications. Hence, this research can be suitably extended to different cancers where whole slide images are the gold standard for detecting cancer.
Author Contributions
R.S.R.: conceptualization and pathology, D.B.S.: research methodology and research guide, S.M.: algorithm definition and implementation. M.A.: dataset, C.C.: dataset, R.R.: dataset, S.S.: dataset, V.K.: dataset, P.K.: dataset. All authors have read and agreed to the published version of the manuscript.
Funding
This review was supported by the Government of India, Science and Engineering Research Board (SERB) Promoting Opportunities for Women in Exploratory Research (POWER) with grant number no: SPG/2020/000426-G.
Institutional Review Board Statement
This research was conducted at MS Ramaiah Dental College, MS Ramaiah University of Applied Sciences. Ethics approval was waived due to the study’s retrospective nature, and the collected slides were anonymized and encoded to eliminate any patient identifiers.
Informed Consent Statement
Not applicable.
Data Availability Statement
Restrictions apply to the availability of these data. Data was obtained from the mentioned centers and is available from the authors with permission.
Acknowledgments
We are very grateful to the oral pathologists of the Manipal College of Dental Sciences (MCODS), Manipal, Institute of Dental Science, Bareilly, S Nijalingappa Institute of Dental Sciences and Research, Rajapur, Gulbarga, Karnataka, Maratha Mandal’s Nathajirao G Halgekar, Institute of Dental Science & Research Centre, Belgaum, and SVS Institute of Dental Sciences, Mahbubnagar, Andhra Pradesh.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Kuroyanagi, N.; Sakuma, H.; Miyabe, S.; Machida, J.; Kaetsu, A.; Yokoi, M.; Maeda, H.; Warnakulasuriya, S.; Nagao, T.; Shimozato, K. Prognostic factors for keratocystic odontogenic tumor (odontogenic keratocyst): Analysis of clinicopathologic and immunohistochemical findings in cysts treated by enucleation. J. Oral Pathol. Med. 2009, 38, 386–392. [Google Scholar] [CrossRef] [PubMed]
- Radhakrishnan, R.; Chandrashekar, C.; Patel, P.; Thennavan, A. Odontogenic keratocyst: Analysis of recurrence by AgNOR, p53 and MDM2 profiling. J. Oral Maxillofac. Pathol. 2020, 24, 184–185. [Google Scholar] [CrossRef] [PubMed]
- Diniz, M.G.; Borges, R.; Guimarães, A.L.S.; Moreira, P.R.; Brito, J.A.R.; Gomez, M.V.; De Marco, L.; Gomez, R.S. PTCH1 isoforms in odontogenic keratocysts. Oral Oncol. 2009, 45, 291–295. [Google Scholar] [CrossRef] [PubMed]
- Cottom, H.E.; Bshena, F.I.; Speight, P.M.; Craig, G.T.; Jones, A.V. Histopathological features that predict the recurrence of odontogenic keratocysts. J. Oral Pathol. Med. 2011, 41, 408–414. [Google Scholar] [CrossRef] [PubMed]
- Augustine, D.; Rao, R.S.; Patil, S. Hyalinization as a histomorphological risk predictor in oral pathological lesions. J. Oral Biol. Craniofacial Res. 2021, 11, 415–422. [Google Scholar] [CrossRef] [PubMed]
- Keshani, F.; Jahanshahi, G.; Mirkazemi, Z.; Mirkazemi, H. Evaluating histopathological factors of predicting the recurrence rate of odontogenic keratocyst. Dent. Res. J. 2023, 20, 12. [Google Scholar] [CrossRef]
- Hou, L.; Samaras, D.; Kurc, T.M.; Gao, Y.; Davis, J.E.; Saltz, J.H. Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2424–2433. [Google Scholar] [CrossRef]
- Hossain, S.; Shahriar, G.M.; Syeed, M.M.M.; Uddin, M.F.; Hasan, M.; Shivam, S.; Advani, S. Region of interest (ROI) selection using vision transformer for automatic analysis using whole slide images. Sci. Rep. 2023, 13, 11314. [Google Scholar] [CrossRef] [PubMed]
- Rashmi, R.; Prasad, K.; Udupa, C.B.K. Breast histopathological image analysis using image processing techniques for diagnostic purposes: A methodological review. J. Med. Syst. 2021, 46, 1–24. [Google Scholar] [CrossRef] [PubMed]
- Fidele, N.; Yueyu, Z.; Zhao, Y.; Tianfu, W.; Liu, J.; Sun, Y.; Liu, B. Recurrence of odontogenic keratocysts and possible prognostic factors: Review of 455 patients. Med. Oral Patol. Oral Cir. Buccal 2019, 24, e491. [Google Scholar] [CrossRef] [PubMed]
- Aeffner, F.; Zarella, M.D.; Buchbinder, N.; Bui, M.M.; Goodman, M.R.; Hartman, D.J.; Lujan, G.M.; Molani, M.A.; Parwani, A.V.; Lillard, K.; et al. Introduction to Digital Image Analysis in Whole-slide Imaging: A White Paper from the Digital Pathology Association. J. Pathol. Inform. 2019, 10, 9. [Google Scholar] [CrossRef] [PubMed]
- Dimitriou, N.; Arandjelović, O.; Caie, P.D. Deep Learning for Whole Slide Image Analysis: An Overview. Front. Med. 2019, 6, 264. [Google Scholar] [CrossRef] [PubMed]
- Barker, J.; Hoogi, A.; Depeursinge, A.; Rubin, D.L. Automated classification of brain tumor type in whole-slide digital pathology images using local representative tiles. Med. Image Anal. 2016, 30, 60–71. [Google Scholar] [CrossRef] [PubMed]
- Shafqat, W.; Byun, Y.-C. A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems. IEEE Access 2022, 10, 11036–11047. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Yuan, L.; Chen, Y.; Wang, T.; Yu, W.; Shi, Y.; Jiang, Z.H.; Tay, F.E.; Feng, J.; Yan, S. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 558–567. [Google Scholar]
- Rao, R.S.; Shivanna, D.B.; Lakshminarayana, S.; Mahadevpur, K.S.; Alhazmi, Y.A.; Bakri, M.M.H.; Alharbi, H.S.; Alzahrani, K.J.; Alsharif, K.F.; Banjer, H.J.; et al. Ensemble Deep-Learning-Based Prognostic and Prediction for Recurrence of Sporadic Odontogenic Keratocysts on Hematoxylin and Eosin Stained Pathological Images of Incisional Biopsies. J. Pers. Med. 2022, 12, 1220. [Google Scholar] [CrossRef] [PubMed]
- Rao, R.S.; Shivanna, D.B.; Mahadevpur, K.S.; Shivaramegowda, S.G.; Prakash, S.; Lakshminarayana, S.; Patil, S. Deep Learning-Based Microscopic Diagnosis of Odontogenic Keratocysts and Non-Keratocysts in Haematoxylin and Eosin-Stained Incisional Biopsies. Diagnostics 2021, 11, 2184. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).