Next Article in Journal
The Influence of Telomere-Related Gene Variants, Serum Levels, and Relative Leukocyte Telomere Length in Pituitary Adenoma Occurrence and Recurrence
Previous Article in Journal
Evaluating the Impact of Bowel Gas Variations for Wilms’ Tumor in Pediatric Proton Therapy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Keeping Pathologists in the Loop and an Adaptive F1-Score Threshold Method for Mitosis Detection in Canine Perivascular Wall Tumours

1
Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK
2
Surrey DataHub, University of Surrey, Guildford GU2 7AL, UK
3
School of Veterinary Medicine, University of Surrey, Guildford GU2 7AL, UK
4
Department of Veterinary Medical Sciences, University of Bologna, 40126 Bologna, Italy
5
AURA Veterinary, Guildford GU2 7AJ, UK
6
Department of Comparative, Diagnostic and Population Medicine, College of Veterinary Medicine, University of Florida, Gainesville, FL 32611, USA
7
Department of Diagnostic Pathology and Pathobiology, Kansas State University, Manhattan, KS 66506, USA
8
Department of Computer Science, University of Surrey, Guildford GU2 7XH, UK
9
National Physical Laboratory, London TW11 0LW, UK
10
School of Biosciences, University of Surrey, Guildford GU2 7XH, UK
*
Author to whom correspondence should be addressed.
Cancers 2024, 16(3), 644; https://doi.org/10.3390/cancers16030644
Submission received: 30 November 2023 / Revised: 17 January 2024 / Accepted: 29 January 2024 / Published: 2 February 2024
(This article belongs to the Section Methods and Technologies Development)

Abstract

:

Simple Summary

Performing a mitosis count (MC) is essential in grading canine Soft Tissue Sarcoma (cSTS) and canine Perivascular Wall Tumours (cPWTs), although it is subject to inter- and intra-observer variability. To enhance standardisation, an artificial intelligence mitosis detection approach was investigated. A two-step annotation process was utilised with a pre-trained Faster R-CNN model, refined through veterinary pathologists’ reviews of false positives, and subsequently optimised using an F1-score thresholding method to maximise accuracy measures. The study achieved a best F1-score of 0.75, demonstrating competitiveness in the field of canine mitosis detection.

Abstract

Performing a mitosis count (MC) is the diagnostic task of histologically grading canine Soft Tissue Sarcoma (cSTS). However, mitosis count is subject to inter- and intra-observer variability. Deep learning models can offer a standardisation in the process of MC used to histologically grade canine Soft Tissue Sarcomas. Subsequently, the focus of this study was mitosis detection in canine Perivascular Wall Tumours (cPWTs). Generating mitosis annotations is a long and arduous process open to inter-observer variability. Therefore, by keeping pathologists in the loop, a two-step annotation process was performed where a pre-trained Faster R-CNN model was trained on initial annotations provided by veterinary pathologists. The pathologists reviewed the output false positive mitosis candidates and determined whether these were overlooked candidates, thus updating the dataset. Faster R-CNN was then trained on this updated dataset. An optimal decision threshold was applied to maximise the F1-score predetermined using the validation set and produced our best F1-score of 0.75, which is competitive with the state of the art in the canine mitosis domain.

1. Introduction

Canine Soft Tissue Sarcoma (cSTS) is a heterogeneous group of mesenchymal neoplasms (tumours) that arise in connective tissue [1,2,3,4,5,6]. cSTS is more prevalent in middle-age to older and medium to large-sized breeds with the median reported age of diagnosis between 10 and 11 years old [3,7,8,9,10]. The anatomical site of cSTS can vary considerably, but it is mostly found in the cutaneous and subcutaneous tissues [9]. In human Soft Tissue Sarcoma (STS), histological grade is an important prognostic factor and one of the most validated criteria to predict outcome following surgery in canines [10,11,12,13]. General treatment consists of surgically removing these cutaneous and subcutaneous sarcomas. Nevertheless, it is the higher-grade tumours that can be problematic, as their aggressiveness can reduce treatment options and result in a poorer prognosis. The focus of this study was on one common subtype found in dogs: canine Perivascular Wall Tumours (cPWTs). Canine Perivascular Wall Tumours (cPWTs) arise from vascular mural cells and are often recognisable from their vascular growth patterns [14,15].
The scoring for cSTS grading is broken down into three major criteria: the mitotic count, differentiation and the level of necrosis [9]. Mitosis counting can be exposed to high inter-observer variability [16], depending on the expertise of the pathologist; however, the counting of mitotic figures is considered the most objective factor in comparison to tumour necrosis and cellular differentiation when grading cSTS [16]. It is routine practise to investigate mitosis using 40× magnification; however, manual investigation at such high-powered fields (HPFs) is a laborious task that is prone to error, thus leading to the previously discussed inter-observer variability phenomenon.
For the purposes of this study, the focus was on creating a mitosis detection model as it is a significant criterion from the cSTS histological grading system [13] where the density of mitotic figures is also considered highly correlated with tumour proliferation [17]. Mitosis detection has been pursued in the computer vision domain since the 1980s [18]. Before 2010, relatively few studies aimed to automate mitosis detection [19,20,21]. However, since the MITOS 2012 challenge [22], there has been a resurgence of interest. Mitosis detection can often be considered as an object detection problem [23]. Rather than categorising entire images as in image classification tasks, object detection algorithms present object categories inside the image along with an axis-aligned bounding box, which in turn indicates the position and scale of each instance of the object category. In the case of mitosis detection, the considered objects are mitotic figures. As a result, several approaches have used object detection-related algorithms for mitosis detection. An example of an object detection algorithm is the regions-based convolutional neural network (R-CNN) [24]. At first, a selective search is performed on the input image to propose candidate regions, and then the CNN is used for feature extraction. These feature vectors are used for training in bounding box regression. There have been many developments on this type of architecture such as Fast R-CNN [25] and Faster R-CNN [26], which is the primary object detection model used in this work. One set of authors detected mitosis using a variant of the Faster R-CNN (MITOS-RCNN), achieving an F-measure score of 0.955 [27].
Several challenges have been held in order to find novel and improved approaches for mitosis detection [17,22,23,28,29]. Some of these challenges and research on mitosis detection methods have also been conducted using tissue from the canine domain [30,31,32,33].
It was made apparent by the collaborating pathologists that AI approaches for grading tasks in cSTS were desirable, and so this study aims to tackle one criterion, which is to develop methods for mitosis detection in a subtype of cSTS: cPWT. To the best of our knowledge, this is the first work in the automated detection of mitoses in cPWTs.

2. Materials and Methods

2.1. Data Description and Annotation Process

A set of canine Perivascular Wall Tumour (cPWT) slides were obtained from the Department of Microbiology, Immunology and Pathology, Colorado State University. A senior veterinary pathologist at the University of Surrey confirmed the grade of each case (patient) and chose a representative histological slide for each patient. These histological slides were digitised using a Hamamatsu NDP Nanozoomer 2.0 HT slide scanner. A digital Whole Slide Image (WSI) was created via scanning under 40× magnification (0.23 µm/pixel) with a scanning speed of approximately 150 s at 40× mode (15 mm × 15 mm).
Veterinary pathologists independently annotated the WSIs for mitosis using the open-source Automated Slide Analysis Platform (ASAP) software (https://www.computationalpathologygroup.eu/software/asap/, accessed on 28 January 2024) [34]. The pathologists used different magnifications (ranging from 10× to 40×) to analyse the mitosis before creating mitosis annotations. These annotations were centroid coordinates, which were centered on the suspecting mitotic candidate. Centroid coordinate annotations can be considered as weak annotations as they are simply coordinates placed in the centre of a mitotic figure and not fine-grained pixel-wise annotations around the mitosis. In order to categorise a mitotic figure, both pathologist annotators needed to form an agreement on the mitotic candidate. As these were centroid coordinates, an agreement was determined when two independent centroid annotations from each annotator were overlaid on one another. Any centroid annotations without agreement were dismissed from being considered as a mitotic figure. Table 1 shows the differences between the two annotators for both training and validation when counting mitotic figures in our cPWT dataset.
For patch extraction, downsized binary image masks (by a factor of 32) were generated, depicting tissue from the biopsy samples against background slide glass. A tissue threshold of 0.75 was applied to 512 × 512 patches for final patch extraction. Therefore, if a patch contained less than 75% of any tissue, it was dismissed from the dataset. This was to ensure that the patches contained relevant information for mitosis object detection.
The test set consisted of patches extracted from high-powered fields (HPFs) determined by the pathologist annotators. To replicate real-world test data, our collaborating pathologists selected 10 continuous non-overlapping HPFs from each WSI. The size of this area was determined by loosely following the Elston and Ellis [35] criteria of an area size of 2.0 mm2. For 20× magnification (level 1 in the WSI pyramid), the width of the 10 HPFs was 4096 pixels and the height was 2560 pixels. This produced 40 non-overlapping patches of 512 × 512 pixels, thus producing a dataset of 440 patch images from the 11 hold-out test WSIs at 20× magnification. Only patches containing mitosis were used for training and validation, whereas for testing, all extracted patches were evaluated. Details on the number of mitosis per slide in training/validation and test sets are provided in Appendix Table A1 and Table A2, respectively. Details on the number of patches used for training/validation and testing for 40× magnification is provided in Appendix Table A3. Details on the number of patches used for training/validation and testing for 20× magnification is provided in Appendix Table A4.

2.2. Object Detection and Keeping the Pathologist in the Loop for Dataset Refinement

Mitosis detection is generally considered an object detection problem [23]; For this study, we used a Faster R-CNN model [26]. We initialised a Faster R-CNN model with pre-trained COCO [36] weights with the ResNet-50 head pre-trained on ImageNet. The model was fine-tuned, updating all parameters of the model using our dataset. Preliminary experiments suggested using a learning rate of 0.01 and SGD to be used as the optimiser. A batch size of 4 was also used for these experiments. Training was implemented for 30 epochs, where the the model with the lowest validation loss was saved for final evaluation. Faster R-CNN is jointly trained with four different losses; two for the RPN and two for the Fast R-CNN. These losses are RPN classification loss (for distinguishing between foreground and background), RPN regression loss (for determining differences between the regression of the foreground bounding box and ground truth bounding box), the Fast R-CNN classification loss (for object classes) and Fast R-CNN bounding box regression (used to refine the bounding box coordinates). Therefore, in our implementation of determining the lowest validation loss, at every epoch, each loss type was considered equally. We implemented 3-fold cross-validation at the patient (WSI) level to test the veracity and robustness of our approach with the training data split into three folds for training and validation. We also used an unseen hold-out test set for final evaluation and for a fair comparison of all three folds. The training, validation and hold-out test splits for each fold are depicted in Appendix Table A5.
Furthermore, as most mitotic figures from the same tissue type are generally of a similar size (dependent on the stage of mitosis, staining techniques, and slide quality), we opted to use the default anchor generator sizes provided by the PyTorch implementation of Faster R-CNN. These sizes were 32, 64, 128, 256 and 512 with aspect ratios of 0.5, 1.0 and 2.0. See Figure 1 for a depiction of the Faster R-CNN applied to the cPWT mitosis detection problem.
During the evaluation inference, non-maximum suppression (NMS) with an IoU value of 0.1 was applied as a post-processing step to remove low-scoring otherwise redundant overlapping bounding boxes. This post-processing method is also consistent with other mitosis detection methods in the literature [38,39].
In object detection, mean average precision (mAP) is typically used to evaluate the performance of a model depending on the task or dataset [40,41,42,43]. However, we opted to use the F1-score in order to compare our results to mitosis detection approaches in the literature. The F1-score was computed globally for each fold; thus, it was applied and determined for the entire dataset of interest. True positive (TP) detections were computed if there was an IoU of >= 0.5 between the ground truth and proposed candidate detections. Anything that did not meet the IoU threshold was considered a false positive (FP) detection. Any missed ground truth detections were considered false negatives (FNs). As a result, we could also generate the F1-score. The F1-score can be considered the harmonic mean between the precision and recall (sensitivity). Both precision (Equation (1)) and sensitivity (Equation (2)) contribute equally to the F1-score (Equation (3)):
P r e c i s i o n = T P T P + F P
S e n s i t i v i t y = T P T P + F N
F 1 = 2 * S e n s i t i v i t y * P r e c i s i o n S e n s i t i v i t y + P r e c i s i o n
where TP, FP and FN are true positives, false positives and false negatives, respectively.
The models were implemented in Python, using the PyTorch deep learning framework. The hardware and resources available for implementation used a Dell T630 system, which included 2 Intel Xeon E5 v4 series 8-Core CPUs with 3.2 GHz, 128 GB of RAM (Dell Corporation Limited, London, UK), and 4 nVidia Titan X (Pascal, Compute 6.1, Single Precision) GPUs.
The mitosis annotation process is an exhaustive and arduous process, and thus the initial annotation process may be suboptimal due to the vast area annotators needed to examine mitotic candidates. Taking inspiration from Bertram et al. [33], we used our deep learning object detection models from these experiments to refine the dataset (see Figure 2). We hypothesised that many of the FP candidates may have been incorrectly labelled. Our collaborating pathologists reviewed all the FP candidates (irregardless of class score) from each validation fold and the hold-out test set and determined which candidates were mislabeled. As a result, we were able to formulate additional ground truth mitoses for use in the final set of experiments.

2.3. Adaptive F1-Score Threshold

For this method, the Faster R-CNN object detector was trained on detecting mitotic candidates using the refined (updated) dataset. The same training hyperparameters as described earlier were applied; however, we lowered the number of epochs. It was observed that the models found their optimal validation loss by epoch 7 across all three folds in the initial experiment runs. Therefore, to ensure optimality, we chose 12 epochs for training, again using the lowest validation loss as determining the “best” model. The trained Faster R-CNN model outputs potential mitosis candidates, but it also outputs probability scores relating to the strength of the object prediction. These scores ranged from 0 to 1, where 1 would highlight the model is 100% certain that the candidate is mitosis and 0.01 would describe a prediction that is very low in confidence. We optimised our models based on the F1-score [44,45,46]. The probability thresholds t ranged from 0.01 to 1, and so choosing the optimal threshold T for the F1-score F 1 can be represented formally as:
T = arg max t F 1 ( t )
We determined the optimal F1-score threshold value using the validation set and applied this threshold value to the final evaluation on the hold-out test set. Figure 3 demonstrates the entire workflow of this method from the creation of the updated mitosis dataset where the pathologists reviewed all the FP candidates all the way to the adaptive F1-score thresholds applied to the mitosis candidate predictions.

3. Results

The pathologists-in-the-loop approach for dataset refinement was first applied as demonstrated by Figure 2. In a preliminary investigation, two magnifications (40× and 20×) were used to determine the best resolution for our for our task (see Table 2).
Table A6 and Table A7 show the differences in mitotic candidate numbers before and after refinement (second review) for the training/validation and test sets, respectively. The first set of results from the optimised Faster R-CNN approach is depicted in Table 3. This shows a comparison of performance of the Faster R-CNN trained on the initial mitosis dataset and the updated refined mitosis dataset. It is apparent that sensitivities have improved for all folds when using the updated refined dataset; however, in some cases, such as in fold-1 validation, fold-3 validation and fold-3 test, we can see that the F1-score is lower due to a decrease in precision scores. This could be due to the updated refined dataset containing more difficult examples for the effective mitosis object detection training. The previous initial dataset may have contained more obvious mitosis examples and thus was predicting detections that closely resembled these obvious examples. Table 4 shows the Faster R-CNN results before and after F1-score thresholding was applied on the models trained using the updated mitosis dataset. The thresholds were predetermined on the validation set for each fold using Equation (4) (see Figure 4). When applying the optimal thresholds, we saw large improvements in the F1-score, which were largely due to an improvement in precision because of a reduction in FPs. This was seen on the test set with an F1-score of 0.402 to 0.750. However, this increase in precision came at the expense of some sensitivity across all three folds, where for example on the test set the mean sensitivity for all three folds reduced from 0.952 to 0.803. Nevertheless, the depreciation in sensitivity does not offset the increase in precision, where sensitivity decreased by 14.9 % and precision increased by 45.2 %. This suggests that the majority of TP detections prior to the adaptive F1-score thresholding are of a high probability confidence compared to the FP detections.

4. Discussion

This study has demonstrated a method for mitosis detection in cPWT WSIs using a Faster R-CNN object detection model, an adaptive F1-score thresholding feature on output probabilities and the refinement of a mitotic figures dataset by keeping pathologists in the loop.
Many approaches in the literature use the highest resolution images for their object detection methods (typically at 40× objective); however, we preliminarily found that 20× magnification was beneficial for our task and the dataset provided, as shown in Table 2. Nevertheless, this warrants a further investigation and additional discussions with the collaborating pathologists, who may provide reasoning as to why certain candidates were classed as mitosis at different resolutions.
Initially, solely using the outputs from a Faster R-CNN model produced promising results generating high sensitivities; however, these outputs required further post-processing to improve precision. Applying adaptive F1-score thresholds, where the optimal values were predetermined on the validation set and applied to the test set, demonstrated an effective method of reducing the number of FP predictions. This ultimately resulted in dramatically increasing the F1-score due to a stark increase in precision. However, this came at a small expense of sensitivity. Nevertheless, the rate of change of the sensitivity and the precision are not equal with the latter vastly improving. This suggests that the majority of FP detections are of lower probability confidence compared to TP detections.
Multi-stage (typically dual-stage) approaches have also become increasingly prevalent over the years where they typically take the form of selecting mitotic candidates in the first stage and then apply another classifier in the second stage [32,33,47,48,49]. Although not reflected in the main findings of this study, we attempted to use a second-stage classifier (Figure A1) on mitotic candidates to classify between TP and hard FPs to no avail (see results of the two-stage approach in Table A8 and its subsequent ROC curves in Figure A2). Most machine learning methods require large datasets for effective training, which in this case was not available once optimisation was applied using the adaptive F1-score threshold method. One could train models using the non-thresholded detections; however, this would result in a model that is able to distinguish between true positive mitosis and mostly obvious FP candidates. By applying the adaptive F1-score thresholding method, we constrained the dataset and attempted to learn differences between TP and high confidence hard false positive detections, but we did not provide an adequately large dataset for training. Figure 5 depicts a 512 × 512 pixel image in the test set, highlighting FN and FP detection.
Different phases and other biological phenomenon could influence the size of the mitosis region of interest. Going forward, it may also be worth labelling mitosis in regard to the phases and thus creating a multi-class problem rather than binary, as shown in this study. As a consequence, the size of the ground truth bounding boxes could also be varied depending on the target phase being classified. Nonetheless, the models were still able to predict the vast majority of mitosis in these phases.
It must be further denoted that the methodology is applied to only patches from HPFs containing mitosis that were annotated by the collaborating pathologists. Therefore, we propose expanding our dataset to include a broader range of sections, including those not initially marked by pathologists, to evaluate and enhance our model’s generalisability. The data should include labels for areas containing tumour and non-tumour tissue to fully consider the overall impact of this mitosis detection method.
Our focus for this study is on cPWT; however, we could potentially adapt this method to other cSTS subtypes as well as to other tumour types. An additional study might explore the application of cPWT-trained models to different cSTS subtypes to assess if comparable outcomes are achieved. Nevertheless, given that tumour types from various domains exhibit unique challenges due to their specific histological characteristics, it may be necessary to train or fine-tune models using tumour-specific datasets to evaluate the efficacy of this approach.
While our F1-score demonstrates competitive performance for detecting mitosis in the canine domain, the clinical relevance and applicability of this metric should be taken into account. Future work should focus on employing this method as a supportive tool, assessing its practical effectiveness and reliability in a veterinary clinical setting.
To conclude, by using our experimental set-up, the optimised Faster R-CNN model was a suitable method for determining mitosis in cPWT WSIs. To the best of our knowledge, this is the first mitosis detection model applied solely on cPWT data, and thus we consider this a baseline three-fold cross-validation mean F1-score of 0.750 for mitosis detection in cPWT.

Author Contributions

T.R. conducted all experiments. N.J.B., T.A., M.J.D., A.M. and B.B. assisted in the data collection and image capture process. A.M. and B.B. conducted the annotations process. T.R., R.M.L.R., K.W. and S.A.T. analysed the results. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Doctoral College, University of Surrey (UK), National Physical Laboratory (UK) and Zoetis.

Institutional Review Board Statement

For this study, a Non-Animals Scientific Procedures Act 1986 (NASPA) form was approved by the University of Surrey (approval number NERA-1819-045).

Informed Consent Statement

Not applicable.

Data Availability Statement

The Whole Slide Images used in this study are available from the corresponding author on reasonable request. Code and annotations are currently unavailable.

Conflicts of Interest

Zoetis funded part of the original study. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
WSIWhole Slide Images
cPWTCanine Perivascular Wall Tumours
cSTSCanine Soft Tissue Sarcoma
HPFHigh-Powered Fields
FPFalse Positive
TPTrue Positive
TNTrue Negative
FNFalse Negative
ROCReceiver Operating Characteristic
CNNConvolutional Neural Network
R-CNNRegion-Based Convolutional Neural Network
ASAPAutomated Slide Analysis Platform
NMSNon-Maximum Suppression
IoUIntersection over Union
mAPMean Average Precision
RPNRegion Proposal Network

Appendix A

Appendix A.1

Table A1. Two WSI magnification resolutions (40× and 20×) were initially investigated for determining a suitable resolution for mitosis detection using our cPWT dataset. Therefore, two separate datasets of the two resolutions were extracted. The 10 HPF size at 40× magnification (level 0 of the WSI pyramid) resulted in a width of 7680 pixels and height of 5120 pixels. In terms of physical distance, this is a width of 1.753 mm and height of 1.169 mm. When rounded to 1 decimal place, this approximately represents an aspect ratio of 3:2. When extracting 512 × 512 pixels from this area of interest, we ended up with 150 patches. This produced 150 non-overlapping patches of 512 × 512 pixels, producing a test dataset of 1650 patch images from 11 hold-out test WSIs. The details for the 20× dataset are in text. Presented below are the number of mitosis annotations per Whole Slide Image (WSI) for both 40× and 20× magnifications in the training/validation set.
Table A1. Two WSI magnification resolutions (40× and 20×) were initially investigated for determining a suitable resolution for mitosis detection using our cPWT dataset. Therefore, two separate datasets of the two resolutions were extracted. The 10 HPF size at 40× magnification (level 0 of the WSI pyramid) resulted in a width of 7680 pixels and height of 5120 pixels. In terms of physical distance, this is a width of 1.753 mm and height of 1.169 mm. When rounded to 1 decimal place, this approximately represents an aspect ratio of 3:2. When extracting 512 × 512 pixels from this area of interest, we ended up with 150 patches. This produced 150 non-overlapping patches of 512 × 512 pixels, producing a test dataset of 1650 patch images from 11 hold-out test WSIs. The details for the 20× dataset are in text. Presented below are the number of mitosis annotations per Whole Slide Image (WSI) for both 40× and 20× magnifications in the training/validation set.
SlideAgreement after Threshold (40×)Agreement after Threshold (20×)
F17-047732118
F17-031413538
F17-12614141
F18-13364437437
F17-02232215216
F17-049113737
F17-0549110106
F17-0115772323
F17-011777217212
F17-038556968
F17-049007575
F18-7832335331
F17-09700138134
F17-026414041
F17-099266162
F17-027233839
F17-059354241
F17-021204342
F18-797058584
Total:20622045
Table A2. The number of mitosis annotations in 10 continuous high-powered fields (HPFs) from each Whole Slide Image (WSI) for both 40× and 20× magnifications in the hold-out test set.
Table A2. The number of mitosis annotations in 10 continuous high-powered fields (HPFs) from each Whole Slide Image (WSI) for both 40× and 20× magnifications in the hold-out test set.
SlideAgreement after Threshold (40×)Agreement after Threshold (20×)
F17-063481213
F17-01034811
F17-01149022
F19-036155458
F17-052562535
F17-0857011
F19-740833
F18-25081011
F17-075101717
F17-0803155
F17-026011
Total:131147
Table A3. The number of patches per Whole Slide Image (WSI) in the train/validation and test sets for patches extracted from level 0 (40× magnification) of the WSI.
Table A3. The number of patches per Whole Slide Image (WSI) in the train/validation and test sets for patches extracted from level 0 (40× magnification) of the WSI.
SetSlideNo. of Patches
Train/ValF18-7832305
Train/ValF17-02232208
Train/ValF17-011777206
Train/ValF17-09700131
Train/ValF17-0549105
Train/ValF18-7970581
Train/ValF17-0385568
Train/ValF17-0992660
Train/ValF17-126141
Train/ValF17-0212043
Train/ValF17-0314135
Train/ValF17-0593537
Train/ValF17-0272337
Train/ValF17-01157721
Train/ValF18-13364401
Train/ValF17-0490072
Train/ValF17-0264140
Train/ValF17-0491137
Train/ValF17-0477321
Total1949
TestF17-05256150
TestF17-02600150
TestF17-07510150
TestF17-08031150
TestF17-011490150
TestF17-006348150
TestF19-03615150
TestF17-010348150
TestF18-2508150
TestF17-08570150
TestF19-7408150
Total1650
Table A4. The number of patches per Whole Slide Image (WSI) in the train/validation and test sets for patches extracted from level 1 (20× magnification) of the WSI.
Table A4. The number of patches per Whole Slide Image (WSI) in the train/validation and test sets for patches extracted from level 1 (20× magnification) of the WSI.
SetSlideNo. of Patches
Train/ValF18-7832251
Train/ValF17-02232189
Train/ValF17-011777179
Train/ValF17-09700125
Train/ValF17-054997
Train/ValF18-7970576
Train/ValF17-0385566
Train/ValF17-0992657
Train/ValF17-126140
Train/ValF17-0212040
Train/ValF17-0314137
Train/ValF17-0593535
Train/ValF17-0272334
Train/ValF17-01157720
Train/ValF18-13364339
Train/ValF17-0490071
Train/ValF17-0264140
Train/ValF17-0491136
Train/ValF17-0477318
Total1750
TestF17-0525640
TestF17-0260040
TestF17-0751040
TestF17-0803140
TestF17-01149040
TestF17-00634840
TestF19-0361540
TestF17-01034840
TestF18-250840
TestF17-0857040
TestF19-740840
Total440
Table A5. The training, validation and hold-out test splits for each fold in the dataset.
Table A5. The training, validation and hold-out test splits for each fold in the dataset.
SlideFold 1Fold 2Fold 3
F17-04773ValTrainTrain
F17-03141TrainTrainVal
F17-1261TrainValTrain
F18-13364ValTrainTrain
F17-02232TrainTrainVal
F17-04911ValTrainTrain
F17-0549TrainTrainVal
F17-011577TrainTrainVal
F17-011777TrainValTrain
F17-03855TrainTrainVal
F17-04900ValTrainTrain
F18-7832TrainValTrain
F17-09700TrainValTrain
F17-02641ValTrainTrain
F17-09926TrainValTrain
F17-02723TrainTrainVal
F17-05935TrainValTrain
F17-02120TrainTrainVal
F18-79705TrainValTrain
F17-06348TestTestTest
F17-010348TestTestTest
F17-011490TestTestTest
F19-03615TestTestTest
F17-05256TestTestTest
F17-08570TestTestTest
F19-7408TestTestTest
F18-2508TestTestTest
F17-07510TestTestTest
F17-08031TestTestTest
F17-0260TestTestTest
Table A6. The updated agreed mitosis between annotator 1 and 2 for the training/validation sets. The “Agreement” column shows the number of ground truth agreed mitosis annotations for the 20× magnification dataset before refinement. “Updated Agreement” shows the number of mitosis after refinement. “Missed Mitosis” shows the difference in numbers of mitosis before and after refinement. Lastly, “% Missed Mitosis” shows the difference in percentage of mitosis before and after refinement against the updated agreed mitotic count.
Table A6. The updated agreed mitosis between annotator 1 and 2 for the training/validation sets. The “Agreement” column shows the number of ground truth agreed mitosis annotations for the 20× magnification dataset before refinement. “Updated Agreement” shows the number of mitosis after refinement. “Missed Mitosis” shows the difference in numbers of mitosis before and after refinement. Lastly, “% Missed Mitosis” shows the difference in percentage of mitosis before and after refinement against the updated agreed mitotic count.
SlideAgreementUpdated Agreement“Missed” Mitosis% “Missed” Mitosis
F17-04773181800.00
F17-03141383912.56
F17-1261414100.00
F18-13364437460235.00
F17-02232216236208.47
F17-04911373925.13
F17-054910611265.36
F17-011577232414.17
F17-011777212227156.61
F17-03855687134.23
F17-04900757611.32
F18-7832331350195.43
F17-0970013413842.90
F17-02641414212.38
F17-09926626200.00
F17-02723393900.00
F17-05935414548.89
F17-02120424424.55
F18-79705849177.69
Total:20452154109Avg: 3.93
Table A7. The updated agreed mitosis between annotator 1 and 2 for the hold-out test set. The “Agreement” column shows the number of ground truth agreed mitosis annotations for the 20× magnification dataset before refinement. “Updated Agreement” shows the number of mitosis after refinement. “Missed Mitosis” shows the difference in numbers of mitosis before and after refinement. Lastly, “% Missed Mitosis” shows the difference in percentage of mitosis before and after refinement against the updated agreed mitotic count.
Table A7. The updated agreed mitosis between annotator 1 and 2 for the hold-out test set. The “Agreement” column shows the number of ground truth agreed mitosis annotations for the 20× magnification dataset before refinement. “Updated Agreement” shows the number of mitosis after refinement. “Missed Mitosis” shows the difference in numbers of mitosis before and after refinement. Lastly, “% Missed Mitosis” shows the difference in percentage of mitosis before and after refinement against the updated agreed mitotic count.
SlideAgreementUpdated Agreement“Missed” Mitosis% “Missed” Mitosis
F17-063481316318.75
F17-01034815480.00
F17-01149023133.33
F19-0361558812328.40
F17-052563539410.26
F17-085701100.00
F19-74083300.00
F18-25081116531.25
F17-075101726934.62
F17-080315500.00
F17-02601100.00
Total:14719649Avg: 21.51
Figure A1. A depiction of the two-stage mitosis detection approach. On the top, in stage 1, 20× magnification images and annotations from the updated refined mitoses dataset are used to train a Faster R-CNN model (the model is also presented in Figure 1). Optimal probability thresholds are applied on the output candidates, which are determined from the validation set (based on Equation (4)). These selected candidates are then extracted (size 64 × 64 pixels) at 40× magnification from the original Whole Slide Images (WSIs) and passed into the second stage. On the bottom shows stage 2 where the extracted patches are fed into a DenseNet-161 ImageNet pre-trained feature extractor, where the outputs are fed into a logistic regression classifier to determine whether the candidates are mitosis or difficult false positives.
Figure A1. A depiction of the two-stage mitosis detection approach. On the top, in stage 1, 20× magnification images and annotations from the updated refined mitoses dataset are used to train a Faster R-CNN model (the model is also presented in Figure 1). Optimal probability thresholds are applied on the output candidates, which are determined from the validation set (based on Equation (4)). These selected candidates are then extracted (size 64 × 64 pixels) at 40× magnification from the original Whole Slide Images (WSIs) and passed into the second stage. On the bottom shows stage 2 where the extracted patches are fed into a DenseNet-161 ImageNet pre-trained feature extractor, where the outputs are fed into a logistic regression classifier to determine whether the candidates are mitosis or difficult false positives.
Cancers 16 00644 g0a1
Table A8. Results from the stage 2 logistic regression model. Across all fold datasets, the sensitivity has dramatically decreased, and it is offset with a large increase in precision when compared to the results in Table 4. The mean average F1-scores for the validation and test sets are 0.654 and 0.611, respectively.
Table A8. Results from the stage 2 logistic regression model. Across all fold datasets, the sensitivity has dramatically decreased, and it is offset with a large increase in precision when compared to the results in Table 4. The mean average F1-scores for the validation and test sets are 0.654 and 0.611, respectively.
FoldSetSensitivityPrecisionF1-ScoreTPFPFN
21Val0.5610.9060.69335637279
Test0.5260.8440.6481031993
2Val0.4870.9060.63446548489
Test0.5920.7730.6711163480
3Val0.4870.9200.63727524290
Test0.3670.8570.5147212124
Avg. (mean)Val0.5120.9110.654
Test0.4950.8250.611
Figure A2. Receiver operating characteristic (ROC) curve plots from the second-stage logistic regression model results for each cross-validation fold. For each fold, it is evident that the models do not effectively learn the differences between true positive (TP) and false positive (FP) mitosis detections.
Figure A2. Receiver operating characteristic (ROC) curve plots from the second-stage logistic regression model results for each cross-validation fold. For each fold, it is evident that the models do not effectively learn the differences between true positive (TP) and false positive (FP) mitosis detections.
Cancers 16 00644 g0a2

References

  1. Bostock, D.; Dye, M. Prognosis after surgical excision of canine fibrous connective tissue sarcomas. Vet. Pathol. 1980, 17, 581–588. [Google Scholar] [CrossRef] [PubMed]
  2. Dernell, W.S.; Withrow, S.J.; Kuntz, C.A.; Powers, B.E. Principles of treatment for soft tissue sarcoma. Clin. Tech. Small Anim. Pract. 1998, 13, 59–64. [Google Scholar] [CrossRef] [PubMed]
  3. Ehrhart, N. Soft-tissue sarcomas in dogs: A review. J. Am. Anim. Hosp. Assoc. 2005, 41, 241–246. [Google Scholar] [CrossRef] [PubMed]
  4. Mayer, M.N.; LaRue, S.M. Soft tissue sarcomas in dogs. Can. Vet. J. 2005, 46, 1048. [Google Scholar] [PubMed]
  5. Cavalcanti, E.B.; Gorza, L.L.; de Sena, B.V.; Sossai, B.G.; Junior, M.C.; Flecher, M.C.; Marcolongo-Pereira, C.; dos Santos Horta, R. Correlation of Clinical, Histopathological and Histomorphometric Features of Canine Soft Tissue Sarcomas. Braz. J. Vet. Pathol. 2021, 14, 151–158. [Google Scholar] [CrossRef]
  6. Torrigiani, F.; Pierini, A.; Lowe, R.; Simčič, P.; Lubas, G. Soft tissue sarcoma in dogs: A treatment review and a novel approach using electrochemotherapy in a case series. Vet. Comp. Oncol. 2019, 17, 234–241. [Google Scholar] [CrossRef] [PubMed]
  7. Stefanello, D.; Avallone, G.; Ferrari, R.; Roccabianca, P.; Boracchi, P. Canine cutaneous perivascular wall tumors at first presentation: Clinical behavior and prognostic factors in 55 cases. J. Vet. Intern. Med. 2011, 25, 1398–1405. [Google Scholar] [CrossRef]
  8. Chase, D.; Bray, J.; Ide, A.; Polton, G. Outcome following removal of canine spindle cell tumours in first opinion practice: 104 cases. J. Small Anim. Pract. 2009, 50, 568–574. [Google Scholar] [CrossRef]
  9. Dennis, M.; McSporran, K.; Bacon, N.; Schulman, F.; Foster, R.; Powers, B. Prognostic factors for cutaneous and subcutaneous soft tissue sarcomas in dogs. Vet. Pathol. 2011, 48, 73–84. [Google Scholar] [CrossRef]
  10. Bray, J.P.; Polton, G.A.; McSporran, K.D.; Bridges, J.; Whitbread, T.M. Canine soft tissue sarcoma managed in first opinion practice: Outcome in 350 cases. Vet. Surg. 2014, 43, 774–782. [Google Scholar] [CrossRef]
  11. Kuntz, C.; Dernell, W.; Powers, B.; Devitt, C.; Straw, R.; Withrow, S. Prognostic factors for surgical treatment of soft-tissue sarcomas in dogs: 75 cases (1986–1996). J. Am. Vet. Med. Assoc. 1997, 211, 1147–1151. [Google Scholar] [CrossRef]
  12. McSporran, K. Histologic grade predicts recurrence for marginally excised canine subcutaneous soft tissue sarcomas. Vet. Pathol. 2009, 46, 928–933. [Google Scholar] [CrossRef] [PubMed]
  13. Avallone, G.; Rasotto, R.; Chambers, J.K.; Miller, A.D.; Behling-Kelly, E.; Monti, P.; Berlato, D.; Valenti, P.; Roccabianca, P. Review of histological grading systems in veterinary medicine. Vet. Pathol. 2021, 58, 809–828. [Google Scholar] [CrossRef]
  14. Avallone, G.; Helmbold, P.; Caniatti, M.; Stefanello, D.; Nayak, R.; Roccabianca, P. The spectrum of canine cutaneous perivascular wall tumors: Morphologic, phenotypic and clinical characterization. Vet. Pathol. 2007, 44, 607–620. [Google Scholar] [CrossRef]
  15. Loures, F.; Conceição, L.; Lauffer-Amorim, R.; Nóbrega, J.; Costa, E.; Torres, R.; Clemente, J.; Vilória, M.; Silva, J. Histopathology and immunohistochemistry of peripheral neural sheath tumor and perivascular wall tumor in dog. Arq. Bras. Med. Vet. Zootec. 2019, 71, 1100–1106. [Google Scholar] [CrossRef]
  16. Mathew, T.; Kini, J.R.; Rajan, J. Computational methods for automated mitosis detection in histopathology images: A review. Biocybern. Biomed. Eng. 2021, 41, 64–82. [Google Scholar] [CrossRef]
  17. Aubreville, M.; Stathonikos, N.; Bertram, C.A.; Klopfleisch, R.; Ter Hoeve, N.; Ciompi, F.; Wilm, F.; Marzahl, C.; Donovan, T.A.; Maier, A.; et al. Mitosis domain generalization in histopathology images—The MIDOG challenge. Med. Image Anal. 2023, 84, 102699. [Google Scholar] [CrossRef] [PubMed]
  18. Kaman, E.; Smeulders, A.; Verbeek, P.; Young, I.; Baak, J. Image processing for mitoses in sections of breast cancer: A feasibility study. Cytom. J. Int. Soc. Anal. Cytol. 1984, 5, 244–249. [Google Scholar] [CrossRef] [PubMed]
  19. Gallardo, G.M.; Yang, F.; Ianzini, F.; Mackey, M.; Sonka, M. Mitotic cell recognition with hidden Markov models. In Proceedings of the Medical Imaging 2004: Visualization, Image-Guided Procedures, and Display; SPIE: Bellingham, WA, USA, 2004; Volume 5367, pp. 661–668. [Google Scholar]
  20. Tao, C.Y.; Hoyt, J.; Feng, Y. A support vector machine classifier for recognizing mitotic subphases using high-content screening data. SLAS Discov. 2007, 12, 490–496. [Google Scholar] [CrossRef]
  21. Liu, A.; Li, K.; Kanade, T. Mitosis sequence detection using hidden conditional random fields. In Proceedings of the 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Rotterdam, The Netherlands, 14–17 April 2010; pp. 580–583. [Google Scholar]
  22. Roux, L.; Racoceanu, D.; Loménie, N.; Kulikova, M.; Irshad, H.; Klossa, J.; Capron, F.; Genestie, C.; Le Naour, G.; Gurcan, M.N. Mitosis detection in breast cancer histological images An ICPR 2012 contest. J. Pathol. Inform. 2013, 4. [Google Scholar]
  23. Aubreville, M.; Bertram, C.; Veta, M.; Klopfleisch, R.; Stathonikos, N.; Breininger, K.; ter Hoeve, N.; Ciompi, F.; Maier, A. Quantifying the Scanner-Induced Domain Gap in Mitosis Detection. arXiv 2021, arXiv:2103.16515. [Google Scholar]
  24. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [Google Scholar] [CrossRef]
  25. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  26. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
  27. Rao, S. Mitos-rcnn: A novel approach to mitotic figure detection in breast cancer histopathology images using region based convolutional neural networks. arXiv 2018, arXiv:1807.01788. [Google Scholar]
  28. Veta, M.; Van Diest, P.J.; Willems, S.M.; Wang, H.; Madabhushi, A.; Cruz-Roa, A.; Gonzalez, F.; Larsen, A.B.; Vestergaard, J.S.; Dahl, A.B.; et al. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Med. Image Anal. 2015, 20, 237–248. [Google Scholar] [CrossRef]
  29. Roux, L.; Racoceanu, D.; Capron, F.; Calvo, J.; Attieh, E.; Le Naour, G.; Gloaguen, A. Mitos & atypia. Detection of Mitosis and Evaluation of Nuclear Atypia Score in Breast Cancer Histological Images. 2014, Volume 1, pp. 1–8. Available online: http://ludo17.free.fr/mitos_atypia_2014/icpr2014_MitosAtypia_DataDescription.pdf (accessed on 28 January 2024).
  30. Aubreville, M. MItosis DOmain Generalization Challenge 2022 (MICCAI MIDOG 2022), Training Data Set (PNG version) (1.0) [Data Set]. Zenodo. Available online: https://zenodo.org/records/6547151 (accessed on 28 January 2024).
  31. Aubreville, M.; Wilm, F.; Stathonikos, N.; Breininger, K.; Donovan, T.A.; Jabari, S.; Veta, M.; Ganz, J.; Ammeling, J.; van Diest, P.J.; et al. A comprehensive multi-domain dataset for mitotic figure detection. Sci. Data 2023, 10, 484. [Google Scholar] [CrossRef]
  32. Aubreville, M.; Bertram, C.A.; Marzahl, C.; Gurtner, C.; Dettwiler, M.; Schmidt, A.; Bartenschlager, F.; Merz, S.; Fragoso, M.; Kershaw, O.; et al. Deep learning algorithms out-perform veterinary pathologists in detecting the mitotically most active tumor region. Sci. Rep. 2020, 10, 1–11. [Google Scholar]
  33. Bertram, C.A.; Aubreville, M.; Marzahl, C.; Maier, A.; Klopfleisch, R. A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor. Sci. Data 2019, 6, 1–9. [Google Scholar] [CrossRef] [PubMed]
  34. Litjens, G. Automated Slide Analysis Platform (ASAP). 2017. Available online: https://www.computationalpathologygroup.eu/software/asap/ (accessed on 28 January 2024).
  35. Elston, C.W.; Ellis, I.O. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: Experience from a large study with long-term follow-up. Histopathology 1991, 19, 403–410. [Google Scholar] [CrossRef]
  36. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
  37. Mahmood, T.; Arsalan, M.; Owais, M.; Lee, M.B.; Park, K.R. Artificial intelligence-based mitosis detection in breast cancer histopathology images using faster R-CNN and deep CNNs. J. Clin. Med. 2020, 9, 749. [Google Scholar] [CrossRef] [PubMed]
  38. Halmes, M.; Heuberger, H.; Berlemont, S. Deep Learning-based mitosis detection in breast cancer histologic samples. arXiv 2021, arXiv:2109.00816. [Google Scholar]
  39. Zhou, Y.; Mao, H.; Yi, Z. Cell mitosis detection using deep neural networks. Knowledge-Based Systems 2017, 137, 19–28. [Google Scholar] [CrossRef]
  40. Henderson, P.; Ferrari, V. End-to-end training of object class detectors for mean average precision. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 198–213. [Google Scholar]
  41. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
  42. Everingham, M.; Eslami, S.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
  43. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  44. Rai, T.; Morisi, A.; Bacci, B.; Bacon, N.J.; Dark, M.J.; Aboellail, T.; Thomas, S.A.; Bober, M.; La Ragione, R.; Wells, K. Deep learning for necrosis detection using canine perivascular wall tumour whole slide images. Sci. Rep. 2022, 12, 10634. [Google Scholar] [CrossRef]
  45. Morisi, A.; Rai, T.; Bacon, N.J.; Thomas, S.A.; Bober, M.; Wells, K.; Dark, M.J.; Aboellail, T.; Bacci, B.; La Ragione, R.M. Detection of Necrosis in Digitised Whole-Slide Images for Better Grading of Canine Soft-Tissue Sarcomas Using Machine-Learning. Vet. Sci. 2023, 10, 45. [Google Scholar] [CrossRef] [PubMed]
  46. Rai, T.; Papanikolaou, I.; Dave, N.; Morisi, A.; Bacci, B.; Thomas, S.; La Ragione, R.; Wells, K. Investigating the potential of untrained convolutional layers and pruning in computational pathology. In Proceedings of the Medical Imaging 2023: Digital and Computational Pathology, San Diego, CA, USA, 19–23 February 2023; Volume 12471, pp. 323–332. [Google Scholar]
  47. Aubreville, M.; Bertram, C.A.; Donovan, T.A.; Marzahl, C.; Maier, A.; Klopfleisch, R. A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research. Sci. Data 2020, 7, 1–10. [Google Scholar] [CrossRef] [PubMed]
  48. Piansaddhayanon, C.; Santisukwongchote, S.; Shuangshoti, S.; Tao, Q.; Sriswasdi, S.; Chuangsuwanich, E. ReCasNet: Improving consistency within the two-stage mitosis detection framework. arXiv 2022, arXiv:2202.13912. [Google Scholar] [CrossRef]
  49. Çayır, S.; Solmaz, G.; Kusetogullari, H.; Tokat, F.; Bozaba, E.; Karakaya, S.; Iheme, L.O.; Tekin, E.; Özsoy, G.; Ayaltı, S.; et al. MITNET: A novel dataset and a two-stage deep learning approach for mitosis recognition in whole slide images of breast cancer tissue. Neural Comput. Appl. 2022, 34, 17837–17851. [Google Scholar] [CrossRef]
Figure 1. Image is inspired by Mahmood et al.’s depiction of Faster R-CNN [37]. A Faster R-CNN object detection model applied to the cPWT mitosis dataset. An input image of size 512 × 512 pixels is passed through the model where the feature map is extracted using the Resnet-50 feature-extraction network. This is then followed by generating region proposals in the Region Proposal Network (RPN) and finally mitosis detection in the classifier.
Figure 1. Image is inspired by Mahmood et al.’s depiction of Faster R-CNN [37]. A Faster R-CNN object detection model applied to the cPWT mitosis dataset. An input image of size 512 × 512 pixels is passed through the model where the feature map is extracted using the Resnet-50 feature-extraction network. This is then followed by generating region proposals in the Region Proposal Network (RPN) and finally mitosis detection in the classifier.
Cancers 16 00644 g001
Figure 2. Keeping humans in the loop: (a) Two pathologist annotators independently review canine Perivascular Wall Tumour (cPWT) Whole Slide Images (WSIs) and applied centroid annotations to mitotic figures. (b) After initial agreement of mitoses, this formed the initial dataset of patch images with annotations. (c) A Faster R-CNN object detector was trained and tested on these data. (d) Thereafter, false positive (FP) candidates are reviewed again by the pathologist annotators where misclassified candidates are reassigned as true positives (TPs). (e) These new TPs are added to the updated dataset. (20× magnification images).
Figure 2. Keeping humans in the loop: (a) Two pathologist annotators independently review canine Perivascular Wall Tumour (cPWT) Whole Slide Images (WSIs) and applied centroid annotations to mitotic figures. (b) After initial agreement of mitoses, this formed the initial dataset of patch images with annotations. (c) A Faster R-CNN object detector was trained and tested on these data. (d) Thereafter, false positive (FP) candidates are reviewed again by the pathologist annotators where misclassified candidates are reassigned as true positives (TPs). (e) These new TPs are added to the updated dataset. (20× magnification images).
Cancers 16 00644 g002
Figure 3. We used 20× magnification images and annotations from the updated mitosis dataset to train the Faster R-CNN object detection model (details from the Faster R-CNN model are also shown in Figure 1). Optimal thresholds using Equation (4) were applied on the output candidates determined from the validation set.
Figure 3. We used 20× magnification images and annotations from the updated mitosis dataset to train the Faster R-CNN object detection model (details from the Faster R-CNN model are also shown in Figure 1). Optimal thresholds using Equation (4) were applied on the output candidates determined from the validation set.
Cancers 16 00644 g003
Figure 4. Line graphs that show the sensitivity, precision and F1-score calculated for each probability threshold for the three validation folds. To determine the optimal probability threshold, we choose the threshold with the highest F1-score as determined via Equation (4). In the above plots, these are denoted as “best threshold”. For fold 1, this threshold was 0.96, for fold 2, it was 0.84, and for fold 3, it was 0.91.
Figure 4. Line graphs that show the sensitivity, precision and F1-score calculated for each probability threshold for the three validation folds. To determine the optimal probability threshold, we choose the threshold with the highest F1-score as determined via Equation (4). In the above plots, these are denoted as “best threshold”. For fold 1, this threshold was 0.96, for fold 2, it was 0.84, and for fold 3, it was 0.91.
Cancers 16 00644 g004
Figure 5. An example 512 × 512 pixel image from the test set with a false negative (FN) shown in the red bounding box and a false positive (FP) detection shown in the yellow bounding box (32 × 32 pixels). The FP detection provides a probability confidence score of 5.3% and so would typically be dismissed as a mitosis candidate once the adaptive F1-score threshold is applied.
Figure 5. An example 512 × 512 pixel image from the test set with a false negative (FN) shown in the red bounding box and a false positive (FP) detection shown in the yellow bounding box (32 × 32 pixels). The FP detection provides a probability confidence score of 5.3% and so would typically be dismissed as a mitosis candidate once the adaptive F1-score threshold is applied.
Cancers 16 00644 g005
Table 1. The differences between the two annotators in regard to mitosis annotations for the training/validation set. The “Slide” column represents the anonymised set of slides annotated. “Anno 1” and “Anno 2” show the number of mitoses annotated per slide for each annotator. “Agreement” represents the number of agreed mitoses between each annotator. The “% agreement” for each annotator represents the percentage of the agreed mitotic count against the respective annotators mitotic count. “Avg” is the average of every WSI % agreement, which is computed for each annotator.
Table 1. The differences between the two annotators in regard to mitosis annotations for the training/validation set. The “Slide” column represents the anonymised set of slides annotated. “Anno 1” and “Anno 2” show the number of mitoses annotated per slide for each annotator. “Agreement” represents the number of agreed mitoses between each annotator. The “% agreement” for each annotator represents the percentage of the agreed mitotic count against the respective annotators mitotic count. “Avg” is the average of every WSI % agreement, which is computed for each annotator.
SlideAnno 1Anno 2Agreement% Agreement Anno 1% Agreement Anno 2
F17-0477331312374.1974.19
F17-0314169895579.7161.80
F17-126145464191.1189.13
F18-1336469551744463.8885.88
F17-0223233126421865.8682.58
F17-0491149583775.5163.79
F17-054915714211271.3478.87
F17-01157727292385.1979.31
F17-01177744936729064.5979.02
F17-0385597877072.1680.46
F17-0490091867582.4287.21
F18-783249640134669.7686.28
F17-0970020218713968.8174.33
F17-0264159484372.8889.58
F17-0992677716280.5287.32
F17-0272349524081.6376.92
F17-0593555464480.0095.65
F17-0212058534374.1481.13
F18-79705132998765.9187.88
Total:316926732192Avg: 74.72Avg: 81.12
Table 2. Initial mitosis object detection results for the 40× and 20× magnification patches datasets. As the difference in performance between the two resolution datasets was of interest, we first present the initial results for 20× and 40 magnifications for validation and test sets and for all three folds. Interestingly, although the 40× magnification trained models appeared to produce better F1-scores for validation, 20× magnification models performed better across all three folds when applied to the hold-out test set. It appears that with our experimental set-up, the models trained on 20× magnification generalise across the two evaluation datasets better. As a consequence, and to also reduce computational requirements, we proceeded further with the 20× magnification extracted dataset. Results for these initial experiments also suggested that the object detector was highly sensitive for the test set (at a mean average of 0.918) but not as precise (at a mean average of 0.249 for the precision measure).
Table 2. Initial mitosis object detection results for the 40× and 20× magnification patches datasets. As the difference in performance between the two resolution datasets was of interest, we first present the initial results for 20× and 40 magnifications for validation and test sets and for all three folds. Interestingly, although the 40× magnification trained models appeared to produce better F1-scores for validation, 20× magnification models performed better across all three folds when applied to the hold-out test set. It appears that with our experimental set-up, the models trained on 20× magnification generalise across the two evaluation datasets better. As a consequence, and to also reduce computational requirements, we proceeded further with the 20× magnification extracted dataset. Results for these initial experiments also suggested that the object detector was highly sensitive for the test set (at a mean average of 0.918) but not as precise (at a mean average of 0.249 for the precision measure).
MagnificationFoldSetSensitivityPrecisionF1-ScoreTPFPFN
40×1Val0.9670.7200.82659022920
40×1Test0.9570.1320.2321358906
40×2Val0.9220.7860.84984723072
40×2Test0.9650.1730.2941366495
40×3Val0.9440.7240.81950319230
40×3Test0.9570.1850.3111355936
20×1Val0.9570.4840.64358262026
20×1Test0.9320.2070.33813752610
20×2Val0.8950.5670.69481061995
20×2Test0.9180.2210.35613547712
20×3Val0.8970.5450.67847739955
20×3Test0.9050.3200.47313328214
Table 3. A comparison of results of the models trained on the initial annotated dataset and the updated dataset. Results are shown for both the validation and test sets for folds 1, 2 and 3.
Table 3. A comparison of results of the models trained on the initial annotated dataset and the updated dataset. Results are shown for both the validation and test sets for folds 1, 2 and 3.
FoldDataSetSensitivityPrecisionF1-ScoreTPFPFN
1InitialVal0.9570.4840.64358262026
1UpdatedVal0.9610.4520.61561074025
1InitialTest0.9320.2070.33813752610
1UpdatedTest0.9540.2390.3831875949
2InitialVal0.8950.5670.69481061995
2UpdatedVal0.9190.5570.69487769877
2InitialTest0.9180.2210.35613547712
2UpdatedTest0.9590.2810.4351884808
3InitialVal0.8970.5450.67847739955
3UpdatedVal0.9350.3980.55852879837
3InitialTest0.9050.3200.47313328214
3UpdatedTest0.9440.2440.38718557411
Table 4. Results of the models trained on the updated dataset. The “Thresholds” column depict whether models were optimised using the adaptive F1-score threshold metric described in Equation (4); filled in values state the probability threshold. It is apparent that the models with optimised thresholds produced the highest F1-scores across all folds, producing a mean average F1-score of 0.750 on the test set compared to 0.402.
Table 4. Results of the models trained on the updated dataset. The “Thresholds” column depict whether models were optimised using the adaptive F1-score threshold metric described in Equation (4); filled in values state the probability threshold. It is apparent that the models with optimised thresholds produced the highest F1-scores across all folds, producing a mean average F1-score of 0.750 on the test set compared to 0.402.
FoldThresholdSetSensitivityPrecisionF1-Score
1NoneVal0.9610.4520.615
10.96Val0.8110.8540.832
1NoneTest0.9540.2390.383
10.96Test0.7760.7560.766
2NoneVal0.9190.5570.694
20.84Val0.7780.8570.815
2NoneTest0.9590.2810.435
20.84Test0.8270.6330.717
3NoneVal0.9350.3980.558
30.91Val0.8190.8400.830
3NoneTest0.9440.2440.387
30.91Test0.8060.7310.767
Average (mean)NoneVal0.9380.4690.622
Test0.9520.2550.402
OptimisedVal0.8030.8500.826
Test0.8030.7070.750
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rai, T.; Morisi, A.; Bacci, B.; Bacon, N.J.; Dark, M.J.; Aboellail, T.; Thomas, S.A.; La Ragione, R.M.; Wells, K. Keeping Pathologists in the Loop and an Adaptive F1-Score Threshold Method for Mitosis Detection in Canine Perivascular Wall Tumours. Cancers 2024, 16, 644. https://doi.org/10.3390/cancers16030644

AMA Style

Rai T, Morisi A, Bacci B, Bacon NJ, Dark MJ, Aboellail T, Thomas SA, La Ragione RM, Wells K. Keeping Pathologists in the Loop and an Adaptive F1-Score Threshold Method for Mitosis Detection in Canine Perivascular Wall Tumours. Cancers. 2024; 16(3):644. https://doi.org/10.3390/cancers16030644

Chicago/Turabian Style

Rai, Taranpreet, Ambra Morisi, Barbara Bacci, Nicholas James Bacon, Michael J. Dark, Tawfik Aboellail, Spencer A. Thomas, Roberto M. La Ragione, and Kevin Wells. 2024. "Keeping Pathologists in the Loop and an Adaptive F1-Score Threshold Method for Mitosis Detection in Canine Perivascular Wall Tumours" Cancers 16, no. 3: 644. https://doi.org/10.3390/cancers16030644

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop