3.1. Multi-Center Multi-Organ WSI Image Dataset
To assess the efficacy of the proposed method, we used a well-known and very challenging multi-center multi-organ WSI dataset called MoNuSeg [
14], which was made available by the Indian Institute of Technology Guwahati. The MoNuSeg dataset contains 30 WSIs having a size of
with a total of 21,000 manually annotated nuclei. The MoNuSeg dataset is tremendously difficult as it has WSIs of seven different organs (breast, kidney, colon, stomach, prostate, liver, and bladder). Besides, the WSIs were collected from many medical centers that employed different stains. It should be noted that in our experiments, we used 23 WSIs for training the proposed method and kept one WSI for each organ for testing (seven WSIs hand-picked from the dataset) to take into consideration the diversity of organs and stain colors. We used the same test set to evaluate all segmentation models.
The input image size to the NSM models was . During training, we re-scaled each WSI to , and then, we divided it into four non-overlapping sub-images of size . Data augmentation was also employed by cropping 200 patches of size from each WSI randomly, then flipping them vertically and horizontally, resulting in a total of 14,076 patches. During the test phase, each WSI was re-scaled to and then divided into four non-overlapping sub-images of size . It should be noted that the proposed segmentation classified each pixel in the input WSI as a nucleus or non-nucleus pixel. Thus, even if the splitting line crossed a nucleus, the proposed method segmented pixels of each part of the cropped nucleus. Afterward, the patches belonging to one input WSI were stacked while keeping their spatial order to form the final segmentation mask.
3.2. Analyzing the Performance of the Proposed Method
First, we analyzed the effect of the stain template selection algorithm on the performance of the proposed method.
Figure 6 shows the distance maps of the stain template selection algorithm with different numbers of clusters:
K = 2, 3, and 5. In these maps, the darkest color indicates the index of the closest WSI to the cluster centroid. The stain template selection algorithm selects the best stain templates based on these distance maps as explained in
Section 2.1.
Figure 6a shows the distance map of the stain selection algorithm with
K = 2 (two clusters). As shown, the closest WSI to Centroid 1 (WSI # 15) is very far from Centroid 2.
Figure 6b shows the distance map of the stain template selection algorithm with
K = 3. As we can see, WSI Number 15 is the closest image to the centroid of the first cluster, which obtains a Euclidean distance of 0.31 (the minimum value on the map). WSI Number 5 is the closest image to the centroid of the second cluster (Euclidean distance of 0.33). In turn, WSI Number 1 is the closest image to the centroid of the third cluster (Euclidean distance of 0.42).
Figure 6c shows the distance map of the stain selection algorithm with
K = 5 (five clusters). As one can see, Centroid 3 and Centroid 5 cannot yield distinct stain templates because most of the images are far from these two clusters. Therefore, the stain selection algorithm with five clusters (
K = 5) may yield limited segmentation results. However, the use of three clusters (
K = 3) would lead to a more accurate nuclei segmentation.
Table 1 shows the performance of the stain template selection algorithm with different numbers of clusters
, and 5 in terms of the F1 and AJI-scores. It is worth noting that
means that we train one
model after normalizing the data using one stain template,
means that we train two
models after normalizing the data using two stain templates, and
means that we train five
models after normalizing the data using five stain templates. As one can see in
Table 1, the proposed method obtains the lowest F1- and AJI-scores with one stain template. A noticeable improvement in the segmentation results (
approximately) can be seen when using multiple stain templates. The proposed method achieved the highest F1-score (
), AJI-score (
), precision
, and recall
when setting the number of stain templates
k to three.
Second, we evaluated the performance of other variants of the proposed method by replacing the Choquet integral-based aggregation function with other aggregation operators: median, mean, and max aggregation operators. Specifically, in this experiment, we fixed the number of templates to three (
, meaning we used three NSMs) while replacing the proposed Choquet integral-based aggregation function with other operators. As presented in
Table 2, the mean operator achieved F1- and AJI-scores of
and
, which were better than the maximum and median operators.
Although the mean aggregation function gave an acceptable performance, the proposed Choquet integral-based aggregation achieved slightly better results. It is worth noting that one of the main advantages of the Choquet integral is that it considers the interaction between the components to be combined through the fuzzy measure. In turn, other operators like the maximum operator disregard the relationship between the components and ignore inherent information. Differently, the Choquet integral is a generalization of various aggregation operators, such as the arithmetic mean, weighted sum, ordered weighted average, minimum, and maximum [
27,
29,
30].
Figure 7 shows the segmentation results of the proposed method with four different organs: liver, breast, colon, and kidney. Despite the color variation in the four WSIs, the proposed method achieved precise segmentation masks. As shown, the proposed method accurately segmented nuclei in the WSIs of kidney organs with an AJI-score of
. Furthermore, it could segment the liver WSI with an AJI-score of
, noting that this WSI was comprised of many overlapped nuclei having different shapes.
Table 3 compares the performance of the proposed method with recently published nuclei segmentation methods [
15,
18,
31].
Table 3 also compares the proposed method with two popular medical image segmentation models (UNet [
8] and UNet++ [
17]), as well as efficient semantic segmentation models adopted for the nuclei segmentation task [
7,
10,
23]. It should be noted that all these models were trained using the MoNuSeg dataset. Besides, in this experiment, we used the same data augmentation strategy used to train our model, and we tried different stain templates to get the best possible performance for all compared models. As one can see, the proposed method outperformed all compared methods. Both FCN8sand UNet++ achieved a competitive performance over the other methods; however, they obtained AJI-scores of
and
, respectively, which were
lower than the proposed method. The NSM-based FCDenseNet103 (one of the key components of the proposed model that was trained) achieved F1- and AJI-scores higher than the other compared methods. Besides, we compared the performance of the proposed method with a popular segmentation software called ImageFIJI [
32], finding that the proposed method obtained superior results. It should be noted that the recently published nuclei segmentation methods achieved improvements of 1–2% over the state-of-the-art models—for instance, [
33]. As one can see in
Table 3, the AJI-score of the proposed method was better than the ones of FCN8s, UNet++, and FCDenseNet103 with approximately 3, 3, and 2 points, respectively. Thus, it is clear that the proposed method achieved good enhancements over the existing methods when looking at the range of enhancement achieved with the state-of-the-art models and considering the difficulty of the problem.
Figure 8 shows the segmentation masks of the proposed method, FCN8s, UNet++, and ImageFIJI for the WSIs of bladder, prostate, and stomach organs. The ImageFIJI software had the worst segmentation results with all organ images; in particular, there was a serious over-segmentation for the WSIs of bladder and stomach, which have dense nuclei. In the case of the prostate organ WSI,
Figure 8 shows that the proposed method can highly recover the mis-segmentation of the other methods. With the prostate WSI, the proposed method achieved an AJI-score of
, which was
better than the high-performance method UNet++.
Figure 9 presents the boxplots of the F1- and AJI-scores of the proposed method, as well as different compared methods (UNet, SegNet, UNet++, and ImageFIJI). Given the F1- and AJI-scores of the test images with a particular model, a boxplot can be displayed based on a five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles. As shown in
Figure 9, the proposed method achieved the highest median AJI- and F1-scores without outliers.
To demonstrate the efficiency of the proposed method, in
Figure 10, we plot the AJI-scores and the number of parameters of the proposed method, FCN8s, UNet, SegNet, UNet++, and FC-DenseNet103. As shown, the nuclei segmentation models had a noticeable variation in the number of parameters. On the one hand, UNet++ achieved an AJI-score higher than
; however, it is a bit heavy model with >35 million trainable parameters (high computational cost). On the other hand, FCDenseNet achieved an AJI-score higher than
; it had <10 million trainable parameters (light architecture), which made it the primary choice to construct the NSMs. It is worth noting that despite the number of trainable parameters of the proposed method (25 million approximately) being higher than NSMs (FCDenseNet), it achieved state-of-the-art results (AJI >73%). All in all, the proposed method obtained the highest segmentation results while maintaining the number of parameters lower than the other methods having close results.