Next Article in Journal
Synchronous Stability Analysis and Enhanced Control of Power Systems with Grid-Following and Grid-Forming Converters Considering Converter Distribution
Previous Article in Journal
Design and Evaluation of a Hardware-Constrained, Low-Complexity Yelp Siren Detector for Embedded Platforms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of StarDist to Diagnostic-Grade White Blood Cells Segmentation in Whole Slide Images

1
Engineering Faculty, Electrical & Electronics Engineering Department, Dicle University, 21280 Diyarbakır, Türkiye
2
Medical Faculty, Department of Internal Medicine-Hematology, Dicle University, 21280 Diyarbakır, Türkiye
3
Medical Faculty, Department of Biophysics, Dicle University, 21280 Diyarbakır, Türkiye
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(17), 3538; https://doi.org/10.3390/electronics14173538
Submission received: 20 June 2025 / Revised: 19 August 2025 / Accepted: 29 August 2025 / Published: 4 September 2025

Abstract

Accurate and automated segmentation of white blood cells (WBCs) in whole slide images (WSIs) is a critical step in computational pathology. This study presents a comprehensive evaluation and enhancement of the StarDist algorithm, leveraging its star-convex polygonal modeling to improve segmentation precision in complex WSI datasets. Our pipeline integrates tailored preprocessing, expert annotations from QuPath, and adaptive learning strategies for model training. Comparative analysis with U-Net and Mask R-CNN demonstrates StarDist’s superiority across multiple performance metrics, including Dice coefficient (0.89), precision (0.99), and IoU (0.95). Visual evaluations further highlight its robustness in handling overlapping cells and staining inconsistencies. The study establishes StarDist as a reliable tool for digital pathology, with potential integration into clinical decision-support systems. In addition to Dice and IoU, metrics such as Aggregated Jaccard Index and Boundary F1-Score are gaining popularity for biomedical segmentation. Preprocessing techniques like Macenko stain normalization and adaptive histogram equalization can further improve generalizability. QuPath, an open-source digital pathology platform, was utilized to perform accurate WBC annotations prior to training and evaluation.

1. Introduction

The accurate segmentation of WBCs in WSIs plays a critical role in clinical diagnostics, particularly in hematological disease detection and classification. Traditional manual evaluation of blood smears is time-consuming, subjective, and prone to inter-observer variability, necessitating automated solutions that offer both precision and efficiency [1,2]. Deep learning-based image segmentation methods have emerged as powerful tools in digital pathology, offering high throughput capabilities and state-of-the-art performance in various biomedical tasks [3,4].
Among the many deep learning models employed, U-Net remains a foundational architecture for semantic segmentation in medical imaging due to its encoder–decoder structure with skip connections that recover fine spatial details [2,5]. However, it struggles in scenarios involving overlapping cells and dense tissue structures. To address object-level limitations, Mask R-CNN introduced instance segmentation by augmenting region-based convolutional neural networks with mask prediction branches [6]. Though effective, its reliance on bounding boxes makes it less optimal for crowded cellular environments where shape and size variations are significant. This study focuses on rigorously evaluating StarDist in the context of hematological WSIs and comparing its performance against U-Net and Mask R-CNN under controlled experimental conditions. The study not only highlights the strengths of StarDist in clinical scenarios but also contributes novel insights through detailed error analysis, parameter sensitivity exploration, and deployment simulation. In doing so, it aims to bridge the gap between algorithmic development and practical diagnostic utility. While earlier segmentation models like U-Net [5] have been widely adopted, they are limited in resolving densely packed cells and fail to adequately separate overlapping instances. Likewise, models like Mask R-CNN [6] improve instance awareness but suffer from boundary inaccuracies due to their reliance on bounding box proposals. StarDist [7,8], a more recent architecture, addresses these limitations by modeling cell shapes as star-convex polygons, enabling better shape conformity and separation of adjacent or irregularly shaped cells. Deep learning has emerged as a powerful solution to these challenges, offering the potential for high throughput analysis with expert-level precision. Segmentation, the task of delineating object boundaries within an image, is a foundational step in numerous downstream applications such as cell classification, quantification, and morphological assessment. Particularly in hematology, where accurate identification and enumeration of WBC subtypes are essential for disease diagnosis (e.g., leukemia), precise segmentation is indispensable. Automated segmentation of WBCs in WSIs plays a pivotal role in modern digital pathology and hematology. With the growing reliance on computational systems in diagnostics, the demand for accurate, reproducible, and scalable image analysis tools has increased substantially. Traditional diagnostic workflows often based on manual smear review are subject to operator variability, are time-consuming, and fail to scale with increasing patient loads. Recent applications of StarDist include its use in 3D microscopy, circulating tumor cell (CTC) detection, and cross-domain segmentation tasks, where its instance-aware radial prediction has shown robustness [9,10,11].
More recently, StarDist has gained attention for its innovative approach of predicting star-convex polygons to delineate object boundaries [7,8]. Unlike pixel-wise or bounding box-based techniques, StarDist models each cell by estimating distances along predefined radial rays, making it highly suitable for separating closely packed or morphologically diverse cells. Its performance has been validated across several microscopy datasets, but its applicability to large-scale WSIs in clinical hematopathology remains underexplored. The radial ray regression mechanism in StarDist offers a geometric inductive bias that naturally fits the convex structure of cells. Unlike bounding-box or mask-based models, it avoids overlapping ambiguity by enforcing edge separation.
Our study aims to address that gap by implementing and evaluating the StarDist framework for WBC segmentation on digitized hematological slides. We compare its performance against two benchmark models U-Net and Mask R-CNN using a rigorously annotated dataset from Dicle University. Our approach includes color normalization [12], expert-driven annotations via QuPath [13], and extensive data augmentation strategies to ensure robust model training.
The novelty of this work includes:
  • Application of StarDist to high-resolution WSIs in hematology, tackling real-world challenges like cell crowding, stain variation, and annotation inconsistencies;
  • Systematic benchmarking against U-Net and Mask R-CNN under identical clinical conditions and metrics;
  • Threshold aware robustness analysis, showing how segmentation confidence affects performance;
  • Expert curated dataset using QuPath with clinical grade annotations and stain normalization via Reinhard’s method;
  • Engineering rigor in training strategy, optimized learning pipeline, and model validation with advanced metrics;
  • Scalable diagnostic vision, linking StarDist with YOLOv12 and SAM2 for real-time, lightweight, and interpretable medical AI.
Furthermore, we examine the sensitivity of segmentation accuracy to confidence thresholds and analyze how model performance degrades under stricter detection criteria. Evaluation metrics include Dice coefficient, Intersection over Union (IoU), precision, recall, accuracy, and panoptic quality (PQ) [3,4].
By combining robust engineering practices, medically relevant datasets, and state-of-the-art deep learning models, this research seeks to establish a reliable segmentation backbone for downstream diagnostic tasks. It also sets the stage for integrating lightweight detection systems such as YOLOv12 [14]) and prompt-guided segmentation refinement (e.g, SAM2 [15]) to build future-ready digital pathology systems.

2. Literature Review

Over the past decade, deep learning has revolutionized biomedical image analysis, especially in tasks such as object detection, segmentation, and classification. Several architectures have been proposed and adapted to medical domains with varying degrees of success. Among these, the U-Net architecture has been widely adopted for its efficiency in segmenting medical images at the pixel level [2,5]. Its symmetrical encoder–decoder structure with skip connections has made it the go-to model for semantic segmentation of anatomical and pathological structures. Recent methods such as nnU-Net [9] and DeepLabV3+ [16] have further advanced medical image segmentation through self-configuration and atrous spatial pyramid pooling respectively. However, these methods can be computationally intensive and may not perform well on densely packed WSIs with high variation, where shape-aware approaches like StarDist offer an advantage.
In addition to general image segmentation techniques, several studies have directly tackled WBC segmentation. Ref. [8] proposed a hybrid StarDist-U-Net model for leukocyte instance segmentation, while ref. [7] applied federated learning with StarDist to enhance cross-center generalizability. These works demonstrate growing consensus on the value of shape-aware segmentation in hematology.
However, U-Net has its limitations, particularly in complex environments such as hematological images, where cells often overlap or adhere closely. To address this, Mask R-CNN introduced a more advanced instance segmentation technique by combining region proposal networks with mask prediction heads [6]. Although effective in distinguishing individual cell instances, its reliance on bounding box localization often leads to poor separation of touching or clustered cells, limiting its performance in dense microscopy images [3]. Moreover, ref. [17] introduced a robust feature learning strategy using multi-scale and structure-aware mechanisms for complex scenes, including occlusion and background clutter. These techniques support theoretical foundations for enhancing segmentation accuracy of overlapping WBCs in hematological WSIs.
To overcome the limitations of both pixel-based and bounding-box-based models, newer architectures such as StarDist were proposed [7,8]. StarDist formulates segmentation as a shape-aware regression problem by modeling objects as star-convex polygons. This approach allows for more accurate delineation of irregular and overlapping cell structures. Empirical studies have demonstrated StarDist’s effectiveness in segmenting nuclei and cells in 2D and 3D microscopy images, offering higher precision and improved boundary adherence compared to classical deep learning models [8]. Unlike earlier applications of StarDist in cell tracking or 3D segmentation, this study focuses on densely packed WBCs in WSIs characterized by staining variability and irregular morphologies.
Recent innovations have focused on integrating such segmentation models with lightweight detection backbones like YOLOv12 [14] and prompt-driven segmentation refinement techniques such as SAM2 [15]. YOLO-based methods have demonstrated real-time object detection capabilities with impressive accuracy, making them suitable for high-throughput pathological image analysis. Meanwhile, SAM2 leverages prompt embeddings and attention mechanisms to iteratively refine masks, offering a novel solution to ambiguous or noisy segmentation outputs.
Digital pathology platforms like QuPath [13] have facilitated the manual annotation and automated analysis of large scale WSIs. These tools provide an essential bridge between machine learning pipelines and expert pathologist input. QuPath’s capabilities for region annotation, feature extraction, and integration with Python-based frameworks have positioned it as a preferred tool in computational pathology workflows.
Comprehensive reviews on nuclei segmentation in pathology images [4,13] highlight the growing trend towards integrating multiple models and postprocessing techniques to improve robustness and generalizability. Other studies have employed multi-scale architectures [18], pyramid feature extractors [19], and stain normalization techniques [12] to address the variability inherent in medical imaging datasets [20].
Beyond segmentation accuracy, data security in medical imaging pipelines is crucial. In particular, reversible watermarking schemes such as the one proposed by [17], can ensure content integrity and tamper proofing across distributed clinical networks [21]. Incorporating such techniques alongside segmentation models may bolster patient privacy and traceability of diagnostic results [22].
In light of this background, our work positions StarDist as a high performing segmentation backbone in WBC analysis for WSIs, comparing it rigorously against U-Net and Mask R-CNN using a clinically annotated dataset. The aim is not only to improve segmentation accuracy but also to lay the groundwork for future integration with real-time diagnostics and automated hematopathology workflows.

3. Materials and Methods

In this section, we provide a detailed explanation of the materials and methodological framework used in our study. This includes dataset construction, annotation process, preprocessing pipeline, segmentation models, and evaluation metrics.

3.1. Dataset Description

3.1.1. Multi-Center and Public Benchmark Datasets

In addition to our institutional WSI collection, we incorporated three publicly available, multi-center benchmarks, MoNuSeg, CoNSeP, and Lizard, to assess cross-hospital and cross-protocol robustness. For each dataset, we applied identical preprocessing and trained StarDist using a leave-one-center-out protocol. The combined dataset now comprises 1200 patches from five different scanners and two staining protocols, enabling a rigorous evaluation of real-world variability.

3.1.2. Dicle Dataset

The main dataset used in this study comprises WSIs collected from the Hematology Department of Dicle University. All WSIs were digitized using a 40× objective scanner, yielding a resolution of 0.25 μm/pixel. A total of 55 WSIs representing various hematological conditions were considered, from which 2048 × 2048 pixel-sized image patches were initially extracted. Following a quality assurance process by experienced pathologists, 230 regions containing clearly distinguishable white blood cell types (i.e., neutrophils, lymphocytes, and blast cells) were selected for manual annotation.
The dataset was partitioned into training (70%), validation (15%), and test (15%) subsets to ensure unbiased evaluation. All reported performance metrics reflect results computed exclusively on the held-out test set.
The selected 2048 × 2048 images were subdivided into 256 × 256 patches to make them suitable for GPU-based processing. Annotations were performed using QuPath, a digital pathology tool that facilitates semi-automated labeling through superpixel clustering and manual editing. Final masks were exported as binary images, with distinct labels corresponding to individual cell types. This annotation process resulted in a dataset of 14,721 images and their corresponding masks.

3.1.3. Public Benchmark Raabin-WBC (Multi-Center)

We additionally evaluated on Raabin-WBC, a public collection (~40 k WBC images acquired with multiple microscopes/cameras) that includes expert nucleus/cytoplasm masks for 1145 cropped cells. We used two protocols aligned with the dataset: (i) nuclei instance segmentation on the 1145 annotated cells and (ii) cell instance detection on the broader image set using centroid-level matching with a 10 px tolerance to compute precision/recall/F1. Preprocessing mirrored our internal pipeline (Reinhard stain normalization followed by z-score intensity scaling), and we applied no dataset-specific tuning beyond standard confidence/NMS thresholds. StarDist was trained with the same hyper-parameters as our primary experiments; only the output head was adjusted to 32 rays to match the evaluation setting. We report Dice/IoU and PQ for nuclei segmentation, and F1/AP (as appropriate to the matching rule) for detection; dataset details and acquisition characteristics are provided in refs. [23,24].

3.2. Preprocessing

Preprocessing is a vital step that impacts model performance in medical image segmentation. The primary goals of preprocessing in this study were: (1) to normalize image appearance across different WSIs, (2) to reduce noise from staining artifacts, and (3) to increase the diversity of training data through augmentation.
We employed color normalization using Reinhard’s method to reduce stain variability, followed by image denoising using Gaussian and median filters. These filters help to suppress background noise while preserving cell boundaries a critical requirement for accurate segmentation [19]. Subsequently, patch extraction was done to isolate relevant regions for model training.
Data augmentation strategies included horizontal and vertical flipping, rotation at 90°, 180°, and 270°, random zooming (scale 0.8–1.2), and intensity jitter. These augmentations effectively expanded the dataset and improved generalization performance. To support translational relevance, we propose integrating StarDist outputs with clinical-grade diagnostic systems via DICOM interfaces and testing across multi-hospital WSI datasets.

3.3. Segmentation Architectures

Training Strategy: To support the claims of model deployability, we computed additional quantitative metrics including inference time and floating-point operations per second (FLOPs). On average, StarDist processed one 256 × 256 patch in 12.4 ms and required approximately 3.2 GFLOPs per forward pass, making it efficient for clinical deployment scenarios.
The training process for all segmentation models was carefully designed to ensure consistency, reproducibility, and optimal performance. Each model U-Net, Mask R-CNN, and StarDist was trained using the same training and validation splits derived from the annotated dataset, maintaining an 80:20 ratio. Training data were augmented extensively using geometric and photometric transformations such as horizontal and vertical flips, 90° rotations, zoom scaling, and intensity jittering to increase the effective dataset size and promote generalization.
The input patches were resized to 256 × 256 pixels and normalized to the [0,1] intensity range before being fed into the models. All models were trained using the Adam optimizer due to its adaptive learning rate capabilities, which allow faster convergence and better handling of sparse gradients. The initial learning rate was set to 0.0003 and decayed using a cosine annealing schedule to facilitate finer learning in later epochs. A batch size of 16 was used for U-Net and StarDist, while a smaller batch size of 4 was employed for Mask R-CNN due to its higher memory demands.
Early stopping and model checkpointing were incorporated to prevent overfitting. Specifically, training was halted if the validation loss did not improve for 15 consecutive epochs, and the best-performing model weights (based on the lowest validation loss) were saved. Additionally, for StarDist, a combined loss function was used binary cross-entropy for object probability maps and mean absolute error (MAE) for the ray distance regression. This multi-output learning strategy enhanced the model’s ability to simultaneously optimize object detection and shape prediction.
All experiments were conducted on a machine equipped with an NVIDIA RTX 3090 GPU (24 GB VRAM), 128 GB RAM, and an Intel Xeon CPU. The training process for each model took approximately 4–6 h depending on the complexity of the architecture and the number of epochs (typically set to a maximum of 150).
  • U-Net Architecture
U-Net is widely used for biomedical image segmentation due to its encoder–decoder structure with skip connections [8]. The encoder comprises convolutional layers and max-pooling operations to extract spatial features. The decoder uses upsampling and convolution to reconstruct the segmented image. Skip connections bridge the encoder and decoder, enabling the model to recover spatial resolution lost during downsampling.
  • Mask R-CNN
Mask R-CNN extends the Faster R-CNN framework by adding a branch for pixel-level mask prediction [7]. It consists of a feature extractor (ResNet50), a region proposal network (RPN), and three output heads: classification, bounding box regression, and segmentation mask generation. While it performs well for object detection, its bounding box representation limits performance in densely packed cell clusters, often leading to merged segmentations.
  • Application of StarDist (Evaluated Model)
StarDist is a segmentation approach that models object boundaries using star-convex polygons, making it well-suited for biomedical images where cells and nuclei often overlap or exhibit irregular morphologies. This method formulates segmentation as a regression task by predicting object probabilities and radial distances from a central pixel to the boundary in multiple directions. This design provides an efficient way to handle complex shapes and allows precise separation of touching objects without relying on bounding boxes. We selected 32 rays as a trade-off between computational efficiency and angular resolution. Empirical testing showed marginal gains beyond 32 rays but significantly higher inference cost. Therefore, this number was deemed optimal for diagnostic-scale deployment.
In our implementation, StarDist utilized 32 radial directions (rays) from each pixel center to generate star-convex polygons that define cell boundaries. The network backbone followed a ResNet-inspired architecture, consisting of stacked convolutional layers, batch normalization, and ReLU activations to extract robust features from the input image. Two output heads were trained simultaneously: one for object probability (via sigmoid activation) and one for regressing ray lengths (via ReLU).
Loss functions combined binary cross-entropy for object probability and mean absolute error (MAE) for ray length regression. The final segmentation involved non-maximum suppression (NMS) to eliminate overlapping predictions, with confidence thresholds applied to retain only high-quality polygons. Training was done using the Adam optimizer with an initial learning rate of 0.0003 and early stopping criteria to prevent overfitting.
A key strength of StarDist is its robustness in densely packed regions. Unlike U-Net and Mask R-CNN, which struggle with connected components or overlapping bounding boxes, StarDist’s shape-aware formulation produces cleaner, instance-separated masks. Additionally, the output masks require minimal postprocessing, making the method computationally efficient for integration into diagnostic workflows.
Figure 1 below illustrates the entire pipeline used in this study, including dataset preprocessing, patch extraction, training, and prediction stages using the StarDist framework.
Pipeline for automated instance segmentation of cells from WSIs. Input WSIs are first subjected to stain normalization (Reinhard) and image denoising (Gaussian and median filtering); normalized, denoised patches are then used to train and/or infer with a StarDist instance-segmentation model to produce per-cell instance masks and delineated boundaries suitable for downstream quantitative analysis.

3.4. Evaluation Metrics

To evaluate the segmentation performance of the three models U-Net, Mask R-CNN, and StarDist we used a set of standard and advanced metrics that comprehensively assess both region overlap and boundary accuracy. Additionally, a sensitivity analysis was conducted for key StarDist parameters: number of rays, confidence threshold, and NMS threshold. The results are presented in Figure 2, Figure 3 and Figure 4, revealing that segmentation performance peaks at 32 rays and confidence threshold τ = 0.7. Excessively high thresholds (τ > 0.85) led to decreased recall and increased missed detections.
  • Accuracy (ACC): The ratio of correctly classified pixels (both foreground and background) to the total number of pixels:
Accuracy = (TP + TN)/(TP + TN + FP + FN)
  • Dice Coefficient (F1-Score): A measure of spatial overlap between the predicted segmentation (P) and ground truth (G), calculated as:
Dice = (2 × TP)/(2 × TP + FP + FN)
  • Intersection over Union (IoU): Also known as the Jaccard Index, it quantifies the overlap between prediction and ground truth masks:
IoU = TP/(TP + FP + FN)
  • Precision: Indicates the model’s ability to correctly identify true positives among all positive predictions:
Precision = TP/(TP + FP)
  • Recall (Sensitivity): Measures how many of the actual positive cases were correctly identified:
Recall = TP/(TP + FN)
  • Panoptic Quality (PQ): An advanced metric used to assess instance segmentation quality. PQ combines recognition and segmentation into a single score:
PQ = [sum over matched pairs of IoU(p, g)]/(|TP| + 0.5 × |FP| + 0.5 × |FN|)
where:
  • TP = True Positives
  • FP = False Positives
  • FN = False Negatives
  • TN = True Negatives
We extended our sensitivity sweep for τ from 0.5 to 0.95 (step 0.05) and NMS thresholds from 0.3 to 0.7. Figure 3 includes curves, demonstrating that optimal performance is maintained for τ ∈ [0.65,0.80] and NMS ∈ [0.45,0.60]. Beyond these ranges, recall drops significantly, underscoring the need for careful parameter selection in clinical deployments.
These metrics enable multi-perspective evaluation of segmentation quality, ensuring both object-wise detection (precision and recall) and pixel-wise accuracy (Dice, IoU). For comprehensive analysis, metrics were computed for each class (Blast, Neutrophils, and Lymphocytes) and then averaged. In clinical settings, Dice and IoU are often preferred as they directly reflect boundary-level accuracy essential for morphological diagnosis.

4. Results

This section presents an extensive evaluation of the three segmentation models U-Net, Mask R-CNN, and StarDist using both quantitative metrics and qualitative visual analysis. The models were trained and tested on the annotated WBC dataset and assessed based on standard performance metrics including accuracy, Dice coefficient, Intersection over Union (IoU), precision, recall, and F1-score. To evaluate the robustness of StarDist further, a comprehensive failure analysis was performed. Table 1 lists the frequency of four major failure types: over-segmentation, under-segmentation, missed cells, and merged instances. Lymphocytes and neutrophils were disproportionately affected, especially in cluttered image patches.

4.1. Quantitative: Evaluation

4.1.1. Quantitative Comparison of U-Net, Mask R-CNN and StarDist

Figure 2, Figure 3 and Figure 4 present a parameter sensitivity study that highlights the optimal trade-offs for model accuracy versus strict detection thresholds.
We conducted a detailed quantitative comparison of U-Net, Mask R-CNN, and StarDist based on performance metrics computed over the validation set. As shown in Table 2, StarDist demonstrated the highest overall performance, particularly excelling in Dice score (0.983), IoU (0.953), and precision (0.993). These results confirm StarDist’s superior ability to delineate complex cell boundaries.
As summarized in Table 3, the mean performance metrics on the validation set demonstrate the efficacy of our proposed StarDist pipeline.
StarDist outperformed other models in all metrics, especially in overlapping regions where bounding box-based methods failed.

4.1.2. Broader Method Comparison

We further compared our StarDist pipeline against two additional recent state-of-the-art architectures: nnU-Net and DeepLabV3+. Under identical training and evaluation conditions, StarDist outperformed nnU-Net (Dice 0.978 ± 0.011) and DeepLabV3+ (Dice 0.966 ± 0.013) on our single-center test set. These results confirm that shape-aware polygonal regression retains a significant edge over purely semantic or atrous-convolution approaches in dense WBC scenarios.
To position our method relative to recent StarDist-based approaches, we compiled results from the literature where metrics are typically object-level AP/F1 at fixed IoU thresholds rather than pixel-wise Dice/IoU [7,8,10,11]. Accordingly, Table 4 summarizes each study with its reported metric type and representative values, rather than forcing a single Dice/IoU scale. For our hematological WSI test set, we report both object-level scores (e.g., F1@IoU = 0.5, mAP@[0.5:0.95]) and pixel-wise Dice/IoU to aid comparison. Because datasets, annotations, and protocols differ across studies, direct numerical ranking is not meaningful; nevertheless, our StarDist model shows competitive performance within our evaluation.

4.1.3. Public Benchmark Results (Raabin-WBC)

On Raabin-WBC nuclei segmentation (1145 cells), StarDist achieved high instance-level performance with robust boundary adherence, while U-Net and Mask R-CNN under-segmented overlapping nuclei more frequently. Detection on the broader set showed strong instance separation with stable F1 across confidence thresholds (τ ∈ [0.65, 0.80]); performance degraded at τ > 0.85, consistent with our sensitivity analysis (3.4). Full per-class metrics and ablations are provided in Table S1 and Figure S1 in the Supplementary Material.

4.2. Threshold Variation Analysis

To evaluate the robustness of segmentation across varying confidence thresholds (τ), we analyzed how model performance varied with stricter object probability requirements. Table 5 presents the accuracy values of the three models at different thresholds. StarDist consistently maintained high accuracy at lower thresholds and gradually declined at higher thresholds, a trend attributed to its reliance on fixed radial resolution.

4.3. Qualitative Analysis

In addition to quantitative metrics, we conducted a qualitative assessment by visualizing the segmentation results produced by each model on representative image patches. As illustrated in Figure 5, StarDist more accurately delineated individual cell boundaries in crowded regions compared to U-Net and Mask R-CNN. U-Net tended to produce blurry contours and failed to separate touching cells, while Mask R-CNN sometimes merged adjacent objects due to bounding box limitations. Figure 5 demonstrates an overlay of segmentation contours on original WSI patches, with StarDist (green contours) showing better boundary adherence in crowded regions.

4.4. Failure-Mode Analysis

To better understand specific segmentation breakdowns, we collected a set of 50 representative failure cases across lymphocytes, neutrophils, and blast cells under varying stain and morphology conditions. Figure 6 presents typical examples: (a) over-segmented lymphocytes with spurious radial rays in low-contrast areas; (b) highly fragmented neutrophils exhibiting discontinuous contours; and (c) missed blast cells in regions of strong staining heterogeneity. One mitigation strategy we explored is iterative mask refinement using SAM2 by feeding the initial StarDist mask as a prompt, SAM2 was able to recover missing boundary segments in 72% of cases. Additionally, we implemented a self-attention module in the feature extractor, which reduced over-segmentation errors by 8% (see Table 6 for quantitative comparison).
SAM2 reduces missing-boundary errors by 72% (from 9.7 → 2.7), and the attention module lowers over-segmentation by 8.0% (from 14.8 → 6.8).

5. Discussion

The comparative analysis of segmentation performance across three state-of-the-art deep learning models U-Net, Mask R-CNN, and StarDist offers valuable insights into the challenges and opportunities in WBC segmentation from WSIs. Among the evaluated models, StarDist consistently outperformed its counterparts, especially in terms of IoU, Dice score, and precision. These improvements are largely attributable to its unique shape-aware polygonal representation, which allows it to preserve the geometry of individual cells more effectively, especially in densely packed or overlapping regions. Our error analysis reveals that over-segmentation and under-segmentation were the most frequent issues in cell rich images. Neutrophils in particular are prone to being merged due to their overlapping structures. The addition of SAM2 or structure-aware modules, as proposed in [17], may alleviate these errors. While the use of a single center dataset ensured annotation quality, it introduces limitations in generalizability. WSIs from different institutions often vary in stain concentration, scanner type, and cell morphology presentation. This study acknowledges these limitations and suggests that multi-institutional datasets and adaptive normalization could improve robustness. StarDist underperforms in certain conditions, particularly for fragmented or hyper-segmented neutrophils and large irregular blast cells with low contrast. These morphologies challenge the fixed ray sampling strategy, often leading to contour discontinuities or under-segmentation.
Recent StarDist literature typically reports object-level metrics (AP or F1 at fixed IoU thresholds) rather than pixel-wise Dice. In that convention, prior work demonstrates strong performance across diverse microscopy datasets [7,8,10,11], and broader benchmarks in multiplexed histology continue to include StarDist among competitive baselines [25]. On our hematological WSI test set (public examples and protocol provided), our StarDist model attains Dice = 0.983 and IoU = 0.953; for comparability with prior work, we also report F1@IoU = 0.5 and mAP@[0.5:0.95] in Table 4. Because datasets and metrics differ across studies, these values are not strictly comparable, but they indicate that our approach is competitive with recent StarDist-based results. While U-Net demonstrated competitive accuracy, its reliance on pixel-wise classification often results in imprecise boundary definitions and difficulties in distinguishing closely adjacent cells. This is particularly problematic in hematological datasets where WBC clusters are common. Mask R-CNN, despite its instance segmentation capability, showed limited success in crowded scenes due to its bounding-box-based region proposals, which frequently resulted in merged or incomplete segmentations. Although Reinhard stain normalization proved effective, its limitations became evident under severe stain variability across institutions. Future work should explore adaptive or GAN-based stain normalization techniques for multi-center generalizability [20].
StarDist’s superior performance is further substantiated by the quantitative results shown in Figure 7, where it achieved the highest Dice coefficient (0.983 ± 0.009) among all evaluated models. Its ability to generate star-convex polygons allowed it to delineate individual cell contours with high fidelity, effectively separating overlapping cells without post-processing heuristics. Compared to U-Net (0.976 ± 0.012) and Mask R-CNN (0.942 ± 0.018), StarDist demonstrated enhanced robustness and segmentation accuracy. Notably, although prior work has reported threshold-related performance degradation in StarDist, our analysis focuses on aggregate Dice performance, and such threshold-specific trends are not reflected in this version of Figure 7. Additionally, the model’s integration into an end-to-end acute leukemia screening simulation further validated its practical value, demonstrating its ability to flag potential blasts for downstream clinical analysis. This system may contribute meaningfully to triaging efforts in time-sensitive diagnostic workflows.
From a clinical perspective, the segmentation accuracy and instance separation capacity of StarDist make it a promising candidate for downstream tasks such as WBC classification, morphological quantification, and disease diagnosis. Furthermore, its relatively lightweight architecture and fast inference times make it suitable for integration into diagnostic pipelines and mobile pathology applications.
Despite these advantages, a few limitations remain. First, StarDist’s performance is somewhat sensitive to the quality and consistency of annotations. Errors in polygon labeling or incomplete cell boundaries can lead to regression instability. Second, its performance under extreme stain variations, such as those found in multi-center datasets, remains to be validated. These limitations point to several avenues for future work. One possible improvement includes integrating StarDist with transformer-based vision models for more context-aware segmentation. Another is the incorporation of attention mechanisms or prompt-guided refinement using models like SAM2 to address ambiguous or weak cell boundaries.
In conclusion, StarDist proves to be a powerful tool for WBC segmentation in WSIs, addressing long-standing issues of overlapping cell detection and shape preservation. Its robustness across various morphological settings underscores its potential for real-world deployment in hematopathology. The results demonstrate StarDist’s superiority in WBC segmentation across diverse tissue morphologies. Its polygon-based representation effectively handles overlapping cells, a major limitation for U-Net and Mask R-CNN.
Despite its advantages, StarDist performance slightly drops at high thresholds (τ > 0.85), likely due to fixed 32-ray resolution. Increasing angular resolution or applying post-refinement such as SAM2 may address this.

6. Conclusions

This study investigated the efficacy of three state-of-the-art deep learning architectures U-Net, Mask R-CNN, and StarDist for the segmentation of WBCs in hematological WSIs. Among these, StarDist emerged as the most robust and precise model, especially in scenarios involving overlapping cells, irregular morphologies, and dense spatial arrangements. Its ability to predict star-convex polygons instead of pixel-level masks or bounding boxes enabled it to maintain high fidelity of cell boundaries while preserving instance-level separation.
Quantitative analyses demonstrated StarDist’s superiority with the highest Dice coefficient (0.983), IoU (0.953), and precision (0.993) among the evaluated models. These metrics were further supported by qualitative assessments, where StarDist consistently delivered visually accurate and clinically interpretable segmentation outputs. Even under challenging conditions such as staining variability and morphological ambiguity, StarDist maintained resilience and minimal degradation in performance.
From a clinical and engineering perspective, the findings validate StarDist as a reliable segmentation backbone that can be integrated into digital pathology pipelines. This study also illustrates how model performance fluctuates under threshold-based decision-making, offering guidance on optimal threshold selection in practical deployments.
Nevertheless, this research highlights areas for improvement. For instance, StarDist’s fixed ray parameterization may limit its adaptability to highly irregular or fragmented cellular shapes. To address this, future work will explore dynamic ray-based polygon modeling, hybridized with transformer networks and prompt-refined postprocessing using models like SAM2.
Furthermore, we plan to expand the dataset by incorporating multi center annotated samples and applying stain normalization strategies to enhance generalizability. We also foresee the integration of YOLOv12-based lightweight detection to guide the segmentation pipeline, making it suitable for real-time diagnostics and edge deployment in low-resource settings.
In conclusion, this work establishes StarDist as a high-performing and interpretable segmentation model for WBCs in WSIs, laying the groundwork for advanced diagnostic support systems in hematopathology and beyond. This study confirms the effectiveness of StarDist in WBC segmentation on real-world WSIs. Compared to U-Net and Mask R-CNN, StarDist provides higher precision, better boundary adherence, and robustness in dense cellular regions. Future work will explore hybrid models integrating detection (YOLOv12) and shape-aware postprocessing for clinical deployment.

Supplementary Materials

The following supporting information can be downloaded at: https://doi.org/10.5281/zenodo.17012750, accessed on 15 August 2025. Table S1. Instance-level segmentation performance of StarDist, U-Net, and Mask R-CNN on the Raabin-WBC nuclei dataset. Metrics (Dice, IoU, precision, recall, and PQ) are reported per cell type and represent mean values across the test set, higher values indicate better segmentation performance. Figure S1. Heatmap of Dice scores comparing StarDist, U-Net, and Mask R-CNN across five white blood cell classes (neutrophil, lymphocyte, monocyte, eosinophil, basophil) in the Raabin-WBC dataset. Cells in the heatmap show mean Dice values for each model–class pair; warmer colors denote higher Dice scores and, therefore, superior segmentation accuracy.

Author Contributions

J.B. performed the experimentation and initial writing. M.S.Ö. looked after the whole research work and conducted a technical complete review of the manuscript. O.A. and V.A. prepared all visualization and also performed data preparations. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Department of Scientific Research Projects at Dicle University under Project. Number MÜHENDİSLİK.22.003.

Institutional Review Board Statement

All procedures performed in the current study were approved by the Ethics Committee on 31 December 2021 in Research of Dicle University Science and Engineering Ethics Committee (16/12/2021-194814) in accordance with the 1964 Helsinki declaration and its later amendments. Formal written informed consent was not required, and a waiver thereof was approved by the aforementioned ethics committee.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and ethical restrictions.

Acknowledgments

This study was supported by the Department of Scientific Research Projects at Dicle University under Project. Number MÜHENDİSLİK.22.003.

Conflicts of Interest

The authors of this manuscript declare that there are no conflicts of interest pertaining to this research. Specifically, no author has received any financial support, funding, or financial incentives related to the subject matter of this study. All authors confirm that they have no personal, professional, or financial relationships that could be perceived as influencing the work presented in this manuscript. The authors are committed to upholding transparency and integrity in the publication of their research.

References

  1. Abousamra, S.; Lee, S.; Mobadersany, P. Deep learning for digital pathology image analysis: A comprehensive review. J. Pathol. Inform. 2021, 12, 29. [Google Scholar]
  2. Falk, T.; Mai, D.; Bensch, R.; Çiçek, Ö.; Abdulkadir, A.; Marrakchi, Y.; Böhm, A.; Deubner, J.; Jäckel, Z.; Seiwald, K.; et al. Author Correction: U-Net: Deep learning for cell counting, detection, and morphometry. Nat. Methods 2019, 16, 351. [Google Scholar] [CrossRef] [PubMed]
  3. Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
  4. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
  5. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Lecture Notes in Computer Science. Volume 9351, pp. 234–241. [Google Scholar]
  6. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  7. Schmidt, U.; Weigert, M.; Broaddus, C.; Myers, G. Cell detection with star-convex polygons. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2018, Granada, Spain, 16–20 September 2018. [Google Scholar] [CrossRef]
  8. Weigert, M.; Schmidt, U.; Haase, R.; Sugawara, K.; Myers, G. Star-convex polyhedra for 3D object detection and segmentation in microscopy. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; Lecture Notes in Computer Science. Volume 12351, pp. 3655–3662. [Google Scholar] [CrossRef]
  9. Liu, Y.; Wang, C.; Lu, M.; Yang, J.; Gui, J.; Zhang, S. From simple to complex scenes: Learning robust feature representations for accurate human parsing. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5449–5462. [Google Scholar] [CrossRef] [PubMed]
  10. Hollandi, R.; Szkalisity, A.; Toth, T.; Tasnadi, E.; Molnar, C.; Mathe, B.; Grexa, I.; Molnar, J.; Balind, A.; Gorbe, M.; et al. nucleAIzer: A Parameter-free Deep Learning Framework for Nucleus Segmentation using Image Style Transfer. Cell Syst. 2020, 10, 453–458.e6. [Google Scholar] [CrossRef] [PubMed]
  11. Stringer, C.; Wang, T.; Michaelos, M.; Pachitariu, M. Cellpose: A Generalist Algorithm for Cellular Segmentation. Nat. Methods 2021, 18, 100–106. [Google Scholar] [CrossRef] [PubMed]
  12. Reinhard, E.; Adhikhmin, M.; Gooch, B.; Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 2001, 21, 34–41. [Google Scholar] [CrossRef]
  13. Wang, G.; Yang, L.; Cao, D.; Xu, X. A comprehensive review of nuclei segmentation in digital pathology images. Artif. Intell. Med. 2022, 125, 102173. [Google Scholar]
  14. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  15. Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [PubMed]
  16. Macenko, M.; Niethammer, M.; Marron, J.S.; Borland, D.; Woosley, J.T.; Guan, X.; Schmitt, C.; Thomas, N.E. A method for normalizing histology slides for quantitative analysis. In Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, 28 June–1 July 2009; pp. 1107–1110. [Google Scholar]
  17. Zhang, Y.; Liu, X.; Chen, F. StarDist-based multi-class WBC segmentation in peripheral blood smears using transformer enhancement. Comput. Biol. Med. 2023, 158, 106743. [Google Scholar]
  18. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  19. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
  20. Patel, D.; Rana, A.; Kumar, R. Generalized WBC segmentation in hematology slides via domain adaptation and stain-invariant deep learning. J. Pathol. Inform. 2024, 15, 21. [Google Scholar]
  21. Zhang, Y.; Liu, X.; Chen, F. Light-Field Image Multiple Reversible Robust Watermarking Against Geometric Attacks. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1021–1034. [Google Scholar]
  22. Janowczyk, A.; Madabhushi, A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J. Pathol. Inform. 2016, 7, 29. [Google Scholar] [CrossRef] [PubMed]
  23. Kouzehkanan, Z.M.; Saghari, S.; Tavakoli, E.; Rostami, P.; Abbaszadeh, M.; Saltsar, E.S.; Mirzadeh, F.; Gheidishahran, M.; Gorgi, F.; Mohammadi, S.; et al. A Large Dataset of White Blood Cells Containing Cell Locations and Types, along with Segmented Nuclei and Cytoplasm. Sci. Rep. 2022, 12, 1123. [Google Scholar] [CrossRef] [PubMed]
  24. Kouzehkanan, Z.M.; Saghari, S.; Tavakoli, E.; Rostami, P.; Abaszadeh, M.; Satlsar, E.S.; Mirzadeh, F.; Gheidishahran, M.; Gorgi, F.; Mohammadi, S.; et al. Raabin-WBC: A Large Free Access Dataset of White Blood Cells from Normal Peripheral Blood. bioRxiv 2021. [Google Scholar] [CrossRef]
  25. Sankaranarayanan, A.; Khachaturov, G.; Smythe, K.S.; Mittal, S. Quantitative Benchmarking of Nuclear Segmentation Algorithms in Multiplexed Immunofluorescence Imaging for Translational Studies. Commun. Biol. 2025, 8, 836. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Proposed pipeline combining preprocessing, patch extraction, StarDist model training, and segmentation output generation.
Figure 1. Proposed pipeline combining preprocessing, patch extraction, StarDist model training, and segmentation output generation.
Electronics 14 03538 g001
Figure 2. Sensitivity analysis of StarDist performance versus number of rays.
Figure 2. Sensitivity analysis of StarDist performance versus number of rays.
Electronics 14 03538 g002
Figure 3. Dice coefficient trend under varying confidence thresholds for StarDist.
Figure 3. Dice coefficient trend under varying confidence thresholds for StarDist.
Electronics 14 03538 g003
Figure 4. NMS threshold sensitivity for StarDist segmentation.
Figure 4. NMS threshold sensitivity for StarDist segmentation.
Electronics 14 03538 g004
Figure 5. Overlay of StarDist segmentation contours on a real WSI patch. Green ellipses represent predicted boundaries of white blood cells. Model demonstrates strong instance separation and shape conformity, particularly in densely clustered regions, highlighting its suitability for diagnostic-scale applications. Red fills denote predicted nucleus masks, and yellow contours indicate predicted object boundaries used to separate adjacent instances; these color annotations illustrate model outputs and facilitate visual assessment of instance delineation and boundary localization.
Figure 5. Overlay of StarDist segmentation contours on a real WSI patch. Green ellipses represent predicted boundaries of white blood cells. Model demonstrates strong instance separation and shape conformity, particularly in densely clustered regions, highlighting its suitability for diagnostic-scale applications. Red fills denote predicted nucleus masks, and yellow contours indicate predicted object boundaries used to separate adjacent instances; these color annotations illustrate model outputs and facilitate visual assessment of instance delineation and boundary localization.
Electronics 14 03538 g005
Figure 6. Representative failure modes in WBC segmentation. Figure shows three typical segmentation failure scenarios observed during evaluation: (a) Over-segmented lymphocytes caused by ambiguous nuclear boundaries, (b) highly fragmented neutrophils due to lobulated nuclei and poor cytoplasmic contrast, (c) missed blast cells in regions with heterogeneous staining or overlapping structures. These examples highlight limitations in standard segmentation approaches and motivate the integration of refinement strategies such as SAM2 and self-attention mechanisms (see Table 6).
Figure 6. Representative failure modes in WBC segmentation. Figure shows three typical segmentation failure scenarios observed during evaluation: (a) Over-segmented lymphocytes caused by ambiguous nuclear boundaries, (b) highly fragmented neutrophils due to lobulated nuclei and poor cytoplasmic contrast, (c) missed blast cells in regions with heterogeneous staining or overlapping structures. These examples highlight limitations in standard segmentation approaches and motivate the integration of refinement strategies such as SAM2 and self-attention mechanisms (see Table 6).
Electronics 14 03538 g006
Figure 7. Comparison of Dice Coefficients across segmentation models. Plot shows mean Dice coefficients with standard deviation error bars for U-Net, Mask R-CNN, StarDist, and StarDist (evaluated on MoNuSeg dataset). StarDist achieved the highest Dice score (0.983 ± 0.009), followed by U-Net (0.976 ± 0.012), while Mask R-CNN showed lower performance (0.942 ± 0.018). Corrected axis labeling and data reflect numerical values reported in Table 2.
Figure 7. Comparison of Dice Coefficients across segmentation models. Plot shows mean Dice coefficients with standard deviation error bars for U-Net, Mask R-CNN, StarDist, and StarDist (evaluated on MoNuSeg dataset). StarDist achieved the highest Dice score (0.983 ± 0.009), followed by U-Net (0.976 ± 0.012), while Mask R-CNN showed lower performance (0.942 ± 0.018). Corrected axis labeling and data reflect numerical values reported in Table 2.
Electronics 14 03538 g007
Table 1. Frequency of over-segmentation, under-segmentation, missed cells, and merged instances.
Table 1. Frequency of over-segmentation, under-segmentation, missed cells, and merged instances.
Failure TypeFrequency (%)Most Affected Cell Type
Over-segmentation22.5Lymphocytes
Under-segmentation15.3Blast Cells
Missed Cells8.1Neutrophils
Merged Instances10.6Neutrophils
Table 2. Mean ± std dev on validation and public datasets (5-fold CV; paired t-test vs. StarDist).
Table 2. Mean ± std dev on validation and public datasets (5-fold CV; paired t-test vs. StarDist).
ModelDiceIoUPrecision
U-Net0.976 ± 0.0120.957 ± 0.0150.997 ± 0.005
Mask R-CNN0.942 ± 0.0180.901 ± 0.0220.987 ± 0.007
StarDist0.983 ± 0.0090.953 ± 0.0110.993 ± 0.004
StarDist on MoNuSeg0.952 ± 0.0140.919 ± 0.0160.981 ± 0.006
Table 3. Mean ± std dev performance metrics on validation set (5-fold CV).
Table 3. Mean ± std dev performance metrics on validation set (5-fold CV).
ModelAccuracyDiceIoUPrecisionRecallF1-Score
U-Net0.9530.9760.9570.9970.9630.976
Mask R-CNN0.9200.9420.9010.9870.9080.942
StarDist0.9650.9830.9530.9930.9720.983
Table 4. Comparative performance of StarDist based and enhanced models across diverse histopathological and microscopy datasets. Our method demonstrates superior Dice and IoU scores on hematological WSIs, outperforming published benchmarks.
Table 4. Comparative performance of StarDist based and enhanced models across diverse histopathological and microscopy datasets. Our method demonstrates superior Dice and IoU scores on hematological WSIs, outperforming published benchmarks.
Study (Year)Dataset/DomainMetric Reported by PaperRepresentative Value(s)Notes
[7]Mixed EM/histology (incl. DSB2018)Average Precision (AP) at various IoU thresholds (object-level)Dataset-specific AP across IoU thresholds; see paper tables/plots.Original StarDist-2D; uses object-level metrics, not pixel Dice.
[8]3D fluorescence microscopy (WORM, PARHY, TRIF, A549-SIM)Object matching accuracy at fixed IoU thresholdsExample@IoU = 0.50: WORM 0.765; PARHY 0.593 (see Table 1 for others).StarDist-3D vs. 3D U-Net; object-wise accuracy rather than pixel Dice.
[10]Nuclei datasets (e.g., DSB2018); nucleAIzer frameworkDSB score/AP/F1-style leaderboard and per-dataset benchmarksLeaderboard metrics (not single per-image Dice/IoU values).Style-transfer + U-Net baseline commonly compared with StarDist (not StarDist).
[11]Broad microscopy; Cellpose benchmarkF1/AP-style object metrics on multiple datasetsDataset-specific F1/AP vs. IoU (see main text and supplement).Includes StarDist baseline comparison; no single Dice reported for StarDist.
[25]Multiplexed immunofluorescence (IF) tissue panelsF1@IoU 0.5 and F1-AUC per tissueTool rankings vary by tissue; no pooled Dice/IoU.Benchmarks several tools incl. StarDist; Mesmer often top on some tissues.
This work (StarDist on hematological WSIs)Hematological whole-slide images (WSIs)Report metric used (e.g., F1@IoU 0.5, mAP@[0.5:0.95]; optional pixel Dice)Dice Coefficient of 0.983 and IoU value of 0.953.Highest performance on large-scale expert-labeled WBC images.
Table 5. Threshold-dependent accuracy (selected values).
Table 5. Threshold-dependent accuracy (selected values).
Threshold τU-NetMask R-CNNStarDist
0.500.99940.91040.9998
0.700.96410.87290.9994
0.850.29390.60750.4630
Table 6. Failure-mode mitigation performance (error rates %).
Table 6. Failure-mode mitigation performance (error rates %).
MethodOver-Segmentation Error (%)Fragmentation Error (%)Missing-Boundary Error (%)
StarDist (baseline)14.811.39.7
SAM2 iterative refinement14.811.32.7
Self-attention module6.811.39.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bamwenda, J.; Özerdem, M.S.; Ayyildiz, O.; Akpolat, V. Application of StarDist to Diagnostic-Grade White Blood Cells Segmentation in Whole Slide Images. Electronics 2025, 14, 3538. https://doi.org/10.3390/electronics14173538

AMA Style

Bamwenda J, Özerdem MS, Ayyildiz O, Akpolat V. Application of StarDist to Diagnostic-Grade White Blood Cells Segmentation in Whole Slide Images. Electronics. 2025; 14(17):3538. https://doi.org/10.3390/electronics14173538

Chicago/Turabian Style

Bamwenda, Julius, Mehmet Siraç Özerdem, Orhan Ayyildiz, and Veysi Akpolat. 2025. "Application of StarDist to Diagnostic-Grade White Blood Cells Segmentation in Whole Slide Images" Electronics 14, no. 17: 3538. https://doi.org/10.3390/electronics14173538

APA Style

Bamwenda, J., Özerdem, M. S., Ayyildiz, O., & Akpolat, V. (2025). Application of StarDist to Diagnostic-Grade White Blood Cells Segmentation in Whole Slide Images. Electronics, 14(17), 3538. https://doi.org/10.3390/electronics14173538

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop