1. Introduction
Cell segmentation is an important step for imaging studies and is widely used in the life sciences, bioinformatics, and biomedical fields such as oncology, immunology, and histopathology, including the emerging field of spatial transcriptomics. Scientists apply cell segmentation and prior knowledge of cell-type-specific gene expression to analyze the morphology and location of individual cells, obtain single-cell gene counts, and detect fine intracellular variations [
1,
2,
3].
Differences in biological tissue, intercellular heterogeneity, and high cell densities need to be resolved before analyzing cell imaging data. Further, differences in illumination gradients, imaging modalities, and imaging parameters have to be accounted for while considering segmentation solutions [
4,
5]. In addition, microscopic techniques for cell imaging have improved, resulting in higher resolution, broader visualization, and noninvasive cell images. Common microscopic techniques include bright field microscopy, fluorescent microscopy, confocal microscopy, and phase contrast microscopy [
6,
7,
8,
9]. The more-recent imaging methods, such as cell staining combined with immunohistochemistry (IHC), multiplex immunofluorescence (mIF) [
10], and CO-Detection by IndEXing (CODEX) [
11] can co-detect and co-locate multiple transcriptomes and proteins, resulting in the precise annotation of individual cell types and the resolution of biological functions. In general, cell images obtained using such techniques have different channel types. The main image channels include three-channel RGB, three-channel HSV, single-channel GRAY, and fluorescently labeled specific channels, which are more commonly used by researchers.
Most of the traditional segmentation methods are based on the intensity and spatial relationship of pixels, and the constraint model is found by manual optimization, requiring expertise in basic techniques including code adaptation [
12]. Code adaptation is highly subjective, and its development has reached a bottleneck. For example, the threshold only considers the grayscale information of the image and is sensitive to noise, which can easily cause uneven segmentation results; the region algorithm has low adaptability and performs poorly when used alone; the watershed algorithm is susceptible to over-segmentation due to noise; the graph theory algorithm is complex, computationally intensive, and not easy to operate. Therefore, using a sequential combination of these algorithms can effectively avoid these deficiencies and further separate touching or overlapping cells [
13]. In addition, threshold processing and watershed algorithm are often used as preprocessing or postprocessing methods for Machine Learning (ML) and Deep Learning (DL).
ML and DL have similar workflows: the selection of the training data, data processing, model training, and model evaluation. These steps in the workflow may be iterative until the model is appropriate and accurate. HK-means [
14], Random Forests [
15], and EM [
16] are trainable ML methods that include part of the knowledge in the segmentation process and improve the Accuracy of the segmentation. DL algorithms are better suited to addressing the challenges of cell segmentation, including multiple object morphologies and imaging techniques. DL network structures provide a generalized framework that can be applied to various tasks in different domains. They learn from data, adapt to different problem settings, and leverage the capabilities of pre-trained models, making them a convenient and effective strategy in many research and application fields. Further, manual tuning is not needed; however, retraining with annotated data is required [
12].
The core algorithms in 2D segmentation tools are gradually shifting to more-complex deep learning networks. At the beginning, the earlier 2D segmentation tools CellProfiler [
17] and Icy [
14] used built-in traditional segmentation algorithms, such as watershed algorithms. Later, the classic U-net [
18] structural deep learning model was wildly used and improved. StarDist added a polygon distance output layer [
19,
20]. Cellpose replaced the standard building blocks with residual blocks [
21,
22]. Notably, many 2D segmentation tools keep adoptingthe best segmentation models and update their software. For instance, CellProfiler and Icy update their deep learning model plugins (ClassifyPixels-U-net, DoGNet, etc.) for cell segmentation.
For this study, we used three publicly accessible datasets with annotations from several cell-imaging modalities to compare the generality of the tools.
4. Results
The default pre-trained model was selected for segmentation comparison. As shown in
Table 4,
Table 5,
Table 6 and
Table 7, we compared the segmentation results and conducted a quantitative comparison. Among these tables, N_true represents the number of cells in the Ground Truth (GT). N_pred represents the number of cells predicted by each software. Accuracy, Recall, Precision, and F1 are some segmentation metrics. F1-based Rank was used to compare the software performance based on the F1 score.
Without preprocessing, the top-three F1 scores of the software were for StarDist, Cellpose, and Omnipose. After the preprocessing operations, the F1 scores of the remaining five software tools, except for StarDist and Plantseg, improved, verifying the feasibility of the preprocessing operations. CellProfiler leverages the strengths of traditional algorithms, ranking first with an F1 score of 0.6394. StarDist was second with an F1 score of 0.5912, and Cellpose was third with an F1 score of 0.5763. The performance of the other software tools was comparatively low. The cell segmentation results using DeepCell expanded the boundary and, hence, it is more susceptible to noise, which makes the cells with obvious gaps appear adhesive and dense, resulting in over-segmentation. Icy is most affected by noise: the segmentation results contained holes and showed under-segmentation phenomena (
Figure 2).
Without preprocessing, the top-three F1 scores of the software were for Cellpose, Omnipose, and StarDist. The same preprocessing operation did not give good results on the Cellpose_cyto dataset, the input of was a three-channel image, which may also be a result of the diversity of its image types. After processing, Cellpose demonstrated the best performance with an F1 score of 0.6929 for the Cellpose_cyto dataset. DeepCell ranked second with an F1 score of 0.3847, followed by StarDist in third place with an F1 score of 0.3072. The performance of the other software tools was notably lower, with Precision, Recall, Accuracy, and F1 scores all falling behind the top-three performers. The poorer results of Omnipose compared to Cellpose, which was highly affected by preprocessing, may be related to its suitability for regular (round or oval) small cells and bacterial cells. Plantseg had the worst results for both datasets, which may be because it is a plant-cell-segmentation software, which is more adapted to the segmentation of tightly arranged cells. For neuron cells, apart from Omnipose, Plantseg and StarDist were among the software tools with poor segmentation results, which are more sensitive to differences in cell shape and more adapted to round or oval conventional cells. However, Icy and CellProfiler are more adaptable for such cells, with finer segmentation (
Figure 3).
In addition, we selected two software tools with the best F1 scores (Cellpose and StarDist) to compare the training adaptability of their models with the default parameters and methods using two preprocessing datasets. After model training, the segmentation metrics of the two software tools significantly increased, as shown in
Table 6. The F1 score of Cellpose rose from 0.5763 to 0.7026, surpassing the F1 score of StarDist (0.5912) on the DSB2018 dataset. The F1 score of StarDist rose from 0.3072 to 0.6739, lower than the F1 score of Cellpose (0.6929) on the Cellpose_cyto dataset.
We tested an additional 2D dataset (PhC-C2DL-PSC) for cell segmentation with and without preprocessing operations. This dataset was phase contrast images from pancreatic stem cells on a polystyrene substrate. We chose 300 images from its training dataset (folder 01) and the corresponding masks in folder 01_ERR_SEG. Since this dataset has a uniform size, DeepCell and CellProfiler could also perform segmentation without preprocessing. Omnipose and Plantseg among the selected software were ignored as there was no output in this dataset. The results are as shown in
Table 7 and
Figure 4. Without preprocessing, the top-three F1 scores of the software were StarDist, Cellpose, and Icy. The same preprocessing operation did not give good results on this dataset, except for StarDist.
Finally, we quantitatively compared the computer resources used to obtain the segmentation results for the DSB2018 and the Cellpose_cyto datasets with preprocessing for each of the selected segmentation software, as shown in
Table 8. In a GPU hardware environment, Cellpose ran fastest on the DSB2018 dataset and second fastest to Omnipose on the Cellpose_cyto dataset. The maximum memory occupied during StarDist processing was the lowest on both datasets, except for Plantseg. Plantseg took up the least memory, but ran the slowest on both datasets.
5. Discussion
This study examined the performance of different cell-image-segmentation software tools on the DSB2018, the Cellpose_cyto, and the PhC-C2DL-PSC datasets. The DSB2018 dataset is generic, and the overall shape of the cells in the dataset is relatively uniform with differences in the gap densities and cell sizes. The Cellpose_cyto dataset comprises cells with various characteristics. The PhC-C2DL-PSC dataset is consistent on cellular features as it is a 2D time-lapse sequence of cell images. The final results showed performance differences across the software tools, with no one tool performing better than the others across all the measures and datasets evaluated. CellProfiler and StarDist performed well on the DSB2018 dataset, while Cellpose performed well on the Cellpose_cyto dataset, and Cellpose and StarDist performed well on the PhC-C2DL-PSC dataset. After systematic model training and learning, Cellpose and StarDist were trained on their respective datasets and showed similar performance on the two datasets. The segmentation result of Cellpose was slightly better than that of StarDist, indicating that Cellpose has better adaptability to different types of cells. Cellpose and StarDist use distance-transformed gradients to predict the final result, process faster during segmentation, and consume less memory resource, which can satisfy the researchers’ need for the batch processing of cells. Meanwhile, a more-complete process can be constructed using deep learning model extension plug-ins such as Cellpose and StarDist in CellProfiler and the Icy software to achieve further statistical analysis of the segmented cells. Plantseg, Omnipose, and Icy showed limitations in working with specific types of cells. In addition, the adaptability of different software to preprocessing operations varies considerably, and it is not yet possible to choose a uniform preprocessing method to evaluate the performance of software under different datasets. Therefore, the specific requirements of the dataset and application scenarios should be considered when selecting the right software.
This study highlights the need for continuous development and improvement of cell-image-segmentation software. As technology continues to advance, further enhancements in algorithmic methods and optimization techniques may improve the performance of different datasets. In the future, it is expected that general cell-segmentation software with good segmentation results and advanced functions will be developed. These software tools can be used interactively with other software, such as the CellProfiler platform using Cellpose, StarDist, and other DL models for cell segmentation through plug-ins. The segmentation output generated by Icy can be imported into ImageJ for further statistical analysis and processing. It can also be controlled as a process script, which can analyze the cell structure more accurately and effectively and better serve the related biological research.