1. Introduction
A healthy human cell has 46 chromosomes, which usually occur in pairs, including 22 pairs of autosomes and one pair of sex chromosomes. Chromosomes are rodlike structures formed by the polymerization of chromatin during mitosis or meiosis. They contain important material required for human genetics and their morphology and structure are closely related to human health. Karyotyping is one of the most important techniques in the field of genetic measurement and diagnosis [
1]. Its applications include prenatal screening for chromosomal abnormalities, screening for genetic diseases, etc. Generally, chromosome karyotyping is done on the midterm chromosome micrographs [
2], where the number, morphology, and structure of these chromosomes are analyzed and compared by a doctor or specialist, resulting in a karyotype map to help doctors quickly diagnose and predict congenital defects, human genetic diseases, cancers, etc. Therefore, karyotype analysis is of great significance in both research and application. Chromosome karyotyping is divided into three main steps. First of all, the chromosomes are captured and stained with a light microscope. Then, each chromosome was segmented and extracted from the microscopic image of metaphase chromosome. Finally, the extracted chromosomes are classified and sorted to form a karyotype image with 24 types of chromosomes [
3]. For example,
Figure 1a shows an image of the 46 types of chromosomes under a 100× microscope and
Figure 1b shows the karyotype image of these chromosomes in pairs.
In early manual karyotyping, doctors would be required to manually extract and classify the chromosomes for analysis, which was not only time-consuming and labor-intensive but also prone to human errors due to its tedious process [
4]. Fortunately, the emergence of automatic karyotype analysis system has greatly reduced the workload of karyotype analysis for doctors. It can automatically complete the operations such as chromosome segmentation and classification and finally generate a karyotype map for medical diagnosis. At present, most existing karyotyping systems employ traditional karyotyping methods, which mainly rely on the manual features to complete the chromosome segmentation and classification. For instance, the mainstream traditional chromosome segmentation methods include threshold-based segmentation methods [
5], watershed-based segmentation methods [
6], fuzzy-clustering-based segmentation methods [
7], geometric feature-based segmentation methods [
8], etc. The above methods are highly dependent on the manual features and preset parameters thus being limited in their scope of applications. For example, the threshold-based segmentation algorithms rely heavily on the pixel’s grayscale information and do not consider the spatial information between pixels. However, due to the strong variability of the chromosome morphological structure and complex distribution, traditional rule-based segmentation methods cannot adapt to such complex situations. In addition, traditional methods still require a lot of manual intervention, which is time-consuming and labor-intensive. Therefore, traditional image segmentation methods still cannot solve the problem of chromosome segmentation well.
With rapid technological advancement, deep learning technology has achieved remarkable results in various fields including computer vision and image processing [
9]. Convolutional Neural Network (CNN) is a class of deep neural networks with convolutional structure, which has excellent feature extraction ability, self-learning ability, and low computational cost. In the past decade, CNNs have excelled in various fields of computer vision. Several CNNs including AlexNet [
10], Resnet [
11], and VggNet [
12] have been applied to image classification, image segmentation, and achieved advanced superior performance. In the literature of medical image segmentation, a large number of CNN-based medical image segmentation models have been proposed. Most such models are designed for semantic segmentation [
13], e.g., segmenting the cell contours. The task of chromosome image segmentation in this work, however, is to segment (and classify) each chromosome instance. In fact, there is a pressing need to develop the instance-level segmentation models in the field of chromosome segmentation because of its significant applications as discussed earlier.
Specifically, the challenge of chromosome instance segmentation stems from four main points [
14]: (1) chromosome data belongs to medical privacy data, which is difficult to obtain, limited in quantity and quality; (2) the chromosomes are prone to distortion, and the same type of chromosomes still have deformation differences; (3) each chromosome image contains a large number of chromosomes, and (4) contacts and overlaps between chromosomes are easy to occur. Because of the above challenges, the existing segmentation networks cannot efficiently complete the task of instance-level chromosome segmentation. In view of the lack of data sources, we worked with the National Engineering Laboratory of Key Technologies for Birth Defect Prevention and Control to obtain chromosome data. However, the quantity and quality of the original chromosome data are far from enough for the experimental standards. Therefore, we designed an enhanced processing pipeline to chromosome data, which assists in our construction of a large-scale chromosome segmentation dataset. To overcome the limitation of the existing segmentation networks, we propose in this paper a novel convolutional neural network for chromosome segmentation, called ChroSegNet. In this network, we designed a new hybrid attention module combining channel attention and spatial attention to extract key feature information and location information of chromosome instances. The channel attention mechanism helps the network to extract key feature information of the target chromosome, while the spatial attention mechanism helps the network to focus on more important spatial information, such as the position relationship between each chromosome. In addition, we carry on the appropriate deepening of the network structure, so that our network can obtain more multidimensional feature information. These improvements would lead to improved feature extraction capability and segmentation accuracy of ChroSegNet, demonstrated in our experimental results reported later.
The main contributions of this paper are summarized as follows.
We process the chromosome data with particular techniques to realize the quality and quantity enhancement of the chromosome data. Based on the processed data, we constructed our chromosome segmentation dataset containing 13,096 pairs of chromosome data ready for the training of not only our ChroSegNet but also any other CNN models for chromosome processing.
We propose our end-to-end chromosome segmentation network, i.e., ChroSegNet. ChroSegNet can focus on key feature information and location information of each chromosome through the attention module proposed by us. In addition, the deep-level feature fusion further improves the ability of the network to extract chromosome feature information.
The rest of this paper is organized as follows. In
Section 2, we present the related work. In
Section 3, we introduce the construction of our enhanced dataset, our enhanced processing, and the structure of ChroSegNet. In
Section 4, we evaluate ChroSegNet with 3 evaluation metrics and discuss the experimental results. In
Section 5, we conclude our work and discuss the future improvements to our research.
2. Related Works
Chromosome segmentation is a branch in the field of medical image segmentation and one of the most critical stages in the process of karyotype analysis. The purpose of chromosome segmentation is to separate the chromosome instances from the complex microscopic chromosome images. Different from other medical images, chromosome microscopic images are susceptible to sensor noises, staining noises, and uneven illumination noises. These noises are due to the irresistible factors in the process of image preparation and acquisition. In addition, chromosomes have the variability of morphological structure and the diversity of contact overlap, which is difficult to identify by traditional methods. The need for hospitals to protect patient privacy has led to difficulties in obtaining chromosome microscopy images; thus, there is a severe lack of data volume. The above problems impose significant challenge in chromosome segmentation. In the early years, many researchers proposed traditional segmentation methods based on specific rules to segment chromosomes. Ji et al. [
15] proposed a rule based on geometric contour analysis to extract chromosomes. Shen [
16] and Karvelis [
17] proposed segmentation methods based on the watershed algorithm, which are too sensitive to noise thus often leading to oversegmentation. Cao et al. [
18] proposed a method based on adaptive fuzzy c-means clustering, which better overcomes the problem of uneven illumination caused by microscope imaging systems and can segment overlapping adjacent chromosomes in different illumination areas. A segmentation method based on spatial variable thresholding was proposed by Grisan et al. [
19], which selects the best region for segmentation based on geometric features and pixel distribution. The above traditional segmentation methods, however, have limited performance and are time-consuming and labor-intensive.
In recent years, the excellent performance of deep learning has made it widely utilized in the field of medical images processings. One category of such representative methods is the chromosome segmentation based on convolutional neural network. For example, Esteban et al. [
20] proposed an overlapping chromosome segmentation method for MFISH (Multicolor Fluorescence In Situ Hybridization) images, which employs fully convolutional networks [
21] (FCN), using spatial and spectral information in an end-to-end manner. Xie et al. [
13] proposed a chromosome segmentation model combining Mask-RCNN [
22] and geometric correction algorithm. This research achieved the instance segmentation of chromosome microscopic images for the first time. Although this model achieved a high accuracy, its structure is too complicated. When the scale of the real chromosome images is small, the segmentation accuracy of this model drops drastically. Ronneberger et al. [
23] proposed U-Net in 2015, which is a network designed based on FCN [
21]. Because of the simple structure of the network and the effective use of high and low dimensional feature information, it is suitable for the medical image segmentation with the lack of data and complex image features. Hariyanti et al. [
24] proposed a method for semantic segmentation of overlapping chromosomes based on U-Net [
23]. This study not only made structural improvements based on the original network such as adding an appropriate number of layers but also used Test Time Augmentation (TTA) to overcome the overfitting problem that occurred during the training process. Compared with previous similar work, the segmentation accuracy is improved, but it is still low due to the lack of improvement for chromosome characteristics. Altinsoy et al. [
25] proposed a primitive G-band chromosome image segmentation method based on U-Net. Bai et al. [
26] proposed a G-band chromosome segmentation method combining U-Net and YOLOv3. The method consists of two stages. In the first stage, YOLOv3 detects chromosome instances and obtains multiple detection boxes containing one or more chromosome instances. In the second stage, U-Net accurately extracts the single chromosome instance in each detection box. This method has achieved a high segmentation accuracy. However, it still falls into the category of semantic segmentation, and its implementation process is complicated. The huge number of parameters of its dual-network structure lead to a significant increase in the computational cost.
To sum up, the existing deep-learning-based chromosome segmentation methods are still unable give a good balance between scale and accuracy. Based on our chromosome segmentation dataset, we propose ChroSegNet based on lightweight segmentation model, U-Net. U-Net [
23] is a fast and accurate network for medical image segmentation, which has been widely applied in various subfields in medical image segmentation [
27,
28]. For example, it has been applied to segment ultrasound images by various organizations [
29,
30,
31] and so far has been the best structure for this task [
32]. We designed a new attention module according to the characteristics of chromosome instances and incorporated it into U-Net to realize the key information extraction of chromosome instances. In addition, we optimize the network structure on the basis of U-Net [
23] to expand the perceptual field, which further improves the segmentation performance.
3. Method
In this work, our main focuses are on (1) the construction of a chromosome segmentation dataset and (2) the design of our ChroSegNet model with an effective attention mechanism.
We work with genetic disease laboratory professionals to obtain raw chromosome data. High-quality biomarker slides are prepared by professionals. Then, we use optical microscope camera (including high-resolution camera, optical microscope, and image frame storage board), three-dimensional object loading platform, three-dimensional platform automatic controller, objective lens switching controller, slide glass replacement controller, and computer platform to form a high-level view. The precise three-dimensional optical platform is used to capture and photograph the metaphase chromosomes on the film, and the image data is stored by computer. Again, we used the annotation tool to annotate the 46 chromosomes in each image one by one to obtain label data. Finally, we obtained about 430 RGB microscopic images with a resolution of 1280 × 1024 and the corresponding label data in json format.
Since deep learning needs to be driven by large data, and the original image data has problems such as complex noise and insignificant chromosome features, we designed an enhanced processing to convert the limited original data into the expected Enhanced dataset to ensure that the obtained segmentation model has High precision and robustness (see
Section 3.1). Based on the Enhanced dataset, we then propose our ChroSegNet for fine chromosome instance segmentation. On the one hand, the hybrid attention mechanism is introduced into ChroSegNet to achieve efficient extraction of chromosome characteristics and location information. On the other hand, we deepen the network structure on the basis of the baseline, in order to obtain more rich multiscale information, which allows the model to be more quickly adapted to microscopic image data with a large number of tiny chromosomes (see
Section 3.2).
3.1. Enhanced Dataset and Enhanced Processing
The chromosome data we used in this research were obtained from the National Engineering Laboratory of Key Technologies for Birth Defect Prevention and Control. After the steps of chromosome extraction, slide preparation, staining, and digital microscope camera acquisition, the lab’s geneticists provided us a total of 430 microscopic images of real G-band chromosomes. Meanwhile, we also obtained the corresponding karyotype images to the chromosomes. All the chromosome images were manually annotated. Subsequently, the original dataset was fed into our enhanced preprocessing module to obtain the chromosome segmentation dataset, as illustrated in
Figure 2.
3.1.1. Image Enhancement Processing
During the chromosome image acquisition, sensor noise, uneven illumination, and various cellular debris can severely degrade the quality of the chromosome images, resulting in a significant impact on the chromosome contours and the banding characteristics. This would affect not only the expert recognition of chromosomes but also the performance of the subsequent chromosome segmentation. The chromosome images provided by geneticists are black and white images since other color features does not help the subsequent chromosome segmentation but increasing the computation. Therefore, we applied gray value adjustment for the initial processing of microscopic images. As there are many impurities and glass slide noises in the chromosome images, we adopted a processing technique combining contrast adjustment and Contrast Limited Adaptive Histogram Equalization (CLAHE) to eliminate the impurities and noises. Meanwhile, the chromosome edges and band features in the image were enhanced. Specifically, the steps of the image enhancement are as follows.
Step-1 Grayscale Conversion: Converting all the chromosome microscopic images from RGB to grayscale can reduce the computational dimensionality and the processing time without affecting the image feature extractions. We adopt floating-point arithmetic to replace the R, G, and B channels of original images with the operation results, so as to obtain the grayscale images.
Step-2 Contrast Stretching: Due to the unique characteristics of chromosomes such as uneven illumination and blurred imaging, the chromosomes in the original image often lose characteristic information. In addition, irresistible factors, e.g., imaging interference, during the chromosome slide preparation can make chromosome images dark and blurry. Image contrast refers to the difference between the brightest part and the darkest part in an image. In our processing, the contrast stretching operation is performed to map all the pixels in the image to a larger range in the grayscale space. This operation not only effectively reduces the noise interference but also makes the contour and band features of chromosomes more prominent. This operation is formulated in (1):
where
represents the grayscale image,
represents the contrast-stretched image, and
b represents a quantitative value. Since excessive contrast stretching can wash out saturated region of images, we divided the images into two categories: low-light-intensity images and high-light-intensity images. For low-light-intensity images, we set b to 50 for a large stretch. For images with high light intensity, the gray value of the whole image will be higher due to high illumination intensity, so we set b as 20 for fine tuning. Moderate contrast stretching allows further clarification of the image and initial elimination of a large amount of background noise.
Step-3 Contrast Limited Adaptive Histogram Equalization (CLAHE): CLAHE was applied after the contrast stretching to ensure further enhancement of the chromosome bands and contouring without amplifying noise. CLAHE is a modified version of adaptive histogram equalization (AHE), which tends to amplify contrast in near-constant areas of the image because of the high concentration of histograms in such areas. This can cause the noise to be amplified in a near-constant region. In CLAHE, the contrast amplification in the vicinity of a given pixel value is given by the slope of the transformation function. This is proportional to the slope of the neighborhood cumulative distribution function (CDF) and therefore to the value of the histogram at that pixel value. CLAHE clipped the histogram to a predetermined value before calculating the CDF. This limits the slope of the CDF and thus the slope of the transform function, both limiting the contrast amplification and therefore reducing the noise amplification problems.
3.1.2. Data Augmentation Step
As described in previous section, in this research only 430 pieces of real chromosome data can be obtained, which cannot meet the training requirements of deep-learning-based model. Therefore, to train a model that is more flexible and can better cope with various disturbances, we combine a series of data augmentation algorithms to generate chromosome data and labels in batches for further augmenting the chromosome data. Specifically, the data augmentation techniques employed include random panning, random flipping, brightness adjustment, introducing salt and pepper noise, etc.
After the above processing, an chromosome segmentation dataset in both scale and diversity containing 13,096 pairs of chromosome data was obtained. Each pair of the chromosome data includes a chromosomal microscopic image (JPG file) and a mask label (PNG file) of each chromosome in the corresponding image. Our dataset is divided proportionally, 80% as training set and 20% as testing set.
3.2. Network Architecture
ChroSegNet is designed based on U-Net [
23] as we reviewed in
Section 2. On the one hand, compared with U-Net, in order to take into account the segmentation of small chromosomes, we constructed more subsampling layers and convolutional layers to expand the receptive field. On the other hand, the traditional attention-based U-Net only pays attention to the feature information of a certain dimension (for example, attention gate [
33] only pays attention to the spatial dimension.), which is easy to cause the feature information obtained is not comprehensive enough, especially when segmenting images with complex feature information such as electron microscopic images. In contrast, the hybrid attention module we designed focuses on both channel and spatial feature information. The network structure of ChroSegNet is shown in
Figure 3.
ChroSegNet is an encoder–decoder structure. The encoding part is a backbone feature extraction network, which is mainly composed of the convolution layers consisted of 3 × 3 convolution kernels, 2 × 2 maximum pooling layer, and ReLU activation function. Instead of directly adding attention modules to the original network, we designed more subsampling layers and convolutional layers for ChroSegNet (the newly added parts are shown in bold) to further expand the receptive field and integrate more comprehensive multiscale feature information. The decoding part is an enhanced feature extraction network, which consists of jump connections, upsampling layers, convolution layers, and hybrid attention modules. The hybrid attention modules are incorporated at the end of the skip connections in layers 2, 3, and 4 to generate the multiscale attention information. The attention information is then input into the feature fusion layer to promote the combination of the high-dimensional key features and the low-dimensional features. This would help the network focuses on more meaningful target regions and suppresses the activation values of the background and the irrelevant regions. The hybrid attention module are mainly responsible for extracting the key parts of the high-dimensional features. However, the first layer of the skip link only contains shallow features and does not involve the information fusion of the high and the low dimensions. Hence, the hybrid attention module is not incorporated in this layer.
The structure of the hybrid attention module is shown in
Figure 4, where
S represents the feature map of the current jump-connected input,
X represents the feature map of the current input,
represents the output value of the channel attention module,
represents the output of the hybrid attention module,
represents the attention coefficient, and
represents the spatial attention coefficient.
The feature map of the encoder input through the jump connection layer and the current input feature map of the decoder first enter the channel attention module to extract the channel attention information. The process of the channel attention module is as follows: the average pooling layer and the maximum pooling layer are used to compress the spatial dimension of the input feature mapping. Among them, the average pooling layer is a commonly used means of spatial information aggregation, and the maximum pooling layer has been proved to be able to collect the key clues of the features of different objects [
34]. Therefore, we adopt the combination of the two pooling methods to obtain more representative information. Then, the descriptor information after secondary fusion is input into Multilayer Perceptron (MLP) including convolution layer and ReLU layer to obtain channel attention mapping. Finally, Sigmod is used to change it into the channel attention coefficient (
) between 0 and 1 and multiply with the current input feature map to obtain the feature map with the channel attention information (
). Next, the feature map with channel attention information (
) and the encoder feature map are input into the spatial attention module. The process of the spatial attention module is as follows: The 1 × 1 convolution layer is used to compress the channel dimension of the input feature map, and then the activation function ReLU and the 1 × 1 convolution layer are used to obtain the spatial attention map. Finally, Sigmod is used to obtain the spatial attention coefficient (
) and multiply with the current input feature map to obtain the feature map with spatial attention information (
). In conclusion, the hybrid attention module that we designed can focus on both “what” and “where” questions of chromosome instance.
5. Conclusions
In this paper, we design our ChroSegNet model with an effective attention mechanism for accurate chromosome segmentation. On the basis of considering the lightweight model, our ChroSegNet not only has a deep network structure but also has the hybrid attention structure, which is responsible for simultaneously extracting key features and position information of chromosomes. Experimental results show that ChroSegNet is more suitable than most CNN models to deal with chromosomes with complex structure and changeable position. To construct a dataset for our modeling training, we cooperated with the laboratory to acquire chromosome data and proposed enhanced processing to enhance the quality and quantity of chromosome data, resulting in the chromosome segmentation dataset which with large scale and high quality. Our experimental results show that the performance of the segmentation model trained with our dataset is better than that trained with the original dataset.
However, the current ChroSegNet is still limited in the following aspects. On the one hand, the segmentation performance of ChroSegNet is relatively limited for the same class of chromosomes with large deformation. For this limitation, we plan to design and incorporate additional branches in ChroSegNet to learn and utilize the chromosome shape information for improved segmentation accuracy in our future work. On the other hand, the segmentation performance of ChroSegNet for overlapping chromosomes still needs to be improved. We plan to further optimize the network structure in future work, such as attempting to model ROI as multiple layers and detecting overlapping chromosomes separately during segmentation.