Vessel Segmentation in Fundus Images with Multi-Scale Feature Extraction and Disentangled Representation

Zhong, Yuanhong; Chen, Ting; Zhong, Daidi; Liu, Xiaoming

doi:10.3390/app14125039

Open AccessArticle

Vessel Segmentation in Fundus Images with Multi-Scale Feature Extraction and Disentangled Representation

¹

The School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China

²

College of Bioengineering, Chongqing University, Chongqing 400044, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5039; https://doi.org/10.3390/app14125039

Submission received: 1 April 2024 / Revised: 1 June 2024 / Accepted: 5 June 2024 / Published: 10 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Vessel segmentation in fundus images is crucial for diagnosing eye diseases. The rapid development of deep learning has greatly improved segmentation accuracy. However, the scale of the retinal blood-vessel structure varies greatly, and there is a lot of noise unrelated to blood-vessel segmentation in fundus images, which increases the complexity and difficulty of the segmentation algorithm. Comprehensive consideration of factors like scale variation and noise suppression is imperative to enhance segmentation accuracy and stability. Therefore, we propose a retinal vessel segmentation method based on multi-scale feature extraction and decoupled representation. Specifically, we design a multi-scale feature extraction module at the skip connections, utilizing dilated convolutions to capture multi-scale features and further emphasizing crucial information through channel attention modules. Additionally, to separate useful spatial information from redundant information and enhance segmentation performance, we introduce an image reconstruction branch to assist in the segmentation task. The specific approach involves using a disentangled representation method to decouple the image into content and style, utilizing the content part for segmentation tasks. We conducted experiments on the DRIVE, STARE, and CHASE_DB1 datasets, and the results showed that our method outperformed others, achieving the highest accuracy across all three datasets (DRIVE:0.9690, CHASE_DB1:0.9757, and STARE:0.9765).

Keywords:

vessel segmentation; deep learning; U-Net; multi-scale features; disentangled representation

1. Introduction

Medical image segmentation is of great significance in clinical applications. It plays a critical role in extracting and isolating regions of interest from medical images for further analysis and diagnosis. The retinal blood vessels in the fundus serve as a crucial indicator of ocular health and a biomarker for various systemic diseases, including diabetic retinopathy, glaucoma, and hypertension [1,2]. Changes in vessel diameter, caliber, and leakage can aid in identifying conditions like retinal artery occlusion or central serous chorioretinopathy [3]. Retinal vessel segmentation plays a pivotal role in the analysis and evaluation of ophthalmologic images. Therefore, precise and efficient segmentation of retinal vessels holds significant clinical importance, enhancing the accuracy and efficiency of medical diagnosis and treatment. Manual segmentation by skilled medical professionals is a challenging and tedious task, with the possibility of incorrect segmentation due to the subjective nature of the process [4]. Automated segmentation for retinal vessels is more efficient and consistent than manual segmentation, as it reduces human error and bias. It holds practical applications in telemedicine and low-resource settings. Currently, automatic retinal vessel segmentation methods are mainly divided into traditional methods and deep learning-based methods.

Traditional methods include matching filtering, wavelet transform, mathematical morphology, cluster analysis, and more [5]. These methods generally rely on handcrafted features for segmentation, necessitating rich prior knowledge and professional skills, resulting in poor model generalization and low segmentation accuracy [6]. The development of deep learning has significantly improved the accuracy and efficiency of medical image segmentation [7]. Deep learning techniques, such as convolutional neural networks, have shown great potential in accurately segmenting various types of medical images, including CT scans, MRI images, and X-rays. Segmentation methods based on deep learning, including fully convolutional networks (FCNs) [8], SegNet [9], and U-Net [10], can automatically learn the characteristics of retinal blood vessels, demonstrating strong generalization capabilities. Due to the superior performance in terms of accuracy and efficiency, most current research methods are based on U-Net, which employs an encoder–decoder structure [11]. The encoder extracts features through convolution and downsampling, while the decoder restores image resolution through upsampling [12]. Moreover, in the decoding stage, the skip connection enables the fusion of low-level features and high-level features at different resolutions.

In retinal image analysis, the scale variation of retinal vessels is a common and challenging issue. This scale variation can be caused by various factors, including vessel diameter, curvature, and branching degree in different regions. During retinal vessel segmentation, due to this scale variation, models trained on a single scale often fail to fully capture the subtle features of vessels, resulting in decreased segmentation accuracy. To address this problem, we adopted U-Net as the base architecture and designed a multi-scale feature extraction module at the skip connections to enhance the ability of the network to capture multi-scale information. Specifically, we employed dilated convolutions with different dilation rates to extract multi-scale information, followed by channel attention modules to emphasize important features. Furthermore, fundus images often contain unrelated noise that can interfere with blood-vessel segmentation, thereby impacting segmentation accuracy. Separating redundant noise information from useful spatial information can significantly improve the segmentation performance of the task. Drawing inspiration from [13,14], we designed an image reconstruction branch to assist in the segmentation task, which can suppress redundant information and reduce image noise. The specific implementation involves using the disentangled representation (DR) method to disentangle the image into content and style, where the content part is utilized for the segmentation branch, thereby encouraging the segmentation branch to focus more on spatial information relevant to the segmentation task while suppressing irrelevant information. In general, we propose a new network called UNet-MSFE-DR for retinal vessel segmentation, based on multi-scale feature extraction and disentangled representation.

The main contributions of this paper are summarized as follows:

A multi-scale feature extraction (MSFE) module is designed to replace the simple skip connection, which can enhance the ability of the network to capture multi-scale information and effectively improve segmentation performance.
An image reconstruction auxiliary branch is designed to assist in the segmentation task. The utilization of the disentangled representation method enables the segmentation branch to encompass spatial information relevant to the segmentation task while suppressing redundant information.
Experiments are conducted on three public datasets, and the results demonstrate the superiority of our method compared to other approaches.

The rest of this paper is organized as follows. Section 2 discusses related works, covering current deep learning-based retinal segmentation methods and disentangled representation methods. Section 3 details our approach, describing the designed network. Section 4 and Section 5 describe the experiments and results, and Section 6 draws the conclusions.

2. Related Works

With its advanced algorithms and neural network architectures, deep learning has revolutionized the field of medical imaging. Deep learning models can learn intricate features and patterns from large datasets, enabling them to effectively segment complex structures and identify abnormalities in medical images. There are many networks suitable for medical image segmentation, such as FCN, DeepLab, and U-Net. A fully convolutional network is a completely convolutional neural network that can turn classifiers into pixel-level segmenters. It can segment inputs of different sizes and is suitable for irregular structure segmentation tasks in medical images. DeepLab is a deep learning network based on FCNs that uses dilated convolution to increase the size of the receptive field. U-Net is one of the most commonly used convolutional neural networks for medical image segmentation. Its success comes from its ability to leverage both local and global image features to generate highly accurate and interpretable segmentation maps. In addition to these, other networks can also be used for segmentation tasks. For example, transformer networks have played an important role in medical image segmentation by leveraging their superior ability to model global and long-range dependencies in medical images [15]. Generative adversarial networks can generate synthetic images that look very similar to real images, allowing for data augmentation and expanding the training dataset, which can help improve the robustness and generalizability of medical image segmentation models [16].

For retinal vessel segmentation, U-Net and its variants have become prevalent methods. To overcome the loss of vessel edge information associated with continuous pooling, Li et al. designed a double-coding path detection method and added an edge enhancement module at the skip connection. In addition, they investigated a graph convolutional network to fully exploit channel information [17]. Zhang et al. designed a dual local attention block for extracting edge information using atrous convolution, an edge detection method, and a channel attention module. They also introduced a global transformer module to capture distant dependencies between pixels, obtaining more comprehensive global information [18]. Several studies have suggested that using auxiliary branches to aid in the segmentation task can effectively enhance segmentation accuracy. Li et al. devised pixel-wise adaptive filters for capturing the texture of local regions and refining the segmentation results. Simultaneously, they utilized a response cue erasing strategy as an auxiliary branch to improve segmentation accuracy [19]. Xu et al. constructed a variational autoencoder as an auxiliary branch for image reconstruction, which regularizes the encoder shared with U-Nets, thereby improving the network’s generalization capabilities [20].

The decomposition of an image into “content” and “style” is known as content-style disentanglement (CSD) [13]. CSD is becoming increasingly popular in computer vision and is widely utilized in tasks such as style transfer [21], multi-task learning [22,23], and semantic segmentation. In content-style disentanglement methods, the content is typically encoded as a spatial representation to maintain spatial correlations, while the style is encoded as a vector to preserve image appearance information, including color and intensity. Through disentangled representation learning, the underlying factors of variation can be encoded as separate latent variables, which proves beneficial in capturing useful information for the current task [13,24]. Esser et al. represented the content as a shape estimate and combined it with the style obtained from the variational autoencoder (VAE) [25]. Through the disentangled representation and reorganization of style and content codes, Jiang et al. decomposed the latent space of an image into a content space and a style space, achieving cross-modal image-to-image translation [26]. Furthermore, CSD is also commonly employed in medical image analysis, encompassing tasks such as image synthesis and medical segmentation. Chartsias et al. disentangled medical images into anatomical elements and modality variables, demonstrating the suitability of disentangled representations for various medical image analysis tasks [22]. Liu et al. enhanced the dataset with a disentangled representation method to effectively improve the performance of heart segmentation [14].

Inspired by the discussions above, to achieve high-precision segmentation, this paper not only designs feature extraction modules at the skip connections to capture multi-scale features but also designs an image reconstruction branch to assist in the segmentation task using disentangled representation learning.

3. Methods

3.1. Overall Network Structure

U-Net is a popular choice in medical image processing due to its exceptional segmentation ability, and we have selected it as our baseline model. In order to enhance the ability of the network to capture multi-scale information, we designed an MSFE instead of a simple skip connection. Additionally, we designed an auxiliary branch to assist in the segmentation task to achieve improved segmentation performance, which is realized using the disentangled representation method to reconstruct the image. In Figure 1, we present a diagram of the model we devised. The image X is input into the content encoder

E_{c}

to obtain a content feature map, where the content encoder represents our enhanced U-Net. Simultaneously, the image is passed through the style encoder

E_{s}

to acquire a style feature map. Subsequently, the content feature map is fed into the segmentor to derive the final segmentation mask. A decoder incorporating adaptive instance normalization layers

D e c o d e r (A d a I N)

combines the content feature map and the style feature map to reconstruct the image

X^{'}

. Figure 2 presents the detailed structure of each module, and the subsequent sections delve into the specific implementation of each module.

3.2. Disentangled Representation

3.2.1. Content Encoder

The content encoder is implemented as an improved four-layer U-Net, consisting of an encoder and a decoder. The specific structure is depicted in Figure 2a. Each layer in the encoder comprises two consecutive 3 × 3 convolution layers for feature extraction, alongside a batch normalization (BN) layer and a rectified linear unit (ReLU) function. The layers are linked by a 2 × 2 max-pooling layer. The decoder adopts a symmetrical structure to the encoder, with each layer employing the same structure as the encoder. The distinction lies in the use of an upsampling layer to restore the image resolution. The skip connection in the decoder stage facilitates the fusion of low-level features and high-level features at various resolutions. To extract multi-scale information, we designed an MSFE at the skip connection, the specific implementation of which is described in a later section. Following the 1 × 1 convolution layer, softmax activation function, and differentiable rounding operator, the content encoder produces an 8-dimensional vector.

3.2.2. Style Encoder

Global average pooling eliminates spatial correlation information and encodes global information by averaging feature values and flattening the feature map into a vector. To eliminate content information from the style encoder, average pooling and global pooling are often utilized [13]. As shown in Figure 2c, the style encoder consists of four 3 × 3 convolution layers, downsized by a factor of 2. Subsequently, a global average pooling layer is employed to suppress spatially correlated information. Then, a 1 × 1 convolution is used to project the output of the global average pooling layer into an 8-dimensional vector.

3.2.3. Segmentor

As shown in Figure 2b, the content feature map is fed into the segmentor module, which generates the final segmentation mask. The segmentor consists of two consecutive 3 × 3 convolution operations, followed by the BN layer and the ReLU function. A 1 × 1 convolution is employed to adjust the number of channels to 2, representing vascular and non-vascular components. Finally, the output is scaled to a range between 0 and 1 using a softmax activation function.

3.2.4. Decoder (AdaIN)

The research in [27] demonstrated that the affine parameters of the adaptive instance normalization layer can alter the style of the output image. Through adaptive instance normalization, the decoder can combine content and style to reconstruct the image. The AdaIN decoder initially utilizes instance normalization to effectively eliminate non-spatial information from the content factor and then allows style factors to define the new mean and standard deviation of the normalized content factors. In this way, AdaIN encourages the content feature map to incorporate spatial information and encourages the style feature map to focus more on non-spatial information. As shown in Equation (2), AdaIN takes the content feature map c and the style feature map s as inputs. The affine parameters

γ

and

β

are determined from s and are used to scale and shift the normalized c, respectively.

γ, β = M L P (s)

(1)

A d a I N (c, s) = γ (\frac{c - μ (c)}{σ (c)}) + β

(2)

where MLP stands for multilayer perceptron. Consequently, the decoder comprises three 3 × 3 convolution layers with adaptive instance normalization layers for combining the content and style. The final layer is a 7 × 7 convolution layer paired with a Tanh activation function, which is utilized to reconstruct the image.

3.3. Multi-Scale Feature Extractor

The retinal vascular structure exhibits a high level of complexity and variability in both size and distribution. A smaller receptive field tends to capture more localized features, making it advantageous for detecting smaller targets. Conversely, features captured by a large receptive field are more global, facilitating the learning of the intricate relationship between the vessel pixels and their surroundings. Therefore, our designed MSFE employs convolutional layers with varying receptive fields to extract features separately, effectively capturing multi-scale information. Given that dilated convolution expands the receptive field without adding more network parameters, we utilize dilated convolution to replace traditional convolution. As illustrated in Figure 3, three parallel convolutional layers with specific dilation rates (1, 2, and 3) are used to independently extract features of different scales. The resulting feature maps are subsequently merged through concatenation.

As the channel attention module can emphasize crucial information and reduce redundant features by assigning varying weights to individual channels, a Squeeze-and-Excitation (SE) module is incorporated into the MSFE to enhance the feature distinguishing ability. The SE module primarily consists of two steps: “squeeze” and “excitation”. Initially, the feature maps are squeezed using global average pooling, capturing the global distributions of each channel. Subsequently, the feature vector obtained from the squeeze operation is fed into two-layer fully connected layers for excitation. The first layer compresses the dimension to reduce computational complexity, while the second layer expands the dimension back to the original dimension, obtaining the feature map weight. Through excitation operations, the interdependencies between channels can be fully captured. The resulting correction weight is then multiplied with the input feature. Finally, a residual mechanism is introduced to prevent the loss of essential information. The described MSFE is presented in Equations (3)–(7):

f_{m} = C o n c a t ({C o n v}_{3 \times 3}^{1} (f_{i n}), {C o n v}_{3 \times 3}^{2} (f_{i n}), {C o n v}_{3 \times 3}^{3} (f_{i n}))

(3)

s = F_{s q} (f_{m}) = \frac{1}{w \times h} \sum_{i}^{w} \sum_{j}^{h} f_{m} (i, j)

(4)

z = F_{e x} (f_{s}, W) = σ (W_{2} δ (W_{1} s))

(5)

f_{o u t} = f_{m} \cdot z

(6)

f_{r e s - o u t} = f_{i n} + {C o n v}_{1 \times 1} (f_{o u t})

(7)

where Equation (3) represents the dilated convolutional module, with

{C o n v}_{3 \times 3}^{n}

denoting the dilated convolutional layer with dilation rate n, and

C o n c a t (\cdot)

denoting the connection operation. Equations (4)–(6) represent the channel attention module, where

F_{s q}

is the global average pooling operation and

F_{e x}

is the nonlinear transformation. The width and height of the feature map

f_{m}

are denoted by w and h, respectively;

W_{1}

and

W_{2}

are adjustable parameters; and

σ

and

δ

are activation functions. Equation (7) represents the residual mechanism, where

{C o n v}_{1 \times 1}

is a 1 × 1 convolution layer used to adjust the number of channels.

3.4. Loss Function

In fundus images, the issue of an imbalanced sample distribution arises due to the significant number of non-blood-vessel pixels, far outnumbering blood-vessel pixels. To address this problem, we selected the dice loss

L_{d i c e}

[28] and the focal loss

L_{f o c a l}

[29] as the segmentation losses. The mean absolute error between the reconstructed image and the input image serves as the image reconstruction loss

L_{r e c}

in the image reconstruction branch. To promote disentanglement and prevent the problem of posterior collapse, we implemented a z reconstruction loss

L_{z_{r e c}}

. This measure calculates the mean absolute error between the re-encoded modality vector and the modality vector that was initially sampled. This loss function helps prevent the loss of relevant information and allows for better disentanglement of the underlying factors in medical image data. The total loss function is expressed by the following equation:

L = L_{d i c e} + L_{f o c a l} + L_{r e c} + L_{z_{r e c}}

(8)

4. Experiments

4.1. Datasets

We evaluated the performance of our model on three datasets: DRIVE, STARE, and CHASE_DB1. The DRIVE dataset [30] was sourced from the Netherlands Screening Work and includes 40 color fundus images with a resolution of 565 × 584. The dataset was split into a training set, comprising the first 20 images, and a test set, consisting of the remaining images. The CHASE_DB1 dataset [31] contains 28 fundus images with a resolution of 999 × 960. The dataset was divided into 20 training images and 8 test images [32]. The STARE dataset [33] comprises 20 fundus images with a resolution of 605 × 700. We adopted the same dataset partitioning method as [34], using 16 images as the training set and 4 images as the test set. For the DRIVE dataset, one expert was responsible for manually annotating the training images, while two experts were assigned to annotate the test images. For the CHASE_DB1 and STARE datasets, two experts were assigned to manually annotate both the training and test images during the annotation phase. Throughout the experiment, we utilized the annotation results of the first expert as the ground truth.

4.2. Evaluation Metrics

A comprehensive assessment of our model’s segmentation performance was carried out based on the following four most commonly used evaluation metrics: the area under the receiver operating characteristic curve (AUC), accuracy (Acc.), specificity (Spe.), and sensitivity (Sen.). The formulas for these four indicators are given below:

A_{c c} = \frac{T_{p} + T_{n}}{T_{p} + F_{p} + T_{n} + F_{n}}

(9)

S p e = \frac{T_{n}}{F_{p} + T_{n}}

(10)

S e n = \frac{T_{p}}{T_{p} + F_{n}}

(11)

where TP (true positive) and TN (true negative) refer to the correctly classified vascular and non-vascular pixels, respectively. Similarly, FP (false positive) and FN (false negative) denote pixels that have been misclassified. Specificity determines the model’s accuracy in identifying non-vessel pixels, while sensitivity measures its correctness in detecting blood-vessel pixels. The overall capabilities of the model are captured by accuracy and AUC.

4.3. Implementation Setup

All experiments were conducted on an NVIDIA RTX-4090 GPU platform, and the PyTorch1.6 framework was used to develop the network. We employed the Adam optimizer with a learning rate of 0.001. The batch size was set to 4, and training was conducted over 5000 iterations. Due to the presence of personal medical information in the retinal images, obtaining these data may raise privacy concerns. Consequently, acquiring fundus images was challenging, resulting in a limited number of samples in the datasets. Segmentation networks based on deep learning have a large number of parameters, and insufficient training data can lead to overfitting and poor generalization of the network. Therefore, random flipping and random slicing methods were used for data augmentation in this paper. We adopted a uniform patch training strategy that involved extracting random patches sized 224 × 224. The same network configuration was utilized for the related segmentation experiments on the three public datasets.

5. Results

5.1. Comparison with State-of-the-Art Methods

We compared our proposed method with numerous existing advanced methods on the DRIVE, CHASE_DB1, and STARE datasets. The comparison results for each dataset are presented in Table 1, Table 2 and Table 3, with the optimal performance values highlighted in bold for each metric.

For the DRIVE dataset, the proposed method achieved the highest accuracy, AUC, specificity, and sensitivity values of 0.9690, 0.9858, 0.9829, and 0.8230, respectively, among all the comparison methods. Notably, our method significantly outperformed the others in terms of accuracy.

Regarding the CHASE_DB1 dataset, while Wang et al. [35] achieved the highest AUC, the difference was merely 0.0063 compared to our method. However, our method significantly outperformed theirs in terms of the other three metrics (accuracy, specificity, and sensitivity), with improvements of 0.0087, 0.0045, and 0.0024, respectively. Compared with all other methods, ours yielded the best results in most evaluation metrics, particularly in accuracy and specificity, reaching values as high as 0.9757 and 0.9858, respectively.

Regarding the STARE dataset, our method achieved the highest accuracy and specificity, reaching 0.9765 and 0.9862, respectively, with an accuracy improvement of 0.0053 compared to the second-best method. Although the method in [36] achieved the highest sensitivity value, its accuracy and specificity values were lower than those of our proposed method.

According to the experimental data presented in Table 1, Table 2 and Table 3, our proposed method exhibited superior segmentation performance compared to the other methods, achieving the highest accuracy values of 0.9690, 0.9757, and 0.9765, respectively, on all three datasets. Accuracy serves as a critical measure of the model’s overall performance, with higher values indicating superior segmentation capabilities. Furthermore, our approach outperformed other methods across most of the evaluated metrics on all three datasets. In summary, the quantitative experimental results clearly showcase the exceptional segmentation performance of our model.

Table 1. Comparison results on the DRIVE dataset.

Method	Year	Acc	AUC	Spe	Sen
Zhang [37]	2017	0.9466	0.9703	0.9712	0.7861
Alom [38]	2018	0.9556	0.9784	0.9813	0.7792
Yan [39]	2018	0.9542	0.9752	0.9818	0.7653
Oliveira [40]	2018	0.9576	0.9821	0.9804	0.8039
Li [12]	2019	0.9568	0.9806	0.9810	0.7921
Wang [35]	2020	0.9581	0.9823	0.9813	0.7991
Lu [41]	2020	0.9547	0.9739	0.9769	0.8026
Wang [42]	2020	0.9565	0.9801	0.9782	0.8071
Liu [43]	2022	0.9581	0.9836	0.9790	0.8170
Liu [44]	2023	0.9664	–	0.9764	0.8164
Du [45]	2023	0.9572	0.9460	0.9783	0.7391
Li [46]	2023	0.9580	0.9820	0.9818	0.8139
Proposed	2023	0.9690	0.9858	0.9829	0.8230

Table 2. Comparison results on the CHASE_DB1 dataset.

Method	Year	Acc	AUC	Spe	Sen
Zhang [37]	2017	0.9502	0.9706	9716	0.7644
Alom [38]	2018	0.9634	0.9815	0.9820	0.7756
Yan [39]	2018	0.9610	0.9781	0.9809	0.7633
Li [12]	2019	0.9635	0.9810	0.9819	0.7818
Wang [35]	2020	0.9670	0.9871	0.9813	0.8239
Lu [41]	2020	0.9617	0.9782	0.9762	0.8135
Wang [42]	2020	0.9706	0.9824	0.9836	0.8427
Yang [47]	2021	0.9632	0.9827	0.9776	0.8176
Tan [36]	2022	0.9561	–	0.9794	0.7816
Liu [44]	2023	0.9664	–	0.9821	0.8284
Du [45]	2023	0.9672	0.9714	0.9733	0.8781
Li [46]	2023	0.9669	0.9867	0.9817	0.8194
Upadhyay [48]	2023	0.9710	0.9810	0.9830	0.7990
Proposed	2023	0.9757	0.9808	0.9858	0.8263

Table 3. Comparison results on the STARE dataset.

Method	Year	Acc	AUC	Spe	Sen
Zhang [37]	2017	0.9547	0.9740	0.9729	0.7882
Yan [39]	2018	0.9612	0.9801	0.9846	0.7581
Oliveria [40]	2018	0.9694	0.9905	0.9858	0.8315
Li [12]	2019	0.9678	0.9875	0.9823	0.8352
Wang [35]	2020	0.9673	0.9881	0.9844	0.8186
Lu [41]	2020	0.9593	0.9788	0.9784	0.8308
Wang [42]	2020	0.9702	0.9825	0.9845	0.8432
Yang [47]	2021	0.9626	0.9827	0.9821	0.7964
Tan [36]	2022	0.9622	–	0.9848	0.8514
Liu [44]	2023	0.9641	–	0.9836	0.7902
Du [45]	2023	0.9591	0.9726	0.9661	0.8743
Li [46]	2023	0.9712	0.9902	0.9779	0.7972
Upadhyay [48]	2023	0.9640	0.9780	0.9780	0.8190
Proposed	2023	0.9765	0.9811	0.9862	0.8296

5.2. Ablation Study

To validate the effectiveness of our designed MSFE module and DR reconstruction branch, we conducted a series of ablation experiments on the three datasets. The baseline was set as a U-shaped network with a four-layer structure. We compared our UNet-MSFE-DR approach with (1) the baseline, (2) the baseline with the MSFE, and (3) the baseline with DR. The statistical results of the experiments are presented in Table 4, Table 5 and Table 6.

5.2.1. Efficacy of MSFE

To demonstrate the efficacy of the MSFE, it was integrated with the baseline network to analyze the improvements. According to Table 4, for the DRIVE dataset, the baseline with the MSFE achieved significantly superior outcomes across the four assessment criteria, reaching up to 0.9680, 0.9810, 0.9807, and 0.8213. Compared with the baseline, accuracy, AUC, specificity, and sensitivity improved by 0.0054, 0.0059, 0.0036, and 0.0161, respectively, with sensitivity exhibiting the most noticeable enhancement. Sensitivity, which measures the proportion of correctly identified vessel pixels over all vessel pixels, is a key assessment indicator. A higher sensitivity indicates a better ability to correctly classify the vessel pixels. In our network, the purpose of the MSFE is to enhance the network’s ability to capture multi-scale information and improve the accuracy of vessel segmentation. Therefore, by introducing the MSFE, more vessel pixels are correctly classified and thus the accuracy and sensitivity are improved, which is consistent with our experimental results. For the CHASE_DB1 dataset, the experimental outcomes in Table 5 illustrate the efficiency of the MSFE. The accuracy, AUC, specificity, and sensitivity values achieved by the baseline were 0.9743, 0.9723, 0.9836, and 0.8090, while the evaluation indicators of the baseline with the MSFE were 0.9754, 0.9752, 0.9849, and 0.8254. Similar to the results obtained for the DRIVE dataset, each performance metric improved, with sensitivity exhibiting the largest increase of 0.0164. For the STARE dataset, the baseline with the MSFE improved compared to the baseline in three indicators, including accuracy, AUC, and sensitivity, reaching 0.9765, 0.9774, and 0.8198. On all three datasets, the baseline with the MSFE achieved higher AUC and accuracy values, which measure the network’s overall ability to correctly identify pixels. This demonstrates that the MSFE comprehensively enhances the network’s segmentation performance while capturing multi-scale information.

5.2.2. Efficacy of DR Reconstruction Branch

To illustrate the efficacy of the proposed DR reconstruction branch, we integrated it with the baseline in the experiment. As shown in Table 4, the addition of the DR reconstruction branch led to improvements across all evaluation metrics on the DRIVE dataset (accuracy: 0.9626/0.9680; AUC: 0.9751/0.9829; specificity: 0.9771/0.9802; and sensitivity: 0.8052/0.8207). The most significant increases were observed in the AUC and sensitivity values, with the AUC increasing by 0.0078 and sensitivity increasing by 0.0155. In our network, the purpose of the DR reconstruction branch is to encourage the segmentation branch to focus more on spatial information and suppress redundant information, which is beneficial to the segmentation task and thus can effectively improve the network’s segmentation performance. Therefore, the accuracy and AUC values should increase after the introduction of the DR reconstruction branch, which is fully proven by the experimental results on the DRIVE dataset. Similarly, the results of the ablation experiments on the CHASE_DB1 and STARE datasets also validate the effectiveness of the DR reconstruction branch. As shown in Table 5, by adding the DR reconstruction branch, the accuracy, AUC, specificity, and sensitivity values all increased. The sensitivity improved from 0.8090 to 0.8179, while the specificity was unaffected. This outcome suggests that the DR reconstruction branch effectively enhances the model’s capacity to accurately segment vessel pixels without compromising its ability to detect non-vessel pixels. For the STARE dataset, the network combined with the DR reconstruction branch outperformed the baseline across all four metrics, reaching up to 0.9778, 0.9783, 0.9885, and 0.8169. As depicted in Table 4, Table 5 and Table 6, the adoption of the DR reconstruction branch resulted in improved values for accuracy, AUC, specificity, and sensitivity across all datasets, which indicates that the performance gain achieved by the DR reconstruction branch is significant.

5.2.3. Efficacy of UNet-MSFE-DR

After proving the effectiveness of the MSFE and the DR reconstruction branch separately, we combined the two components with the baseline network to form our proposed network. As shown in Table 4, Table 5 and Table 6, the UNet-MSFE-DR showed significant improvements across all evaluation metrics on all datasets compared to the baseline. For the DRIVE dataset, the UNet-MSFE-DR obtained the highest values in accuracy, AUC, specificity, and sensitivity. For the CHASE_DB1 dataset, the UNet-MSFE-DR also obtained the highest values across all metrics. For the STARE dataset, although the UNet-MSFE-DR only achieved the highest AUC and sensitivity values, the gaps between the values of the other two indicators and the highest values were particularly small and negligible. Overall, the UNet-MSFE-DR achieved the best segmentation performance compared to the baseline network with just one component.

5.3. Qualitative Analysis

In addition to the quantitative evaluation, we also conducted an extensive qualitative assessment to provide a more visually indicative evaluation of the segmentation performance. Referring to the method in [49], we compared our approach’s performance with other typical prominent retinal vessel segmentation models, including R2U-Net [38], AttU-Net [50], and LadderNet [51]. Figure 4 shows the visualization segmentation results of the comparison models, providing local images to better illustrate the details. R2U-Net exhibited the poorest segmentation results, with a substantial number of vascular pixels inaccurately identified, including both thick and thin vascular pixels. AttU-Net included an attention mechanism to better capture important information, demonstrating slightly better segmentation performance than R2U-Net. LadderNet, with multiple pairs of encoder–decoder branches, allowed for more information flow between layers, leading to a significant improvement over R2U-Net and AttU-Net. However, LadderNet failed to accurately capture thin blood vessels. Our proposed network utilizes the MSFE to extract multi-scale information and employs an auxiliary branch to further improve segmentation performance. Compared with other networks, our network achieved the most accurate prediction results.

Furthermore, we also carried out a qualitative analysis of the ablation experiments. Figure 5 provides the visualization results of the ablation experiments. Figure 6 displays some local image patches, demonstrating the segmentation results more clearly and emphatically. As observed in Figure 5, the baseline with the MSFE captured more thin vessels compared to the baseline network, effectively enhancing the network’s ability to identify vessel pixels. Similarly, the baseline with DR also accurately identified more vascular pixels and correctly classified more non-vessel pixels. When both components were integrated into the network, our proposed network demonstrated stronger segmentation capability, leading to more continuous segmentation results and achieving superior segmentation performance compared to the baseline network or the network with only one additional component. Detailed observations can be made from Figure 6, which indicates that the addition of the MSFE resulted in the correct classification of more vessel pixels, enhancing vessel connectivity for both coarse and fine vessels. Similar effects were observed with the addition of the DR reconstruction branch.

The visualization results from the comparison and ablation experiments intuitively demonstrate that our model outperforms other models and prove that our designed MSFE and DR reconstruction branch can effectively improve segmentation performance.

6. Conclusions

In this paper, we proposed a novel model for retinal vessel segmentation, which consists of a multi-scale feature extraction module and a decoupled representation reconstruction branch. Firstly, to enhance the ability of the network to capture multi-scale information, we designed an MSFE module at the skip connections. Secondly, we designed an image reconstruction auxiliary branch to assist in the segmentation task, using disentangled representation to encourage the segmentation branch to focus more on spatial information and suppress redundant information, thereby enhancing segmentation performance. We conducted experiments on three publicly available datasets. The comparison experiments indicated that the performance of our proposed UNet-MSFE-DR was superior to existing methods, and the ablation experiments demonstrated the effectiveness of the MSFE and the DR reconstruction branch. Our network is not limited to retinal vessel segmentation tasks, as it can also be applied to other image segmentation tasks. In the future, we will validate the effectiveness of our proposed method on other datasets.

Author Contributions

Methodology, Y.Z. and T.C.; Software, T.C.; Writing—original draft, T.C.; Supervision, D.Z. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key Research and Development Program of China (Grant no: 2021YFC2009200), the National Key Research and Development Program of China (Grant no: 2020YFC2007200) and the special project of Technological Innovation and Application Development of Chongqing (Grant no: cstc2019jscx-msxmX0167).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dai, L.; Wu, L.; Li, H.; Cai, C.; Wu, Q.; Kong, H.; Liu, R.; Wang, X.; Hou, X.; Liu, Y.; et al. A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 2021, 12, 3242. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Wang, X.; Wu, Q.; Dai, L.; Fang, X.; Yan, T.; Son, J.; Tang, S.; Li, J.; Gao, Z.; et al. DeepDRiD: Diabetic retinopathy—Grading and image quality estimation challenge. Patterns 2022, 3, 100512. [Google Scholar] [CrossRef] [PubMed]
Soares, J.V.; Leandro, J.J.; Cesar, R.M.; Jelinek, H.F.; Cree, M.J. Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification. IEEE Trans. Med. Imaging 2006, 25, 1214–1222. [Google Scholar] [CrossRef] [PubMed]
Wu, H.; Wang, W.; Zhong, J.; Lei, B.; Wen, Z.; Qin, J. Scs-net: A scale and context sensitive network for retinal vessel segmentation. Med. Image Anal. 2021, 70, 102025. [Google Scholar] [CrossRef] [PubMed]
Mookiah, M.R.K.; Hogg, S.; MacGillivray, T.J.; Prathiba, V.; Pradeepa, R.; Mohan, V.; Anjana, R.M.; Doney, A.S.; Palmer, C.N.; Trucco, E. A review of machine learning methods for retinal blood vessel segmentation and artery/vein classification. Med. Image Anal. 2021, 68, 101905. [Google Scholar] [CrossRef] [PubMed]
Mo, J.; Zhang, L. Multi-level deep supervised networks for retinal vessel segmentation. Int. J. Comput. Assist. Radiol. 2017, 12, 2181–2193. [Google Scholar] [CrossRef]
Liu, R.; Wang, T.; Li, H.; Zhang, P.; Li, J.; Yang, X.; Shen, D.; Sheng, B. TMM-Nets: Transferred Multi-to Mono-Modal Generation for Lupus Retinopathy Diagnosis. IEEE Trans. Med. Imaging 2022, 42, 1083–1094. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Abingdon, UK, 2015; pp. 234–241. [Google Scholar]
Nazir, A.; Cheema, M.N.; Sheng, B.; Li, P.; Li, H.; Xue, G.; Qin, J.; Kim, J.; Feng, D.D. ECSU-net: An embedded clustering sliced U-net coupled with fusing strategy for efficient intervertebral disc segmentation and classification. IEEE Trans. Image Process. 2021, 31, 880–893. [Google Scholar] [CrossRef]
Li, X.; Jiang, Y.; Li, M.; Yin, S. Lightweight attention convolutional neural network for retinal vessel image segmentation. IEEE Trans. Ind. Inform. 2020, 17, 1958–1967. [Google Scholar] [CrossRef]
Liu, X.; Sanchez, P.; Thermos, S.; O’Neil, A.Q.; Tsaftaris, S.A. Learning disentangled representations in the imaging domain. Med. Image Anal. 2022, 80, 102516. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Thermos, S.; Chartsias, A.; O’Neil, A.; Tsaftaris, S.A. Disentangled representations for domain-generalized cardiac segmentation. In The Statistical Atlases and Computational Models of the Heart; Springer: Berlin/Heidelberg, Germany, 2021; pp. 187–195. [Google Scholar]
Lin, X.; Sun, S.; Huang, W.; Sheng, B.; Li, P.; Feng, D.D. EAPT: Efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 2021, 25, 50–61. [Google Scholar] [CrossRef]
Cheema, M.N.; Nazir, A.; Yang, P.; Sheng, B.; Li, P.; Li, H.; Wei, X.; Qin, J.; Kim, J.; Feng, D.D. Modified GAN-cAED to minimize risk of unintentional liver major vessels cutting by controlled segmentation using CTA/SPET-CT. IEEE Trans. Ind. Inform. 2021, 17, 7991–8002. [Google Scholar] [CrossRef]
Li, Y.; Zhang, Y.; Cui, W.; Lei, B.; Kuang, X.; Zhang, T. Dual encoder-based dynamic-channel graph convolutional network with edge enhancement for retinal vessel segmentation. IEEE Trans. Med. Imaging 2022, 41, 1975–1989. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Zhang, Y.; Liu, J.Y.; Wang, K.; Zhang, K.; Zhang, G.S.; Liao, X.F.; Yang, G. Global Transformer and Dual Local Attention Network via Deep-Shallow Hierarchical Feature Fusion for Retinal Vessel Segmentation. IEEE Trans. Cybern. 2022, 53, 5826–5839. [Google Scholar] [CrossRef] [PubMed]
Li, M.; Zhou, S.; Chen, C.; Zhang, Y.; Liu, D.; Xiong, Z. Retinal vessel segmentation with pixel-wise adaptive filters. In Proceedings of the IEEE 19th International Symposium on Biomedical Imaging, Kolkata, India, 28–31 March 2022; pp. 1–5. [Google Scholar]
Xu, W.; Yang, H.; Zhang, M.; Pan, X.; Liu, W.; Yan, S. Retinal Vessel Segmentation with VAE Reconstruction and Multi-Scale Context Extractor. In Proceedings of the IEEE 19th International Symposium on Biomedical Imaging, Kolkata, India, 28–31 March 2022; pp. 1–5. [Google Scholar]
Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
Chartsias, A.; Joyce, T.; Papanastasiou, G.; Semple, S.; Williams, M.; Newby, D.E.; Dharmakumar, R.; Tsaftaris, S.A. Disentangled representation learning in cardiac image analysis. Med. Image Anal. 2019, 58, 101535. [Google Scholar] [CrossRef] [PubMed]
Meng, Q.; Pawlowski, N.; Rueckert, D.; Kainz, B. Representation disentanglement for multi-task learning with application to fetal ultrasound. In The Smart Ultrasound Imaging and Perinatal, Preterm and Paediatric Image Analysis; Springer: Berlin/Heidelberg, Germany, 2019; pp. 47–55. [Google Scholar]
Liu, Y.; Carass, A.; Zuo, L.; He, Y.; Han, S.; Gregori, L.; Murray, S.; Mishra, R.; Lei, J.; Calabresi, P.A.; et al. Disentangled representation learning for octa vessel segmentation with limited training data. IEEE Trans. Med. Imaging 2022, 41, 3686–3698. [Google Scholar] [CrossRef] [PubMed]
Esser, P.; Sutter, E.; Ommer, B. A variational u-net for conditional appearance and shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8857–8866. [Google Scholar]
Jiang, K.; Quan, L.; Gong, T. Disentangled representation and cross-modality image translation based unsupervised domain adaptation method for abdominal organ segmentation. Int. J. Comput. Assist. Radiol. 2022, 17, 1101–1113. [Google Scholar] [CrossRef] [PubMed]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1501–1510. [Google Scholar]
Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In The Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Berlin/Heidelberg, Germany, 2017; pp. 240–248. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Staal, J.; Abràmoff, M.D.; Niemeijer, M.; Viergever, M.A.; Van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 2004, 23, 501–509. [Google Scholar] [CrossRef]
Owen, C.G.; Rudnicka, A.R.; Mullen, R.; Barman, S.A.; Monekosso, D.; Whincup, P.H.; Ng, J.; Paterson, C. Measuring retinal vessel tortuosity in 10-year-old children: Validation of the computer-assisted image analysis of the retina (CAIAR) program. Investig. Ophthalmol. Vis. Sci 2009, 50, 2004–2010. [Google Scholar] [CrossRef]
Li, Q.; Feng, B.; Xie, L.; Liang, P.; Zhang, H.; Wang, T. A cross-modality learning approach for vessel segmentation in retinal images. IEEE Trans. Med. Imaging 2015, 35, 109–118. [Google Scholar] [CrossRef]
Hoover, A.; Kouznetsova, V.; Goldbaum, M. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans. Med. Imaging 2000, 19, 203–210. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Verma, M.; Nakashima, Y.; Nagahara, H.; Kawasaki, R. Iternet: Retinal image segmentation utilizing structural redundancy in vessel networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 3656–3665. [Google Scholar]
Wang, D.; Haytham, A.; Pottenburgh, J.; Saeedi, O.; Tao, Y. Hard attention net for automatic retinal vessel segmentation. IEEE J. Biomed. Health Inform. 2020, 24, 3384–3396. [Google Scholar] [CrossRef]
Tan, Y.; Yang, K.F.; Zhao, S.X.; Li, Y.J. Retinal vessel segmentation with skeletal prior and contrastive loss. IEEE Trans. Med. Imaging 2022, 41, 2238–2251. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Chen, Y.; Bekkers, E.; Wang, M.; Dashtbozorg, B.; ter Haar Romeny, B.M. Retinal vessel delineation using a brain-inspired wavelet transform and random forest. Pattern Recognit. 2017, 69, 107–123. [Google Scholar] [CrossRef]
Alom, M.Z.; Hasan, M.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv 2018, arXiv:1802.06955. [Google Scholar]
Yan, Z.; Yang, X.; Cheng, K.T. Joint segment-level and pixel-wise losses for deep learning based retinal vessel segmentation. IEEE Trans. Biomed. Eng. 2018, 65, 1912–1923. [Google Scholar] [CrossRef]
Oliveira, A.; Pereira, S.; Silva, C.A. Retinal vessel segmentation based on fully convolutional neural networks. Expert Syst. Appl. 2018, 112, 229–242. [Google Scholar] [CrossRef]
Lü, X.; Shao, F.; Xiong, Y.; Yang, W. Retinal vessel segmentation method based on two-stream networks. Acta Opt. Sin. 2020, 40, 0410002. [Google Scholar]
Wang, B.; Wang, S.; Qiu, S.; Wei, W.; Wang, H.; He, H. CSU-Net: A context spatial U-Net for accurate blood vessel segmentation in fundus images. IEEE J. Biomed. Health Inform. 2020, 25, 1128–1138. [Google Scholar] [CrossRef]
Ye, Y.; Pan, C.; Wu, Y.; Wang, S.; Xia, Y. MFI-Net: Multiscale Feature Interaction Network for Retinal Vessel Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 4551–4562. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Shen, J.; Yang, L.; Yu, H.; Bian, G. Wave-Net: A lightweight deep network for retinal vessel segmentation from fundus images. Comput. Biol. Med. 2023, 152, 106341. [Google Scholar] [CrossRef]
Du, H.; Zhang, X.; Song, G.; Bao, F.; Zhang, Y.; Wu, W.; Liu, P. Retinal blood vessel segmentation by using the MS-LSDNet network and geometric skeleton reconnection method. Comput. Biol. Med. 2023, 153, 106416. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Peng, L.; Peng, S.; Xiao, H.; Zhang, Y. Retinal vessel segmentation by using AFNet. Vis. Comput. 2023, 39, 1929–1941. [Google Scholar] [CrossRef]
Yang, L.; Wang, H.; Zeng, Q.; Liu, Y.; Bian, G. A hybrid deep segmentation network for fundus vessels via deep-learning framework. Neurocomputing 2021, 448, 168–178. [Google Scholar] [CrossRef]
Upadhyay, K.; Agrawal, M.; Vashist, P. Learning multi-scale deep fusion for retinal blood vessel extraction in fundus images. Vis. Comput. 2023, 39, 4445–4457. [Google Scholar] [CrossRef]
Li, J.; Gao, G.; Yang, L.; Liu, Y. GDF-Net: A multi-task symmetrical network for retinal vessel segmentation. Biomed. Signal Process. Control 2023, 81, 104426. [Google Scholar] [CrossRef]
Zhuang, J. LadderNet: Multi-path networks based on U-Net for medical image segmentation. arXiv 2018, arXiv:1810.07810. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]

Figure 1. Framework of the proposed UNet-MSFE-DR.

Figure 2. The structure of the proposed UNet-MSFE-DR.

Figure 3. The structure of the designed MSFE.

Figure 4. Visualization of comparative experimental results.

Figure 5. Visualization of the ablation experiments.

Figure 6. Detailed visualization results of ablation experiments.

Table 4. Ablation results on the DRIVE dataset.

Method	Acc	AUC	Spe	Sen
Baseline	0.9626	0.9751	0.9771	0.8052
Baseline + MSFE	0.9680	0.9810	0.9807	0.8213
Baseline + DR	0.9680	0.9829	0.9802	0.8207
Baseline + MSFE + DR	0.9690	0.9858	0.9829	0.8230

Table 5. Ablation results on the CHASE_DB1 dataset.

Method	Acc	AUC	Spe	Sen
Baseline	0.9743	0.9723	0.9836	0.8090
Baseline + MSFE	0.9754	0.9752	0.9849	0.8254
Baseline + DR	0.9747	0.9775	0.9839	0.8179
Baseline + MSFE + DR	0.9757	0.9808	0.9858	0.8263

Table 6. Ablation results on the STARE dataset.

Method	Acc	AUC	Spe	Sen
Baseline	0.9746	0.9689	0.9884	0.7656
Baseline + MSFE	0.9765	0.9774	0.9868	0.8198
Baseline + DR	0.9778	0.9783	0.9885	0.8169
Baseline + MSFE + DR	0.9765	0.9811	0.9862	0.8296

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhong, Y.; Chen, T.; Zhong, D.; Liu, X. Vessel Segmentation in Fundus Images with Multi-Scale Feature Extraction and Disentangled Representation. Appl. Sci. 2024, 14, 5039. https://doi.org/10.3390/app14125039

AMA Style

Zhong Y, Chen T, Zhong D, Liu X. Vessel Segmentation in Fundus Images with Multi-Scale Feature Extraction and Disentangled Representation. Applied Sciences. 2024; 14(12):5039. https://doi.org/10.3390/app14125039

Chicago/Turabian Style

Zhong, Yuanhong, Ting Chen, Daidi Zhong, and Xiaoming Liu. 2024. "Vessel Segmentation in Fundus Images with Multi-Scale Feature Extraction and Disentangled Representation" Applied Sciences 14, no. 12: 5039. https://doi.org/10.3390/app14125039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vessel Segmentation in Fundus Images with Multi-Scale Feature Extraction and Disentangled Representation

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Overall Network Structure

3.2. Disentangled Representation

3.2.1. Content Encoder

3.2.2. Style Encoder

3.2.3. Segmentor

3.2.4. Decoder (AdaIN)

3.3. Multi-Scale Feature Extractor

3.4. Loss Function

4. Experiments

4.1. Datasets

4.2. Evaluation Metrics

4.3. Implementation Setup

5. Results

5.1. Comparison with State-of-the-Art Methods

5.2. Ablation Study

5.2.1. Efficacy of MSFE

5.2.2. Efficacy of DR Reconstruction Branch

5.2.3. Efficacy of UNet-MSFE-DR

5.3. Qualitative Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI