Lightweight Frequency Recalibration Network for Diabetic Retinopathy Multi-Lesion Segmentation

Fu, Yinghua; Liu, Mangmang; Zhang, Ge; Peng, Jiansheng

doi:10.3390/app14166941

Open AccessArticle

Lightweight Frequency Recalibration Network for Diabetic Retinopathy Multi-Lesion Segmentation

¹

School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

²

Department of Artificial Intelligence and Manufacturing, Hechi University, Hechi 546300, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 6941; https://doi.org/10.3390/app14166941

Submission received: 21 June 2024 / Revised: 5 August 2024 / Accepted: 5 August 2024 / Published: 8 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

Automated segmentation of diabetic retinopathy (DR) lesions is crucial for assessing DR severity and diagnosis. Most previous segmentation methods overlook the detrimental impact of texture information bias, resulting in suboptimal segmentation results. Additionally, the role of lesion shape is not thoroughly considered. In this paper, we propose a lightweight frequency recalibration network (LFRC-Net) for simultaneous multi-lesion DR segmentation, which integrates a frequency recalibration module into the bottleneck layers of the encoder to analyze texture information and shape features together. The module utilizes a Gaussian pyramid to generate features at different scales, constructs a Laplacian pyramid using a difference of Gaussian filter, and then analyzes object features in different frequency domains with the Laplacian pyramid. The high-frequency component handles texture information, while the low-frequency area focuses on learning the shape features of DR lesions. By adaptively recalibrating these frequency representations, our method can differentiate the objects of interest. In the decoder, we introduce a residual attention module (RAM) to enhance lesion feature extraction and efficiently suppress irrelevant information. We evaluate the proposed model’s segmentation performance on two public datasets, IDRiD and DDR, and a private dataset, an ultra-wide-field fundus images dataset. Extensive comparative experiments and ablation studies are conducted across multiple datasets. With minimal model parameters, our approach achieves an mAP_PR of 60.51%, 34.83%, and 14.35% for the segmentation of EX, HE, and MA on the DDR dataset and also obtains excellent results for EX and SE on the IDRiD dataset, which validates the effectiveness of our network.

Keywords:

diabetic retinopathy; segmentation; LFRC-Net; RAM

1. Introduction

Diabetic retinopathy (DR), now a prevalent medical issue among diabetics globally, stands as a primary cause of blindness in the working-age population [1]. It is estimated that 93 million individuals worldwide are affected by DR [2,3]. The condition is characterized by symptoms such as microaneurysms (MAs), hemorrhages (HEs), soft exudates (SEs), and hard exudates (EXs) [4,5], as illustrated in Figure 1, which are fundamental in ophthalmological diagnoses.

At present, there is no absolute cure for this condition. The most efficient approach entails early detection and intervention to control its advancement [6]. In the clinic, ophthalmologists must manually examine lesions in fundus images to screen for DR, which is not only time-intensive but also influenced by the subjective judgment of doctors, posing a challenge to the reliability of detection. Therefore developing an automated method for lesion segmentation is pivotal in the diagnosis of DR.

There are considerable variations in the shape, size, and appearance of each lesion type among different individuals, and different lesions may also manifest similar features. MAs and HEs typically present lower intensities in images, while EXs and SEs tend to be brighter. The intra-class differences and inter-class similarities make the segmentation of DR an exceptionally complex endeavor. The diminished contrast and clarity between a lesion and normal areas challenge segmentation and detection approaches.

A variety of image segmentation methods have emerged, and of them, those based on deep learning outperform conventional machine learning techniques. Convolutional neural networks (CNNs) have gained widespread adoption due to their proficiency in extracting intricate layers of features from datasets. Unet and related models [7,8,9,10] are frequently utilized in medical image segmentation. Most of them focus on augmenting the architecture of the original U-Net by embedding attention mechanisms and diverse nonlinear functions within the convolutional layers. Applied in retinal fundus image segmentation, they often are confronted with distractions from inadequate feature extraction. To address this issue, some approaches propose utilizing different domains to extract distinct features. Li et al. [11] proposed using a high-frequency domain to specify retinal vessel segmentation, which addressed the issue of previous networks being too sensitive to low-frequency noise in fundus images. HGC-Net [12] enhanced vessel segmentation in fundus images by extracting high-frequency components to highlight vascular structures. Li et al. [13] proposed a network for free-annotated restored cataractous fundus images, which utilized high-frequency components extracted from fundus images to replace segmentation to preserve the retinal structure. However, these methods overlooked the influence of texture information bias in high-frequency domains on segmentation performance.

To address these challenges, and inspired by [14], we introduced a model named LFRC-Net to simultaneously segment four types of DR lesions by initially incorporating an improved frequency recalibration module (FRCM) at the top of the encoder to capture a wider range of texture and shape characteristics of lesions. The Laplacian pyramid is integrated into the frequency recalibration module, where the low-frequency domain is taken to learn shape information and the high-frequency area to learn text-based features. The noise is diminished in the final segmentation results. Subsequently, a weighted combination is introduced to aggregate features across all levels of the Laplacian pyramid to obtain the weights through their significance to the segmentation result, which focuses more on features with rich information and suppresses noisy attributes through the global embedding of the channels. Finally, a residual attention module including channel attention and spatial attention is incorporated into the decoder, where the channel attention is used to identify features of interest and the spatial attention to pinpoint their locations. By doing this, the interference of irrelevant features is mitigated further, while the pertinent lesions are enhanced.

2. Methods

The proposed LFRC-Net is introduced in this section, which is illustrated in Figure 2, and the encoder–decoder architecture is taken as the backbone. A frequency recalibration module (FRCM) is placed at the bottleneck layer of the encoder, and the residual attention modules are integrated in the decoder.

The process begins by randomly choosing centers in regions of interest within the original image to form

256 \times 256

patches, which are then inputted into the encoder. These encoded feature maps are subsequently refined through the residual attention module (RAM) and further refined by the FRCM to adeptly filter out extraneous features.

The refined feature maps are directed to the decoder, which involves three stages, with a residual attention module for each. Up-sampling in the feature layer is taken by bilinear interpolation, while the skip connections are accomplished through

1 \times 1

convolutions, batch normalization (BN) layers, and ReLU activation, where

1 \times 1

convolutions effectively reduce the channel size of the feature maps from the encoder. The number of convolutions is denoted by K, which can be adjusted to the appropriate parameter size of the model. The segmentation of the four types of DR lesions is obtained through a

1 \times 1

convolution layer followed by softmax activation.

2.1. Encoder Module

MobileNet V2 [15], pretrained on the ImageNet dataset [16], is taken as the backbone of the encoder of our model and consists of two modules: an inverted residual module with 1 stride and a down-sampling module with 2 strides. The incorporation of inverted residual structure is the most notable innovation in MobileNet V2 compared with MobileNet V1, which increases and decreases the dimensionality through

1 \times 1

convolution to preserve more abundant channel information while simultaneously reducing the number of learning parameters. The inverted residual structure removes the nonlinear function during the dimension reduction phase, retaining only the

1 \times 1

convolution, which not only preserves feature diversity but also enhances the expressive capability of the model. Unlike standard convolution, depth-wise convolution, a convolution operation used in convolutional neural networks, applies a filter to each input channel separately and then combines the results. This method can significantly reduce computational cost and the number of parameters. MobileNet V2 is illustrated in Figure 3, where c, n, s, and Dwise represent channel number, module number, stride, and depth-wise convolution, respectively, while t denotes the scaling factor of the channel number within the inverted residual module.

2.2. Frequency Recalibration Module (FRCM)

Retinal fundus images are more sensitive to variations in lighting than the ones from natural scenarios, which decreases the morphological features of lesions under different lighting conditions. Traditional convolutional neural networks (CNNs) tend to extract features based on texture rather than shape [17,18], which limits their ability to utilize useful low-frequency shape information [14]. The frequency recalibration module (FRCM) is proposed to extract the features of both texture and shape, which is illustrated in Figure 4.

There are three components: a Gaussian pyramid, a Laplacian pyramid, and a frequency attention module. The input feature map is transformed into a Laplacian pyramid using the difference of Gaussian (DoG): diverse Gaussian kernels and variances are incorporated into Gaussian functions to generate two-dimensional Gaussian kernels and further form a Gaussian pyramid, and the Laplacian pyramid is generated by subtracting between different levels of the Gaussian pyramid. The computational formulas for both the Gaussian and Laplacian pyramids are, respectively, presented in Equations (1) and (2):

G_{l} (x) = x^{H \times W \times C} * \frac{1}{σ_{l} \sqrt{2 π}} e^{- \frac{i^{2} + j^{2}}{2 σ_{l}^{2}}}

(1)

L P_{l} = \{\begin{matrix} G_{0}, & l = 1 \\ G_{l - 2} - G_{l - 1}, & 2 \leq l \leq 5 \end{matrix}

(2)

where H, W, and C denote the height, width, and number of channels of the feature map, respectively,

x \in R^{H \times W \times C}

the input feature map of the FRCM, l the l-th level of the Gaussian pyramid ranging from

G_{0}

to

G_{4}

,

σ_{l}

the variance of the Gaussian pyramid in the l-th layer, i and j the two-dimensional Gaussian kernel coordinates, * the convolution operation, and

L P_{l}

the l-th layer of the Laplace pyramid.

The Laplacian pyramid, which consists of different frequency layers and a frequency attention module inspired by the ECA-net [19], is introduced in this paper. First, each layer of the Laplacian pyramid undergoes global max pooling to acquire global spatial information and generate weights for all input channels. Then, different from [14], one-dimensional convolution with shared weights is used to capture the dependencies of channels for each layer instead of the fully connected layer, which can reduce the model parameter number and enhance inter-channel information interaction capabilities. The calculation for global max pooling and inter-channel dependencies of feature maps in each layer are shown in Equations (3) and (4), respectively. The feature maps for each channel are computed by multiplying the learned weights with the input channel features

L {\tilde{P}}_{l}^{f} = w_{l}^{f} L P_{l}^{f}

.

G M P_{l}^{f} = \frac{1}{H \times W} \sum_{i}^{H} \sum_{j}^{W} L P_{l}^{f} (i, j)

(3)

w_{l}^{f} = σ (1 D C o n v (G M P_{l}^{f}))

(4)

where H and W represent the height and width of the feature map separately,

L P_{l}^{f}

the f-th channel of the feature map in the l-th layer,

G M P_{l}^{f}

the output of

L P_{l}^{f}

through global max pooling,

w_{l}^{f}

the learned weight for the f-th channel of the l-th layer,

1 D C o n v

and

σ

the one-dimensional convolution and sigmoid functions, respectively, and

L {\tilde{P}}_{l}^{f}

the output obtained by

L P_{l}^{f}

.

After recalibrating the feature maps of each layer, the frequency attention module fuses the features from all layers through 3D convolutional operations to produce the final output

x^{'}

of the FRCM, which can be formulated as follows:

x^{'} = σ (\sum_{l = 1}^{L} w_{l}^{'} L {\tilde{P}}_{l}^{f})

2.3. Residual Attention Module (RAM)

This paper introduces residual attention modules into the decoder to improve the segmentation performance of our LFRC-Net, as illustrated in Figure 5, which combines a residual block and a hybrid channel–spatial attention block. The residual block is incorporated to mitigate the issue of gradient vanishing, the channel attention block efficiently identifies and utilizes pivotal feature channels, and the spatial attention block concentrates on prominent local area features.

The features

x \in R^{H \times W \times C}

are inputted into a residual block comprising a depthwise separable convolution layer, dropblock layer, BN layer, and ReLU activation function to obtain the output

C L (X)

. Notably, the kernel size of both layers of depthwise separable convolution in the decoder is set to 3. Then

C L (X)

proceeds through a hybrid channel–spatial attention module with a channel attention mechanism and a spatial attention mechanism, as depicted in Figure 6.

In the channel attention mechanism,

C L (X)

is fed into average pooling and max pooling to capture the global information of each channel. Then, the channel attention weight vector is obtained by shared-weight one-dimensional convolution and sigmoid function and multiplied with

C L (X)

to obtain the output of the channel attention block

C A (x)

.

In the spatial attention mechanism, the spatial location features of

C A (x)

are obtained by summing the results of average pooling and max pooling. The spatial attention weight vector is acquired through a

7 \times 7

convolution and sigmoid function and multiplied with

C A (x)

to attain the ultimate output of RAM

x_{1} \in R^{H \times W \times C}

. The residual attention module is calculated from Equations (5)–(7).

C L (x) = R e l u (B N (D r o p b l o c k (S e p (x))))

(5)

C A (x) = C L \times σ (1 D C o n v (M P (C L)) + 1 D C o n v (A P (C L)))

(6)

x_{1} = C A \times σ (C o n v (M P (C A)) + 1 D C o n v (A P (C A)))

(7)

where

S e p

denotes depthwise separable convolution,

C A

channel attention,

B N

batch normalization,

C L

the first convolution layer of output, and

1 D C o n v

one-dimensional convolution, with

C o n v 7 \times 7

convolution. The features of input and output are

x \in R^{H \times W \times C}

and

x_{1} \in R^{H \times W \times C}

, respectively.

2.4. Loss Function

The cross-entropy loss is taken to measure the error of segmentation for four DR lesions at the pixel level, and the formula is shown in Equation (8):

L_{C E} (p, q) = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = i}^{M} p_{i j} log q_{i j}

(8)

where N represents the number of samples, M the number of classes,

p_{i j}

the label for the j-th class of the i-th sample in the ground truth where 1 indicates belonging to the class and 0 indicates otherwise, and

q_{i j}

the predicted probability for the j-th class of the i-th sample in the model’s output.

3. Results and Discussion

We validate the proposed architecture on two public datasets, IDRID and DDR, as well as one local dataset of ultra-wide-field fundus images. All the experiments are performed on an NVIDIA GeForce RTX 2080 Ti GPU (NVIDIA, Santa Clara, CA, USA) with 12 G of video memory.

3.1. Datasets

The IDRID dataset [20] was launched at the 2018 Biomedical Retinal Image Challenge International Conference for segmentation and rating. For segmentation of EX, HE, MA, and SE, there are 81 color fundus images with size

4288 \times 2848

, with 54 as the training dataset and 27 as the testing dataset. In total, there are 81 EX annotations, 81 MA annotations, 80 HE annotations, and 40 SE annotations. In this paper, these images undergo cropping, zero-padding, and resizing to

640 \times 640

. Considering the limited number of images in this dataset, to prevent overfitting, data augmentation methods such as horizontal flipping, vertical flipping, random rotation, and contrast limited adaptive histogram equalization (CLAHE) are involved. Throughout the experiments, these images are randomly cropped into patches of size

256 \times 256

to better extract local information.

The DDR dataset [21] contains 757 Chinese DR images with sizes from

1380 \times 1382

to

2736 \times 1824

, with 383 images for training, 225 for testing, and 149 for validation. These fundus images are provided with pixel-level annotation for EX, HE, MA, and SE if the image has this type of lesion. In total, there are 486 EX annotations, 570 MA annotations, 601 HE annotations, and 239 SE annotations. In this paper, all images are cropped and resized to

512 \times 512

, and CLAHE and random cropping into patches of size

112 \times 112

are performed to enhance the data.

The ultra-wide-field fundus images dataset is a local dataset collected and organized by the Xinhua Hospital affiliated with Shanghai Jiao Tong University School of Medicine and captured under a 200-degree field of view (FOV) with 261 fundus images of

3900 \times 3072

and pixel-level annotations of EX and HE. In the experiments, all images are resized to

1300 \times 1024

. A total of 165 images are used for training and 96 for testing. During training and testing, these images are randomly cropped into patches of size 256 × 256, and 20% of the patches from the training set are randomly selected to serve as the validation set in each iteration.

3.2. Training Parameter and Evaluation

The experiments are conducted on Keras and optimized using the Adam (adaptive moment estimation) optimizer. Considering the stability of the model training process and the speed of the training, the initial learning rate is set at 0.001. According to the sizes, complexities, and need to reduce the risk of model overfitting of the IDRID, DDR, and ultra-wide-field fundus image datasets, the training epochs for the three datasets are 100, 150, and 100, respectively. Considering the scales of the datasets, as well as the convergence speed of the model and hardware memory usage, the batch sizes for training the three datasets are 8, 8, and 16, respectively. During the training process, our model with the least weight loss in the validation set is retained for testing.

In this paper, quantitative evaluation metrics include accuracy, recall, specificity, precision, and F1, and ROC (receiver operating characteristic) and PR (precision–recall) curves are plotted. For fundus image segmentation, we primarily focus on the area under the ROC and PR curves. The ROC curve has the false-positive rate (FPR) on the x axis and the true-positive rate (TPR) on the y axis. The larger the area under the curve (AUC-ROC), the better the model’s performance at different thresholds. The PR curve has recall on the horizontal axis and precision on the vertical axis. The larger the area under the curve (mAP), the better the model’s performance in predicting the positive class.

3.3. Experimental Results

Table 1 gives the quantity analysis results on IDRID. The ROC and PR curves on IDRID given by DRNet, FRCU-Net, and our network are illustrated in Figure 7. Results that are the best are indicated in bold black. The results of [22,23,24,25,26,27,28,29,30,31] are from the original papers, while DRNet [32] and FRCU-Net [14] are given by our own implementations on this dataset. Our method achieve the highest AUC_ROC of HE and SE among all methods. The AUC_ROC of MA is lower at 0.51% than DRNet [32]. Additionally, the mAP of both EX and SE of our method rank first among all methods. The mAP for EX is 9.26% higher than [31] and 0.46% higher than [29]. We have the highest accuracy and specificity for the segmentation of the four types of DR lesions, which improves segmentation performance. Compared to the second-ranked [29], the mAPs of EX and SE increase by 0.46% and 0.41% respectively. At the same time, the F1 score of EX ranks second among all methods, only 0.06% lower than [22].

Table 2 presents the results of the segmentation on DDR. This paper contrasts the segmentation results of four DR lesions on DDR. The experimental results on HED [33], DeepLab v3+ [34], U-Net [35], and L-seg [36] come from RTNet [37], while [28,29,37,38,39] are based on their own publications. In the comparative results, most methods only provide results for the mAP and AUC_ROC of four types of DR lesions. Therefore, we mainly discuss the results of these two evaluation metrics for different methods on this dataset. For the segmentation of EX and MA, the mAP (60.51% and 14.35%) and AUC_ROC (97.74% and 95.64%) of our model are the best among all models; the mAP of HE segmentation is 0.84% higher than the second-ranked UNet [35]. Regarding the segmentation of SE, the AUC_ROC is significantly higher by 8.18% compared to RTNet [37]. The ROC curves and PR curves on DDR of our network are shown in Figure 8.

Table 3 shows the quantity analysis results on the ultra-wide-field fundus images. For the segmentation of EX, the mAP, F1, recall, and other metrics of LFRC-Net are the best among all the compared methods; the mAP is 57.90%, F1 is 52.77%, and recall is 41.95%. Regarding HE segmentation, our approach achieves the best AUC_ROC, mAP, and recall and the second best F1 (2.27% lower than SA_Unet [40]). The ROC curves and PR curves on this dataset of different models are shown in Figure 9.

We employ the lightweight MobileNetV2 as the encoder of our network to reduce training time and memory usage. Table 4 shows a comparison of the different lightweight model parameters on IDRID. As illustrated in the comparative results, our model boasts only 0.95 M parameters, the smallest number among all the methods evaluated.

3.4. Ablation

Ablation experiments are conducted to validate the effectiveness of the proposed RAM and FRCM on the IDRID, DDR, and ultra-wide-field fundus images datasets, which are presented separately in Table 5, Table 6 and Table 7.

For Table 5 on IDRID, using only the baseline network, the results of the mAP, F1, accuracy, and sensitivity of four DR lesions are the worst. After introducing RAM, the mAP and F1 of MA increase by 15.02% and 14.92% separately, indicating that RAM focuses more on interested areas and better locates lesion regions. However, compared to solely introducing RAM, the F1 score of MA decreases by 2.45% after the implementation of the FRCM. This decline is partly due to the uneven distribution of MA lesions and their very small size, posing challenges for accurately locating the positions of MA and segmenting based on their shapes. In contrast to solely introducing the FRCM, simultaneously introducing RAM and the FRCM increases the F1 score of MA by 1.22%, which indicates that RAM can assist in accurately locating the positions of small lesions, thereby enhancing segmentation performance. Although the simultaneous introduction of RAM and the FRCM leads to a 2.48% decrease in the F1 score of SE, while the mAP increases by 12.16%. Such results suggest a significant improvement in prediction accuracy, making this performance loss acceptable. Furthermore, compared to the baseline network, the mAP of EX, HE, MA, and SE increases by 5.17%, 4.93%, 10.57%, and 21.45%, respectively, and the F1 also improves by 6.63%, 7.86%, 13.69%, and 17.27% after introducing these two modules, which demonstrates that the low-frequency domain is more effective in segmenting lesion areas and reducing the impact of texture information bias. Overall, our network achieves the highest AUC_ROC and mAP of EX and SE while the mAP and F1 of HE also reaches optimal levels, which indicates the beneficial effects of introducing RAM and the FRCM on enhancing lesion segmentation.

For Table 6 on DDR, the baseline network exhibits relatively low performance with F1 of four DR lesions: 43.97%, 21.57%, 7.98% and 25.71%. After the introduction of RAM, the F1 of EX, MA, and SE increases by 2.34%, 0.32%, and 0.68% separately. After solely introducing the FRCM, the F1 score of HE decreases by 3.36%. This is because of the lower image quality of the DDR dataset and the influence of lighting conditions, leading to an incomplete presentation of the shape of HE lesions in some images, thereby affecting the segmentation effectiveness of HE. However, following the adoption of the FRCM, the AUC_ROC, mAP, and accuracy of the four types of DR lesions are all optimal. Additionally, the F1 scores of EX, MA, and SE are also the highest. The conclusion that both modules contribute to mitigating interference from background information and enhance the segmentation abilities of our network is obtained through these analyses.

As is shown in Table 7, after introducing RAM, the AUC_ROC, mAP, and F1 scores for both EX and HE improve. Following the introduction of the FRCM and compared to solely introducing RAM, although the F1 score of EX decreases by 0.81%, the mAP increases by 5.47%. Such performance loss is deemed acceptable. Overall, with the introduction of both RAM and the FRCM, the mAP, accuracy, and recall for EX are optimal, and similarly for HE, the AUC_ROC, mAP, F1, accuracy, and recall are also the highest. This result reinforces the conclusion we mentioned above and demonstrates the effectiveness of our network in segmenting multiple DR lesions on different datasets.

3.5. Visualization

The visualization results of DRNet, FRCU-Net, and our network for four types of DR lesions on the IDRID dataset are shown in Figure 10. Red boxes denote the labelled lesion areas, while blue boxes indicate the segmentation results. The first and second rows display the segmentation of EX, the third and fourth rows for HE, the fifth and sixth rows for SE, and the seventh and eighth rows for MA. In the segmentation of EX, it is obvious that our network can segment lesion areas where DRNet [32] and FRCU-Net [14] fail. For HE, from the third and fourth rows, it can be seen that our network achieves more precise segmentation compared to other networks, reducing the possibility of mis-segmentation. For SE, our algorithm is also more robust and obtains the better segmentation results. This demonstrates that our network improves segmentation performance by simultaneously utilizing the shape and texture information of DR lesions, reducing interference from irrelevant features. As far as MA is concerned, there is not much difference between the three algorithms in terms of results, because MA occupies only a few pixels, which challenges the involved segmentation methods.

For the visualization on the ultra-wide-field fundus images in Figure 11, the first and second rows show the segmentation results of EX and the third and fourth rows for HE. Green boxes represent lesion areas, and for clarity, red boxes provide magnified views of the green ones, while blue denote the segmentation results of the DR lesions. For EX, it is evident that our model accurately identifies the lesion areas. It can be observed in the third row that our network is also superior to other models even when the lesion areas are relatively scattered such as HE.

Figure 12, Figure 13 and Figure 14 demonstrate the segmentation results of our network on IDRID, DDR, and ultra-wide-field fundus image datasets separately. The first row shows the original images, the second row presents the ground truth, and the third row showcases the segmentation achieved by our model. In the segmentations depicted in Figure 12 and Figure 13, our network simultaneously segments four DR lesions. Blue, red, green, and yellow represent the segmentation results of EX, HE, SE, and MA, respectively. Due to the inconsistent image quality and significant impact of lighting in the DDR dataset, its segmentation performance is not as good as that of the IDRID dataset.

Figure 14 illustrates our network’s segmentation on ultra-wide-field fundus images. Our network simultaneously segments both EX and HE with blue and red, respectively. The low contrast of these fundus images and the sparsity of lesions make the segmentation full of challenges. Furthermore, there is a lot of interference in the image, such as eyelids and eyelashes, while the region of interest is only a small part. Our approach achieves higher-precision segmentation and a more comprehensive delineation of regions by utilizing attention mechanisms despite these factors.

4. Potential Clinical Implications

DR is a leading cause of adult blindness. This research contributes to the early detection and intervention of DR to prevent the progression of the disease. By utilizing LFRC-Net for automatic segmentation and analysis of retinal images, our network can locate the lesion areas, which improves diagnostic accuracy and reduces the workload of ophthalmologists. Furthermore, in this paper, multiple experiments validate the effectiveness of LFRC-Net in DR lesion segmentation. Thus, LFRC-Net can be considered for integration into existing diagnostic workflows, which primarily include data acquisition and preprocessing, lesion detection and segmentation, report generation, and clinical decision support.

5. Conclusions

In this paper, we propose a novel architecture called LFRC-Net that combines a frequency recalibration module with a residual attention module to simultaneously segment four types of DR lesions. The outstanding performance of our network is attributed to the frequency recalibration module to reduce texture information bias and effectively leverage the shape information of lesions for precise DR lesion segmentation. Additionally, the residual attention module concentrates on the lesion regions, which further improves the segmentation performance. The experiments on three datasets demonstrate the effectiveness of our network on segmenting multiple DR lesions.

Despite the remarkable achievements of LFRC-Net in regular field-of-view retinal images, it still faces significant challenges when dealing with multiple lesions segmentation on ultra-wide-field fundus images characterized by poor quality, low contrast, sparse lesion distribution, and small lesion shapes. This will be a focal point for our future work.

Author Contributions

Methodology, Y.F., M.L. and G.Z.; software, M.L. and G.Z.; validation, Y.F., M.L. and G.Z.; formal analysis, M.L.; investigation and resources, M.L. and G.Z.; data curation, Y.F., M.L. and J.P.; writing—original draft preparation, Y.F. and M.L.; writing—review and editing, Y.F. and M.L.; visualization, M.L. and G.Z.; supervision, J.P.; project administration, Y.F. and J.P.; funding acquisition, Y.F. and J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Natural Science Foundation of China (62176244, U2241245 and 62276167) and Key Laboratory of AI and Information Processing, Education Department of Guangxi Zhuang Autonomous Region (Hechi University), No. 2024GXZDSY009.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the first author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ciulla, T.A.; Amador, A.G.; Zinman, B. Diabetic retinopathy and diabetic macular edema: Pathophysiology, screening, and novel therapies. Diabetes Care 2003, 26, 2653–2664. [Google Scholar] [CrossRef] [PubMed]
Singh, R.P. Managing Diabetic Eye Disease in Clinical Practice; Springer: Berlin/Heidelberg, Germany, 2015; Volume 26, pp. 2653–2664. [Google Scholar]
Wild, S.; Roglic, G.; Green, A.; Sicree, R.; King, H. Global prevalence of diabetes: Estimates for the year 2000 and projections for 2030. Diabetes Care 2004, 27, 1047–1053. [Google Scholar] [CrossRef] [PubMed]
Salamat, N.; Missen, M.M.S.; Rashid, A. Diabetic retinopathy techniques in retinal images: A review. Artif. Intell. Med. 2019, 97, 168–188. [Google Scholar] [CrossRef] [PubMed]
Stolte, S.; Fang, R. A survey on medical image analysis in diabetic retinopathy. Med. Image Anal. 2020, 64, 101742. [Google Scholar] [CrossRef] [PubMed]
Wong, T.Y.; Sun, J.; Kawasaki, R.G.; Ruamviboonsuk, P.; Gupta, N.; Lansingh, V.C.; Maia, M.; Mathenge, W.; Moreker, S.; Muqit, M.M. Guidelines on diabetic eye care: The international council of ophthalmology recommendations for screening, follow-up, referral, and treatment based on resource settings. Ophthalmology 2018, 125, 1608–1622. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference; Nassir, N., Joachim, H., William, M.W., Alejandro, F.F., Eds.; Springer: Munich, Germany, 2015; pp. 234–241. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop; Danail, S., Zeike, T., Gustavo, C., Tanveer, S.M., Eds.; Springer: Granada, Spain, 2018; pp. 3–11. [Google Scholar]
Fan, D.P.; Ji, G.P.; Zhou, T.; Chen, G.; Fu, H.Z.; Shen, J.B.; Shao, L. Pranet: Parallel reverse attention network for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Anne, L.M., Purang, A., Danail, S., Dinana, M., Maria, A.Z.S., Kevin, K.Z., Daniel, R., Leo, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; pp. 263–273. [Google Scholar]
Fu, Y.H.; Liu, J.F.; Shi, J. TSCA-Net: Transformer based spatial-channel attention segmentation network for medical images. Comput. Biol. Med. 2024, 170, 107938. [Google Scholar] [CrossRef] [PubMed]
Li, H.J.; Li, H.; Qiu, Z.X.; Hu, Y.; Liu, J. Domain adaptive retinal vessel segmentation guided by high-frequency component. In Ophthalmic Medical Image Analysis; Bhavna, A., Huazhu, F., Cecilia, S.L., Tom, M., Yanwu, X., Yalin, Z., Eds.; Springer: Singapore, 2022; pp. 115–124. [Google Scholar]
Li, H.J.; Li, H.; Shu, H.; Chen, J.Y.; Hu, Y.; Liu, J. Self-supervision boosted retinal vessel segmentation for cross-domain data. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 18–21 April 2023; IEEE: Cartagena, Colombia, 2023; pp. 1–5. [Google Scholar]
Li, H.; Liu, H.F.; Hu, Y.; Fu, H.Z.; Zhao, Y.T.; Miao, H.P.; Liu, J. An annotation-free restoration network for cataractous fundus images. IEEE Trans. Med. Imaging 2022, 41, 1699–1710. [Google Scholar] [CrossRef] [PubMed]
Azad, R.; Bozorgpour, A.; Asadi-Aghbolaghi, M.; Merhof, D.; Escalera, S. Deep frequency re-calibration u-net for medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; IEEE Computer Society: Piscataway, NJ, USA, 2021; pp. 3274–3283. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.L.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE Computer Society: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Azad, R.; Fayjie, A.R.; Kauffmann, C.; Ben Ayed, I.; Pedersoli, M.; Dolz, J. On the texture bias for few-shot cnn segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; IEEE Computer Society: Piscataway, NJ, USA, 2021; pp. 2674–2683. [Google Scholar]
Hermann, K.; Chen, T.; Kornblith, S. The origins and prevalence of texture bias in convolutional neural networks. NIPS 2020, 33, 19000–19015. [Google Scholar]
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.H.; Zuo, W.M.; Hu, Q.H. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seattle, WA, USA, 14–19 June 2020; IEEE Computer Society: Piscataway, NJ, USA, 2020; pp. 11534–11542. [Google Scholar]
Porwal, P.; Pachade, S.; Kamble, R.; Kokare, M.; Deshmukh, G.; Sahasrabuddhe, V.; Meriaudeau, F. Indian diabetic retinopathy image dataset (IDRiD): A database for diabetic retinopathy screening research. Data 2018, 3, 25. [Google Scholar] [CrossRef]
Li, T.; Gao, Y.Q.; Wang, K.; Guo, S.; Liu, H.R.; Kang, H. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 2019, 501, 511–522. [Google Scholar] [CrossRef]
Guo, S.; Li, T.; Wang, K.; Zhang, C.; Kang, H. A lightweight neural network for hard exudate segmentation of fundus image. In Artificial Neural Networks and Machine Learning—ICANN 2019: Image Processing: 28th International Conference on Artificial Neural Networks; Igor, V.T., Vera, K., Pavel, K., Fabian, T., Eds.; Springer: Munich, Germany, 2019; pp. 189–199. [Google Scholar]
Zong, Y.S.; Chen, J.L.; Yang, L.Q.; Tao, S.Y.; Aoma, C.Y.Z.; Zhao, J.S.; Wang, S.H. U-net based method for automatic hard exudates segmentation in fundus images using inception module and residual connection. IEEE Access 2020, 8, 167225–167235. [Google Scholar] [CrossRef]
Lee, C.H.; Ke, Y.H. Fundus images classification for diabetic retinopathy using deep learning. In Proceedings of the 13th International Conference on Computer Modeling and Simulation, Melbourne, VIC, Australia, 25–27 June 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 264–270. [Google Scholar]
Ameri, N.; Shoeibi, N.; Abrishami, M. Segmentation of hard exudates in retina fundus images using BCDU-Net. In Proceedings of the 2022 12th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 17–18 November 2022; IEEE: Mashhad, Iran, 2022; pp. 123–128. [Google Scholar]
Skouta, A.; Elmoufidi, A.; Jai-Andaloussi, S.; Ouchetto, O. Hemorrhage semantic segmentation in fundus images for the diagnosis of diabetic retinopathy by using a convolutional neural network. J. Big Data 2022, 9, 78. [Google Scholar] [CrossRef]
Xiao, Q.Q.; Zou, J.X.; Yang, M.Q.; Gaudio, A.; Kitani, K.; Smailagic, A.; Costa, P.; Xu, M. Improving lesion segmentation for diabetic retinopathy using adversarial learning. In International Conference on Image Analysis and Recognition; Fakhri, K., Aurelio, C., Alfred, Y., Eds.; Springer: Waterloo, ON, Canada, 2019; pp. 333–344. [Google Scholar]
Liu, Q.; Liu, H.T.; Zhao, Y.; Liang, Y.X. Dual-branch network with dual-sampling modulated dice loss for hard exudate segmentation in color fundus images. IEEE J. BioMed 2021, 26, 1091–1102. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.J.; Lu, H.M.; Yan, H.X.; Kan, H.X.; Jin, L. Vison transformer adapter-based hyperbolic embeddings for multi-lesion segmentation in diabetic retinopathy. Sci. Rep. 2023, 13, 11178. [Google Scholar] [CrossRef] [PubMed]
Van Do, Q.; Hoang, H.T.; Van Vu, N.; De Jesus, D.A.; Brea, L.S.; Nguyen, H.X. Segmentation of hard exudate lesions in color fundus image using two-stage CNN-based methods. Expert Syst. Appl. 2024, 241, 122742. [Google Scholar]
Wang, Y.Q.; Hou, Q.S.; Cao, P.; Yang, J.Z.; Zaiane, O.R. Lesion-aware knowledge distillation for diabetic retinopathy lesion segmentation. Appl. Intell. 2024, 54, 1937–1956. [Google Scholar] [CrossRef]
Guo, C.L.; Szemenyei, M.; Yi, Y.G.; Xue, Y.; Zhou, W.; Li, Y.Y. Dense residual network for retinal vessel segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Barcelona, Spain, 2020; pp. 1374–1378. [Google Scholar]
Xie, S.N.; Tu, Z.W. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; IEEE: Santiago, Chile, 2015; pp. 1395–1403. [Google Scholar]
Chen, L.C.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV); Fakhri, K., Aurelio, C., Alfred, Y., Eds.; Springer: Munich, Germany, 2018; pp. 801–818. [Google Scholar]
Guan, S.; Khan, A.A.; Sikdar, S.; Chitnis, P.V. Fully dense UNet for 2-D sparse photoacoustic tomography artifact removal. IEEE JBHI 2019, 24, 568–576. [Google Scholar] [CrossRef] [PubMed]
Guo, S.; Li, T.; Kang, H.; Li, N.; Zhang, Y.; Wang, K. L-Seg: An end-to-end unified framework for multi-lesion segmentation of fundus images. Neurocomputing 2019, 349, 52–63. [Google Scholar] [CrossRef]
Huang, S.Q.; Li, J.N.; Xiao, Y.Z.; Shen, N.; Xu, T.F. RTNet: Relation transformer network for diabetic retinopathy multi-lesion segmentation. IEEE Trans. Med. Imaging 2022, 41, 1596–1607. [Google Scholar] [CrossRef]
Chen, Y.; Xu, S.B.; Long, J.; Xie, Y.N. DR-Net: Diabetic Retinopathy Detection with Fusion Multi-lesion Segmentation and Classification. Multimed. Tools Appl. 2023, 82, 26919–26935. [Google Scholar] [CrossRef]
Guo, T.J.; Yang, J.; Yu, Q. Diabetic retinopathy lesion segmentation using deep multi-scale framework. Biomed. Signal Proces. 2024, 88, 105050. [Google Scholar] [CrossRef]
Guo, C.L.; Szemenyei, M.; Yi, Y.G.; Wang, W.L.; Chen, B.E.; Fan, C.Q. Sa-unet: Spatial attention u-net for retinal vessel segmentation. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Milan, Italy, 2021; pp. 1236–1242. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Guo, C.L.; Szemenyei, M.; Hu, Y.T.; Wang, W.L.; Zhou, W.; Yi, Y.G. Channel Attention Residual u-Net for Retinal Vessel Segmentation. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; IEEE: Toronto, ON, Canada, 2021; pp. 1185–1189. [Google Scholar]

Figure 1. Four categories of DR lesions.

Figure 2. The framework of LFRC-Net.

Figure 3. Structure of the MobileNetV2 and its components.

Figure 4. Frequency recalibration module.

Figure 5. Residual attention module (RAM).

Figure 6. Channel spatial attention module (CSAM).

Figure 7. The ROC and PR curves on IDRID of DRNet and our network.

Figure 8. The ROC and PR curves on DDR of our network.

Figure 9. The ROC and PR curves on ultra-wide-field fundus images of different models.

Figure 10. Visualized results of LFRC-Net and DRNet on IDRID.

Figure 11. Visualized result of different models on ultra-wide-field fundus images.

Figure 12. Visualized results of our LFRC-Net on IDRID.

Figure 13. Visualized results of our LFRC-Net on DDR.

Figure 14. Visualized results of our LFRC-Net on ultra-wide-field fundus images.

Table 1. Comparative segmentation of EX, HE, MA, and SE on IDRID.

Method	Type of Lesion	AUC_ROC	mAP	F1	Accuracy	Recall	Specificity	Precision
Guo et al. [22]	EX	-	78.26%	78.15%	-	-	-	-
Zong et al. [23]	EX	-	-	-	97.95%	96.38%	97.14%	-
Lee et al. [24]	EX	-	-	-	97.21%	70.21%	92.09%	-
Ameri et al. [25]	EX	98.93%	83.15%	76.81%	99.30%	-	99.74%	-
Liu et al. [28]	EX	-	77.67%	-	-	-	-	-
Van et al. [30]	EX	-	85.00%	-	-	79.60%	-	-
Skouta et al. [26]	HE	-	-	-	98.68%	80.49%	99.68%	99.98%
Wang et al. [31]	EX	-	77.60%	71.10%	-	-	-	-
	HE	-	63.60%	59.40%	-	-	-	-
	MA	-	53.20%	43.80%	-	-	-	-
Xiao et al. [27]	EX	-	-	69.08%	-	-	-	-
	HE	-	-	45.76%	-	-	-	-
	MA	-	-	42.98%	-	-	-	-
	SE	-	-	43.98%	-	-	-	-
FRCU-Net [14]	EX	98.98%	84.66%	74.70%	99.30%	66.30%	99.82%	85.54%
	HE	95.69%	58.09%	45.36%	98.81%	34.83%	99.94%	68.73%
	MA	98.01%	38.64%	30.36%	99.87%	21.55%	99.96%	56.01%
	SE	97.75%	55.79%	46.98%	98.88%	38.72%	99.80%	75.18%
DRNet [32]	EX	99.00%	84.20%	73.82%	99.31%	64.30%	99.79%	86.64%
	HE	95.81%	65.52%	53.80%	98.97%	39.58%	99.78%	83.96%
	MA	98.45%	42.72%	36.91%	99.87%	25.79%	99.98%	64.89%
	SE	98.64%	55.60%	48.80%	99.80%	37.42%	99.96%	70.12%
Wang et al. [29]	EX	99.43%	86.40%	-	-	-	-	-
	HE	94.87%	64.94%	-	-	-	-	-
	MA	98.39%	40.63%	-	-	-	-	-
	SE	94.63%	59.78%	-	-	-	-	-
Ours	EX	99.24%	86.86%	78.09%	99.37%	71.97%	99.81%	85.34%
	HE	96.34%	62.97%	55.85%	98.95%	44.05%	99.79%	76.27%
	MA	97.94%	35.67%	28.75%	99.87%	28.73%	99.98%	61.81%
	SE	98.86%	60.19%	47.31%	99.80%	34.48%	99.97%	75.36%

Table 2. Comparative segmentation of EX, HE, MA, and SE on DDR.

Method	Type of Lesion	AUC_ROC	AUC_PR	F1	Accuracy	Recall	Specificity	Precision
Liu et al. [28]	EX	-	50.68%	-	-	-	-	-
HED [33]	EX	96.12%	42.52%	-	-	-	-	-
	HE	88.78%	20.14%	-	-	-	-	-
	MA	92.99%	6.52%	-	-	-	-	-
	SE	82.15%	13.01%	-	-	-	-	-
DeepLab v3+ [34]	EX	96.41%	54.05%	-	-	-	-	-
	HE	93.08%	37.89%	-	-	-	-	-
	MA	92.45%	3.16%	-	-	-	-	-
	SE	86.42%	21.85%	-	-	-	-	-
U-Net [35]	EX	97.41%	55.05%	-	-	-	-	-
	HE	93.87%	38.99%	-	-	-	-	-
	MA	93.66%	3.34%	-	-	-	-	-
	SE	87.78%	24.55%	-	-	-	-	-
L-seg [36]	EX	97.26%	56.45%	-	-	-	-	-
	HE	92.98%	35.88%	-	-	-	-	-
	MA	94.23%	11.74%	-	-	-	-	-
	SE	87.95%	26.54%	-	-	-	-	-
RTNet [37]	EX	97.51%	56.71%	-	-	-	-	-
	HE	93.21%	36.56%	-	-	-	-	-
	MA	94.52%	11.76%	-	-	-	-	-
	SE	88.45%	29.43%	-	-	-	-	-
DR-Net [38]	EX	-	-	-	95.24%	-	-	-
	HE	-	-	-	96.91%	-	-	-
	MA	-	-	-	94.83%	-	-	-
	SE	-	-	-	97.89%	-	-	-
Wang et al. [29]	EX	97.45%	56.07%	-	-	-	-	-
	HE	94.03%	37.78%	-	-	-	-	-
	MA	94.09%	10.66%	-	-	-	-	-
	SE	88.57%	26.94%	-	-	-	-	-
Guo et al. [39]	EX	-	60.14%	54.06%	-	-	-	-
	HE	-	35.52%	33.17%	-	-	-	-
	MA	-	11.94%	15.22%	-	-	-	-
	SE	-	32.60%	38.76%	-	-	-	-
Ours	EX	97.74%	60.51%	59.24%	99.62%	34.75%	99.97%	86.24%
	HE	91.78%	39.83%	30.22%	99.14%	12.01%	99.98%	87.17%
	MA	95.64%	14.35%	17.64%	99.97%	12.49%	99.99%	40.56%
	SE	96.63%	18.99%	25.24%	99.95%	17.41%	99.99%	45.83%

Table 3. Comparative segmentation of EX and HE on ultra-wide-field fundus images.

Method	Type of Lesion	AUC_ROC	mAP	F1	Accuracy	Recall	Specificity	Precision
U-Net [7]	EX	98.80%	43.59%	45.99%	99.91%	41.71%	99.96%	51.26%
U-Net [7]	HE	94.26%	49.99%	47.37%	99.60%	35.67%	99.92%	70.48%
AttentionUnet [41]	EX	99.03%	55.90%	48.85%	99.93%	38.18%	99.98%	67.81%
AttentionUnet [41]	HE	93.70%	52.87%	49.81%	99.61%	38.27%	99.92%	71.30%
DRNet [32]	EX	99.15%	41.3%	44.58%	99.91%	40.18%	99.96%	50.06%
DRNet [32]	HE	95.54%	50.55%	48.95%	99.59%	39.11%	99.89%	65.40%
SA_Unet [40]	EX	99.59%	53.67%	49.59%	99.92%	41.07%	99.98%	62.59%
SA_Unet [40]	HE	95.25%	52.54%	53.13%	99.57%	47.68%	99.84%	59.99%
CAR_UNet [42]	EX	99.60%	47.19%	44.95%	99.92%	35.04%	99.97%	62.67%
CAR_UNet [42]	HE	96.07%	48.78%	45.03%	99.58%	33.63%	99.92%	68.13%
Ours	EX	99.51%	57.90%	52.77%	99.93%	41.95%	99.98%	71.10%
Ours	HE	97.34%	53.63%	50.86%	99.61%	40.35%	99.91%	68.74%

Table 4. Comparison of the different lightweight model parameters.

Method	U-Net	AttentionUnet	DRNet	CAR_UNet	Ours
Parameter	1.95 M	2.34 M	1.62 M	1.05 M	0.95 M

Table 5. Performance comparison of different components of our network on IDRID.

Method	Type of Lesion	AUC_ROC	mAP	F1	Accuracy	Recall	Specificity	Precision
Baseline	EX	99.22%	81.69%	71.46%	99.23%	62.09%	99.82%	84.17%
	HE	96.53%	58.14%	47.99%	98.87%	34.56%	99.85%	78.50%
	MA	97.18%	24.92%	15.06%	99.86%	8.61%	99.99%	61.47%
	SE	98.74%	38.74%	30.04%	99.76%	19.88%	99.97%	61.47%
Baseline+RAM	EX	99.12%	85.68%	77.19%	99.34%	71.98%	99.77%	83.22%
	HE	96.67%	61.51%	54.58%	98.93%	42.74%	99.79%	75.48%
	MA	98.69%	39.94%	29.98%	99.87%	19.24%	99.99%	67.80%
	SE	98.28%	53.97%	46.89%	99.79%	35.37%	99.96%	69.52%
Baseline+FRCM	EX	99.24%	82.82%	68.55%	99.21%	55.84%	99.89%	88.77%
	HE	94.88%	56.39%	49.36%	98.88%	36.30%	99.83%	77.10%
	MA	98.17%	34.99%	27.53%	99.87%	17.49%	99.99%	64.64%
	SE	97.64%	48.03%	49.79%	99.80%	38.25%	99.96%	71.31%
Baseline+RAM+FRCM	EX	99.24%	86.86%	78.09%	99.37%	71.97%	99.81%	85.34%
	HE	96.34%	62.97%	55.85%	98.95%	44.05%	99.79%	76.27%
	MA	97.94%	35.67%	28.75%	99.87%	18.73%	99.98%	61.81%
	SE	98.86%	60.19%	47.31%	99.80%	34.48%	99.97%	75.36%

Table 6. Performance comparison of different components of our network on DDR.

Method	Type of Lesion	AUC_ROC	mAP	F1	Accuracy	Recall	Specificity	Precision
Baseline	EX	96.60%	53.10%	43.97%	99.60%	30.04%	99.96%	81.98%
	HE	87.48%	34.70%	21.57%	99.13%	13.13%	99.97%	80.31%
	MA	95.15%	7.77%	7.98%	99.95%	4.51%	99.99%	34.31%
	SE	95.79%	17.91%	21.71%	99.93%	25.71%	99.96%	25.71%
Baseline+RAM	EX	96.54%	53.68%	46.31%	99.61%	32.20%	99.96%	82.45%
	HE	90.30%	35.06%	22.00%	99.13%	10.47%	99.97%	86.41%
	MA	92.26%	7.95%	8.30%	99.96%	4.71%	99.99%	35.30%
	SE	94.21%	13.32%	22.39%	99.93%	22.16%	99.96%	22.63%
Baseline+FRCM	EX	97.52%	57.07%	48.81%	99.62%	34.69%	99.97%	82.32%
	HE	88.63%	36.29%	18.64%	99.14%	12.68%	99.97%	83.02%
	MA	95.08%	10.70%	9.94%	99.97%	5.76%	99.99%	36.42%
	SE	95.65%	14.65%	23.72%	99.92%	26.96%	99.95%	21.18%
Baseline+RAM+FRCM	EX	97.74%	60.51%	49.13%	99.62%	34.75%	99.97%	86.24%
	HE	91.78%	39.83%	21.61%	99.14%	12.01%	99.98%	87.17%
	MA	95.64%	14.35%	12.64%	99.97%	7.49%	99.99%	40.56%
	SE	96.63%	18.99%	25.24%	99.95%	17.41%	99.99%	45.83%

Table 7. Performance comparison of different components of our network on ultra-wide-field fundus images.

Method	Type of Lesion	AUC_ROC	mAP	F1	Accuracy	Recall	Specificity	Precision
Baseline	EX	99.42%	47.63%	43.78%	99.92%	33.14%	99.97%	64.48%
Baseline	HE	96.62%	48.81%	43.11%	99.59%	30.94%	99.94%	71.05%
Baseline+RAM	EX	99.53%	52.43%	53.58%	99.92%	41.95%	99.97%	57.78%
Baseline+RAM	HE	94.81%	50.02%	46.00%	99.60%	33.96%	99.93%	71.30%
Baseline+FRCM	EX	99.61%	52.25%	51.45%	99.91%	48.81%	99.96%	54.40%
Baseline+FRCM	HE	96.58%	50.32%	43.33%	99.59%	30.98%	99.94%	72.08%
Baseline+RAM+FRCM	EX	99.51%	57.90%	52.77%	99.93%	49.95%	99.98%	71.10%
Baseline+RAM+FRCM	HE	97.34%	53.63%	50.86%	99.61%	40.35%	99.91%	68.75%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, Y.; Liu, M.; Zhang, G.; Peng, J. Lightweight Frequency Recalibration Network for Diabetic Retinopathy Multi-Lesion Segmentation. Appl. Sci. 2024, 14, 6941. https://doi.org/10.3390/app14166941

AMA Style

Fu Y, Liu M, Zhang G, Peng J. Lightweight Frequency Recalibration Network for Diabetic Retinopathy Multi-Lesion Segmentation. Applied Sciences. 2024; 14(16):6941. https://doi.org/10.3390/app14166941

Chicago/Turabian Style

Fu, Yinghua, Mangmang Liu, Ge Zhang, and Jiansheng Peng. 2024. "Lightweight Frequency Recalibration Network for Diabetic Retinopathy Multi-Lesion Segmentation" Applied Sciences 14, no. 16: 6941. https://doi.org/10.3390/app14166941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lightweight Frequency Recalibration Network for Diabetic Retinopathy Multi-Lesion Segmentation

Abstract

1. Introduction

2. Methods

2.1. Encoder Module

2.2. Frequency Recalibration Module (FRCM)

2.3. Residual Attention Module (RAM)

2.4. Loss Function

3. Results and Discussion

3.1. Datasets

3.2. Training Parameter and Evaluation

3.3. Experimental Results

3.4. Ablation

3.5. Visualization

4. Potential Clinical Implications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI