Next Article in Journal
Rewriting a Generative Model with Out-of-Domain Patterns
Previous Article in Journal
Mechanoreceptor-Inspired Tactile Sensor Topological Configurations for Hardness Classification in Robotic Grippers
Previous Article in Special Issue
EOS: Edge-Based Operation Skip Scheme for Real-Time Object Detection Using Viola-Jones Classifier
 
 
Article
Peer-Review Record

Lightweight Denoising Diffusion Implicit Model for Medical Segmentation

Electronics 2025, 14(4), 676; https://doi.org/10.3390/electronics14040676
by Rina Oh * and Tad Gonsalves
Reviewer 2: Anonymous
Electronics 2025, 14(4), 676; https://doi.org/10.3390/electronics14040676
Submission received: 17 January 2025 / Revised: 5 February 2025 / Accepted: 9 February 2025 / Published: 10 February 2025
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Authors design a system for segmentation of medical images. They base in diffusion deep learning models using input noise images to generate a segmentation image.

A more detailed explanation of figure 1 should be issued for appropriate understanding of the following.

DETAILS:

1.-Lines 55-58: What is DICE?

Please define: DiceScore = 2*(number of common elements) / (number of elements in set A + number of elements in set B), is it so?

Recall can also be defined, perhaps in results section, referring here to that.

You state improvements in model size, improvements in runtime should also be commented here.

2.- Line 103: sentence "Medical hardware is extremely expensive..." is a bit surprising as medical equipment is normally very expensive. Can you give more data about especifical needs?

3.- Figure 1: Is this a model for training? It should be as target is an input, isn't it? I don´t find how you compute embedded time: emb(t).

4.- Equation 1: Subindex in the error part is not visible.

5.- Figure 3: Please define or reference SiLU (is Sigmoid Linear UNIT). GroupNorm is GroupNormalization, can you describe this?

6.- Lines 220-222: How is threshold applied??? Dice>th is considered True-Positive for recall computation?

 

Author Response

Comment1: Lines 55-58: What is DICE? Please define: DiceScore = 2*(number of common elements) / (number of elements in set A + number of elements in set B), is it so? Recall can also be defined, perhaps in results section, referring here to that. You state improvements in model size, improvements in runtime should also be commented here.

Response1: We have added a detailed explanation of the Dice score calculation in Lines 228-234. Additionally, we have included comments on the improvements in runtime achieved by our proposed model in the Introduction section (Lines 56-57)

Comment2: Line 103: sentence "Medical hardware is extremely expensive..." is a bit surprising as medical equipment is normally very expensive. Can you give more data about especifical needs?

Response2: We have clarified the statement by explaining that hospitals often operate under strict budget constraints, making it difficult to invest in additional high-performance hardware (Lines 103-108).

Comment3: Figure 1: Is this a model for training? It should be as target is an input, isn't it? I don´t find how you compute embedded time: emb(t).

Response3: We have revised the figure caption to explicitly state that the model overview represents the training phase. About emb(t), emb(t) is computed using multiple linear layers from the single scalar value of t.

Comment4: Equation 1: Subindex in the error part is not visible.

Response4: We have carefully reviewed Equation 1 but could not identify any rendering issues in the generated PDF. If the issue persists, could you provide more details on where the visibility problem occurs? We will further verify and adjust the formatting if necessary.

Comment5: Figure 3: Please define or reference SiLU (is Sigmoid Linear UNIT). GroupNorm is GroupNormalization, can you describe this?

Response5: We have added the formal names and brief descriptions of SiLU and GroupNorm in Figure 3 and its caption.

Comment6: Lines 220-222: How is threshold applied??? Dice>th is considered True-Positive for recall computation?

Response6: We have provided a more detailed explanation of how the threshold is applied and its role in segmentation evaluation in Lines 225-228.

Reviewer 2 Report

Comments and Suggestions for Authors

Summary

The paper introduces a lightweight Denoising Diffusion Implicit Model (DDIM) tailored for medical image segmentation tasks. By incorporating grouped convolutions and streamlined self-attention mechanisms into a U-Net-based framework, the proposed model significantly reduces computational and storage demands without compromising performance. The paper demonstrates the model's effectiveness on diverse datasets, achieving competitive segmentation quality, reduced memory usage, and faster inference times compared to standard DDIM.

 

Strengths

  1. The use of grouped convolutions and simplified self-attention layers effectively addresses the computational challenges of diffusion models, making the method suitable for resource-constrained environments.
  2. The model's performance is validated on diverse datasets, including lung, melanoma, and polyp segmentation, showcasing its adaptability to varying medical imaging tasks.
  3. The lightweight design, which reduces both memory and computational costs, is practical for deployment in real-world medical scenarios, particularly in low-resource settings.

 

Weaknesses and Suggestions

  1. The paper does not examine the impact of image blurriness (i.e., MC-Blur dataset) and low resolution (i.e., DIV2k dataset) on segmentation performance. Although experiments may not be necessary, a qualitative discussion of these factors could enhance understanding of the model's limitations and potential refinements.
  2. The Lightweight DDIM struggles with the diverse features of PolypDB, which include variable shapes, lighting conditions, and positions. Future work should consider incorporating dual-attention mechanisms to capture both spatial and channel-wise relationships for better performance on complex datasets.
  3. The paper could explore robustness against adversarial inputs or noise to better assess real-world deployment in medical environments.
  4. The reliance on paired image-mask datasets restricts scalability. Future work could explore unpaired training methods to broaden the applicability of the proposed model.
  5. While the model demonstrates strong computational results, clinical validation with radiologists or practitioners would substantiate its real-world utility.

 

Recommendation

I recommend this paper be accepted with major revision. The paper presents a promising lightweight diffusion model for medical segmentation, effectively balancing computational efficiency with segmentation accuracy. However, the limitations highlighted should be addressed to enhance the paper's scientific and practical value.

 

Author Response

Comment: 

The paper does not examine the impact of image blurriness (i.e., MC-Blur dataset) and low resolution (i.e., DIV2k dataset) on segmentation performance. Although experiments may not be necessary, a qualitative discussion of these factors could enhance understanding of the model's limitations and potential refinements.

The Lightweight DDIM struggles with the diverse features of PolypDB, which include variable shapes, lighting conditions, and positions. Future work should consider incorporating dual-attention mechanisms to capture both spatial and channel-wise relationships for better performance on complex datasets.

The paper could explore robustness against adversarial inputs or noise to better assess real-world deployment in medical environments.

The reliance on paired image-mask datasets restricts scalability. Future work could explore unpaired training methods to broaden the applicability of the proposed model.

While the model demonstrates strong computational results, clinical validation with radiologists or practitioners would substantiate its real-world utility.

Response: Thank you for your valuable feedback. We have incorporated a discussion on enhancing model robustness against noisy inputs and adversarial attacks in Lines 345-354. Additionally, we acknowledge the importance of analyzing the effects of image blurriness and low resolution on segmentation performance. Lastly, we would like to suggest future collaborations with radiologists or practitioners to assess the model's real-world applicability.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Issues have been adressed. Please check equation at line 229 (2TP in the denominator is correct??).

In lines 225 and following you should specify how many coincident pixels you must find to consider a correct (TP) segmentation.

Author Response

Comment1: Issues have been addressed. Please check equation at line 229 (2TP in the denominator is correct??).

Response1: Thank you for your advice. We have corrected the equation for the Dice score to ensure its accuracy.

Comment2: In lines 225 and following you should specify how many coincident pixels you must find to consider a correct (TP) segmentation.

Response2: Since the target segmentation images have the variety in shape, size, and so on. It is difficult to define a fixed number of TP pixels. Instead, we have added a detailed explanation on how TP is counted in Lines 234-236.

Reviewer 2 Report

Comments and Suggestions for Authors

In this revised version, the authors have addressed several of the previously raised concerns. However, there appears to be no explicit discussion of how the proposed method performs on blurry or low-resolution images. Therefore, I recommend that the authors read the previous comments, discuss relevant LR and blurry datasets in the next version, and analyze the robustness of their method. It would greatly enhance the completeness and generalizability, and practical relevance of the proposed approach.

Author Response

Comment1: 

In this revised version, the authors have addressed several of the previously raised concerns. However, there appears to be no explicit discussion of how the proposed method performs on blurry or low-resolution images. Therefore, I recommend that the authors read the previous comments, discuss relevant LR and blurry datasets in the next version, and analyze the robustness of their method. It would greatly enhance the completeness and generalizability, and practical relevance of the proposed approach.

Response1: In Lines 351-363, we have addressed the issue of handling degraded images and proposed a solution by incorporating a pre-trained super-resolution model to reconstruct high-resolution images before performing segmentation. This approach aims to improve robustness against low-quality inputs.

Back to TopTop