Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network

Chen, Qi; Wang, Wenmin; Wang, Zhibing; Jia, Haomei; Zhao, Minglu

doi:10.3390/app15179259

Open AccessArticle

Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network

by

Qi Chen

,

Wenmin Wang

^*

,

Zhibing Wang

,

Haomei Jia

and

Minglu Zhao

School of Computer Science and Engineering, Macau University of Science and Technology, Macau, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9259; https://doi.org/10.3390/app15179259

Submission received: 28 July 2025 / Revised: 19 August 2025 / Accepted: 21 August 2025 / Published: 22 August 2025

Download

Browse Figures

Versions Notes

Abstract

Medical image segmentation is crucial for disease diagnosis, as precise results aid clinicians in locating lesion regions. However, lesions often have blurred boundaries and complex shapes, challenging traditional methods in capturing clear edges and impacting accurate localization and complete excision. Small lesions are also critical but prone to detail loss during downsampling, reducing segmentation accuracy. To address these issues, we propose a novel adaptive scale thresholding network (AdSTNet) that acts as a post-processing lightweight network for enhancing sensitivity to lesion edges and cores through a dual-threshold adaptive mechanism. The dual-threshold adaptive mechanism is a key architectural component that includes a main threshold map for core localization and an edge threshold map for more precise boundary detection. AdSTNet is compatible with any segmentation network and introduces only a small computational and parameter cost. Additionally, Spatial Attention and Channel Attention (SACA), the Laplacian operator, and the Fusion Enhancement module are introduced to improve feature processing. SACA enhances spatial and channel attention for core localization; the Laplacian operator retains edge details without added complexity; and the Fusion Enhancement module adapts concatenation operation and Convolutional Gated Linear Unit (ConvGLU) to improve feature intensities to improve edge and small lesion segmentation. Experiments show that AdSTNet achieves notable performance gains on ISIC 2018, BUSI, and Kvasir-SEG datasets. Compared with the original U-Net, our method attains mIoU/mDice of 83.40%/90.24% on ISIC, 71.66%/80.32% on BUSI, and 73.08%/81.91% on Kvasir-SEG. Moreover, similar improvements are observed in the rest of the networks.

Keywords:

blurred lesion image segmentation; small lesion segmentation; adaptive scale threshold maps; medical image segmentation

1. Introduction

A blurred lesion image refers to an image of a lesion (abnormal or damaged tissue) where the boundaries or edges of the lesion are unclear or the size of the lesion is small, making it difficult to accurately distinguish the lesion from surrounding healthy tissue. This blurring can occur due to various factors such as motion artifacts, insufficient resolution, or the natural characteristics of specific lesions, especially those with poorly defined edges. Therefore, blurred lesion image segmentation is one of the hardest tasks in disease diagnosis. In current clinical medicine, segmentation results are mainly manually annotated by experts with professional backgrounds, which is time-consuming and subjective [1]. Automated and precise segmentation methods are essential, enabling faster and more accurate identification of lesion regions, particularly in clinical applications such as case analysis and surgical planning. As illustrated in Figure 1a for skin lesions, lesion regions often exhibit blurred boundaries and complex shapes, where accurate edge segmentation assists clinicians in determining lesion size, shape, and growth patterns, facilitating optimized treatment plans. Not only limited to cases where the edges of the lesion are blurred, the segmentation of small object lesion areas [2,3,4,5] is also crucial in treatment and diagnosis, one example being the colorectal polyp endoscopic images in Figure 1b. Moreover, in early-stage diseases like glaucoma [6], skin cancer [7], colorectal cancer [8], and hepatocellular carcinoma [9], lesion regions are typically small and have indistinct boundaries, reflecting disease risks and progression. Accurate delineation of lesion locations and boundaries is thus critical for effective excision.

The rapid advance of deep learning has markedly improved the reliability, performance, and accuracy of diagnostic systems. U-Net [10] and its variants—built on encoder–decoder designs—have excelled in medical image segmentation by capturing edges, textures, and semantic context. Yet the limited receptive field of CNNs biases them toward local features and weakens global reasoning. To mitigate this, Transformers have been introduced into vision, and hybrid CNN–Transformer models such as TransUNet [11] and UNETR [12] report strong results. Nevertheless, downsampling still erodes fine structures, hindering precise localization of small lesions and their boundaries. Because boundary information is clinically critical, accurate segmentation of small lesions and blurred edges remains a central need, especially for early disease detection.

In small-object segmentation for blurred lesion images (e.g., polyps or glaucoma), lesions are tiny yet carry crucial pathological cues, making reliable separation from healthy tissue essential for diagnosis and treatment. Multi-scale fusion and adaptive enhancement are widely used to enlarge the effective receptive field and capture fine detail. For example, CLD-Net [13] combines local edge extraction with global–local fusion to reduce detail loss from downsampling and achieves notable gains in polyp segmentation. Similarly, CFANet [14] integrates context feature fusion (CFF) with effective channel–spatial attention (ECSA) to emphasize key regions and improve accuracy in detecting small-object against complex backgrounds. However, these benefits often come with substantial computational and storage overhead.

Enhancing blurred lesion boundaries is equally important for research and clinical use. In skin disease segmentation, lesions frequently abut normal tissue, creating low-contrast borders that complicate delineation. Although boundary-oriented objectives such as Boundary-IoU [15] strengthen foreground–background separation, loss functions alone are insufficient for robust edge recovery. To better exploit boundary cues, Edge U-Net [16] fuses boundary-related MRI with raw MRI during decoding. Despite progress, indistinct boundaries remain challenging. Recently, the large-scale MedSAM [17]—trained on over one million medical images—has shown strong potential for accurate and efficient segmentation.

To address these challenges, we propose AdSTNet, a training-time, lightweight channel-feature module that increases sensitivity to lesion edges and small targets without changing the base architecture. Downsampling in existing networks often blurs boundaries and erases fine details; AdSTNet mitigates this with a dual-threshold mechanism—incorporating core and edge maps—that guides learning toward lesion centers and boundary transitions. Its adaptive scale thresholding enlarges the effective context for small objects, improving recall while avoiding missed or spurious regions. AdSTNet is plug-and-play with common medical segmentation backbones and introduces only a small computational and parameter cost in practice. By adaptively modeling boundary-pixel probabilities, it preserves edge detail and clarifies fuzzy contours, yielding more precise and robust segmentation in complex clinical settings. Therefore, our main contributions are as follows:

AdSTNet, standing for adaptive scale thresholding network, is a novel channel-feature post-processing lightweight network designed to improve blurred lesion image segmentation, particularly for small lesions and blurred edges in medical images. AdSTNet is only applied during the training phase and is compatible with any segmentation network, demonstrating its strong compatibility and efficiency.
AdSTNet adaptively generates body and edge threshold maps based on lesion characteristics. To further refine feature processing, a Spatial and Channel Attention (SACA) module is introduced for the body threshold map, while a Laplacian operator is applied to the edge threshold map. Additionally, a Fusion Enhancement module is incorporated to enhance lesion localization and edge detection.
Beyond binary segmentation, AdSTNet generates lesion visualization maps using the adaptive scale threshold map. These maps enhance interpretability, bridging the gap between raw segmentation and clinical diagnosis and aiding accurate medical decision making.
We tested AdSTNet on three public datasets: ISIC 2018, BUSI, and Kvasir-SEG. The results demonstrate its robustness, effectiveness, and high segmentation accuracy with lower parameter increases and segmentation time.

2. Related Work

This section outlines prior work on Section 2.1, medical image segmentation, then summarizes Section 2.2—the Vatti Clipping Algorithm—and Section 2.3, the Laplacian of Gaussian.

2.1. Medical Image Segmentation Methods

CNN/U-Net family: U-Net established the encoder–decoder paradigm with skip connections. UNet++ narrows the semantic gap via densely nested skips [18]; ResUNet integrates residual and SE modules to strengthen salient cues [19]; and nnU-Net auto-configures architectures and training to diverse datasets and tasks [20]. Further variants improve feature reuse, stability, and volumetric modeling, including DenseUNet [21], ResUNet++ [22], and 3D U-Net [23].

Transformers and hybrids: With ViT bringing Transformers to vision [24], TransUNet first coupled CNNs and Transformers for medical segmentation [11]. UNETR and Swin UNETR leverage Transformer encoders for global context with CNN decoders for precise reconstruction [12,25]. UNeXt targets efficiency with fewer parameters and lower complexity while maintaining accuracy [26]. Dual-branch designs such as TransFuse, DS-TransUNet, and HiFormer fuse local textures (CNN) with long-range dependencies (Transformer) [27,28,29]. Beyond Transformers, HCMUNet combines CNN-Mamba with attention-enhanced skips and multi-scale fusion [30], while MA-DenseUNet augments U-Net with attention, Bi-LSTM, and GAN for improved skin lesion segmentation and generalization [31].

Small lesions are easily lost during downsampling and suffer from limited receptive fields at low resolutions. EFCNet performs layer-wise fusion for fine-scale targets [32]; APAUNet uses axial projection attention to map 3D features to multiple 2D planes for robust small-object segmentation [33]; recent advances—GravityNet, SvANet, IHA-Net—further mitigate detail loss and background interference [34,35,36].

Explicit boundary modeling improves contour fidelity. ETNet learns edge attention early and integrates it into multi-scale decoding [37]; EANet adapts receptive fields and combines complementary edge/region cues [38]; and PATrans emphasizes critical regions via adaptive pixel relationships [39].

2.2. Vatti Clipping Algorithm

The Vatti algorithm provides a generic and robust framework for polygon boolean operations—union, intersection, difference, and XOR—on arbitrary polygons, including concave shapes, holes, and self-intersections [40]. In practice, Vatti-style processing underpins polygon offsetting that yields topology-preserving parallel contours with controllable thickness. Contour-parallel toolpaths in additive manufacturing explicitly rely on inward/outward polygon offsets [41], and parallel polygon overlay studies adopt GPC (a widely used implementation of Vatti-style clipping/offsetting) as a performance baseline in GPU/MPI comparisons [42]. In this work, such offsetting is used to transform polygonal annotations into dilating and shrinking masks.

2.3. Laplacian of Gaussian

The Laplacian computes second-order spatial derivatives, producing strong responses at rapid intensity changes and identifying edges at zero-crossings. Because second derivatives amplify noise, it is commonly combined with smoothing; the Laplacian-of-Gaussian (LoG) integrates Gaussian filtering and yields a natural multi-scale control via its

δ

parameter (DoG is a practical approximation). Unlike first-order gradient operators such as Sobel [43] and Prewitt [44]—which provide orientation-specific responses at very low cost—the (smoothed) Laplacian is isotropic (rotation-invariant) and detects edges independently of direction, facilitating closed, thin, and highly curved boundary extraction with accurate edge-center localization. In deep models, applying LoG in feature space further suppresses background clutter while preserving high-frequency cues, making it well suited for refining lesion boundaries.

3. Methodology

As shown in Figure 2, the input image undergoes feature extraction via a neural network, resulting in a feature map F of size

H \times W \times 1

. The feature map F is then used to predict a probability map

P_{o u t}

as well as two adaptive threshold maps: the dilating threshold map

T_{e d g e}

and the shrinking threshold map

T_{b o d y}

. The SACA block is applied in the

T_{b o d y}

branch to strengthen interior cues, while a Laplacian/LoG operator is applied in the

T_{e d g e}

branch to emphasize boundary details. The two threshold maps are then fused into

T_{o u t}

, which modulates

P_{p r e d}

during training and inference phases to obtain the final result

P_{o u t}

(Formula (1)). Accordingly, both the probability and threshold branches are active during training (black + blue paths in Figure 2); only the probability path is used during inference (black path).

\begin{matrix} P_{o u t} = (T_{o u t}) (P_{p r e d}) \end{matrix}

(1)

Guided by the above description, the proposed adaptive scale threshold map is detailed in Section 3.1. The Center and Edge Attention Modules are explained in Section 3.2, and the Fusion Enhancement module is detailed in Section 3.3.

3.1. Adaptive Scale Threshold Maps

In medical image segmentation, networks typically output a probability map, which is binarized with a fixed threshold (e.g., p > 0.5 ⇒ 1, otherwise 0). However, this one-step binarization is often unfriendly to clinical use, leading to lesion mislocalization and loss of boundary detail. Inspired by soft segmentation, we evaluate the probability distribution along lesion boundaries and, instead of a fixed cutoff, adaptively expand and contract the lesion region to produce proportional threshold maps that flexibly and precisely handle small targets and blurred edges.

In Figure 2, probability map

P_{p r e d}

is generated from the feature map F extracted by the neural network. To address the information loss caused by upsampling, the Vatti clipping algorithm [40] is used to dilate and shrink polygons representing the ground-truth labels, generating two new label images. The offset D for dilating and shrinking is calculated based on the perimeter L and area A of the original polygon:

\begin{matrix} D = \frac{A (1 - r^{2})}{L} \end{matrix}

(2)

where r is the scaling rate, empirically set to 0.4.

Two threshold map labels are generated by offsetting the ground-truth polygon G of each annotated lesion. The body map

T_{b o d y}

is obtained by a shrinking offset with distance D, and the edge map

T_{e d g e}

is obtained by a dilating offset with the same D. Figure 3 illustrates the ground-truth (a) together with the shrinking (b) and dilated (c) polygons; varying yields adaptive proportional changes in band thickness. For visualization, threshold values are rendered with color gradients in RGB; the yellow curve denotes the original polygon, while the inner and outer blue polygons indicate the offset results. Higher intensities near the yellow contour correspond to higher boundary certainty;

T_{b o d y}

emphasizes the lesion core, whereas

T_{e d g e}

highlights the peripheral boundary.

This label construction is performed offline only for the training set prior to optimization and does not run during training or inference; the resulting

T_{b o d y}

and

T_{e d g e}

serve solely as supervision targets. In practice, the outward band encourages recovery of missing boundary segments, and the inward band focuses the model on the core region to suppress spurious responses. Learning from these adaptive, scale-aware threshold maps enable more effective parameter updates around lesion boundaries and improve both localization and edge fidelity.

3.2. Center and Edge Attention Modules

We aim for the threshold maps to exhibit more distinct probability differences to achieve more accurate segmentation results, thereby improving lesion segmentation. We incorporate SACA and the Laplace operator into the generation of

T_{b o d y}

and

T_{e d g e}

.

3.2.1. Spatial Attention and Channel Attention

We propose a novel spatial attention and channel attention module, called SACA, as shown in Figure 4a. The input feature map (

H \times W \times 1

) first undergoes a 1 × 1 convolution to adjust the channel dimensions (Formula (3)) followed by sequential processing through spatial and channel attention units. Finally, another 1 × 1 convolution restores the channel count, and a residual connection is added. This design reduces feature redundancy in both spatial and channel dimensions, improves model efficiency, and ensures that the input X and output Y are of the same size (

H \times W \times 1

).

\begin{matrix} X = {C o n v}_{1 \times 1} (F) \end{matrix}

(3)

In the SA module (shown in Figure 4b), we exploit the scaling factors of Group Normalization (GN) [45]—rather than convolution or global-pooling heuristics—to estimate spatial information content in the feature map X. This yields two complementary spatial maps,

X_{1}

(high-information) and

X_{2}

(low-information). To further reduce spatial redundancy, a lightweight cross-interaction is applied: the high-information responses produce (

X_{11}

,

X_{12}

), the low-information responses produce (

X_{21}

,

X_{22}

), and complementary pairs are combined to form the refined spatial output X’. This interaction requires no extra components, is memory-efficient, and strengthens informative structures while suppressing redundant regions. However, input features may still contain redundancy in the channel dimension. The Formula (4) of SA module is

\begin{matrix} X^{'} = C o n c a t (X_{11} + X_{22}, X_{12} + X_{21}) \end{matrix}

(4)

In the CA module (Figure 4b), we address the redundancy in the channel dimension. According to channel C, we introduced a split ratio

λ

to split the input features X’ into two parts

λ C

and

(1 - λ) C

,

λ \in [0, 1]

, with the

λ

empirically set to 0.5. Then, we adopt a new dual-branch architecture using group-wise convolution (GWC) and point-wise convolution (PWC) instead of standard convolution or global pooling to extract high-level semantic features to extract rich representative features

Y_{1}

. With GWC, parameters and computation are reduced, but cross-channel interaction is suppressed; PWC is introduced to restore channel information. The second branch uses PWC alone to produce the complementary, shallow-detail feature

Y_{2}

. We then concatenate

Y_{1}

and

Y_{2}

, apply global average pooling + Softmax to obtain channel weights, and we reweight the concatenated features. A residual skip is added to preserve information. The resulting output

X_{o u t}

highlights critical channels and suppresses redundancy, while preserving the spatial size and overall efficiency. The Formulas (5) and (6) of the CA module are

\begin{matrix} Y_{1} = GWC (λ C X^{'}) \oplus PWC (λ C X^{'}) \end{matrix}

(5)

\begin{matrix} Y_{2} = C o n c a t (PWC ((1 - λ) C X^{'}), (1 - λ) C X^{'}) \end{matrix}

(6)

Overall, combining SA and CA enables the model to capture critical features with fewer parameters, as summarized in the Formula (7). During adaptive shrinking of lesion edges, this design improves lesion localization while reducing both false and missed segmentations. Unlike conventional attention that passively reweights responses, SACA actively suppresses redundancy and reconstructs informative features through structured spatial–channel operations.

\begin{matrix} Y = (Softmax & Pooling (Y_{1} + Y_{2})) \otimes X^{'} \end{matrix}

(7)

3.2.2. Laplacian Operator

To better capture the edge features of lesion areas, we adopted the classical Laplacian of Gaussian (LoG) operator, which performs Gaussian smoothing followed by Laplacian edge detection. The mathematical Formulation (8) is as follows:

\begin{matrix} L o G (x, y) = - \frac{1}{π σ^{4}} (1 - \frac{x^{2} + y^{2}}{2 σ^{2}} e^{\frac{x^{2} + y^{2}}{2 σ^{2}}}) \end{matrix}

(8)

where x and y represent the image’s pixel coordinates, determining the position and center of the filter kernel.

σ

is the standard deviation of the Gaussian filter, which controls the degree of Gaussian smoothing. A larger

σ

results in stronger smoothing (blurrier edges), while a smaller

σ

retains sharper edge details.

The Laplacian targets high-frequency detail, especially edges. As a second-order derivative, it often yields more meaningful edge structures than first-order operators such as Sobel [43] and Prewitt [44]. Moreover, the computational cost is similar. Typically, a Gaussian smoothing filter is applied before the Laplacian operation (LoG) to reduce sensitivity to noise, enabling the preservation of the most salient image features. Moreover, LoG is parameter-free and needs no training, so the overhead is minimal. This boundary-aware attention emphasizes the lesion area’s edges, helping the model to focus on the transitions between distinct regions during segmentation (for example, the edges of a polyp).

In AdSTNet, the Laplacian operator does not directly perform edge detection on the original image but instead operates on the feature map F extracted by the backbone network. The features output by the backbone are first lightly projected to obtain edge branch features, and then fixed LoG filtering is applied to this feature to enhance boundary details, resulting in

T_{e d g e}

. Two benefits motivate this choice. First, feature maps already suppress background, so applying LoG to them yields cleaner, boundary-focused edges. Second, applying LoG to original images often amplifies texture noise and creates spurious edges; although information-dense, these responses are less focused than probability maps derived from feature features.

3.3. Fusion Enhancement Module

As shown in Figure 5, we obtain two adaptive scale threshold maps for the lesion region after applying the adaptive scale operations. To better showcase the entire process, we choose a sample visualization instead of labels. These maps are first fused using a concatenation operation. The fused threshold map

T_{o u t}

enhances the segmentation accuracy of

P_{p r e d}

to obtain

P_{o u t}

in the training phase. Although the threshold map T already plays a significant role, we aim to maximize its utility. Inspired by SE attention [46], gating can guide the network to select relevant outputs, but a single shared gate across channels weakens per-channel discrimination in channel attention. Following TransNeXt and GLU [47,48,49], we adopt a ConvGLU mixer that applies fine-grained, convolutional gating conditioned on local context. This yields more expressive channel modulation than a shared gate and enables adaptive emphasis on boundary and small lesion cues, thus improving edge fidelity and small target segmentation.

The two threshold maps

T_{b o d y}

(

H \times W \times 1

) and

T_{e d g e}

(

H \times W \times 1

) are concatenated into

T_{i n}

(

H \times W \times 2

) and passed through a ConvGLU fusion module with three branches. The first branch directly propagates the main features, applying a

1 \times 1

convolution followed by batch normalization to generate the value features V. This branch preserves the core information from the concatenated input. The second branch generates the gating signal W. A depth-wise convolution (DWConv) [50] with kernel size

3 \times 3

is applied to the input to capture local spatial context within each channel independently. The output is then passed through a

1 \times 1

convolution and a sigmoid activation, ensuring the gating weights are in the range (0, 1) for fine-grained modulation. The third branch retains the

T_{i n}

to be added back later, ensuring that the fused output maintains essential low-level details.The element-wise product

V \otimes W

adaptively modulates the intensity of features in different regions, enhancing important areas while suppressing less relevant ones. The modulated features are then added to the residual branch output to form the final fused map

T_{o u t}

(

H \times W \times 1

):

\begin{matrix} T_{o u t} = (V \otimes W) \oplus T_{i n} \end{matrix}

(9)

In medical image segmentation, particularly for lesion regions with blurry edges and small targets, this approach helps maintain the integrity of the input features. It avoids excessive operations that may lead to information loss. Consequently, it ensures more precise and robust segmentation results.

3.4. Loss Function

The loss function can be expressed as the weighted sum of the predicted probability map

P_{p r e d}

, the dilated threshold map

T_{e d g e}

, and the shrinking threshold map

T_{b o d y}

:

\begin{matrix} L = α L_{P_{p r e d}} + β L_{T_{b o d y}} + γ L_{T_{e d g e}} \end{matrix}

(10)

where

L_{P_{p r e d}}

,

L_{T_{b o d y}}

, and

L_{T_{e d g e}}

represent the losses for the predicted probability map

P_{p r e d}

, the dilated threshold map

T_{b o d y}

, and the shrinking threshold map

T_{e d g e}

, respectively. Additionally,

α

,

β

, and

γ

are hyperparameters, all initialized to 1. Moreover, they can be automatically optimized by the network during training. For

L_{P_{p r e d}}

,

L_{T_{b o d y}}

and

L_{T_{e d g e}}

, we employ the commonly used binary cross-entropy (BCE) loss in image segmentation tasks:

\begin{matrix} L_{P_{p r e d}} = L_{T_{b o d y}} = L_{T_{e d g e}} = B C E (y, \hat{y}) = - (y \cdot l o g (\hat{y}) + (1 - y) \cdot l o g (1 - \hat{y})) \end{matrix}

(11)

where y is the ground-truth label, with values of either 0 or 1, and

\hat{y}

is the model’s predicted output, typically passed through a Sigmoid activation function to constrain the output values between 0 and 1.

4. Experimental Results

The experimental setup is presented next, including Section 4.1, Datasets and Evaluation Metrics; Section 4.2, Implementation Details; Section 4.3, Performance Comparison; Section 4.4, Ablation Study; Section 4.5, Evaluation of Gradient Operators; Section 4.6, Evaluation of Hidden Layers; and Section 4.7, Enhanced Visualization for Clinical Diagnosis.

4.1. Datasets and Evaluation Metrics

Datasets: We evaluated our proposed AdSTNet on three datasets—International Skin Imaging Collaboration (ISIC 2018) [7], Breast Ultrasound Image (BUSI) [51], and Kvasir-SEG [52]—conducting comparative and ablation experiments to validate our results. We performed a random 8:2 split on all datasets, using 80% for training and validation and 20% for testing. The images were uniformly resized to dimensions of

256 \times 256

for ISIC 2018, BUSI, and Kvasir-SEG to ensure consistent evaluation and generalization across datasets. The summary of data information is shown in Table 1.

Evaluation Metrics: To comprehensively assess our model and compare it with other methods, we followed state-of-the-art (SOTA) practices and utilized four metrics: mean Dice (mDice), mean IoU (mIoU), Recall (Rec.), and Precision (Pre.).

4.2. Implementation Details

We used Python 3.10 and PyTorch 2.1.2 to implement our AdSTNet. All experiments were trained on NVIDIA RTX 4090D with CUDA Version 12.1. The batch size was set to 8. We used the Adam optimizer with a learning rate starting from 1 × 10⁻⁴ and gradually decreasing to 1 × 10⁻⁶, with a momentum of 0.9. The epoch was set to 400. The hidden layer of all networks was set to 16. And, the code is available at https://github.com/chenq4/AdSTNet (accessed on 28 December 2024).

4.3. Performance Comparison

4.3.1. Quantitative Experiments

To verify the effectiveness and robustness of AdSTNet, we evaluated it with five different backbones: UNet, UNet++, UNeXt, HiFormer, MedSAM, GobleNet, and AHG-UNet. These backbones were selected for their significance and strong performance in the field, as well as for the distinct architectures and purposes they represent. Specifically, UNet and UNet++ are classical CNN-based architectures, UNeXt is a lightweight network combining CNN and MLP, HiFormer represents a recent hybrid CNN-Transformer architecture, MedSAM is a trending large-scale model architecture for medical image segmentation, AHF-UNet employs attention modules and spatial channel attention mechanisms to improve the accuracy of medical image segmentation, and GobletNet utilizes wavelet transforms and attention mechanisms to enhance segmentation performance. The experimental results for all networks with and without AdSTNet are presented in Table 2, Table 3 and Table 4.

Parameter Quantity and Average Segmentation Time: Table 2 demonstrates the impact of integrating AdSTNet on the model parameters. Since AdSTNet operates only during the training phase and is designed with a lightweight architecture, its inclusion introduces minimal additional computational and storage overhead. For instance, the parameter count for UNet increased from 2.1621 M to 2.1660 M, while for UNet++, it rose from 9.1633 M to 9.1672 M. Other models also show the same trend. The negligible increase in parameter quantity can be attributed to AdSTNet’s architectural design, where most operations are performed during the training phase. Modules like SACA, the Laplacian operator, and ConvGLU enhance feature extraction and segmentation accuracy without significantly increasing model size. This characteristic makes AdSTNet particularly suitable for scenarios where memory efficiency is essential, such as deployment on devices with limited computational resources.

The average segmentation time, measured per image during inference, slightly increased across all models after integrating AdSTNet. Except for MedSAM, which requires more time, other models generally have a slight increase in the average segmentation time. The increased time is mainly due to additional processing by the Laplacian operator and ConvGLU, enhancing edge detection and feature refinement. However, the overhead remains minimal owing to module efficiency and AdSTNet’s lightweight design, resulting in only small increases in runtime and parameters across backbones.

ISIC: Experimental results on the ISIC dataset in Table 2 demonstrate that integrating AdSTNet consistently improves segmentation performance by enhancing both lesion localization and boundary detection. The adaptive scale threshold method is a critical component that boosts segmentation accuracy using two distinct threshold maps: the body map and the edge map. The body map focuses on the core regions of lesions, enhancing feature representation for small or indistinct targets, while the edge map emphasizes boundary details, aiding in the detection of blurred or complex edges. The network can effectively reduce false positives and missed detections by employing this dual-threshold strategy.

The improvements in mIoU and mDice illustrate the significant impact of AdSTNet on the segmentation capability of baseline networks. For instance, the mIoU of UNet increased from 81.84% to 83.40%, and mDice improved from 88.84% to 90.24%, indicating better region localization and segmentation precision. Similarly, HiFormer produced an increase in mIoU from 82.99% to 85.01%, with mDice reaching 91.37%. Even the robust MedSAM benefited from improved edge recognition and small lesion detection. Moreover, GobleNet’s mIoU improved from 83.63% to 84.92%, and mDice from 90.36% to 91.27%. In comparison, AHF-UNet showed an mIoU increase from 81.96% to 83.82% and an mDice improvement from 88.89% to 90.35%, further confirming AdSTNet’s adaptability in boosting high-performing models.

Regarding Precision and Recall, AdSTNet effectively balances these metrics by enhancing sensitivity to low-contrast areas and reducing false positives. For instance, UNet’s Precision improved from 90.90% to 92.29%, indicating reduced over-prediction in non-lesion regions. Additionally, HiFormer achieved Precision and Recall scores of 93.69% and 90.85%, demonstrating AdSTNet’s robustness in complex backgrounds. Although GobleNet’s Recall slightly decreased by 1.06%, GobleNet’s Precision increased from 91.57% to 93.81%.

BUSI: Table 3 highlights the performance differences of various segmentation models on the BUSI dataset before and after incorporating AdSTNet. AdSTNet’s inclusion effectively balances recognition performance by enhancing feature extraction and edge detection mechanisms. Due to the small amount of BUSI data and unclear pathological features, accurate localization of the lesion area is crucial. The SACA module greatly enhances this process by optimizing feature representation because it employs Group Normalization and cross-operation mechanisms to better capture spatial and channel-wise information. This enhanced attention mechanism distinguishes lesion features from background noise, particularly when dealing with small or low-contrast targets.

The mIoU and mDice metrics show consistent improvements with AdSTNet across all models. For example, UNet’s mIoU increased from 69.93% to 71.66%, and mDice rose from 78.53% to 80.32%, demonstrating enhanced segmentation performance. HiFormer’s mIoU improved from 72.37% to 76.35%, with mDice increasing by approximately 3%, indicating AdSTNet’s effectiveness in enhancing high-performance networks. Integrating AdSTNet with GobleNet resulted in an mIoU increase from 74.67% to 77.32% and an mDice improvement from 82.45% to 85.41%. Even UNeXt and MedSAM, despite their strong baselines, achieved further improvements in edge and small target recognition with AdSTNet integration. For AHF-UNet, all matrices showed noticeable improvements.

In terms of Precision and Recall, AdSTNet effectively balances recognition performance. Notably, UNeXt’s Recall improved from 72.04% to 83.99%, and HiFormer’s Recall increased to 85.31%, highlighting AdSTNet’s ability to capture blurred edges and small lesions while reducing missed detections. Meanwhile, the growth in Precision highlighted AdSTNet’s role in improving localization accuracy. Furthermore, the improvement of GobleNet and AHF-Net demonstrated AdSTNet’s robustness in complex backgrounds and better localization accuracy.

Kvasir-SEG: Experimental results on the Kvasir-SEG dataset in Table 4 demonstrate AdSTNet’s ability to improve segmentation model performance, particularly in edge processing and overall segmentation accuracy. The data indicates that AdSTNet effectively strengthens lesion region recognition, enhances segmentation robustness, and maintains adaptability across different network architectures.

Experimental results on the Kvasir-SEG dataset demonstrate that AdSTNet significantly improved edge processing and region localization by enhancing edge detection by integrating the Laplacian operator. The boundary of polyps in the Kvasir SEG dataset is often similar to the background, so obtaining edge information is important. The Laplacian operator captures high-frequency information related to lesion boundaries, allowing the model to better distinguish fine edge details that are often blurred or poorly defined.

The mIoU and mDice metrics reveal substantial improvements with AdSTNet across all models. For example, UNet’s mIoU increased from 71.99% to 73.08%, and mDice improved from 80.41% to 81.91%, demonstrating enhanced segmentation accuracy. HiFormer’s mIoU increased from 81.26% to 83.47%, and mDice reached 90.30%, highlighting AdSTNet’s effectiveness in enhancing complex models. Integrating AdSTNet with GobleNet improved the mIoU from 76.52% to 77.32% and the mDice from 84.22% to 85.17%. For AHF-UNet, the mIoU rose from 77.88% to 79.19%, and the mDice increased from 85.58% to 86.50%, demonstrating enhanced segmentation accuracy. Furthermore, MedSAM also benefited from improved segmentation performance, confirming AdSTNet’s compatibility with advanced architectures.

For Precision and Recall, AdSTNet effectively improved edge processing and region localization. HiFormer’s Recall increased from 89.93% to 93.50%, highlighting its strong sensitivity to complex boundaries and small regions. However, Precision slightly decreased from 90.29% to 88.97%, suggesting a trade-off where greater sensitivity to positive samples may introduce minor false positives. In contrast, UNet++ achieved balanced improvements, with Precision and Recall increasing by 1.66% and 2.36%, respectively, indicating balanced performance enhancement. GobleNet’s Recall improved from 84.76% to 85.29%, but Precision slightly decreased from 88.36% to 88.06%, likely due to increased sensitivity causing minor false positives. AHF-UNet also showed consistent improvements, demonstrating AdSTNet’s ability to enhance edge information while maintaining stable detection performance.

Multi-Dataset Experimental Analysis: From the perspective of dataset characteristics, ISIC primarily focuses on skin lesion segmentation, where target edges are often blurred and small lesions are prominent. AdSTNet demonstrates exceptional performance in enhancing blurred edges and improving small target detection, with significant improvements in mIoU and mDice across various models. Notably, in networks like HiFormer and UNet++, Recall showed a substantial increase, reducing missed detections in blurred edge regions. BUSI comprises breast ultrasound images with complex target boundaries and low contrast between small lesions and the background. The results show marked improvements in both Precision and Recall. For example, HiFormer’s Recall increased to 85.31%. Kvasir-SEG focuses on gastrointestinal lesion segmentation, where lesion size and shape vary significantly. For instance, UNet++ achieved an mIoU of 78.26% and an mDice of 86.50%, validating AdSTNet’s adaptability to diverse lesion shapes and sizes.

From a model perspective, AdSTNet consistently enhances performance across various architectures. For classical convolutional networks like UNet and UNet++, AdSTNet significantly improves edge feature representation, resulting in better segmentation performance across all datasets. For the lightweight network UNeXt, AdSTNet effectively boosts Recall and mIoU, addressing its limitations in detecting edge regions and small targets. Even for advanced architectures like HiFormer and MedSAM, which already have high baseline performance, AdSTNet provides additional optimization, particularly in enhancing edge recognition and small lesion detection.

AdSTNet employs the adaptive scale threshold mechanism, the SACA module, the Laplacian operator, and ConvGLU to enhance segmentation accuracy. The adaptive scale threshold mechanism provides dual-threshold maps for lesion cores and edges, improving localization and boundary detection. The SACA module enhances feature representation through efficient attention mechanisms, while the Laplacian operator improves edge clarity by emphasizing high-frequency details. Then, ConvGLU is introduced to enhance feature refinement by selectively controlling information flow within the network. This synergistic combination allows AdSTNet to improve performance significantly across diverse segmentation tasks. However, AdSTNet’s lightweight design introduces minimal parameters during training, making it a versatile post-processing module compatible with various segmentation networks.

From Table 2, Table 3 and Table 4, we may observe slight decreases in Recall and Precision for some models after integrating AdSTNet. This is due to the adaptive scale threshold mechanism, which enhances edge detection and small lesion recognition by generating body and edge maps. While improving sensitivity to low-contrast and blurred regions, the mechanism may also introduce false positives, especially in complex backgrounds, leading to lower Precision. Conversely, if the mechanism is too conservative in suppressing noise or enhancing edges, it may overlook small or unclear lesions, reducing Recall. These findings highlight the need to carefully tune the adaptive scale threshold mechanism to balance lesion localization and boundary detection.

Furthermore, the improvements in some models were marginal due to their high accuracy and effective edge handling, leaving little room for optimization. High-precision models like MedSAM already excel in edge awareness and small target detection, limiting AdSTNet’s impact. However, AdSTNet’s flexibility and adaptability offer efficient, low-cost solutions for small lesion detection and complex edge processing, enhancing performance across various segmentation tasks.

4.3.2. Qualitative Experiments

We visually compare segmentation results before and after adding AdSTNet, using ROI boxes to highlight significant differences. In the ISIC dataset (Figure 6), lesions typically exhibit blurry and irregular boundaries with low contrast to the background, which increases segmentation difficulty. After incorporating AdSTNet, the network more clearly captures the lesion boundary, especially demonstrating significant improvement in regions with blurry or transitional edges. For instance, in Section I, common edge breaks or blurriness in the original segmentation results are effectively corrected after adding AdSTNet, with enhanced boundary continuity. The incorrect segmentation of (f) MedSAM is also correctly identified in Section II.

Figure 7 shows results on the BUSI dataset, primarily consisting of breast ultrasound images with varied shapes, blurred boundaries, and significant background noise. Integrating AdSTNet significantly improves the network’s ability to handle small lesions and complex backgrounds. When comparing the detailed boundaries of the original segmentation, Sections I and II show that AdSTNet effectively reduces false detections in background areas, resulting in more precise lesion contours and improved segmentation details, preventing missed lesions and excessive boundary expansion. Furthermore, in Section II, after adding AdSTNet, both UNet and UNet++ avoid incorrect segmentation.

The Kvasir-SEG dataset primarily consists of gastrointestinal endoscopic images, where lesion boundaries are typically well defined, but the shapes are complex, and the sizes vary. The results in Figure 8 demonstrate that after introducing AdSTNet, the network can handle small and complex lesions more effectively, maintaining boundary integrity even in challenging cases. In Section I, compared to segmentation results without AdSTNet, AdSTNet prevents over-segmentation of the lesion area. In Section II, UNet, UNet++, UNeXt and AHF-UNet show large areas of missing segmentation, whereas HiFormer, GobleNet, and MedSAM have fewer missing segments. AdSTNet effectively fills these gaps, providing more accurate localization of lesion core areas.

The experimental data confirms that AdSTNet’s edge threshold map is pivotal in improving segmentation accuracy for fuzzy boundaries. Additionally, the introduction of the body threshold map enhances the model’s ability to localize the core areas of lesions, resulting in a more precise and stable segmentation overall.

4.4. Ablation Study

To comprehensively evaluate the effectiveness of different components in the AdSTNet, we selected UNet and HiFormer as the backbone and conducted ablation experiments on the ISIC 2018, BUSI and Kvasir—SEG datasets. UNet is strong in local feature extraction but has limitations in modelling global information and handling complex scenarios. On the other hand, HiFormer combines Transformer and CNN architectures, providing strong global context-capturing abilities, but it may be weaker in capturing edge details and small target regions. These two models are suited to different task scenarios; therefore, the ablation experiments can validate the effectiveness of each AdSTNet module. Moreover, the experiments were conducted in the same environment to ensure fairness.

In addition, we provide the visualizations of the ablation experiments in Figure 9, showing intuitive evidence of the effectiveness of each AdSTNet component. By comparing segmentation results with and without specific modules, it is evident that the SACA module, Laplacian operator, and ConvGLU contribute significantly to improving boundary clarity and lesion localization. These visual results demonstrate that integrating all components in AdSTNet effectively enhances feature representation, improves boundary detection, and enhances the model’s robustness against challenging conditions such as low contrast and complex backgrounds.

4.4.1. Evaluation of the Adaptive Scale Threshold Method

In our study, the adaptive scale threshold method involves generating two threshold maps, i.e., body and edge threshold maps, derived from the ground truth. These adaptive threshold maps are utilized to recalibrate each pixel’s probability, thereby enhancing lesion localization and edge delineation, collectively improving segmentation accuracy.

To verify the overall effectiveness of the adaptive scale thresholding method, we replaced the SACA and the Laplacian operator with standard convolutional layers, creating a control setup for comparison. The outcomes of “TwoConvs” are detailed in Table 5 and Table 6. Compared with the resulta of the complete AdSTNet in the last line of each method in Table 5, Table 6, Table 7, both UNet and HiFormer exhibit improvement on the ISIC, BUSI and Kvasir-SEG datasets. This performance highlights the core importance of the adaptive scale shareholding method, which effectively refines fuzzy edges and focuses on lesion centers by dynamically shrinking and expanding the feature map. It improves segmentation performance even without dynamic attention mechanisms or explicit edge enhancement.

4.4.2. Evaluation of SACA

To evaluate the effectiveness of the SACA module, we conducted ablation experiments by replacing it with standard convolution while keeping other components of AdSTNet unchanged. The comparison was made using the experimental results in Table 5 and Table 6 (the third line of each method).

On the ISIC 2018 dataset, replacing SACA with standard convolution led to a decline in performance for both models: UNet’s mIoU and mDice decreased to 82.47% and 89.51%, respectively, while HiFormer’s mIoU and mDice dropped to 84.14% and 90.72%. Similar performance degradation was observed on the BUSI dataset, indicating that removing SACA consistently weakened segmentation accuracy across different networks.

The SACA module dynamically focuses on relevant features by enhancing feature representation through its spatial channel attention mechanism. This mechanism improves the model’s robustness in handling complex backgrounds and blurry boundaries by selectively enhancing important spatial and channel-wise information. In contrast, standard convolution lacks this adaptive weighting capability, resulting in less effective feature extraction and degraded segmentation performance, particularly for edge and small lesion detection.

These results confirm that SACA is essential for optimal AdSTNet performance, especially in scenarios with unclear boundaries and small targets.

4.4.3. Evaluation of the Laplacian Operator

To evaluate the effectiveness of the Laplacian operator in extracting high-frequency edge information, we replaced it with regular convolution in AdSTNet while keeping the other modules unchanged. The comparison is based on the experimental results in Table 5 and Table 6 (the fourth line in each method).

On the ISIC 2018 dataset, the absence of the Laplacian operator led to performance improvements of UNet’s mIoU and mDice by 0.96% and 0.74%, respectively, compared to the configuration without it (82.44% to 83.40% and 89.50% to 90.24%). For HiFormer, the improvements were 0.64% in mIoU and 0.4% in mDice (84.37% to 85.01% and 90.97% to 91.37%). Similar trends were observed on the BUSI dataset, confirming the significant contribution of the Laplacian operator to segmentation performance.

The Laplacian operator enhances edge clarity by detecting high-frequency features related to pixel gradient changes along lesion boundaries. It can effectively capture fine-edged details, especially in blurred or unclear regions. It is particularly beneficial for Transformer-based models like HiFormer, which may overlook subtle edges due to their focus on global context. Removing the Laplacian operator leads to a more significant performance drop in HiFormer than UNet, highlighting its importance in precise edge detection.

These findings confirm that the Laplacian operator effectively complements AdSTNet by providing critical boundary information for accurately segmenting blurry edges and small lesions.

4.4.4. Evaluation of ConvGLU

To assess the effectiveness of the ConvGLU module in feature fusion, we replaced it with an MLP (multi-layer perceptron), keeping other components unchanged. As shown in Table 5 and Table 6 (the fifth line in each method), removing ConvGLU reduced UNet’s mIoU and mDice to 81.96% and 89.16% and HiFormer’s mIoU and mDice to 83.49% and 90.23% in the ISIC dataset, with similar performance drops on the BUSI dataset.

The ConvGLU module enhances feature fusion by dynamically regulating feature flow through a gating mechanism, effectively highlighting and enhancing the representation of critical features in both edge and body regions. This selective gating improves feature fusion by emphasizing essential information while suppressing irrelevant details, which is beneficial for segmenting small targets and handling complex boundaries. Unlike ConvGLU, the MLP struggles to preserve local spatial features, as its fully connected structure tends to over-smooth fine details, especially in small lesion segmentation.

These results demonstrate that ConvGLU significantly improves feature representation and segmentation accuracy, particularly for precise edge detection and small target localization.

4.5. Evaluation of Gradient Operators

To assess the effectiveness of the Laplacian operator in AdSTNet, we performed comparative experiments with commonly used gradient operators, including Sobel, Prewitt, and Roberts. Using UNet as the backbone, we replaced only the Laplacian operator in AdSTNet for comparison. Each operator’s performance was measured using mIoU and mDice metrics on the ISIC dataset, as shown in Table 8.

The results show that the Laplacian operator achieves the best performance, with an mIoU of 83.40% and an mDice of 90.24%. Its ability to capture high-frequency details through second-order derivatives makes it particularly effective in enhancing blurry or unclear boundaries. This strength is crucial for medical images with poorly defined lesion edges.

Among first-order operators, Sobel performs best due to its noise suppression and effective detection of horizontal and vertical edges (mIoU: 83.28%, mDice: 90.12%), but its averaging reduces sensitivity to fine details. Prewitt (mIoU: 82.98%, mDice: 90.03%) and Roberts perform worse because of weaker noise resistance and difficulty with smooth transitions. Although Roberts detects sharp edges well, it struggles with blurry or gradual intensity changes.

Overall, the Laplacian operator provides superior boundary enhancement, significantly improving AdSTNet’s ability to detect fine details and complex edges.

4.6. Evaluation of Hidden Layers

We selected UNet as the backbone for testing the ISIC and BUSI datasets. The results show that changes in the number of hidden layers in ConvGLU significantly impacted segmentation performance in Table 9. The hidden-layer depth is 8 for the small, 16 for the base, and 32 for the large. In the ISIC dataset, increasing the number of hidden layers notably improved segmentation performance in areas with fuzzy edges and complex lesions, with the mIoU rising from 82.53% in the Small configuration to 84.31% in the Large configuration. Similarly, in the BUSI dataset, the mIoU for the Large configuration reached 72.94%, significantly outperforming the 70.78% of the Small configuration. This trend indicates that more hidden layers enhance the model’s feature representation capability, enabling it to capture more complex edge details and small lesion regions, thereby reducing the probability of missed or incorrect segmentation.

4.7. Enhanced Visualization for Clinical Diagnosis

To better serve clinical diagnosis, we further visualized the segmentation results of the RGB input images, referred to as the lesion visualization map. It is a representation generated by the adaptive scale threshold network, designed to highlight lesion regions with precision while enhancing edge details. Figure 10 shows the results of (a) ISIC 2018 and (b) Kvasir-SEG with UNet as the backbone network. Lesion areas appear brighter in this map, with a gradient transition marking the edges, while the background is dimmed to improve contrast. This visualization intuitively delineates the boundaries of the lesion and highlights the structural features of the lesion, addressing challenges such as blurred edges and small target regions in medical imaging.

This map bridges the gap between raw segmentation outputs and clinical interpretability, offering clinicians a clearer, more interpretable visualization. Integrating lesion enhancement and boundary contrast supports accurate diagnosis and decision making in medical scenarios.

5. Discussion

AdSTNet is particularly suited for precisely segmenting small lesions with blurred boundaries or irregular shapes, such as tumor detection, organ boundary delineation, or inflammation localization. Its lightweight and modular design ensures broad compatibility with various backbones, offering an efficient and scalable solution for medical image segmentation. Although the average gains in mIoU/Dice are on the order of 1–2 percentage points, they are consistent across datasets and backbones. Furthermore, AdSTNet concentrates on the hardest small targets and blurred boundaries, where a small numerical margin translates into fewer missed lesions and clearer margins for planning or follow-up. In addition, AdSTNet is used only during training as a supervisory plug-in. The architecture of AdSTNet introduces only a small computational and parameter cost. In practice, this lightweight design provides a favorable balance between accuracy and cost, particularly in scenarios prioritizing boundary fidelity and sensitivity to small or low-contrast lesions (e.g., early screening, margin delineation, or resource-limited deployments).

However, comparative experiments and visualizations reveal limitations in some cases, where segmentation results differ from the ground truth in complex lesion areas. This is primarily due to the reliance on backbone networks for feature extraction because AdSTNet is a refinement network without independent feature extraction capabilities. As a result, its performance is inherently constrained by the quality of the backbone.

Future research can focus on combining AdSTNet with a backbone designed explicitly for medical image optimization, such as adding to the backbone network structure that combines domain-specific prior knowledge or multi-scale feature fusion. Additionally, refining the adaptive scale threshold map’s generation could improve performance in noisy or highly imbalanced scenarios. These improvements would expand AdSTNet’s applicability to broader datasets and strengthen its potential for deployment in clinical settings, enhancing diagnostic accuracy and efficiency.

6. Conclusions

In this paper, we propose a novel AdSTNet that acts as a post-processing lightweight network for enhancing small lesion and edge segmentation. During training, AdSTNet performs shrinking and dilated operations for lesion regions, generating body and edge threshold map labels according to the ground-truth label. Based on pixel-level probability analysis, AdSTNet effectively improves the model’s localization and edge perception capabilities during inference, reducing missed and false detections. In addition, AdSTNet integrates SACA, the Laplacian operator, and ConvGLU to refine feature representation while maintaining computational efficiency. Experimental results demonstrate its effectiveness across multiple datasets. In the future, we will explore multi-scale fusion, deep attention mechanisms, and real-time applications across modalities like CT and MRI, ensuring AdSTNet’s scalability as a universal medical image segmentation enhancement module.

Author Contributions

Conceptualization, Q.C. and W.W.; methodology, Q.C. and M.Z.; software, Q.C.; validation, Q.C. and H.J.; formal analysis, Z.W.; investigation, Q.C.; resources, W.W.; data curation, H.J.; writing—original draft preparation, Q.C.; writing—review and editing, Q.C., W.W. and M.Z.; visualization, Q.C. and H.J.; supervision, W.W.; project administration, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We clarify that our research findings are based on the analysis of publicly available datasets: International Skin Imaging Collaboration (ISIC): https://challenge2018.isic-archive.com/ (accessed on 3 March 2025); Breast Ultrasound Image (BUSI): https://scholar.cu.edu.eg/?q=afahmy/pages/dataset (accessed on 5 March 2025), and Kvasir-SEG: https://datasets.simula.no/kvasir-seg/ (accessed on 5 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AdSTNet	Adaptive Scale Thresholding Network
SACA	Spatial Attention and Channel Attention
ConvGLU	Convolutional Gated Linear Unit
ESAB	Ledge Spatial Attention Block
EM	Electron Microscopy
GPC	Graphics Processing Cluster
GPU	Graphic Processing Unit
MPI	Message Passing Interface
DoG	Difference of Gaussians
GN	Group Normalization
GWC	Group-wise Convolution
PWC	Point-wise Convolution
LoG	Laplacian of Gaussian
GLU	Gated Linear Unit
DWConv	Depth-wise Convolution
BCE	Binary Cross-entropy
ROI	Region of Interest

References

Park, H.; Lee, H.J.; Kim, H.G.; Ro, Y.M.; Shin, D.; Lee, S.R.; Kim, S.H.; Kong, M. Endometrium segmentation on transvaginal ultrasound image using key-point discriminator. Med. Phys. 2019, 46, 3974–3984. [Google Scholar] [CrossRef]
Arthi, M.; Sindal, M.D.; Rashmita, R. Hyperreflective foci as biomarkers for inflammation in diabetic macular edema: Retrospective analysis of treatment naïve eyes from south India. Indian J. Ophthalmol. 2021, 69, 1197–1202. [Google Scholar] [CrossRef] [PubMed]
Chung, Y.R.; Kim, Y.H.; Ha, S.J.; Byeon, H.E.; Cho, C.H.; Kim, J.H.; Lee, K. Role of inflammation in classification of diabetic macular edema by optical coherence tomography. J. Diabetes Res. 2019, 2019, 8164250. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Zhu, L.; Zhu, W.; Lin, T.; Los, L.I.; Yao, C.; Chen, X.; Chen, H. Algorithm for detection and quantification of hyperreflective dots on optical coherence tomography in diabetic macular edema. Front. Med. 2021, 8, 688986. [Google Scholar] [CrossRef] [PubMed]
Qin, S.; Zhang, C.; Qin, H.; Xie, H.; Luo, D.; Qiu, Q.; Liu, K.; Zhang, J.; Xu, G.; Zhang, J. Hyperreflective foci and subretinal fluid are potential imaging biomarkers to evaluate anti-VEGF effect in diabetic macular edema. Front. Physiol. 2021, 12, 791442. [Google Scholar] [CrossRef]
Jin, K.; Huang, X.; Zhou, J.; Li, Y.; Yan, Y.; Sun, Y.; Zhang, Q.; Wang, Y.; Ye, J. Fives: A fundus image dataset for artificial Intelligence based vessel segmentation. Sci. Data 2022, 9, 475. [Google Scholar] [CrossRef]
Codella, N.; Rotemberg, V.; Tschandl, P.; Celebi, M.E.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.; et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv 2019, arXiv:1902.03368. [Google Scholar] [CrossRef]
Ali, S.; Jha, D.; Ghatwary, N.; Realdon, S.; Cannizzaro, R.; Salem, O.E.; Lamarque, D.; Daul, C.; Riegler, M.A.; Anonsen, K.V.; et al. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Sci. Data 2023, 10, 75. [Google Scholar] [CrossRef]
Quinton, F.; Popoff, R.; Presles, B.; Leclerc, S.; Meriaudeau, F.; Nodari, G.; Lopez, O.; Pellegrinelli, J.; Chevallier, O.; Ginhac, D.; et al. A tumour and liver automatic segmentation (atlas) dataset on contrast-enhanced magnetic resonance imaging for hepatocellular carcinoma. Data 2023, 8, 79. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
Chen, R.; Wang, X.; Jin, B.; Tu, J.; Zhu, F.; Li, Y. CLD-Net: Complement local detail for medical small-object segmentation. In Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 6–8 December 2022; pp. 942–947. [Google Scholar]
Cao, R.; Ning, L.; Zhou, C.; Wei, P.; Ding, Y.; Tan, D.; Zheng, C. Cfanet: Context feature fusion and attention mechanism based network for small target segmentation in medical images. Sensors 2023, 23, 8739. [Google Scholar] [CrossRef]
Cheng, B.; Girshick, R.; Dollár, P.; Berg, A.C.; Kirillov, A. Boundary IoU: Improving object-centric image segmentation evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15334–15342. [Google Scholar]
Allah, A.M.G.; Sarhan, A.M.; Elshennawy, N.M. Edge U-Net: Brain tumor segmentation using MRI based on deep U-Net model with boundary information. Expert Syst. Appl. 2023, 213, 118833. [Google Scholar] [CrossRef]
Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Isensee, F.; Jaeger, P.F.; Kohl, S.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
Guan, S.; Khan, A.A.; Sikdar, S.; Chitnis, P.V. Fully dense UNet for 2-D sparse photoacoustic tomography artifact removal. IEEE J. Biomed. Health Inform. 2019, 24, 568–576. [Google Scholar] [CrossRef]
Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Johansen, D.; De Lange, T.; Halvorsen, P.; Johansen, H.D. Resunet++: An advanced architecture for medical image segmentation. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 225–2255. [Google Scholar]
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II 19. Springer: Cham, Switzerland, 2016; pp. 424–432. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Valanarasu, J.M.J.; Patel, V.M. Unext: Mlp-based rapid medical image segmentation network. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Cham, Switzerland, 2022; pp. 23–33. [Google Scholar]
Zhang, Y.; Liu, H.; Hu, Q. Transfuse: Fusing transformers and cnns for medical image segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part I 24. Springer: Cham, Switzerland, 2021; pp. 14–24. [Google Scholar]
Lin, A.; Chen, B.; Xu, J.; Zhang, Z.; Lu, G.; Zhang, D. Ds-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 2022, 71, 1–15. [Google Scholar] [CrossRef]
Heidari, M.; Kazerouni, A.; Soltany, M.; Azad, R.; Aghdam, E.K.; Cohen-Adad, J.; Merhof, D. Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 6202–6212. [Google Scholar]
Ma, X.; Du, Y.; Sui, D. A U-Shaped Architecture Based on Hybrid CNN and Mamba for Medical Image Segmentation. Appl. Sci. 2025, 15, 7821. [Google Scholar] [CrossRef]
Huang, W.; Cai, X.; Yan, Y.; Kang, Y. MA-DenseUNet: A Skin Lesion Segmentation Method Based on Multi-Scale Attention and Bidirectional LSTM. Appl. Sci. 2025, 15, 6538. [Google Scholar] [CrossRef]
Kong, L.; Wei, Q.; Xu, C.; Chen, H.; Fu, Y. EFCNet: Every Feature Counts for Small Medical Object Segmentation. arXiv 2024, arXiv:2406.18201. [Google Scholar] [CrossRef]
Jiang, Y.; Zhang, Z.; Qin, S.; Guo, Y.; Li, Z.; Cui, S. APAUNet: Axis projection attention UNet for small target in 3D medical segmentation. In Proceedings of the Asian Conference on Computer Vision, Macao, China, 4–8 December 2022; pp. 283–298. [Google Scholar]
Russo, C.; Bria, A.; Marrocco, C. GravityNet for end-to-end small lesion detection. Artif. Intell. Med. 2024, 150, 102842. [Google Scholar] [CrossRef]
Dai, W.; Liu, R.; Wu, Z.; Wu, T.; Wang, M.; Zhou, J.; Yuan, Y.; Liu, J. Exploiting Scale-Variant Attention for Segmenting Small Medical Objects. arXiv 2024, arXiv:2407.07720. [Google Scholar]
Ma, Y.; Ren, F.; Li, W.; Yu, N.; Zhang, D.; Li, Y.; Ke, M. IHA-Net: An automatic segmentation framework for computer-tomography of tiny intracerebral hemorrhage based on improved attention U-net. Biomed. Signal Process. Control 2023, 80, 104320. [Google Scholar] [CrossRef]
Zhang, Z.; Fu, H.; Dai, H.; Shen, J.; Pang, Y.; Shao, L. Et-net: A generic edge-attention guidance network for medical image segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, 13–17 October 2019; Proceedings, Part I 22. Springer: Cham, Switzerland, 2019; pp. 442–450. [Google Scholar]
Wang, K.; Zhang, X.; Zhang, X.; Lu, Y.; Huang, S.; Yang, D. EANet: Iterative edge attention network for medical image segmentation. Pattern Recognit. 2022, 127, 108636. [Google Scholar] [CrossRef]
Hu, H.; Zhang, J.; Yang, T.; Hu, Q.; Yu, Y.; Huang, Q. PATrans: Pixel-Adaptive Transformer for edge segmentation of cervical nuclei on small-scale datasets. Comput. Biol. Med. 2024, 168, 107823. [Google Scholar] [CrossRef]
Vatti, B.R. A generic solution to polygon clipping. Commun. ACM 1992, 35, 56–63. [Google Scholar] [CrossRef]
Kuipers, T.; Doubrovski, E.L.; Wu, J.; Wang, C.C. A Framework for Adaptive Width Control of Dense Contour-Parallel Toolpaths in Fused Deposition Modeling. Comput.-Aided Des. 2020, 128, 102907. [Google Scholar] [CrossRef]
Puri, S.; Prasad, S. A Parallel Algorithm for Clipping Polygons with Improved Bounds and a Distributed Overlay Processing System Using MPI. In Proceedings of the 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015, Shenzhen, China, 4–7 May 2015; pp. 576–585. [Google Scholar] [CrossRef]
Sobel, I.; Feldman, G. A 3 × 3 Isotropic Gradient Operator for Image Processing. A Talk at the Stanford Artificial Project. 1968, pp. 271–272. Available online: https://www.researchgate.net/publication/285159837_A_33_isotropic_gradient_operator_for_image_processing (accessed on 28 December 2024).
Prewitt, J.M. Object enhancement and extraction. Pict. Process. Psychopictorics 1970, 10, 15–19. [Google Scholar]
Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Shi, D. Transnext: Robust foveal visual perception for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17773–17783. [Google Scholar]
Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
Shazeer, N. Glu variants improve transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Halvorsen, P.; De Lange, T.; Johansen, D.; Johansen, H.D. Kvasir-seg: A segmented polyp dataset. In Proceedings of the MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, Republic of Korea, 5–8 January 2020; Proceedings, Part II 26. Springer: Cham, Switzerland, 2020; pp. 451–462. [Google Scholar]
Zhou, Y.; Li, L.; Wang, C.; Song, L.; Yang, G. GobletNet: Wavelet-Based High-Frequency Fusion Network for Semantic Segmentation of Electron Microscopy Images. IEEE Trans. Med. Imaging 2024, 44, 1058–1069. [Google Scholar] [CrossRef]
Munia, A.A.; Abdar, M.; Hasan, M.; Jalali, M.S.; Banerjee, B.; Khosravi, A.; Hossain, I.; Fu, H.; Frangi, A.F. Attention-guided hierarchical fusion U-Net for uncertainty-driven medical image segmentation. Inf. Fusion 2025, 115, 102719. [Google Scholar] [CrossRef]
Valanarasu, J.M.J.; Oza, P.; Hacihaliloglu, I.; Patel, V.M. Medical transformer: Gated axial-attention for medical image segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part I 24. Springer: Cham, Switzerland, 2021; pp. 36–46. [Google Scholar]

Figure 1. Two representative datasets of medical images are provided. The first row (a) represents images of skin lesions in dermoscopy. The second row (b) shows colorectal polyp endoscopic images. In each image, the green line represents the edge of the target area, which is manually labeled.

Figure 2. The architecture of AdSTNet. The input image first passes through a backbone network with feature extraction capabilities, such as a U-shaped encoder–decoder structure. After obtaining the feature map F, it will undergo a process of shrinking and dilating to obtain two new threshold maps, which will be fused through the Fusion Enhancement module. Finally, the fused feature map will assist the probability map

P_{p r e d}

in predicting the final segmentation result. In addition, the training phase includes the process of black and blue lines, while the inference phase only includes the process of black lines.

Figure 2. The architecture of AdSTNet. The input image first passes through a backbone network with feature extraction capabilities, such as a U-shaped encoder–decoder structure. After obtaining the feature map F, it will undergo a process of shrinking and dilating to obtain two new threshold maps, which will be fused through the Fusion Enhancement module. Finally, the fused feature map will assist the probability map

P_{p r e d}

in predicting the final segmentation result. In addition, the training phase includes the process of black and blue lines, while the inference phase only includes the process of black lines.

Figure 3. The labels required for training. (a) is the original label ground-truth. (b,c) are new labels obtained by shrinking and dilating the external polygon of the original label by the same offset D, corresponding to the threshold map of the

T_{b o d y}

and

T_{e d g e}

, respectively.

Figure 3. The labels required for training. (a) is the original label ground-truth. (b,c) are new labels obtained by shrinking and dilating the external polygon of the original label by the same offset D, corresponding to the threshold map of the

T_{b o d y}

and

T_{e d g e}

, respectively.

Figure 4. (a) The complete architecture of SACA integrating spatial attention (SA) and channel attention (CA) and the exact location of our SACA module in the network. (b) The architecture of the spatial attention module and channel attention module.

Figure 5. A block diagram of the Fusion Enhancement module.

Figure 6. A visualized comparison of different methods on the ISIC 2018 dataset. We present the results of two samples in Sections I and II, respectively. Specifically, the first column is the input image and ground-truth. The rest of the columns represent the results of UNet, UNet++, UNeXt, HiFormer, MedSAM, GlobleNet, and AHF-UNet, respectively. The first and third lines represent the methods without AdSTNet. The second and fourth lines represent the methods with AdSTNet.

Figure 7. A visualized comparison of different methods on the BUSI dataset. We present the results of two samples in Sections I and II, respectively. Specifically, the first column is the input image and ground truth. The rest of the columns represent the results of UNet, UNet++, UNeXt, HiFormer, MedSAM, GlobleNet, and AHF-UNet, respectively. The first and third lines represent the methods without AdSTNet. The second and fourth lines represent the methods with AdSTNet.

Figure 8. A visualized comparison of different methods on the Kvasir—SEG dataset. We present the results of two samples in Sections I and II, respectively. Specifically, the first column is the input image and ground truth. The rest of the columns represent the results of UNet, UNet++, UNeXt, HiFormer, MedSAM, GlobleNet, and AHF-UNet, respectively. The first and third lines represent the methods without AdSTNet. The second and fourth lines represent the methods with AdSTNet.

Figure 9. The visualizations of the ablation experiments. Section I shows the results of BUSI. Section II shows the results of ISIC. Section III shows the results of Kvasir—SEG. The first column of each part is the input image and ground truth. The rest of the columns show the visualization results of different modules. Other columns display the ablation experiment results of different modules.

Figure 10. The visualization results of (a) ISIC 2018 and (b) Kvasir-SEG. They include the input image, grountruth, adaptive scale threshold map, and lesion visualization map.

Table 1. Summary of medical image segmentation datasets.

Dataset	Domain	Number of Images	Image Resolution	Focus Area
ISIC 2018	Dermoscopy	2594	Varies	Skin lesion segmentation
BUSI	Ultrasound	780	500 × 500	Breast tumor segmentation
Kvasir-SEG	Endoscopy	1000	332 × 487 to 1920 × 1072	Polyp segmentation

Table 2. Experiments on ISIC 2018. “Param” and “Time” are short for parameter and the average segmentation time. “Rec.” and “Pre.” are the short for Recall and Precision. The best outcomes for each method are indicated in bold.

Methods	Without AdSTNet						With AdSTNet
Methods	Param (M)	Time (s)	mIoU (%)	mDice (%)	Rec. (%)	Pre. (%)	Param (M)	Time (s)	mIoU (%)	mDice (%)	Rec. (%)	Pre. (%)
UNet [10]	2.1621	2.97	81.84	88.84	90.90	90.21	2.1660	3.26	83.40	90.24	92.29	90.58
UNet++ [18]	9.1633	4.55	81.92	88.97	90.88	90.33	9.1672	5.09	84.35	90.87	92.99	90.91
UNeXt [26]	1.4719	3.32	80.87	88.19	90.76	89.38	1.4758	4.01	82.96	89.96	90.37	91.99
HiFormer [29]	25.5109	11.62	82.99	89.55	90.41	91.50	25.5148	12.93	85.01	91.37	93.69	90.85
MedSAM [17]	93.7354	31.94	86.51	92.43	94.69	90.65	93.7394	43.39	86.81	92.71	96.55	90.88
GobleNet [53]	133.2482	9.30	83.63	90.36	91.59	91.57	133.2521	10.11	84.92	91.27	90.53	93.81
AHF-UNet [54]	37.6676	5.60	81.96	88.89	89.17	92.05	37.6716	6.10	83.82	90.35	92.14	90.81

Table 3. Experiments on BUSI. “Rec.” and “Pre.” are short for Recall and Precision. The best outcomes for each method are indicated in bold.

Methods	Without AdSTNet				With AdSTNet
Methods	mIoU (%)	mDice (%)	Rec. (%)	Pre. (%)	mIoU (%)	mDice (%)	Rec. (%)	Pre. (%)
UNet [10]	69.93	78.53	83.02	77.36	71.66	80.32	84.32	80.56
UNet++ [18]	70.45	79.32	85.20	77.65	72.03	80.03	83.07	80.99
UNeXt [55]	66.47	75.87	83.54	72.04	71.32	81.01	82.20	83.99
HiFormer [29]	72.37	81.24	84.14	81.84	76.35	84.52	86.70	85.31
MedSAM [17]	77.78	86.96	89.86	84.71	78.11	87.25	90.69	88.29
GobleNet [53]	74.67	82.45	82.08	85.63	77.32	85.41	82.72	91.00
AHF-UNet [54]	72.47	80.96	82.84	82.60	76.41	84.97	83.73	89.08

Table 4. Experiments on Kvasir-SEG. “Rec.” and “Pre.” are short for Recall and Precision. The best outcomes for each method are indicated in bold.

Methods	Without AdSTNet				With AdSTNet
Methods	mIoU (%)	mDice (%)	Rec. (%)	Pre. (%)	mIoU (%)	mDice (%)	Rec. (%)	Pre. (%)
UNet [10]	71.99	80.41	84.79	81.30	73.08	81.91	84.53	82.93
UNet++ [18]	75.80	83.65	87.22	84.90	78.26	86.50	88.88	87.26
UNeXt [26]	66.72	76.01	83.11	75.57	67.98	77.73	85.32	75.44
HiFormer [29]	81.26	88.44	90.29	89.93	83.47	90.30	88.97	93.50
MedSAM [17]	74.92	84.57	86.76	83.43	75.12	84.95	89.79	83.89
GobleNet [53]	76.52	84.22	84.76	88.36	77.32	85.17	85.29	88.06
AHF-UNet [54]	77.88	85.58	86.03	89.37	79.19	86.50	86.51	88.24

Table 5. Ablation experiment results of UNet and HiFormer on ISIC. “✓” means “with”.

Backbone	TwoConvs	SACA	Laplacian	ConvGLU	mIoU (%)	mDice (%)
UNet	-	-	-	-	81.84	88.84
	✓	-	-	-	82.05	89.19
	-	-	✓	✓	82.47	89.51
	-	✓	-	✓	82.44	89.50
	-	✓	✓	-	81.96	89.16
	-	✓	✓	✓	83.40	90.24
HiFormer	-	-	-	-	82.29	89.55
	✓	-	-	-	83.40	90.20
	-	-	✓	✓	84.14	90.72
	-	✓	-	✓	84.37	90.97
	-	✓	✓	-	83.49	90.23
	-	✓	✓	✓	85.01	91.37

Table 6. Ablation experiment results of UNet and HiFormer on BUSI. “✓” means “with”.

Backbone	TwoConvs	SACA	Laplacian	ConvGLU	mIoU (%)	mDice (%)
UNet	-	-	-	-	69.93	78.53
	✓	-	-	-	69.97	79.17
	-	-	✓	✓	70.54	79.51
	-	✓	-	✓	71.27	80.76
	-	✓	✓	-	70.30	79.44
	-	✓	✓	✓	71.66	80.32
Hiformer	-	-	-	-	72.37	81.24
	✓	-	-	-	73.05	82.11
	-	-	✓	✓	75.27	84.55
	-	✓	-	✓	75.18	84.51
	-	✓	✓	-	74.02	83.17
	-	✓	✓	✓	76.35	84.52

Table 7. Ablation experiment results of UNet and HiFormer on Kvasir-SEG. “✓” means “with”.

Backbone	TwoConvs	SACA	Laplacian	ConvGLU	mIoU (%)	mDice (%)
UNet	-	-	-	-	71.99	80.41
	✓	-	-	-	72.15	81.26
	-	-	✓	✓	72.43	81.24
	-	✓	-	✓	72.61	81.54
	-	✓	✓	-	72.28	81.44
	-	✓	✓	✓	73.08	81.91
Hiformer	-	-	-	-	81.26	88.44
	✓	-	-	-	82.81	89.88
	-	-	✓	✓	83.07	89.51
	-	✓	-	✓	81.58	88.96
	-	✓	✓	-	83.13	90.19
	-	✓	✓	✓	83.47	90.30

Table 8. Comparison of gradient operators on the ISIC dataset.

Methods (ISIC)	mIoU (%)	mDice (%)
UNet	81.84	88.84
Sobel	83.28	90.12
Prewitt	82.94	89.96
Roberts	82.98	90.03
UNet-AdSTNet	83.40	90.24

Table 9. Evaluation of hidden layers. UNet is the baseline. “w/” is short for “with”.

Method	ISIC		BUSI
Method	mIoU (%)	mDice (%)	mIoU (%)	mDice (%)
UNet [10]	81.84	88.84	69.93	78.53
w/AdSTNet-Small	82.53	89.54	70.78	80.11
w/AdSTNet-Base	83.40	90.24	71.66	80.32
w/AdSTNet-Large	84.31	90.89	72.94	81.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Q.; Wang, W.; Wang, Z.; Jia, H.; Zhao, M. Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network. Appl. Sci. 2025, 15, 9259. https://doi.org/10.3390/app15179259

AMA Style

Chen Q, Wang W, Wang Z, Jia H, Zhao M. Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network. Applied Sciences. 2025; 15(17):9259. https://doi.org/10.3390/app15179259

Chicago/Turabian Style

Chen, Qi, Wenmin Wang, Zhibing Wang, Haomei Jia, and Minglu Zhao. 2025. "Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network" Applied Sciences 15, no. 17: 9259. https://doi.org/10.3390/app15179259

APA Style

Chen, Q., Wang, W., Wang, Z., Jia, H., & Zhao, M. (2025). Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network. Applied Sciences, 15(17), 9259. https://doi.org/10.3390/app15179259

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network

Abstract

1. Introduction

2. Related Work

2.1. Medical Image Segmentation Methods

2.2. Vatti Clipping Algorithm

2.3. Laplacian of Gaussian

3. Methodology

3.1. Adaptive Scale Threshold Maps

3.2. Center and Edge Attention Modules

3.2.1. Spatial Attention and Channel Attention

3.2.2. Laplacian Operator

3.3. Fusion Enhancement Module

3.4. Loss Function

4. Experimental Results

4.1. Datasets and Evaluation Metrics

4.2. Implementation Details

4.3. Performance Comparison

4.3.1. Quantitative Experiments

4.3.2. Qualitative Experiments

4.4. Ablation Study

4.4.1. Evaluation of the Adaptive Scale Threshold Method

4.4.2. Evaluation of SACA

4.4.3. Evaluation of the Laplacian Operator

4.4.4. Evaluation of ConvGLU

4.5. Evaluation of Gradient Operators

4.6. Evaluation of Hidden Layers

4.7. Enhanced Visualization for Clinical Diagnosis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI