4.3.1. Quantitative Experiments
To verify the effectiveness and robustness of AdSTNet, we evaluated it with five different backbones: UNet, UNet++, UNeXt, HiFormer, MedSAM, GobleNet, and AHG-UNet. These backbones were selected for their significance and strong performance in the field, as well as for the distinct architectures and purposes they represent. Specifically, UNet and UNet++ are classical CNN-based architectures, UNeXt is a lightweight network combining CNN and MLP, HiFormer represents a recent hybrid CNN-Transformer architecture, MedSAM is a trending large-scale model architecture for medical image segmentation, AHF-UNet employs attention modules and spatial channel attention mechanisms to improve the accuracy of medical image segmentation, and GobletNet utilizes wavelet transforms and attention mechanisms to enhance segmentation performance. The experimental results for all networks with and without AdSTNet are presented in
Table 2,
Table 3 and
Table 4.
Parameter Quantity and Average Segmentation Time:
Table 2 demonstrates the impact of integrating AdSTNet on the model parameters. Since AdSTNet operates only during the training phase and is designed with a lightweight architecture, its inclusion introduces minimal additional computational and storage overhead. For instance, the parameter count for UNet increased from 2.1621 M to 2.1660 M, while for UNet++, it rose from 9.1633 M to 9.1672 M. Other models also show the same trend. The negligible increase in parameter quantity can be attributed to AdSTNet’s architectural design, where most operations are performed during the training phase. Modules like SACA, the Laplacian operator, and ConvGLU enhance feature extraction and segmentation accuracy without significantly increasing model size. This characteristic makes AdSTNet particularly suitable for scenarios where memory efficiency is essential, such as deployment on devices with limited computational resources.
The average segmentation time, measured per image during inference, slightly increased across all models after integrating AdSTNet. Except for MedSAM, which requires more time, other models generally have a slight increase in the average segmentation time. The increased time is mainly due to additional processing by the Laplacian operator and ConvGLU, enhancing edge detection and feature refinement. However, the overhead remains minimal owing to module efficiency and AdSTNet’s lightweight design, resulting in only small increases in runtime and parameters across backbones.
ISIC: Experimental results on the ISIC dataset in
Table 2 demonstrate that integrating AdSTNet consistently improves segmentation performance by enhancing both lesion localization and boundary detection. The adaptive scale threshold method is a critical component that boosts segmentation accuracy using two distinct threshold maps: the body map and the edge map. The body map focuses on the core regions of lesions, enhancing feature representation for small or indistinct targets, while the edge map emphasizes boundary details, aiding in the detection of blurred or complex edges. The network can effectively reduce false positives and missed detections by employing this dual-threshold strategy.
The improvements in mIoU and mDice illustrate the significant impact of AdSTNet on the segmentation capability of baseline networks. For instance, the mIoU of UNet increased from 81.84% to 83.40%, and mDice improved from 88.84% to 90.24%, indicating better region localization and segmentation precision. Similarly, HiFormer produced an increase in mIoU from 82.99% to 85.01%, with mDice reaching 91.37%. Even the robust MedSAM benefited from improved edge recognition and small lesion detection. Moreover, GobleNet’s mIoU improved from 83.63% to 84.92%, and mDice from 90.36% to 91.27%. In comparison, AHF-UNet showed an mIoU increase from 81.96% to 83.82% and an mDice improvement from 88.89% to 90.35%, further confirming AdSTNet’s adaptability in boosting high-performing models.
Regarding Precision and Recall, AdSTNet effectively balances these metrics by enhancing sensitivity to low-contrast areas and reducing false positives. For instance, UNet’s Precision improved from 90.90% to 92.29%, indicating reduced over-prediction in non-lesion regions. Additionally, HiFormer achieved Precision and Recall scores of 93.69% and 90.85%, demonstrating AdSTNet’s robustness in complex backgrounds. Although GobleNet’s Recall slightly decreased by 1.06%, GobleNet’s Precision increased from 91.57% to 93.81%.
BUSI:
Table 3 highlights the performance differences of various segmentation models on the BUSI dataset before and after incorporating AdSTNet. AdSTNet’s inclusion effectively balances recognition performance by enhancing feature extraction and edge detection mechanisms. Due to the small amount of BUSI data and unclear pathological features, accurate localization of the lesion area is crucial. The SACA module greatly enhances this process by optimizing feature representation because it employs Group Normalization and cross-operation mechanisms to better capture spatial and channel-wise information. This enhanced attention mechanism distinguishes lesion features from background noise, particularly when dealing with small or low-contrast targets.
The mIoU and mDice metrics show consistent improvements with AdSTNet across all models. For example, UNet’s mIoU increased from 69.93% to 71.66%, and mDice rose from 78.53% to 80.32%, demonstrating enhanced segmentation performance. HiFormer’s mIoU improved from 72.37% to 76.35%, with mDice increasing by approximately 3%, indicating AdSTNet’s effectiveness in enhancing high-performance networks. Integrating AdSTNet with GobleNet resulted in an mIoU increase from 74.67% to 77.32% and an mDice improvement from 82.45% to 85.41%. Even UNeXt and MedSAM, despite their strong baselines, achieved further improvements in edge and small target recognition with AdSTNet integration. For AHF-UNet, all matrices showed noticeable improvements.
In terms of Precision and Recall, AdSTNet effectively balances recognition performance. Notably, UNeXt’s Recall improved from 72.04% to 83.99%, and HiFormer’s Recall increased to 85.31%, highlighting AdSTNet’s ability to capture blurred edges and small lesions while reducing missed detections. Meanwhile, the growth in Precision highlighted AdSTNet’s role in improving localization accuracy. Furthermore, the improvement of GobleNet and AHF-Net demonstrated AdSTNet’s robustness in complex backgrounds and better localization accuracy.
Kvasir-SEG: Experimental results on the Kvasir-SEG dataset in
Table 4 demonstrate AdSTNet’s ability to improve segmentation model performance, particularly in edge processing and overall segmentation accuracy. The data indicates that AdSTNet effectively strengthens lesion region recognition, enhances segmentation robustness, and maintains adaptability across different network architectures.
Experimental results on the Kvasir-SEG dataset demonstrate that AdSTNet significantly improved edge processing and region localization by enhancing edge detection by integrating the Laplacian operator. The boundary of polyps in the Kvasir SEG dataset is often similar to the background, so obtaining edge information is important. The Laplacian operator captures high-frequency information related to lesion boundaries, allowing the model to better distinguish fine edge details that are often blurred or poorly defined.
The mIoU and mDice metrics reveal substantial improvements with AdSTNet across all models. For example, UNet’s mIoU increased from 71.99% to 73.08%, and mDice improved from 80.41% to 81.91%, demonstrating enhanced segmentation accuracy. HiFormer’s mIoU increased from 81.26% to 83.47%, and mDice reached 90.30%, highlighting AdSTNet’s effectiveness in enhancing complex models. Integrating AdSTNet with GobleNet improved the mIoU from 76.52% to 77.32% and the mDice from 84.22% to 85.17%. For AHF-UNet, the mIoU rose from 77.88% to 79.19%, and the mDice increased from 85.58% to 86.50%, demonstrating enhanced segmentation accuracy. Furthermore, MedSAM also benefited from improved segmentation performance, confirming AdSTNet’s compatibility with advanced architectures.
For Precision and Recall, AdSTNet effectively improved edge processing and region localization. HiFormer’s Recall increased from 89.93% to 93.50%, highlighting its strong sensitivity to complex boundaries and small regions. However, Precision slightly decreased from 90.29% to 88.97%, suggesting a trade-off where greater sensitivity to positive samples may introduce minor false positives. In contrast, UNet++ achieved balanced improvements, with Precision and Recall increasing by 1.66% and 2.36%, respectively, indicating balanced performance enhancement. GobleNet’s Recall improved from 84.76% to 85.29%, but Precision slightly decreased from 88.36% to 88.06%, likely due to increased sensitivity causing minor false positives. AHF-UNet also showed consistent improvements, demonstrating AdSTNet’s ability to enhance edge information while maintaining stable detection performance.
Multi-Dataset Experimental Analysis: From the perspective of dataset characteristics, ISIC primarily focuses on skin lesion segmentation, where target edges are often blurred and small lesions are prominent. AdSTNet demonstrates exceptional performance in enhancing blurred edges and improving small target detection, with significant improvements in mIoU and mDice across various models. Notably, in networks like HiFormer and UNet++, Recall showed a substantial increase, reducing missed detections in blurred edge regions. BUSI comprises breast ultrasound images with complex target boundaries and low contrast between small lesions and the background. The results show marked improvements in both Precision and Recall. For example, HiFormer’s Recall increased to 85.31%. Kvasir-SEG focuses on gastrointestinal lesion segmentation, where lesion size and shape vary significantly. For instance, UNet++ achieved an mIoU of 78.26% and an mDice of 86.50%, validating AdSTNet’s adaptability to diverse lesion shapes and sizes.
From a model perspective, AdSTNet consistently enhances performance across various architectures. For classical convolutional networks like UNet and UNet++, AdSTNet significantly improves edge feature representation, resulting in better segmentation performance across all datasets. For the lightweight network UNeXt, AdSTNet effectively boosts Recall and mIoU, addressing its limitations in detecting edge regions and small targets. Even for advanced architectures like HiFormer and MedSAM, which already have high baseline performance, AdSTNet provides additional optimization, particularly in enhancing edge recognition and small lesion detection.
AdSTNet employs the adaptive scale threshold mechanism, the SACA module, the Laplacian operator, and ConvGLU to enhance segmentation accuracy. The adaptive scale threshold mechanism provides dual-threshold maps for lesion cores and edges, improving localization and boundary detection. The SACA module enhances feature representation through efficient attention mechanisms, while the Laplacian operator improves edge clarity by emphasizing high-frequency details. Then, ConvGLU is introduced to enhance feature refinement by selectively controlling information flow within the network. This synergistic combination allows AdSTNet to improve performance significantly across diverse segmentation tasks. However, AdSTNet’s lightweight design introduces minimal parameters during training, making it a versatile post-processing module compatible with various segmentation networks.
From
Table 2,
Table 3 and
Table 4, we may observe slight decreases in Recall and Precision for some models after integrating AdSTNet. This is due to the adaptive scale threshold mechanism, which enhances edge detection and small lesion recognition by generating body and edge maps. While improving sensitivity to low-contrast and blurred regions, the mechanism may also introduce false positives, especially in complex backgrounds, leading to lower Precision. Conversely, if the mechanism is too conservative in suppressing noise or enhancing edges, it may overlook small or unclear lesions, reducing Recall. These findings highlight the need to carefully tune the adaptive scale threshold mechanism to balance lesion localization and boundary detection.
Furthermore, the improvements in some models were marginal due to their high accuracy and effective edge handling, leaving little room for optimization. High-precision models like MedSAM already excel in edge awareness and small target detection, limiting AdSTNet’s impact. However, AdSTNet’s flexibility and adaptability offer efficient, low-cost solutions for small lesion detection and complex edge processing, enhancing performance across various segmentation tasks.