Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (609)

Search Parameters:
Keywords = Swin transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
43 pages, 8627 KB  
Article
Fault Diagnosis of Rolling Bearings Based on HFMD and Dual-Branch Parallel Network Under Acoustic Signals
by Hengdi Wang, Haokui Wang and Jizhan Xie
Sensors 2025, 25(17), 5338; https://doi.org/10.3390/s25175338 - 28 Aug 2025
Abstract
This paper proposes a rolling bearing fault diagnosis method based on HFMD and a dual-branch parallel network, aiming to address the issue of diagnostic accuracy being compromised by the disparity in data quality across different source domains due to sparse feature separation in [...] Read more.
This paper proposes a rolling bearing fault diagnosis method based on HFMD and a dual-branch parallel network, aiming to address the issue of diagnostic accuracy being compromised by the disparity in data quality across different source domains due to sparse feature separation in rolling bearing acoustic signals. Traditional methods face challenges in feature extraction, sensitivity to noise, and difficulties in handling coupled multi-fault conditions in rolling bearing fault diagnosis. To overcome these challenges, this study first employs the HawkFish Optimization Algorithm to optimize Feature Mode Decomposition (HFMD) parameters, thereby improving modal decomposition accuracy. The optimal modal components are selected based on the minimum Residual Energy Index (REI) criterion, with their time-domain graphs and Continuous Wavelet Transform (CWT) time-frequency diagrams extracted as network inputs. Then, a dual-branch parallel network model is constructed, where the multi-scale residual structure (Res2Net) incorporating the Efficient Channel Attention (ECA) mechanism serves as the temporal branch to extract key features and suppress noise interference, while the Swin Transformer integrating multi-stage cross-scale attention (MSCSA) acts as the time-frequency branch to break through local perception bottlenecks and enhance classification performance under limited resources. Finally, the time-domain graphs and time-frequency graphs are, respectively, input into Res2Net and Swin Transformer, and the features from both branches are fused through a fully connected layer to obtain comprehensive fault diagnosis results. The research results demonstrate that the proposed method achieves 100% accuracy in open-source datasets. In the experimental data, the diagnostic accuracy of this study demonstrates significant advantages over other diagnostic models, achieving an accuracy rate of 98.5%. Under few-shot conditions, this study maintains an accuracy rate no lower than 95%, with only a 2.34% variation in accuracy. HFMD and the dual-branch parallel network exhibit remarkable stability and superiority in the field of rolling bearing fault diagnosis. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
13 pages, 2141 KB  
Article
Transformer-Based Semantic Segmentation of Japanese Knotweed in High-Resolution UAV Imagery Using Twins-SVT
by Sruthi Keerthi Valicharla, Roghaiyeh Karimzadeh, Xin Li and Yong-Lak Park
Information 2025, 16(9), 741; https://doi.org/10.3390/info16090741 - 28 Aug 2025
Abstract
Japanese knotweed (Fallopia japonica) is a noxious invasive plant species that requires scalable and precise monitoring methods. Current visually based ground surveys are resource-intensive and inefficient for detecting Japanese knotweed in landscapes. This study presents a transformer-based semantic segmentation framework for [...] Read more.
Japanese knotweed (Fallopia japonica) is a noxious invasive plant species that requires scalable and precise monitoring methods. Current visually based ground surveys are resource-intensive and inefficient for detecting Japanese knotweed in landscapes. This study presents a transformer-based semantic segmentation framework for the automated detection of Japanese knotweed patches using high-resolution RGB imagery acquired with unmanned aerial vehicles (UAVs). We used the Twins Spatially Separable Vision Transformer (Twins-SVT), which utilizes a hierarchical architecture with spatially separable self-attention to effectively model long-range dependencies and multiscale contextual features. The model was trained on 6945 annotated aerial images collected in three sites infested with Japanese knotweed in West Virginia, USA. The results of this study showed that the proposed framework achieved superior performance compared to other transformer-based baselines. The Twins-SVT model achieved a mean Intersection over Union (mIoU) of 94.94% and an Average Accuracy (AAcc) of 97.50%, outperforming SegFormer, Swin-T, and ViT. These findings highlight the model’s ability to accurately distinguish Japanese knotweed patches from surrounding vegetation. The method and protocol presented in this research provide a robust, scalable solution for mapping Japanese knotweed through aerial imagery and highlight the successful use of advanced vision transformers in ecological and geospatial information analysis. Full article
(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)
Show Figures

Graphical abstract

43 pages, 3603 KB  
Article
Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer
by Hengdi Wang, Haokui Wang, Jizhan Xie and Zikui Ma
Processes 2025, 13(9), 2742; https://doi.org/10.3390/pr13092742 - 27 Aug 2025
Abstract
To address the challenges of low diagnostic accuracy arising from the non-stationary and nonlinear time-varying characteristics of acoustic signals in rolling bearing fault diagnosis, as well as their susceptibility to noise interference, this paper proposes a fault diagnosis method based on a Weighted [...] Read more.
To address the challenges of low diagnostic accuracy arising from the non-stationary and nonlinear time-varying characteristics of acoustic signals in rolling bearing fault diagnosis, as well as their susceptibility to noise interference, this paper proposes a fault diagnosis method based on a Weighted Average Algorithm–Feature Mode Decomposition (WAA-FMD) and a Local–Global Adaptive Multi-scale Attention Mechanism (LGAF)–Swin Transformer. First, the WAA is utilized to optimize the key parameters of FMD, thereby enhancing its signal decomposition performance while minimizing noise interference. Next, a bilateral expansion strategy is implemented to extend both the time window and frequency band of the signal, which improves the temporal locality and frequency globality of the time–frequency diagram, significantly enhancing the ability to capture signal features. Ultimately, the introduction of depthwise separable convolution optimizes the receptive field and improves the computational efficiency of shallow networks. When combined with the Swin Transformer, which incorporates LGAF and adaptive feature selection modules, the model further enhances its perceptual capabilities and feature extraction accuracy through dynamic kernel adjustment and deep feature aggregation strategies. The experimental results indicate that the signal denoising performance of WAA-FMD significantly outperforms traditional denoising techniques. In the KAIST dataset (NSK 6205: inner raceway fault and outer raceway fault) and the experimental dataset (FAG 30205: inner raceway fault, outer raceway fault, and rolling element fault), the accuracies of the proposed model reach 100% and 98.62%, respectively, both exceeding that of other deep learning models. In summary, the proposed method demonstrates substantial advantages in noise reduction performance and fault diagnosis accuracy, providing valuable theoretical insights for practical applications. Full article
(This article belongs to the Section Process Control and Monitoring)
37 pages, 3806 KB  
Article
Comparative Evaluation of CNN and Transformer Architectures for Flowering Phase Classification of Tilia cordata Mill. with Automated Image Quality Filtering
by Bogdan Arct, Bartosz Świderski, Monika A. Różańska, Bogdan H. Chojnicki, Tomasz Wojciechowski, Gniewko Niedbała, Michał Kruk, Krzysztof Bobran and Jarosław Kurek
Sensors 2025, 25(17), 5326; https://doi.org/10.3390/s25175326 - 27 Aug 2025
Abstract
Understanding and monitoring the phenological phases of trees is essential for ecological research and climate change studies. In this work, we present a comprehensive evaluation of state-of-the-art convolutional neural networks (CNNs) and transformer architectures for the automated classification of the flowering phase of [...] Read more.
Understanding and monitoring the phenological phases of trees is essential for ecological research and climate change studies. In this work, we present a comprehensive evaluation of state-of-the-art convolutional neural networks (CNNs) and transformer architectures for the automated classification of the flowering phase of Tilia cordata Mill. (small-leaved lime) based on a large set of real-world images acquired under natural field conditions. The study introduces a novel, automated image quality filtering approach using an XGBoost classifier trained on diverse exposure and sharpness features to ensure robust input data for subsequent deep learning models. Seven modern neural network architectures, including VGG16, ResNet50, EfficientNetB3, MobileNetV3 Large, ConvNeXt Tiny, Vision Transformer (ViT-B/16), and Swin Transformer Tiny, were fine-tuned and evaluated under a rigorous cross-validation protocol. All models achieved excellent performance, with cross-validated F1-scores exceeding 0.97 and balanced accuracy up to 0.993. The best results were obtained for ResNet50 and ConvNeXt Tiny (F1-score: 0.9879 ± 0.0077 and 0.9860 ± 0.0073, balanced accuracy: 0.9922 ± 0.0054 and 0.9927 ± 0.0042, respectively), indicating outstanding sensitivity and specificity for both flowering and non-flowering classes. Classical CNNs (VGG16, ResNet50, and ConvNeXt Tiny) demonstrated slightly superior robustness compared to transformer-based models, though all architectures maintained high generalization and minimal variance across folds. The integrated quality assessment and classification pipeline enables scalable, high-throughput monitoring of flowering phases in natural environments. The proposed methodology is adaptable to other plant species and locations, supporting future ecological monitoring and climate studies. Our key contributions are as follows: (i) introducing an automated exposure-quality filtering stage for field imagery; (ii) publishing a curated, season-long dataset of Tilia cordata images; and (iii) providing the first systematic cross-validated benchmark that contrasts classical CNNs with transformer architectures for phenological phase recognition. Full article
(This article belongs to the Special Issue Application of UAV and Sensing in Precision Agriculture)
Show Figures

Figure 1

16 pages, 3757 KB  
Article
Seismic Channel Characterization Based on 3D DS-TransUnet
by Jiaqi Zhao, Binpeng Yan, Mutian Li and Rui Pan
Appl. Sci. 2025, 15(17), 9387; https://doi.org/10.3390/app15179387 - 27 Aug 2025
Abstract
The structure and geomorphology of channel systems play a critical role in interpreting sedimentary processes and characterizing subsurface reservoir capacity. This study presents an innovative 3D DS-TransUnet model for seismic channel interpretation. The model incorporates a multi-scale Swin Transformer architecture capable of processing [...] Read more.
The structure and geomorphology of channel systems play a critical role in interpreting sedimentary processes and characterizing subsurface reservoir capacity. This study presents an innovative 3D DS-TransUnet model for seismic channel interpretation. The model incorporates a multi-scale Swin Transformer architecture capable of processing 3D data in both the encoder and decoder, and integrates a feature fusion module into the skip connections to effectively combine shallow detail features with deep semantic features, thereby enhancing the detectability of weak reflection signals. This design not only enables the network to capture global dependencies but also preserves fine-grained local details, allowing for more robust feature learning under complex geological conditions. In addition, a complete synthetic data generation workflow is proposed, through which 300 pairs of high-quality synthetic data were constructed for model training. During training, the proposed model achieved a significantly faster convergence speed compared with other selected models. Experimental results on both synthetic and field seismic datasets demonstrate that the proposed method yields substantial improvements in channel boundary delineation accuracy and interference suppression, providing an efficient and reliable approach for intelligent channel recognition. Full article
(This article belongs to the Section Earth Sciences)
Show Figures

Figure 1

21 pages, 6790 KB  
Article
MGFormer: Super-Resolution Reconstruction of Retinal OCT Images Based on a Multi-Granularity Transformer
by Jingmin Luan, Zhe Jiao, Yutian Li, Yanru Si, Jian Liu, Yao Yu, Dongni Yang, Jia Sun, Zehao Wei and Zhenhe Ma
Photonics 2025, 12(9), 850; https://doi.org/10.3390/photonics12090850 - 25 Aug 2025
Viewed by 134
Abstract
Optical coherence tomography (OCT) acquisitions often reduce lateral sampling density to shorten scan time and suppress motion artifacts, but this strategy degrades the signal-to-noise ratio and obscures fine retinal microstructures. To recover these details without hardware modifications, we propose MGFormer, a lightweight Transformer [...] Read more.
Optical coherence tomography (OCT) acquisitions often reduce lateral sampling density to shorten scan time and suppress motion artifacts, but this strategy degrades the signal-to-noise ratio and obscures fine retinal microstructures. To recover these details without hardware modifications, we propose MGFormer, a lightweight Transformer for OCT super-resolution (SR) that integrates a multi-granularity attention mechanism with tensor distillation. A feature-enhancing convolution first sharpens edges; stacked multi-granularity attention blocks then fuse coarse-to-fine context, while a row-wise top-k operator retains the most informative tokens and preserves their positional order. We trained and evaluated MGFormer on B-scans from the Duke SD-OCT dataset at 2×, 4×, and 8× scaling factors. Relative to seven recent CNN- and Transformer-based SR models, MGFormer achieves the highest quantitative fidelity; at 4× it reaches 34.39 dB PSNR and 0.8399 SSIM, surpassing SwinIR by +0.52 dB and +0.026 SSIM, and reduces LPIPS by 21.4%. Compared with the same backbone without tensor distillation, FLOPs drop from 289G to 233G (−19.4%), and per-B-scan latency at 4× falls from 166.43 ms to 98.17 ms (−41.01%); the model size remains compact (105.68 MB). A blinded reader study shows higher scores for boundary sharpness (4.2 ± 0.3), pathology discernibility (4.1 ± 0.3), and diagnostic confidence (4.3 ± 0.2), exceeding SwinIR by 0.3–0.5 points. These results suggest that MGFormer can provide fast, high-fidelity OCT SR suitable for routine clinical workflows. Full article
(This article belongs to the Section Biophotonics and Biomedical Optics)
Show Figures

Figure 1

24 pages, 4538 KB  
Article
CNN–Transformer-Based Model for Maritime Blurred Target Recognition
by Tianyu Huang, Chao Pan, Jin Liu and Zhiwei Kang
Electronics 2025, 14(17), 3354; https://doi.org/10.3390/electronics14173354 - 23 Aug 2025
Viewed by 162
Abstract
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This [...] Read more.
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This paper proposes a dual-branch recognition method specifically designed for motion blur, which represents the most prevalent blur type in maritime scenarios. Conventional approaches exhibit constrained computational efficiency and limited adaptability across different modalities. To overcome these limitations, we propose a hybrid CNN–Transformer architecture: the CNN branch captures local blur characteristics, while the enhanced Transformer module models long-range dependencies via attention mechanisms. The CNN branch employs a lightweight ResNet variant, in which conventional residual blocks are substituted with Multi-Scale Gradient-Aware Residual Block (MSG-ARB). This architecture employs learnable gradient convolution for explicit local gradient feature extraction and utilizes gradient content gating to strengthen blur-sensitive region representation, significantly improving computational efficiency compared to conventional CNNs. The Transformer branch incorporates a Hierarchical Swin Transformer (HST) framework with Shifted Window-based Multi-head Self-Attention for global context modeling. The proposed method incorporates blur invariant Positional Encoding (PE) to enhance blur spectrum modeling capability, while employing DyT (Dynamic Tanh) module with learnable α parameters to replace traditional normalization layers. This architecture achieves a significant reduction in computational costs while preserving feature representation quality. Moreover, it efficiently computes long-range image dependencies using a compact 16 × 16 window configuration. The proposed feature fusion module synergistically integrates CNN-based local feature extraction with Transformer-enabled global representation learning, achieving comprehensive feature modeling across different scales. To evaluate the model’s performance and generalization ability, we conducted comprehensive experiments on four benchmark datasets: VAIS, GoPro, Mini-ImageNet, and Open Images V4. Experimental results show that our method achieves superior classification accuracy compared to state-of-the-art approaches, while simultaneously enhancing inference speed and reducing GPU memory consumption. Ablation studies confirm that the DyT module effectively suppresses outliers and improves computational efficiency, particularly when processing low-quality input data. Full article
Show Figures

Figure 1

33 pages, 8494 KB  
Article
Enhanced Multi-Class Brain Tumor Classification in MRI Using Pre-Trained CNNs and Transformer Architectures
by Marco Antonio Gómez-Guzmán, Laura Jiménez-Beristain, Enrique Efren García-Guerrero, Oscar Adrian Aguirre-Castro, José Jaime Esqueda-Elizondo, Edgar Rene Ramos-Acosta, Gilberto Manuel Galindo-Aldana, Cynthia Torres-Gonzalez and Everardo Inzunza-Gonzalez
Technologies 2025, 13(9), 379; https://doi.org/10.3390/technologies13090379 - 22 Aug 2025
Viewed by 252
Abstract
Early and accurate identification of brain tumors is essential for determining effective treatment strategies and improving patient outcomes. Artificial intelligence (AI) and deep learning (DL) techniques have shown promise in automating diagnostic tasks based on magnetic resonance imaging (MRI). This study evaluates the [...] Read more.
Early and accurate identification of brain tumors is essential for determining effective treatment strategies and improving patient outcomes. Artificial intelligence (AI) and deep learning (DL) techniques have shown promise in automating diagnostic tasks based on magnetic resonance imaging (MRI). This study evaluates the performance of four pre-trained deep convolutional neural network (CNN) architectures for the automatic multi-class classification of brain tumors into four categories: Glioma, Meningioma, Pituitary, and No Tumor. The proposed approach utilizes the publicly accessible Brain Tumor MRI Msoud dataset, consisting of 7023 images, with 5712 provided for training and 1311 for testing. To assess the impact of data availability, subsets containing 25%, 50%, 75%, and 100% of the training data were used. A stratified five-fold cross-validation technique was applied. The CNN architectures evaluated include DeiT3_base_patch16_224, Xception41, Inception_v4, and Swin_Tiny_Patch4_Window7_224, all fine-tuned using transfer learning. The training pipeline incorporated advanced preprocessing and image data augmentation techniques to enhance robustness and mitigate overfitting. Among the models tested, Swin_Tiny_Patch4_Window7_224 achieved the highest classification Accuracy of 99.24% on the test set using 75% of the training data. This model demonstrated superior generalization across all tumor classes and effectively addressed class imbalance issues. Furthermore, we deployed and benchmarked the best-performing DL model on embedded AI platforms (Jetson AGX Xavier and Orin Nano), demonstrating their capability for real-time inference and highlighting their feasibility for edge-based clinical deployment. The results highlight the strong potential of pre-trained deep CNN and transformer-based architectures in medical image analysis. The proposed approach provides a scalable and energy-efficient solution for automated brain tumor diagnosis, facilitating the integration of AI into clinical workflows. Full article
Show Figures

Figure 1

20 pages, 5304 KB  
Article
Deep Learning with UAV Imagery for Subtropical Sphagnum Peatland Vegetation Mapping
by Zhengshun Liu and Xianyu Huang
Remote Sens. 2025, 17(17), 2920; https://doi.org/10.3390/rs17172920 - 22 Aug 2025
Viewed by 365
Abstract
Peatlands are vital for global carbon cycling, and their ecological functions are influenced by vegetation composition. Accurate vegetation mapping is crucial for peatland management and conservation, but traditional methods face limitations such as low spatial resolution and labor-intensive fieldwork. We used ultra-high-resolution UAV [...] Read more.
Peatlands are vital for global carbon cycling, and their ecological functions are influenced by vegetation composition. Accurate vegetation mapping is crucial for peatland management and conservation, but traditional methods face limitations such as low spatial resolution and labor-intensive fieldwork. We used ultra-high-resolution UAV imagery captured across seasonal and topographic gradients and assessed the impact of phenology and topography on classification accuracy. Additionally, this study evaluated the performance of four deep learning models (ResNet, Swin Transformer, ConvNeXt, and EfficientNet) for mapping vegetation in a subtropical Sphagnum peatland. ConvNeXt achieved peak accuracy at 87% during non-growing seasons through its large-kernel feature extraction capability, while ResNet served as the optimal efficient alternative for growing-season applications. Non-growing seasons facilitated superior identification of Sphagnum and monocotyledons, whereas growing seasons enhanced dicotyledon distinction through clearer morphological features. Overall accuracy in low-lying humid areas was 12–15% lower than in elevated terrain due to severe spectral confusion among vegetation. SHapley Additive exPlanations (SHAP) of the ConvNeXt model identified key vegetation indices, the digital surface model, and select textural features as primary performance drivers. This study concludes that the combination of deep learning and UAV imagery presents a powerful tool for peatland vegetation mapping, highlighting the importance of considering phenological and topographical factors. Full article
Show Figures

Figure 1

21 pages, 9325 KB  
Article
Lightweight Model Improvement and Application for Rice Disease Classification
by Tonglai Liu, Mingguang Liu, Chengcheng Yang, Ancong Wu, Xiaodong Li and Wenzhao Wei
Electronics 2025, 14(16), 3331; https://doi.org/10.3390/electronics14163331 - 21 Aug 2025
Viewed by 220
Abstract
The timely and correct identification of rice diseases is essential to ensuring rice productivity. However, many methods have drawbacks such as slow recognition speed, low recognition accuracy and overly complex models that are unfavorable for portability. Therefore, this study proposes an improved model [...] Read more.
The timely and correct identification of rice diseases is essential to ensuring rice productivity. However, many methods have drawbacks such as slow recognition speed, low recognition accuracy and overly complex models that are unfavorable for portability. Therefore, this study proposes an improved model for accurately classifying rice diseases based on a two-level routing attention mechanism and dynamic convolution based on the above difficulties. The model employs Alterable Kernel Convolution with dynamic, irregularly shaped convolutional kernels and Bi-level Routing Attention that utilizes sparsity to reduce parameters and involves a GPU-friendly dense matrix multiplication, which can achieve high-precision rice disease recognition while ensuring lightweight and recognition speed. The model successfully classified 10 species, including nine diseased and healthy rice, with 97.31% accuracy and a 97.18% F1-score. Our proposed method outperforms MobileNetV3-large, EfficientNet-b0, Swin Transformer-tiny and ResNet-50 by 1.73%, 1.82%, 1.25% and 0.67%, respectively. Meanwhile, the model contains only 4.453×106 parameters and achieves an inference time of 6.13 s, which facilitates deployment on mobile devices.The proposed MobileViT_BiAK method effectively identifies rice diseases while providing a lightweight and high-performance classification solution. Full article
(This article belongs to the Special Issue Target Tracking and Recognition Techniques and Their Applications)
Show Figures

Figure 1

10 pages, 511 KB  
Article
Improving Benign and Malignant Classifications in Mammography with ROI-Stratified Deep Learning
by Kenji Yoshitsugu, Kazumasa Kishimoto and Tadamasa Takemura
Bioengineering 2025, 12(8), 885; https://doi.org/10.3390/bioengineering12080885 - 20 Aug 2025
Viewed by 266
Abstract
Deep learning has achieved widespread adoption for medical image diagnosis, with extensive research dedicated to mammographic image analysis for breast cancer screening. This study investigates the hypothesis that incorporating region-of-interest (ROI) mask information for individual mammographic images during deep learning can improve the [...] Read more.
Deep learning has achieved widespread adoption for medical image diagnosis, with extensive research dedicated to mammographic image analysis for breast cancer screening. This study investigates the hypothesis that incorporating region-of-interest (ROI) mask information for individual mammographic images during deep learning can improve the accuracy of benign/malignant diagnoses. Swin Transformer and ConvNeXtV2 deep learning models were used to evaluate their performance on the public VinDr and CDD-CESM datasets. Our approach involved stratifying mammographic images based on the presence or absence of ROI masks, performing independent training and prediction for each subgroup, and subsequently merging the results. Baseline prediction metrics (sensitivity, specificity, F-score, and accuracy) without ROI-stratified separation were the following: VinDr/Swin Transformer (0.00, 1.00, 0.00, 0.85), VinDr/ConvNeXtV2 (0.00, 1.00, 0.00, 0.85), CDD-CESM/Swin Transformer (0.29, 0.68, 0.41, 0.48), and CDD-CESM/ConvNeXtV2 (0.65, 0.65, 0.65, 0.65). Subsequent analysis with ROI-stratified separation demonstrated marked improvements in these metrics: VinDr/Swin Transformer (0.93, 0.87, 0.90, 0.87), VinDr/ConvNeXtV2 (0.90, 0.86, 0.88, 0.87), CDD-CESM/Swin Transformer (0.65, 0.65, 0.65, 0.65), and CDD-CESM/ConvNeXtV2 (0.74, 0.61, 0.67, 0.68). These findings provide compelling evidence that validate our hypothesis and affirm the utility of considering ROI mask information for enhanced diagnostic accuracy in mammography. Full article
Show Figures

Figure 1

26 pages, 7726 KB  
Article
Multi-Branch Channel-Gated Swin Network for Wetland Hyperspectral Image Classification
by Ruopu Liu, Jie Zhao, Shufang Tian, Guohao Li and Jingshu Chen
Remote Sens. 2025, 17(16), 2862; https://doi.org/10.3390/rs17162862 - 17 Aug 2025
Viewed by 337
Abstract
Hyperspectral classification of wetland environments remains challenging due to high spectral similarity, class imbalance, and blurred boundaries. To address these issues, we propose a novel Multi-Branch Channel-Gated Swin Transformer network (MBCG-SwinNet). In contrast to previous CNN-based designs, our model introduces a Swin Transformer [...] Read more.
Hyperspectral classification of wetland environments remains challenging due to high spectral similarity, class imbalance, and blurred boundaries. To address these issues, we propose a novel Multi-Branch Channel-Gated Swin Transformer network (MBCG-SwinNet). In contrast to previous CNN-based designs, our model introduces a Swin Transformer spectral branch to enhance global contextual modeling, enabling improved spectral discrimination. To effectively fuse spatial and spectral features, we design a residual feature interaction chain comprising a Residual Spatial Fusion (RSF) module, a channel-wise gating mechanism, and a multi-scale feature fusion (MFF) module, which together enhance spatial adaptivity and feature integration. Additionally, a DenseCRF-based post-processing step is employed to refine classification boundaries and suppress salt-and-pepper noise. Experimental results on three UAV-based hyperspectral wetland datasets from the Yellow River Delta (Shandong, China)—NC12, NC13, and NC16—demonstrate that MBCG-SwinNet achieves superior classification performance, with overall accuracies of 97.62%, 82.37%, and 97.32%, respectively—surpassing state-of-the-art methods. The proposed architecture offers a robust and scalable solution for hyperspectral image classification in complex ecological settings. Full article
Show Figures

Figure 1

18 pages, 897 KB  
Article
Self-Supervised Cloud Classification with Patch Rotation Tasks (SSCC-PR)
by Wuyang Yan, Xiong Xiong, Xinyuan Xia, Yanchao Zhang and Xiaojie Guo
Appl. Sci. 2025, 15(16), 9051; https://doi.org/10.3390/app15169051 - 16 Aug 2025
Viewed by 266
Abstract
Solar irradiance, which is closely influenced by cloud cover, significantly affects photovoltaic (PV) power generation efficiency. To improve cloud type recognition without relying on labeled data tasks, this paper proposes a self-supervised cloud classification method based on patch rotation prediction. In the Pre-training [...] Read more.
Solar irradiance, which is closely influenced by cloud cover, significantly affects photovoltaic (PV) power generation efficiency. To improve cloud type recognition without relying on labeled data tasks, this paper proposes a self-supervised cloud classification method based on patch rotation prediction. In the Pre-training stage, unlabeled ground-based cloud images are augmented through blockwise rotation, and high-level semantic representations are learned via a Swin Transformer encoder. In the fine-tuning stage, these representations are adapted to the cloud classification task using labeled data. Experimental results show that our method achieves 96.61% accuracy on the RCCD and 90.18% on the SWIMCAT dataset, outperforming existing supervised and self-supervised baselines by a clear margin. These results demonstrate the effectiveness and robustness of the proposed approach, especially in data-scarce scenarios. This research provides valuable technical support for improving the prediction of solar irradiance and optimizing PV power generation efficiency. Full article
Show Figures

Figure 1

20 pages, 3718 KB  
Article
The Lightweight Swin Transformer for Salinity Degree Classification in a Natural Saline Soil Environment
by Ruoxi Wang, Ling Yang, Qiliang Yang and Chunhao Cao
Agronomy 2025, 15(8), 1958; https://doi.org/10.3390/agronomy15081958 - 14 Aug 2025
Viewed by 223
Abstract
Excessive salt in soil can significantly reduce crop yield and quality by hindering nutrient absorption. Accurate classification of soil salinization degree is very important for the development of effective management strategies. In this paper, we propose a novel deep learning-based method to identify [...] Read more.
Excessive salt in soil can significantly reduce crop yield and quality by hindering nutrient absorption. Accurate classification of soil salinization degree is very important for the development of effective management strategies. In this paper, we propose a novel deep learning-based method to identify the degree of soil salinization using a Swin Transformer model enhanced with token-based knowledge distillation. The Swin Transformer, as the backbone of the model, provides comprehensive contextual information and a larger receptive field, ensuring efficient feature extraction. By incorporating token-based distillation, we effectively reduce model size and inference time, overcoming the traditional challenges of large parameters in Transformer models. Our model achieves a classification accuracy of 96.3% on the saline soil datasets of three degrees categories, outperforming existing methods. Compared to the baseline model, the number of parameters is reduced by 80.8%, ensuring faster and more efficient salinity detection. This method not only enhances the accuracy of soil salinity classification but also offers a cost-effective solution, providing valuable data to guide agricultural practitioners in improving soil quality and optimizing land resource management. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

20 pages, 6359 KB  
Article
Symmetry in Explainable AI: A Morphometric Deep Learning Analysis for Skin Lesion Classification
by Rafael Fernandez, Angélica Guzmán-Ponce, Ruben Fernandez-Beltran and Ginés García-Mateos
Symmetry 2025, 17(8), 1264; https://doi.org/10.3390/sym17081264 - 7 Aug 2025
Viewed by 322
Abstract
Deep learning has achieved remarkable performance in skin lesion classification, but its lack of interpretability often remains a critical barrier to clinical adoption. In this study, we investigate the spatial properties of saliency-based model explanations, focusing on symmetry and other morphometric features. We [...] Read more.
Deep learning has achieved remarkable performance in skin lesion classification, but its lack of interpretability often remains a critical barrier to clinical adoption. In this study, we investigate the spatial properties of saliency-based model explanations, focusing on symmetry and other morphometric features. We benchmark five deep learning architectures (ResNet-50, EfficientNetV2-S, ConvNeXt-Tiny, Swin-Tiny, and MaxViT-Tiny) on a nine-class skin lesion dataset from the International Skin Imaging Collaboration (ISIC) archive, generating saliency maps with Grad-CAM++ and LayerCAM. The best-performing model, Swin-Tiny, achieved an accuracy of 78.2% and a macro-F1 score of 71.2%. Our morphometric analysis reveals statistically significant differences in the explanation maps between correct and incorrect predictions. Notably, the transformer-based models exhibit highly significant differences (p<0.001) in metrics related to attentional focus (Entropy and Gini), indicating that their correct predictions are associated with more concentrated saliency maps. In contrast, convolutional models show less consistent differences, and only at a standard significance level (p<0.05). These findings suggest that the quantitative morphometric properties of saliency maps could serve as valuable indicators of predictive reliability in medical AI. Full article
Show Figures

Figure 1

Back to TopTop