Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (143)

Search Parameters:
Keywords = mobile vision transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 5732 KB  
Article
Explainable Transformer-Based Framework for Glaucoma Detection from Fundus Images Using Multi-Backbone Segmentation and vCDR-Based Classification
by Hind Alasmari, Ghada Amoudi and Hanan Alghamdi
Diagnostics 2025, 15(18), 2301; https://doi.org/10.3390/diagnostics15182301 - 10 Sep 2025
Abstract
Glaucoma is an eye disease caused by increased intraocular pressure (IOP) that affects the optic nerve head (ONH), leading to vision problems and irreversible blindness. Background/Objectives: Glaucoma is the second leading cause of blindness worldwide, and the number of people affected is [...] Read more.
Glaucoma is an eye disease caused by increased intraocular pressure (IOP) that affects the optic nerve head (ONH), leading to vision problems and irreversible blindness. Background/Objectives: Glaucoma is the second leading cause of blindness worldwide, and the number of people affected is increasing each year, with the number expected to reach 111.8 million by 2040. This escalating trend is alarming due to the lack of ophthalmology specialists relative to the population. This study proposes an explainable end-to-end pipeline for automated glaucoma diagnosis from fundus images. It also evaluates the performance of Vision Transformers (ViTs) relative to traditional CNN-based models. Methods: The proposed system uses three datasets: REFUGE, ORIGA, and G1020. It begins with YOLOv11 for object detection of the optic disc. Then, the optic disc (OD) and optic cup (OC) are segmented using U-Net with ResNet50, VGG16, and MobileNetV2 backbones, as well as MaskFormer with a Swin-Base backbone. Glaucoma is classified based on the vertical cup-to-disc ratio (vCDR). Results: MaskFormer outperforms all models in segmentation in all aspects, including IoU OD, IoU OC, DSC OD, and DSC OC, with scores of 88.29%, 91.09%, 93.83%, and 93.71%. For classification, it achieved accuracy and F1-scores of 84.03% and 84.56%. Conclusions: By relying on the interpretable features of the vCDR, the proposed framework enhances transparency and aligns well with the principles of explainable AI, thus offering a trustworthy solution for glaucoma screening. Our findings show that Vision Transformers offer a promising approach for achieving high segmentation performance with explainable, biomarker-driven diagnosis. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

17 pages, 2874 KB  
Article
Emulating Hyperspectral and Narrow-Band Imaging for Deep-Learning-Driven Gastrointestinal Disorder Detection in Wireless Capsule Endoscopy
by Chu-Kuang Chou, Kun-Hua Lee, Riya Karmakar, Arvind Mukundan, Pratham Chandraskhar Gade, Devansh Gupta, Chang-Chao Su, Tsung-Hsien Chen, Chou-Yuan Ko and Hsiang-Chen Wang
Bioengineering 2025, 12(9), 953; https://doi.org/10.3390/bioengineering12090953 - 4 Sep 2025
Viewed by 338
Abstract
Diagnosing gastrointestinal disorders (GIDs) remains a significant challenge, particularly when relying on wireless capsule endoscopy (WCE), which lacks advanced imaging enhancements like Narrow Band Imaging (NBI). To address this, we propose a novel framework, the Spectrum-Aided Vision Enhancer (SAVE), especially designed to transform [...] Read more.
Diagnosing gastrointestinal disorders (GIDs) remains a significant challenge, particularly when relying on wireless capsule endoscopy (WCE), which lacks advanced imaging enhancements like Narrow Band Imaging (NBI). To address this, we propose a novel framework, the Spectrum-Aided Vision Enhancer (SAVE), especially designed to transform standard white light (WLI) endoscopic images into spectrally enriched representations that emulate both hyperspectral imaging (HSI) and NBI formats. By leveraging color calibration through the Macbeth Color Checker, gamma correction, CIE 1931 XYZ transformation, and principal component analysis (PCA), SAVE reconstructs detailed spectral information from conventional RGB inputs. Performance was evaluated using the Kvasir-v2 dataset, which includes 6490 annotated images spanning eight GI-related categories. Deep learning models like Inception-Net V3, MobileNetV2, MobileNetV3, and AlexNet were trained on both original WLI- and SAVE-enhanced images. Among these, MobileNetV2 achieved an F1-score of 96% for polyp classification using SAVE, and AlexNet saw a notable increase in average accuracy to 84% when applied to enhanced images. Image quality assessment showed high structural similarity (SSIM scores of 93.99% for Olympus endoscopy and 90.68% for WCE), confirming the fidelity of the spectral transformations. Overall, the SAVE framework offers a practical, software-based enhancement strategy that significantly improves diagnostic accuracy in GI imaging, with strong implications for low-cost, non-invasive diagnostics using capsule endoscopy systems. Full article
Show Figures

Figure 1

23 pages, 33339 KB  
Article
Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques
by Thi-Nhung Le, Duc-Manh Nguyen, A-Cong Giang, Hong-Thai Pham, Thi-Lan Le and Hai Vu
AgriEngineering 2025, 7(9), 282; https://doi.org/10.3390/agriengineering7090282 - 1 Sep 2025
Viewed by 403
Abstract
Identifying the botanical origin of honey is essential for ensuring its quality, preventing adulteration, and protecting consumers. Traditional techniques, such as melissopalynology, physicochemical analysis, and PCR, are often labor-intensive, time-consuming, or limited to the detection of only known species, while advanced DNA sequencing [...] Read more.
Identifying the botanical origin of honey is essential for ensuring its quality, preventing adulteration, and protecting consumers. Traditional techniques, such as melissopalynology, physicochemical analysis, and PCR, are often labor-intensive, time-consuming, or limited to the detection of only known species, while advanced DNA sequencing remains prohibitively costly. In this study, we aim to develop a deep learning-based approach for identifying pollen grains extracted from honey and captured through microscopic imaging. To achieve this, we first constructed a dataset named VNUA-Pollen52, which consists of microscopic images of pollen grains collected from flowers of plant species cultivated in the surveyed area in Hanoi, Vietnam. Second, we evaluated the classification performance of advanced deep learning models, including MobileNet, YOLOv11, and Vision Transformer, on pollen grain images. To improve performances of these model, we proposed data augmentation and hybrid fusion strategies to improve the identification accuracy of pollen grains extracted from honey. Third, we developed an online platform to support experts in identifying these pollen grains and to gather expert consensus, ensuring accurate determination of the plant species and providing a basis for evaluating the proposed identification strategy. Experimental results on 93 images of pollen grains extracted from honey samples demonstrated the effectiveness of the proposed hybrid fusion strategy, achieving 70.21% accuracy at rank 1 and 92.47% at rank 5. This study demonstrates the capability of recent advances in computer vision to identify pollen grains using their microscopic images, thereby opening up opportunities for the development of automated systems that support plant traceability and quality control of honey. Full article
Show Figures

Graphical abstract

29 pages, 11689 KB  
Article
Enhanced Breast Cancer Diagnosis Using Multimodal Feature Fusion with Radiomics and Transfer Learning
by Nazmul Ahasan Maruf, Abdullah Basuhail and Muhammad Umair Ramzan
Diagnostics 2025, 15(17), 2170; https://doi.org/10.3390/diagnostics15172170 - 28 Aug 2025
Viewed by 592
Abstract
Background: Breast cancer remains a critical public health problem worldwide and is a leading cause of cancer-related mortality. Optimizing clinical outcomes is contingent upon the early and precise detection of malignancies. Advances in medical imaging and artificial intelligence (AI), particularly in the fields [...] Read more.
Background: Breast cancer remains a critical public health problem worldwide and is a leading cause of cancer-related mortality. Optimizing clinical outcomes is contingent upon the early and precise detection of malignancies. Advances in medical imaging and artificial intelligence (AI), particularly in the fields of radiomics and deep learning (DL), have contributed to improvements in early detection methodologies. Nonetheless, persistent challenges, including limited data availability, model overfitting, and restricted generalization, continue to hinder performance. Methods: This study aims to overcome existing challenges by improving model accuracy and robustness through enhanced data augmentation and the integration of radiomics and deep learning features from the CBIS-DDSM dataset. To mitigate overfitting and improve model generalization, data augmentation techniques were applied. The PyRadiomics library was used to extract radiomics features, while transfer learning models were employed to derive deep learning features from the augmented training dataset. For radiomics feature selection, we compared multiple supervised feature selection methods, including RFE with random forest and logistic regression, ANOVA F-test, LASSO, and mutual information. Embedded methods with XGBoost, LightGBM, and CatBoost for GPUs were also explored. Finally, we integrated radiomics and deep features to build a unified multimodal feature space for improved classification performance. Based on this integrated set of radiomics and deep learning features, 13 pre-trained transfer learning models were trained and evaluated, including various versions of ResNet (50, 50V2, 101, 101V2, 152, 152V2), DenseNet (121, 169, 201), InceptionV3, MobileNet, and VGG (16, 19). Results: Among the evaluated models, ResNet152 achieved the highest classification accuracy of 97%, demonstrating the potential of this approach to enhance diagnostic precision. Other models, including VGG19, ResNet101V2, and ResNet101, achieved 96% accuracy, emphasizing the importance of the selected feature set in achieving robust detection. Conclusions: Future research could build on this work by incorporating Vision Transformer (ViT) architectures and leveraging multimodal data (e.g., clinical data, genomic information, and patient history). This could improve predictive performance and make the model more robust and adaptable to diverse data types. Ultimately, this approach has the potential to transform breast cancer detection, making it more accurate and interpretable. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

37 pages, 3806 KB  
Article
Comparative Evaluation of CNN and Transformer Architectures for Flowering Phase Classification of Tilia cordata Mill. with Automated Image Quality Filtering
by Bogdan Arct, Bartosz Świderski, Monika A. Różańska, Bogdan H. Chojnicki, Tomasz Wojciechowski, Gniewko Niedbała, Michał Kruk, Krzysztof Bobran and Jarosław Kurek
Sensors 2025, 25(17), 5326; https://doi.org/10.3390/s25175326 - 27 Aug 2025
Viewed by 628
Abstract
Understanding and monitoring the phenological phases of trees is essential for ecological research and climate change studies. In this work, we present a comprehensive evaluation of state-of-the-art convolutional neural networks (CNNs) and transformer architectures for the automated classification of the flowering phase of [...] Read more.
Understanding and monitoring the phenological phases of trees is essential for ecological research and climate change studies. In this work, we present a comprehensive evaluation of state-of-the-art convolutional neural networks (CNNs) and transformer architectures for the automated classification of the flowering phase of Tilia cordata Mill. (small-leaved lime) based on a large set of real-world images acquired under natural field conditions. The study introduces a novel, automated image quality filtering approach using an XGBoost classifier trained on diverse exposure and sharpness features to ensure robust input data for subsequent deep learning models. Seven modern neural network architectures, including VGG16, ResNet50, EfficientNetB3, MobileNetV3 Large, ConvNeXt Tiny, Vision Transformer (ViT-B/16), and Swin Transformer Tiny, were fine-tuned and evaluated under a rigorous cross-validation protocol. All models achieved excellent performance, with cross-validated F1-scores exceeding 0.97 and balanced accuracy up to 0.993. The best results were obtained for ResNet50 and ConvNeXt Tiny (F1-score: 0.9879 ± 0.0077 and 0.9860 ± 0.0073, balanced accuracy: 0.9922 ± 0.0054 and 0.9927 ± 0.0042, respectively), indicating outstanding sensitivity and specificity for both flowering and non-flowering classes. Classical CNNs (VGG16, ResNet50, and ConvNeXt Tiny) demonstrated slightly superior robustness compared to transformer-based models, though all architectures maintained high generalization and minimal variance across folds. The integrated quality assessment and classification pipeline enables scalable, high-throughput monitoring of flowering phases in natural environments. The proposed methodology is adaptable to other plant species and locations, supporting future ecological monitoring and climate studies. Our key contributions are as follows: (i) introducing an automated exposure-quality filtering stage for field imagery; (ii) publishing a curated, season-long dataset of Tilia cordata images; and (iii) providing the first systematic cross-validated benchmark that contrasts classical CNNs with transformer architectures for phenological phase recognition. Full article
(This article belongs to the Special Issue Application of UAV and Sensing in Precision Agriculture)
Show Figures

Figure 1

22 pages, 2117 KB  
Article
Deep Learning-Powered Down Syndrome Detection Using Facial Images
by Mujeeb Ahmed Shaikh, Hazim Saleh Al-Rawashdeh and Abdul Rahaman Wahab Sait
Life 2025, 15(9), 1361; https://doi.org/10.3390/life15091361 - 27 Aug 2025
Viewed by 472
Abstract
Down syndrome (DS) is one of the prevalent chromosomal disorders, representing distinctive craniofacial features and a range of developmental and medical challenges. Due to the lack of clinical expertise and high infrastructure costs, access to genetic testing is restricted to resource-constrained clinical settings. [...] Read more.
Down syndrome (DS) is one of the prevalent chromosomal disorders, representing distinctive craniofacial features and a range of developmental and medical challenges. Due to the lack of clinical expertise and high infrastructure costs, access to genetic testing is restricted to resource-constrained clinical settings. There is a demand for developing a non-invasive and equitable DS screening tool, facilitating DS diagnosis for a wide range of populations. In this study, we develop and validate a robust, interpretable deep learning model for the early detection of DS using facial images of infants. A hybrid feature extraction architecture combining RegNet X–MobileNet V3 and vision transformer (ViT)-Linformer is developed for effective feature representation. We use an adaptive attention-based feature fusion to enhance the proposed model’s focus on diagnostically relevant facial regions. Bayesian optimization with hyperband (BOHB) fine-tuned extremely randomized trees (ExtraTrees) is employed to classify the features. To ensure the model’s generalizability, stratified five-fold cross-validation is performed. Compared to the recent DS classification approaches, the proposed model demonstrates outstanding performance, achieving an accuracy of 99.10%, precision of 98.80%, recall of 98.87%, F1-score of 98.83%, and specificity of 98.81%, on the unseen data. The findings underscore the strengths of the proposed model as a reliable screening tool to identify DS in the early stages using the facial images. This study paves the foundation to build equitable, scalable, and trustworthy digital solution for effective pediatric care across the globe. Full article
(This article belongs to the Section Medical Research)
Show Figures

Figure 1

32 pages, 25342 KB  
Article
An End-to-End Computationally Lightweight Vision-Based Grasping System for Grocery Items
by Thanavin Mansakul, Gilbert Tang, Phil Webb, Jamie Rice, Daniel Oakley and James Fowler
Sensors 2025, 25(17), 5309; https://doi.org/10.3390/s25175309 - 26 Aug 2025
Viewed by 611
Abstract
Vision-based grasping for mobile manipulators poses significant challenges in machine perception, computational efficiency, and real-world deployment. This study presents a computationally lightweight, end-to-end grasp detection framework that integrates object detection, object pose estimation, and grasp point prediction for a mobile manipulator equipped with [...] Read more.
Vision-based grasping for mobile manipulators poses significant challenges in machine perception, computational efficiency, and real-world deployment. This study presents a computationally lightweight, end-to-end grasp detection framework that integrates object detection, object pose estimation, and grasp point prediction for a mobile manipulator equipped with a parallel gripper. A transformation model is developed to map coordinates from the image frame to the robot frame, enabling accurate manipulation. To evaluate system performance, a benchmark and a dataset tailored to pick-and-pack grocery tasks are introduced. Experimental validation demonstrates an average execution time of under 5 s on an edge device, achieving a 100% success rate on Level 1 and 96% on Level 2 of the benchmark. Additionally, the system achieves an average compute-to-speed ratio of 0.0130, highlighting its energy efficiency. The proposed framework offers a practical, robust, and efficient solution for lightweight robotic applications in real-world environments. Full article
Show Figures

Figure 1

22 pages, 23322 KB  
Article
MS-PreTE: A Multi-Scale Pre-Training Encoder for Mobile Encrypted Traffic Classification
by Ziqi Wang, Yufan Qiu, Yaping Liu, Shuo Zhang and Xinyi Liu
Big Data Cogn. Comput. 2025, 9(8), 216; https://doi.org/10.3390/bdcc9080216 - 21 Aug 2025
Viewed by 530
Abstract
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar [...] Read more.
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar TCP flow characteristics across different applications. This makes it challenging for existing traffic classification methods to effectively identify mobile traffic. To address the challenge, we propose MS-PreTE, a two-phase pre-training framework for mobile traffic classification. MS-PreTE introduces a novel multi-level representation model to preserve traffic information from diverse perspectives and hierarchical levels. Furthermore, MS-PreTE incorporates a focal-attention mechanism to enhance the model’s capability in discerning subtle differences among similar traffic flows. Evaluations demonstrate that MS-PreTE achieves state-of-the-art performance on three mobile application datasets, boosting the F1 score for Cross-platform (iOS) to 99.34% (up by 2.1%), Cross-platform (Android) to 98.61% (up by 1.6%), and NUDT-Mobile-Traffic to 87.70% (up by 2.47%). Moreover, MS-PreTE exhibits strong generalization capabilities across four real-world traffic datasets. Full article
Show Figures

Figure 1

30 pages, 4741 KB  
Article
TriViT-Lite: A Compact Vision Transformer–MobileNet Model with Texture-Aware Attention for Real-Time Facial Emotion Recognition in Healthcare
by Waqar Riaz, Jiancheng (Charles) Ji and Asif Ullah
Electronics 2025, 14(16), 3256; https://doi.org/10.3390/electronics14163256 - 16 Aug 2025
Cited by 1 | Viewed by 369
Abstract
Facial emotion recognition has become increasingly important in healthcare, where understanding delicate cues like pain, discomfort, or unconsciousness can support more timely and responsive care. Yet, recognizing facial expressions in real-world settings remains challenging due to varying lighting, facial occlusions, and hardware limitations [...] Read more.
Facial emotion recognition has become increasingly important in healthcare, where understanding delicate cues like pain, discomfort, or unconsciousness can support more timely and responsive care. Yet, recognizing facial expressions in real-world settings remains challenging due to varying lighting, facial occlusions, and hardware limitations in clinical environments. To address this, we propose TriViT-Lite, a lightweight yet powerful model that blends three complementary components: MobileNet, for capturing fine-grained local features efficiently; Vision Transformers (ViT), for modeling global facial patterns; and handcrafted texture descriptors, such as Local Binary Patterns (LBP) and Histograms of Oriented Gradients (HOG), for added robustness. These multi-scale features are brought together through a texture-aware cross-attention fusion mechanism that helps the model focus on the most relevant facial regions dynamically. TriViT-Lite is evaluated on both benchmark datasets (FER2013, AffectNet) and a custom healthcare-oriented dataset covering seven critical emotional states, including pain and unconsciousness. It achieves a competitive accuracy of 91.8% on FER2013 and of 87.5% on the custom dataset while maintaining real-time performance (~15 FPS) on resource-constrained edge devices. Our results show that TriViT-Lite offers a practical and accurate solution for real-time emotion recognition, particularly in healthcare settings. It strikes a balance between performance, interpretability, and efficiency, making it a strong candidate for machine-learning-driven pattern recognition in patient-monitoring applications. Full article
Show Figures

Figure 1

14 pages, 841 KB  
Article
Enhanced Deep Learning for Robust Stress Classification in Sows from Facial Images
by Syed U. Yunas, Ajmal Shahbaz, Emma M. Baxter, Mark F. Hansen, Melvyn L. Smith and Lyndon N. Smith
Agriculture 2025, 15(15), 1675; https://doi.org/10.3390/agriculture15151675 - 2 Aug 2025
Viewed by 411
Abstract
Stress in pigs poses significant challenges to animal welfare and productivity in modern pig farming, contributing to increased antimicrobial use and the rise of antimicrobial resistance (AMR). This study involves stress classification in pregnant sows by exploring five deep learning models: ConvNeXt, EfficientNet_V2, [...] Read more.
Stress in pigs poses significant challenges to animal welfare and productivity in modern pig farming, contributing to increased antimicrobial use and the rise of antimicrobial resistance (AMR). This study involves stress classification in pregnant sows by exploring five deep learning models: ConvNeXt, EfficientNet_V2, MobileNet_V3, RegNet, and Vision Transformer (ViT). These models are used for stress detection from facial images, leveraging an expanded dataset. A facial image dataset of sows was collected at Scotland’s Rural College (SRUC) and the images were categorized into primiparous Low-Stressed (LS) and High-Stress (HS) groups based on expert behavioural assessments and cortisol level analysis. The selected deep learning models were then trained on this enriched dataset and their performance was evaluated using cross-validation on unseen data. The Vision Transformer (ViT) model outperformed the others across the dataset of annotated facial images, achieving an average accuracy of 0.75, an F1 score of 0.78 for high-stress detection, and consistent batch-level performance (up to 0.88 F1 score). These findings highlight the efficacy of transformer-based models for automated stress detection in sows, supporting early intervention strategies to enhance welfare, optimize productivity, and mitigate AMR risks in livestock production. Full article
Show Figures

Figure 1

15 pages, 2123 KB  
Article
Multi-Class Visual Cyberbullying Detection Using Deep Neural Networks and the CVID Dataset
by Muhammad Asad Arshed, Zunera Samreen, Arslan Ahmad, Laiba Amjad, Hasnain Muavia, Christine Dewi and Muhammad Kabir
Information 2025, 16(8), 630; https://doi.org/10.3390/info16080630 - 24 Jul 2025
Viewed by 1527
Abstract
In an era where online interactions increasingly shape social dynamics, the pervasive issue of cyberbullying poses a significant threat to the well-being of individuals, particularly among vulnerable groups. Despite extensive research on text-based cyberbullying detection, the rise of visual content on social media [...] Read more.
In an era where online interactions increasingly shape social dynamics, the pervasive issue of cyberbullying poses a significant threat to the well-being of individuals, particularly among vulnerable groups. Despite extensive research on text-based cyberbullying detection, the rise of visual content on social media platforms necessitates new approaches to address cyberbullying using images. This domain has been largely overlooked. In this paper, we present a novel dataset specifically designed for the detection of visual cyberbullying, encompassing four distinct classes: abuse, curse, discourage, and threat. The initial prepared dataset (cyberbullying visual indicators dataset (CVID)) comprised 664 samples for training and validation, expanded through data augmentation techniques to ensure balanced and accurate results across all classes. We analyzed this dataset using several advanced deep learning models, including VGG16, VGG19, MobileNetV2, and Vision Transformer. The proposed model, based on DenseNet201, achieved the highest test accuracy of 99%, demonstrating its efficacy in identifying the visual cues associated with cyberbullying. To prove the proposed model’s generalizability, the 5-fold stratified K-fold was also considered, and the model achieved an average test accuracy of 99%. This work introduces a dataset and highlights the potential of leveraging deep learning models to address the multifaceted challenges of detecting cyberbullying in visual content. Full article
(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)
Show Figures

Figure 1

24 pages, 8015 KB  
Article
Innovative Multi-View Strategies for AI-Assisted Breast Cancer Detection in Mammography
by Beibit Abdikenov, Tomiris Zhaksylyk, Aruzhan Imasheva, Yerzhan Orazayev and Temirlan Karibekov
J. Imaging 2025, 11(8), 247; https://doi.org/10.3390/jimaging11080247 - 22 Jul 2025
Viewed by 1016
Abstract
Mammography is the main method for early detection of breast cancer, which is still a major global health concern. However, inter-reader variability and the inherent difficulty of interpreting subtle radiographic features frequently limit the accuracy of diagnosis. A thorough assessment of deep convolutional [...] Read more.
Mammography is the main method for early detection of breast cancer, which is still a major global health concern. However, inter-reader variability and the inherent difficulty of interpreting subtle radiographic features frequently limit the accuracy of diagnosis. A thorough assessment of deep convolutional neural networks (CNNs) for automated mammogram classification is presented in this work, along with the introduction of two innovative multi-view integration techniques: Dual-Branch Ensemble (DBE) and Merged Dual-View (MDV). By setting aside two datasets for out-of-sample testing, we evaluate the generalizability of the model using six different mammography datasets that represent various populations and imaging systems. We compare a number of cutting-edge architectures on both individual and combined datasets, including ResNet, DenseNet, EfficientNet, MobileNet, Vision Transformers, and VGG19. Both MDV and DBE strategies improve classification performance, according to experimental results. VGG19 and DenseNet both obtained high ROC AUC scores of 0.9051 and 0.7960 under the MDV approach. DenseNet demonstrated strong performance in the DBE setting, achieving a ROC AUC of 0.8033, while ResNet50 recorded a ROC AUC of 0.8042. These enhancements demonstrate how beneficial multi-view fusion is for boosting model robustness. The impact of domain shift is further highlighted by generalization tests, which emphasize the need for diverse datasets in training. These results offer practical advice for improving CNN architectures and integration tactics, which will aid in the creation of trustworthy, broadly applicable AI-assisted breast cancer screening tools. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Graphical abstract

30 pages, 10173 KB  
Article
Integrated Robust Optimization for Lightweight Transformer Models in Low-Resource Scenarios
by Hui Huang, Hengyu Zhang, Yusen Wang, Haibin Liu, Xiaojie Chen, Yiling Chen and Yuan Liang
Symmetry 2025, 17(7), 1162; https://doi.org/10.3390/sym17071162 - 21 Jul 2025
Viewed by 752
Abstract
With the rapid proliferation of artificial intelligence (AI) applications, an increasing number of edge devices—such as smartphones, cameras, and embedded controllers—are being tasked with performing AI-based inference. Due to constraints in storage capacity, computational power, and network connectivity, these devices are often categorized [...] Read more.
With the rapid proliferation of artificial intelligence (AI) applications, an increasing number of edge devices—such as smartphones, cameras, and embedded controllers—are being tasked with performing AI-based inference. Due to constraints in storage capacity, computational power, and network connectivity, these devices are often categorized as operating in resource-constrained environments. In such scenarios, deploying powerful Transformer-based models like ChatGPT and Vision Transformers is highly impractical because of their large parameter sizes and intensive computational requirements. While lightweight Transformer models, such as MobileViT, offer a promising solution to meet storage and computational limitations, their robustness remains insufficient. This poses a significant security risk for AI applications, particularly in critical edge environments. To address this challenge, our research focuses on enhancing the robustness of lightweight Transformer models under resource-constrained conditions. First, we propose a comprehensive robustness evaluation framework tailored for lightweight Transformer inference. This framework assesses model robustness across three key dimensions: noise robustness, distributional robustness, and adversarial robustness. It further investigates how model size and hardware limitations affect robustness, thereby providing valuable insights for robustness-aware model design. Second, we introduce a novel adversarial robustness enhancement strategy that integrates lightweight modeling techniques. This approach leverages methods such as gradient clipping and layer-wise unfreezing, as well as decision boundary optimization techniques like TRADES and SMART. Together, these strategies effectively address challenges related to training instability and decision boundary smoothness, significantly improving model robustness. Finally, we deploy the robust lightweight Transformer models in real-world resource-constrained environments and empirically validate their inference robustness. The results confirm the effectiveness of our proposed methods in enhancing the robustness and reliability of lightweight Transformers for edge AI applications. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

23 pages, 10698 KB  
Article
Unmanned Aerial Vehicle-Based RGB Imaging and Lightweight Deep Learning for Downy Mildew Detection in Kimchi Cabbage
by Yang Lyu, Xiongzhe Han, Pingan Wang, Jae-Yeong Shin and Min-Woong Ju
Remote Sens. 2025, 17(14), 2388; https://doi.org/10.3390/rs17142388 - 10 Jul 2025
Viewed by 626
Abstract
Downy mildew is a highly destructive fungal disease that significantly reduces both the yield and quality of kimchi cabbage. Conventional detection methods rely on manual scouting, which is labor-intensive and prone to subjectivity. This study proposes an automated detection approach using RGB imagery [...] Read more.
Downy mildew is a highly destructive fungal disease that significantly reduces both the yield and quality of kimchi cabbage. Conventional detection methods rely on manual scouting, which is labor-intensive and prone to subjectivity. This study proposes an automated detection approach using RGB imagery acquired by an unmanned aerial vehicle (UAV), integrated with lightweight deep learning models for leaf-level identification of downy mildew. To improve disease feature extraction, Simple Linear Iterative Clustering (SLIC) segmentation was applied to the images. Among the evaluated models, Vision Transformer (ViT)-based architectures outperformed Convolutional Neural Network (CNN)-based models in terms of classification accuracy and generalization capability. For late-stage disease detection, DeiT-Tiny recorded the highest test accuracy (0.948) and macro F1-score (0.913), while MobileViT-S achieved the highest diseased recall (0.931). In early-stage detection, TinyViT-5M achieved the highest test accuracy (0.970) and macro F1-score (0.918); however, all models demonstrated reduced diseased recall under early-stage conditions, with DeiT-Tiny achieving the highest recall at 0.774. These findings underscore the challenges of identifying early symptoms using RGB imagery. Based on the classification results, prescription maps were generated to facilitate variable-rate pesticide application. Overall, this study demonstrates the potential of UAV-based RGB imaging for precision agriculture, while highlighting the importance of integrating multispectral data and utilizing domain adaptation techniques to enhance early-stage disease detection. Full article
(This article belongs to the Special Issue Advances in Remote Sensing for Crop Monitoring and Food Security)
Show Figures

Figure 1

28 pages, 3267 KB  
Article
Alzheimer’s Disease Detection in Various Brain Anatomies Based on Optimized Vision Transformer
by Faisal Mehmood, Asif Mehmood and Taeg Keun Whangbo
Mathematics 2025, 13(12), 1927; https://doi.org/10.3390/math13121927 - 10 Jun 2025
Viewed by 692
Abstract
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and a growing public health concern. Despite significant advances in deep learning for medical image analysis, early and accurate diagnosis of AD remains challenging. In this study, we focused on optimizing the training process of [...] Read more.
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and a growing public health concern. Despite significant advances in deep learning for medical image analysis, early and accurate diagnosis of AD remains challenging. In this study, we focused on optimizing the training process of deep learning models by proposing an enhanced version of the Adam optimizer. The proposed optimizer introduces adaptive learning rate scaling, momentum correction, and decay modulation to improve convergence speed, training stability, and classification accuracy. We integrated the enhanced optimizer with Vision Transformer (ViT) and Convolutional Neural Network (CNN) architectures. The ViT-based model comprises a linear projection of image patches, positional encoding, a transformer encoder, and a Multi-Layer Perceptron (MLP) head with a Softmax classifier for multiclass AD classification. Experiments on publicly available Alzheimer’s disease datasets (ADNI-1 and ADNI-2) showed that the enhanced optimizer enabled the ViT model to achieve a 99.84% classification accuracy on Dataset-1 and 95.75% on Dataset-2, outperforming Adam, RMSProp, and SGD. Moreover, the optimizer reduced entropy loss and improved convergence stability by 0.8–2.1% across various architectures, including ResNet, RegNet, and MobileNet. This work contributes a robust optimizer-centric framework that enhances training efficiency and diagnostic accuracy for automated Alzheimer’s disease detection. Full article
(This article belongs to the Special Issue The Application of Deep Neural Networks in Image Processing)
Show Figures

Figure 1

Back to TopTop