Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (397)

Search Parameters:
Keywords = vision transformers (ViTs)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2054 KiB  
Article
Deep-Learning Approaches for Cervical Cytology Nuclei Segmentation in Whole Slide Images
by Andrés Mosquera-Zamudio, Sandra Cancino, Guillermo Cárdenas-Montoya, Juan D. Garcia-Arteaga, Carlos Zambrano-Betancourt and Rafael Parra-Medina
J. Imaging 2025, 11(5), 137; https://doi.org/10.3390/jimaging11050137 - 29 Apr 2025
Viewed by 254
Abstract
Whole-slide imaging (WSI) in cytopathology poses challenges related to segmentation accuracy, computational efficiency, and image acquisition artifacts. This study aims to evaluate the performance of deep-learning models for instance segmentation in cervical cytology, benchmarking them against state-of-the-art methods on both public and institutional [...] Read more.
Whole-slide imaging (WSI) in cytopathology poses challenges related to segmentation accuracy, computational efficiency, and image acquisition artifacts. This study aims to evaluate the performance of deep-learning models for instance segmentation in cervical cytology, benchmarking them against state-of-the-art methods on both public and institutional datasets. We tested three architectures—U-Net, vision transformer (ViT), and Detectron2—and evaluated their performance on the ISBI 2014 and CNseg datasets using panoptic quality (PQ), dice similarity coefficient (DSC), and intersection over union (IoU). All models were trained on CNseg and tested on an independent institutional dataset. Data preprocessing involved manual annotation using QuPath, patch extraction guided by GeoJSON files, and exclusion of regions containing less than 60% cytologic material. Our models achieved superior segmentation performance on public datasets, reaching up to 98% PQ. Performance decreased on the institutional dataset, likely due to differences in image acquisition and the presence of blurred nuclei. Nevertheless, the models were able to detect blurred nuclei, highlighting their robustness in suboptimal imaging conditions. In conclusion, the proposed models offer an accurate and efficient solution for instance segmentation in cytology WSI. These results support the development of reliable AI-powered tools for digital cytology, with potential applications in automated screening and diagnostic workflows. Full article
Show Figures

Figure 1

37 pages, 59030 KiB  
Review
Integration of Hyperspectral Imaging and AI Techniques for Crop Type Mapping: Present Status, Trends, and Challenges
by Mohamed Bourriz, Hicham Hajji, Ahmed Laamrani, Nadir Elbouanani, Hamd Ait Abdelali, François Bourzeix, Ali El-Battay, Abdelhakim Amazirh and Abdelghani Chehbouni
Remote Sens. 2025, 17(9), 1574; https://doi.org/10.3390/rs17091574 - 29 Apr 2025
Viewed by 432
Abstract
Accurate and efficient crop maps are essential for decision-makers to improve agricultural monitoring and management, thereby ensuring food security. The integration of advanced artificial intelligence (AI) models with hyperspectral remote sensing data, which provide richer spectral information than multispectral imaging, has proven highly [...] Read more.
Accurate and efficient crop maps are essential for decision-makers to improve agricultural monitoring and management, thereby ensuring food security. The integration of advanced artificial intelligence (AI) models with hyperspectral remote sensing data, which provide richer spectral information than multispectral imaging, has proven highly effective in the precise discrimination of crop types. This systematic review examines the evolution of hyperspectral platforms, from Unmanned Aerial Vehicle (UAV)-mounted sensors to space-borne satellites (e.g., EnMAP, PRISMA), and explores recent scientific advances in AI methodologies for crop mapping. A review protocol was applied to identify 47 studies from databases of peer-reviewed scientific publications, focusing on hyperspectral sensors, input features, and classification architectures. The analysis highlights the significant contributions of Deep Learning (DL) models, particularly Vision Transformers (ViTs) and hybrid architectures, in improving classification accuracy. However, the review also identifies critical gaps, including the under-utilization of hyperspectral space-borne imaging, the limited integration of multi-sensor data, and the need for advanced modeling approaches such as Graph Neural Networks (GNNs)-based methods and geospatial foundation models (GFMs) for large-scale crop type mapping. Furthermore, the findings highlight the importance of developing scalable, interpretable, and transparent models to maximize the potential of hyperspectral imaging (HSI), particularly in underrepresented regions such as Africa, where research remains limited. This review provides valuable insights to guide future researchers in adopting HSI and advanced AI models for reliable large-scale crop mapping, contributing to sustainable agriculture and global food security. Full article
Show Figures

Figure 1

38 pages, 2033 KiB  
Article
DCAT: A Novel Transformer-Based Approach for Dynamic Context-Aware Image Captioning in the Tamil Language
by Jothi Prakash Venugopal, Arul Antran Vijay Subramanian, Manikandan Murugan, Gopikrishnan Sundaram, Marco Rivera and Patrick Wheeler
Appl. Sci. 2025, 15(9), 4909; https://doi.org/10.3390/app15094909 - 28 Apr 2025
Viewed by 110
Abstract
The task of image captioning in low-resource languages like Tamil is fraught with challenges due to limited linguistic resources and complex semantic structures. This paper addresses the problem of generating contextually and linguistically coherent captions in Tamil. We introduce the Dynamic Context-Aware Transformer [...] Read more.
The task of image captioning in low-resource languages like Tamil is fraught with challenges due to limited linguistic resources and complex semantic structures. This paper addresses the problem of generating contextually and linguistically coherent captions in Tamil. We introduce the Dynamic Context-Aware Transformer (DCAT), a novel approach that synergizes the Vision Transformer (ViT) with the Generative Pre-trained Transformer (GPT-3), reinforced by a unique Context Embedding Layer. The DCAT model, tailored for Tamil, innovatively employs dynamic attention mechanisms during its Initialization, Training, and Inference phases to focus on pertinent visual and textual elements. Our method distinctively leverages the nuances of Tamil syntax and semantics, a novelty in the realm of low-resource language image captioning. Comparative evaluations against established models on datasets like Flickr8k, Flickr30k, and MSCOCO reveal DCAT’s superiority, with a notable 12% increase in BLEU score (0.7425) and a 15% enhancement in METEOR score (0.4391) over leading models. Despite its computational demands, DCAT sets a new benchmark for image captioning in Tamil, demonstrating potential applicability to other similar languages. Full article
Show Figures

Figure 1

13 pages, 354 KiB  
Article
Enhanced Cleft Lip and Palate Classification Using SigLIP 2: A Comparative Study with Vision Transformers and Siamese Networks
by Oraphan Nantha, Benjaporn Sathanarugsawait and Prasong Praneetpolgrang
Appl. Sci. 2025, 15(9), 4766; https://doi.org/10.3390/app15094766 - 25 Apr 2025
Viewed by 152
Abstract
This paper extends our previous work on cleft lip and/or palate (CL/P) classification, which employed vision transformers (ViTs) and Siamese neural networks. We now integrate SigLIP 2, a state-of-the-art multilingual vision–language model, for feature extraction, replacing the previously utilized BiomedCLIP. SigLIP 2 offers [...] Read more.
This paper extends our previous work on cleft lip and/or palate (CL/P) classification, which employed vision transformers (ViTs) and Siamese neural networks. We now integrate SigLIP 2, a state-of-the-art multilingual vision–language model, for feature extraction, replacing the previously utilized BiomedCLIP. SigLIP 2 offers enhanced semantic understanding, improved localization capabilities, and multilingual support, potentially leading to more robust feature representations for CL/P classification. We hypothesize that SigLIP 2’s superior feature extraction will improve the classification accuracy of CL/P types (bilateral, unilateral, and palate-only) from the UltraSuite CLEFT dataset, a collection of ultrasound video sequences capturing tongue movements during speech with synchronized audio recordings. A comparative analysis is conducted, evaluating the performance of our original ViT-Siamese network model (using BiomedCLIP) against a new model leveraging SigLIP 2 for feature extraction. Performance is assessed using accuracy, precision, recall, and F1 score, demonstrating the impact of SigLIP 2 on CL/P classification. The new model achieves statistically significant improvements in overall accuracy (86.6% vs. 82.76%) and F1 scores for all cleft types. We discuss the computational efficiency and practical implications of employing SigLIP 2 in a clinical setting, highlighting its potential for earlier and more accurate diagnosis, personalized treatment planning, and broader applicability across diverse populations. The results demonstrate the significant potential of advanced vision–language models, such as SigLIP 2, to enhance AI-powered medical diagnostics. Full article
Show Figures

Figure 1

33 pages, 2777 KiB  
Review
Developments in Deep Learning Artificial Neural Network Techniques for Medical Image Analysis and Interpretation
by Olamilekan Shobayo and Reza Saatchi
Diagnostics 2025, 15(9), 1072; https://doi.org/10.3390/diagnostics15091072 - 23 Apr 2025
Viewed by 303
Abstract
Deep learning has revolutionised medical image analysis, offering the possibility of automated, efficient, and highly accurate diagnostic solutions. This article explores recent developments in deep learning techniques applied to medical imaging, including convolutional neural networks (CNNs) for classification and segmentation, recurrent neural networks [...] Read more.
Deep learning has revolutionised medical image analysis, offering the possibility of automated, efficient, and highly accurate diagnostic solutions. This article explores recent developments in deep learning techniques applied to medical imaging, including convolutional neural networks (CNNs) for classification and segmentation, recurrent neural networks (RNNs) for temporal analysis, autoencoders for feature extraction, and generative adversarial networks (GANs) for image synthesis and augmentation. Additionally, U-Net models for segmentation, vision transformers (ViTs) for global feature extraction, and hybrid models integrating multiple architectures are explored. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) process were used, and searches on PubMed, Google Scholar, and Scopus databases were conducted. The findings highlight key challenges such as data availability, interpretability, overfitting, and computational requirements. While deep learning has demonstrated significant potential in enhancing diagnostic accuracy across multiple medical imaging modalities—including MRI, CT, US, and X-ray—factors such as model trust, data privacy, and ethical considerations remain ongoing concerns. The study underscores the importance of integrating multimodal data, improving computational efficiency, and advancing explainability to facilitate broader clinical adoption. Future research directions emphasize optimising deep learning models for real-time applications, enhancing interpretability, and integrating deep learning with existing healthcare frameworks for improved patient outcomes. Full article
(This article belongs to the Special Issue Artificial Intelligence in Biomedical Imaging and Signal Processing)
Show Figures

Figure 1

24 pages, 4056 KiB  
Article
Unveiling the Ultimate Meme Recipe: Image Embeddings for Identifying Top Meme Templates from r/Memes
by Jan Sawicki
J. Imaging 2025, 11(5), 132; https://doi.org/10.3390/jimaging11050132 - 23 Apr 2025
Viewed by 344
Abstract
Meme analysis, particularly identifying top meme templates, is crucial for understanding digital culture, communication trends, and the spread of online humor, as memes serve as units of cultural transmission that shape public discourse. Tracking popular templates enables researchers to examine their role in [...] Read more.
Meme analysis, particularly identifying top meme templates, is crucial for understanding digital culture, communication trends, and the spread of online humor, as memes serve as units of cultural transmission that shape public discourse. Tracking popular templates enables researchers to examine their role in social engagement, ideological framing, and viral dynamics within digital ecosystems. This study explored the viral nature of memes by analyzing a large dataset of over 1.5 million meme submissions from Reddit’s r/memes subreddit, spanning from January 2021 to July 2024. The focus was on uncovering the most popular meme templates by applying advanced image processing techniques. Apart from building an overall understanding of the memesphere, the main contribution was a selection of top meme templates providing a recipe for the best meme template for the meme creators (memesters). Using Vision Transformer (ViT) models, visual features of memes were analyzed without the influence of text, and memes were grouped into 1000 clusters that represented distinct templates. By combining image captioning and keyword extraction methods, key characteristics of the templates were identified, highlighting those with the most visual consistency. A deeper examination of the most popular memes revealed that factors like timing, cultural relevance, and references to current events played a significant role in their virality. Although user identity had limited influence on meme success, a closer look at contributors revealed an interesting pattern of a bot account and two prominent users. Ultimately, the study pinpointed the ten most popular meme templates, many of which were based on pop culture, offering insights into what makes a meme likely to go viral in today’s digital culture. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

15 pages, 960 KiB  
Technical Note
ViT–KAN Synergistic Fusion: A Novel Framework for Parameter- Efficient Multi-Band PolSAR Land Cover Classification
by Songli Han, Dawei Ren, Fan Gao, Jian Yang and Hui Ma
Remote Sens. 2025, 17(8), 1470; https://doi.org/10.3390/rs17081470 - 20 Apr 2025
Viewed by 144
Abstract
Deep learning has shown significant potential in multi-band Polarimetric Synthetic Aperture Radar (PolSAR) land cover classification. However, the existing methods face two main challenges: accurately modeling the complex nonlinear relationships between multiple bands and balancing classifier parameter efficiency with classification accuracy. To address [...] Read more.
Deep learning has shown significant potential in multi-band Polarimetric Synthetic Aperture Radar (PolSAR) land cover classification. However, the existing methods face two main challenges: accurately modeling the complex nonlinear relationships between multiple bands and balancing classifier parameter efficiency with classification accuracy. To address these challenges, this paper proposes a novel decision-level multi-band fusion framework that leverages the synergistic optimization of the Vision Transformer (ViT) and Kolmogorov–Arnold Network (KAN). This innovative architecture effectively captures global spatial–spectral correlations through ViT’s cross-band self-attention mechanism and achieves parameter-efficient decision-level probability space mapping using KAN’s spline basis functions. The proposed method significantly enhances the model’s generalization capability across different band combinations. The experimental results on the quad-band (P/L/C/X) Hainan PolSAR dataset, acquired by the Aerial Remote Sensing System of the Chinese Academy of Sciences, show that the proposed framework achieves an overall accuracy of 96.24%, outperforming conventional methods in both accuracy and parameter efficiency. These results demonstrate the practical potential of the proposed method for high-performance and efficient multi-band PolSAR land cover classification. Full article
(This article belongs to the Special Issue Big Data Era: AI Technology for SAR and PolSAR Image)
Show Figures

Figure 1

17 pages, 1797 KiB  
Article
Advanced Diagnosis of Cardiac and Respiratory Diseases from Chest X-Ray Imagery Using Deep Learning Ensembles
by Hemal Nakrani, Essa Q. Shahra, Shadi Basurra, Rasheed Mohammad, Edlira Vakaj and Waheb A. Jabbar
J. Sens. Actuator Netw. 2025, 14(2), 44; https://doi.org/10.3390/jsan14020044 - 18 Apr 2025
Viewed by 281
Abstract
Chest X-ray interpretation is essential for diagnosing cardiac and respiratory diseases. This study introduces a deep learning ensemble approach that integrates Convolutional Neural Networks (CNNs), including ResNet-152, VGG19, EfficientNet, and a Vision Transformer (ViT), to enhance diagnostic accuracy. Using the NIH Chest X-ray [...] Read more.
Chest X-ray interpretation is essential for diagnosing cardiac and respiratory diseases. This study introduces a deep learning ensemble approach that integrates Convolutional Neural Networks (CNNs), including ResNet-152, VGG19, EfficientNet, and a Vision Transformer (ViT), to enhance diagnostic accuracy. Using the NIH Chest X-ray dataset, the methodology involved comprehensive preprocessing, data augmentation, and model optimization techniques to address challenges such as label imbalance and feature variability. Among the individual models, VGG19 exhibited strong performance with a Hamming Loss of 0.1335 and high accuracy in detecting Edema, while ViT excelled in classifying certain conditions like Hernia. Despite the strengths of individual models, the ensemble meta-model achieved the best overall performance, with a Hamming Loss of 0.1408 and consistently higher ROC-AUC values across multiple diseases, demonstrating its superior capability to handle complex classification tasks. This robust ensemble learning framework underscores its potential for reliable and precise disease detection, offering significant improvements over traditional methods. The findings highlight the value of integrating diverse model architectures to address the complexities of multi-label chest X-ray classification, providing a pathway for more accurate, scalable, and accessible diagnostic tools in clinical practice. Full article
Show Figures

Figure 1

18 pages, 3263 KiB  
Article
Boosting Skin Cancer Classification: A Multi-Scale Attention and Ensemble Approach with Vision Transformers
by Guang Yang, Suhuai Luo and Peter Greer
Sensors 2025, 25(8), 2479; https://doi.org/10.3390/s25082479 - 15 Apr 2025
Viewed by 364
Abstract
Skin cancer is a significant global health concern, with melanoma being the most dangerous form, responsible for the majority of skin cancer-related deaths. Early detection of skin cancer is critical, as it can drastically improve survival rates. While deep learning models have achieved [...] Read more.
Skin cancer is a significant global health concern, with melanoma being the most dangerous form, responsible for the majority of skin cancer-related deaths. Early detection of skin cancer is critical, as it can drastically improve survival rates. While deep learning models have achieved impressive results in skin cancer classification, there remain challenges in accurately distinguishing between benign and malignant lesions. In this study, we introduce a novel multi-scale attention-based performance booster inspired by the Vision Transformer (ViT) architecture, which enhances the accuracy of both ViT and convolutional neural network (CNN) models. By leveraging attention maps to identify discriminative regions within skin lesion images, our method improves the models’ focus on diagnostically relevant areas. Additionally, we employ ensemble learning techniques to combine the outputs of several deep learning models using majority voting. Our skin cancer classifier, consisting of ViT and EfficientNet models, achieved a classification accuracy of 95.05% on the ISIC2018 dataset, outperforming individual models. The results demonstrate the effectiveness of integrating attention-based multi-scale learning and ensemble methods in skin cancer classification. Full article
Show Figures

Figure 1

27 pages, 41478 KiB  
Article
LO-MLPRNN: A Classification Algorithm for Multispectral Remote Sensing Images by Fusing Selective Convolution
by Xiangsuo Fan, Yan Zhang, Yong Peng, Qi Li, Xianqiang Wei, Jiabin Wang and Fadong Zou
Sensors 2025, 25(8), 2472; https://doi.org/10.3390/s25082472 - 14 Apr 2025
Viewed by 202
Abstract
To address the limitation of traditional deep learning algorithms in fully utilizing contextual information in multispectral remote sensing (RS) images, this paper proposes an improved vegetation cover classification algorithm called LO-MLPRNN, which integrates Large Selective Kernel Network (LSK) and Omni-Dimensional Dynamic Convolution (ODC) [...] Read more.
To address the limitation of traditional deep learning algorithms in fully utilizing contextual information in multispectral remote sensing (RS) images, this paper proposes an improved vegetation cover classification algorithm called LO-MLPRNN, which integrates Large Selective Kernel Network (LSK) and Omni-Dimensional Dynamic Convolution (ODC) with a Multi-Layer Perceptron Recurrent Neural Network (MLPRNN). The algorithm employs parallel-connected ODC and LSK modules to adaptively adjust convolution kernel parameters across multiple dimensions and dynamically optimize spatial receptive fields, enabling multi-perspective feature fusion for efficient processing of multispectral band information. The extracted features are mapped to a high-dimensional space through a Gate Recurrent Unit (GRU) and fully connected layers, with nonlinear characteristics enhanced by activation functions, ultimately achieving pixel-level land cover classification. Experiments conducted on GF-2 (0.75 m) and Sentinel-2 (10 m) multispectral RS images from Liucheng County, Liuzhou City, Guangxi Province, demonstrate that LO-MLPRNN achieves overall accuracies of 99.11% and 99.43%, outperforming Vision Transformer (ViT) by 2.61% and 3.98%, respectively. Notably, the classification accuracy for sugarcane reaches 99.70% and 99.67%, showcasing its superior performance. Full article
(This article belongs to the Special Issue Smart Image Recognition and Detection Sensors)
Show Figures

Figure 1

10 pages, 864 KiB  
Review
Role of Artificial Intelligence in Thyroid Cancer Diagnosis
by Alessio Cece, Massimo Agresti, Nadia De Falco, Pasquale Sperlongano, Giancarlo Moccia, Pasquale Luongo, Francesco Miele, Alfredo Allaria, Francesco Torelli, Paola Bassi, Antonella Sciarra, Stefano Avenia, Paola Della Monica, Federica Colapietra, Marina Di Domenico, Ludovico Docimo and Domenico Parmeggiani
J. Clin. Med. 2025, 14(7), 2422; https://doi.org/10.3390/jcm14072422 - 2 Apr 2025
Viewed by 580
Abstract
The progress of artificial intelligence (AI), particularly its core algorithms—machine learning (ML) and deep learning (DL)—has been significant in the medical field, impacting both scientific research and clinical practice. These algorithms are now capable of analyzing ultrasound images, processing them, and providing outcomes, [...] Read more.
The progress of artificial intelligence (AI), particularly its core algorithms—machine learning (ML) and deep learning (DL)—has been significant in the medical field, impacting both scientific research and clinical practice. These algorithms are now capable of analyzing ultrasound images, processing them, and providing outcomes, such as determining the benignity or malignancy of thyroid nodules. This integration into ultrasound machines is referred to as computer-aided diagnosis (CAD). The use of such software extends beyond ultrasound to include cytopathological and molecular assessments, enhancing the estimation of malignancy risk. AI’s considerable potential in cancer diagnosis and prevention is evident. This article provides an overview of AI models based on ML and DL algorithms used in thyroid diagnostics. Recent studies demonstrate their effectiveness and diagnostic role in ultrasound, pathology, and molecular fields. Notable advancements include content-based image retrieval (CBIR), enhanced saliency CBIR (SE-CBIR), Restore-Generative Adversarial Networks (GANs), and Vision Transformers (ViTs). These new algorithms show remarkable results, indicating their potential as diagnostic and prognostic tools for thyroid pathology. The future trend points to these AI systems becoming the preferred choice for thyroid diagnostics. Full article
(This article belongs to the Section Oncology)
Show Figures

Figure 1

18 pages, 4714 KiB  
Article
Integrating Hyperspectral Images and LiDAR Data Using Vision Transformers for Enhanced Vegetation Classification
by Xingquan Shu, Limin Ma and Fengqin Chang
Forests 2025, 16(4), 620; https://doi.org/10.3390/f16040620 - 2 Apr 2025
Viewed by 398
Abstract
This study proposes PlantViT, a Vision Transformer (ViT)-based framework for high-precision vegetation classification by integrating hyperspectral imaging (HSI) and Light Detection and Ranging (LiDAR) data. The dual-branch architecture optimizes feature fusion across spectral and spatial dimensions, where the LiDAR branch extracts elevation and [...] Read more.
This study proposes PlantViT, a Vision Transformer (ViT)-based framework for high-precision vegetation classification by integrating hyperspectral imaging (HSI) and Light Detection and Ranging (LiDAR) data. The dual-branch architecture optimizes feature fusion across spectral and spatial dimensions, where the LiDAR branch extracts elevation and structural features while minimizing information loss and the HSI branch applies involution-based feature extraction to enhance spectral discrimination. By leveraging involution-based feature extraction and a Lightweight ViT (LightViT), the proposed method demonstrates superior classification performance. Experimental results on the Houston 2013 and Trento datasets show that PlantViT achieves an overall accuracy of 99.0% and 97.4%, respectively, with strong agreement indicated by Kappa coefficients of 98.7% and 97.2%. These results highlight PlantViT’s robust capability in classifying heterogeneous vegetation, outperforming conventional CNN-based and other ViT-based models. This study advances Unmanned Aerial Vehicle (UAV)-based remote sensing (RS) for environmental monitoring by providing a scalable and efficient solution for wetland and forest ecosystem assessment. Full article
(This article belongs to the Special Issue Remote Sensing Approach for Early Detection of Forest Disturbance)
Show Figures

Figure 1

22 pages, 2288 KiB  
Article
Central Pixel-Based Dual-Branch Network for Hyperspectral Image Classification
by Dandan Ma, Shijie Xu, Zhiyu Jiang and Yuan Yuan
Remote Sens. 2025, 17(7), 1255; https://doi.org/10.3390/rs17071255 - 2 Apr 2025
Viewed by 422
Abstract
Hyperspectral image classification faces significant challenges in effectively extracting and integrating spectral-spatial features from high-dimensional data. Recent deep learning (DL) methods combining Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have demonstrated exceptional performance. However, two critical challenges may cause degradation in the [...] Read more.
Hyperspectral image classification faces significant challenges in effectively extracting and integrating spectral-spatial features from high-dimensional data. Recent deep learning (DL) methods combining Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have demonstrated exceptional performance. However, two critical challenges may cause degradation in the classification accuracy of these methods: interference from irrelevant information within the observed region, and the potential loss of useful information due to local spectral variability within the same class. To address these issues, we propose a central pixel-based dual-branch network (CPDB-Net) that synergistically integrates CNN and ViT for robust feature extraction. Specifically, the central spectral feature extraction branch based on CNN serves as a strong prior to reinforce the importance of central pixel features in classification. Additionally, the spatial branch based on ViT incorporates a novel frequency-aware HiLo attention, which can effectively separate high and low frequencies, alleviating the problem of local spectral variability and enhancing the ability to extract global features. Extensive experiments on widely used HSI datasets demonstrate the superiority of our method. Our CPDB-Net achieves the highest overall accuracies of 92.67%, 97.48%, and 95.02% on the Indian Pines, Pavia University, and Houston 2013 datasets, respectively, outperforming recent representative methods and confirming its effectiveness. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

14 pages, 1479 KiB  
Article
Rosette Trajectory MRI Reconstruction with Vision Transformers
by Muhammed Fikret Yalcinbas, Cengizhan Ozturk, Onur Ozyurt, Uzay E. Emir and Ulas Bagci
Tomography 2025, 11(4), 41; https://doi.org/10.3390/tomography11040041 - 1 Apr 2025
Viewed by 378
Abstract
Introduction: An efficient pipeline for rosette trajectory magnetic resonance imaging reconstruction is proposed, combining the inverse Fourier transform with a vision transformer (ViT) network enhanced with a convolutional layer. This method addresses the challenges of reconstructing high-quality images from non-Cartesian data by leveraging [...] Read more.
Introduction: An efficient pipeline for rosette trajectory magnetic resonance imaging reconstruction is proposed, combining the inverse Fourier transform with a vision transformer (ViT) network enhanced with a convolutional layer. This method addresses the challenges of reconstructing high-quality images from non-Cartesian data by leveraging the ViT’s ability to handle complex spatial dependencies without extensive preprocessing. Materials and Methods: The inverse fast Fourier transform provides a robust initial approximation, which is refined by the ViT network to produce high-fidelity images. Results and Discussion: This approach outperforms established deep learning techniques for normalized root mean squared error, peak signal-to-noise ratio, and entropy-based image quality scores; offers better runtime performance; and remains competitive with respect to other metrics. Full article
(This article belongs to the Topic AI in Medical Imaging and Image Processing)
Show Figures

Figure 1

19 pages, 3770 KiB  
Article
A New Pes Planus Automatic Diagnosis Method: ViT-OELM Hybrid Modeling
by Derya Avcı
Diagnostics 2025, 15(7), 867; https://doi.org/10.3390/diagnostics15070867 - 28 Mar 2025
Viewed by 227
Abstract
Background/Objectives: Pes planus (flat feet) is a condition characterized by flatter than normal soles of the foot. In this study, a Vision Transformer (ViT)-based deep learning architecture is proposed to automate the diagnosis of pes planus. The model analyzes foot images and classifies [...] Read more.
Background/Objectives: Pes planus (flat feet) is a condition characterized by flatter than normal soles of the foot. In this study, a Vision Transformer (ViT)-based deep learning architecture is proposed to automate the diagnosis of pes planus. The model analyzes foot images and classifies them into two classes, as “pes planus” and “not pes planus”. In the literature, models based on Convolutional neural networks (CNNs) can automatically perform such classification, regression, and prediction processes, but these models cannot capture long-term addictions and general conditions. Methods: In this study, the pes planus dataset, which is openly available on the Kaggle database, was used. This paper suggests a ViT-OELM hybrid model for automatic diagnosis from the obtained pes planus images. The suggested ViT-OELM hybrid model includes an attention mechanism for feature extraction from the pes planus images. A total of 1000 features obtained for each sample image from this attention mechanism are used as inputs for an Optimum Extreme Learning Machine (OELM) classifier using various activation functions, and are classified. Results: In this study, the performance of this suggested ViT-OELM hybrid model is compared with some other studies, which used the same pes planus database. These comparison results are given. The suggested ViT-OELM hybrid model was trained for binary classification. The performance metrics were computed in testing phase. The model showed 98.04% accuracy, 98.04% recall, 98.05% precision, and an F-1 score of 98.03%. Conclusions: Our suggested ViT-OELM hybrid model demonstrates superior performance compared to those of other studies, which used the same dataset, in the literature. Full article
Show Figures

Figure 1

Back to TopTop