MDPI - Publisher of Open Access Journals

19 pages, 2598 KB

Open AccessArticle

DOCB: A Dynamic Online Cross-Batch Hard Exemplar Recall for Cross-View Geo-Localization

by Wenchao Fan, Xuetao Tian, Long Huang, Xiuwei Zhang and Fang Wang

ISPRS Int. J. Geo-Inf. 2025, 14(11), 418; https://doi.org/10.3390/ijgi14110418 - 26 Oct 2025

Viewed by 165

Image-based geo-localization is a challenging task that aims to determine the geographic location of a ground-level query image captured by an Unmanned Ground Vehicle (UGV) by matching it to geo-tagged nadir-view (top-down) images from an Unmanned Aerial Vehicle (UAV) stored in a reference [...] Read more.

Image-based geo-localization is a challenging task that aims to determine the geographic location of a ground-level query image captured by an Unmanned Ground Vehicle (UGV) by matching it to geo-tagged nadir-view (top-down) images from an Unmanned Aerial Vehicle (UAV) stored in a reference database. The challenge comes from the perspective inconsistency between matched objects. In this work, we propose a novel metric learning scheme for hard exemplar mining to improve the performance of cross-view geo-localization. Specifically, we introduce a Dynamic Online Cross-Batch (DOCB) hard exemplar mining scheme that solves the problem of the lack of hard exemplars in mini-batches in the middle and late stages of training, which leads to training stagnation. It mines cross-batch hard negative exemplars according to the current network state and reloads them into the network to make the gradient of negative exemplars participating in back-propagation. Since the feature representation of cross-batch negative examples adapts to the current network state, the triplet loss calculation becomes more accurate. Compared with methods only considering the gradient of anchors and positives, adding the gradient of negative exemplars helps us to obtain the correct gradient direction. Therefore, our DOCB scheme can better guide the network to learn valuable metric information. Moreover, we design a simple Siamese-like network called multi-scale feature aggregation (MSFA), which can generate multi-scale feature aggregation by learning and fusing multiple local spatial embeddings. The experimental results demonstrate that our DOCB scheme and MSFA network achieve an accuracy of 95.78% on the CVUSA dataset and 86.34% on the CVACT_val dataset, which outperforms those of other existing methods in the field. Full article

(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

► Show Figures

Figure 1

17 pages, 4493 KB

Open AccessArticle

VimGeo: An Efficient Visual Model for Cross-View Geo-Localization

by Kaiqian Yang, Yujin Zhang, Li Wang, A. A. M. Muzahid, Ferdous Sohel, Fei Wu and Qiong Wu

Electronics 2025, 14(19), 3906; https://doi.org/10.3390/electronics14193906 - 30 Sep 2025

Viewed by 332

Abstract

Cross-view geo-localization is a challenging task due to the significant changes in the appearance of target scenes from variable perspectives. Most existing methods primarily adopt Transformers or ConvNeXt as backbone models but often face high computational costs and accuracy degradation in complex scenarios. [...] Read more.

Cross-view geo-localization is a challenging task due to the significant changes in the appearance of target scenes from variable perspectives. Most existing methods primarily adopt Transformers or ConvNeXt as backbone models but often face high computational costs and accuracy degradation in complex scenarios. Therefore, this paper proposes a visual Mamba framework based on the state-space model (SSM) for cross-view geo-localization. Compared with the existing methods, Vision Mamba is more efficient in modeling and memory usage and achieves more efficient cross-view matching by combining the twin architecture of shared weights with multiple mixed losses. Additionally, this paper introduces Dice Loss to handle scale differences and imbalance issues in cross-view images. Extensive experiments on the public cross-view dataset University-1652 demonstrate that Vision Mamba not only achieves excellent performance in UAV target localization tasks but also attains the highest efficiency with lower memory consumption. This work provides a novel solution for cross-view geo-localization tasks and shows great potential to become the backbone model for the next generation of cross-view geo-localization. Full article

► Show Figures

Figure 1

25 pages, 29114 KB

Open AccessArticle

Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework

by Xiang Zhou, Xiangkai Zhang, Xu Yang, Jiannan Zhao, Zhiyong Liu and Feng Shuang

Remote Sens. 2025, 17(17), 3048; https://doi.org/10.3390/rs17173048 - 2 Sep 2025

Viewed by 1598

Abstract

Precise and robust localization for micro Unmanned Aerial Vehicles (UAVs) in GNSS-denied environments is hindered by the lack of diverse datasets and the limited real-world performance of existing visual matching methods. To address these gaps, we introduce two contributions: (1) the SatLoc dataset, [...] Read more.

Precise and robust localization for micro Unmanned Aerial Vehicles (UAVs) in GNSS-denied environments is hindered by the lack of diverse datasets and the limited real-world performance of existing visual matching methods. To address these gaps, we introduce two contributions: (1) the SatLoc dataset, a new benchmark featuring synchronized, multi-source data from varied real-world scenarios tailored for UAV-to-satellite image matching, and (2) SatLoc-Fusion, a hierarchical localization framework. Our proposed pipeline integrates three complementary layers: absolute geo-localization via satellite imagery using DinoV2, high-frequency relative motion tracking from visual odometry with XFeat, and velocity estimation using optical flow. An adaptive fusion strategy dynamically weights the output of each layer based on real-time confidence metrics, ensuring an accurate and self-consistent state estimate. Deployed on a 6 TFLOPS edge computer, our system achieves real-time operation at over 2 Hz, with an absolute localization error below 15 m and effective trajectory coverage exceeding

90 %

, demonstrating state-of-the-art performance. The SatLoc dataset and fusion pipeline provide a robust and comprehensive baseline for advancing UAV navigation in challenging environments. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

25 pages, 24334 KB

Open AccessArticle

Unsupervised Knowledge Extraction of Distinctive Landmarks from Earth Imagery Using Deep Feature Outliers for Robust UAV Geo-Localization

by Zakhar Ostrovskyi, Oleksander Barmak, Pavlo Radiuk and Iurii Krak

Mach. Learn. Knowl. Extr. 2025, 7(3), 81; https://doi.org/10.3390/make7030081 - 13 Aug 2025

Viewed by 751

Abstract

Vision-based navigation is a common solution for the critical challenge of GPS-denied Unmanned Aerial Vehicle (UAV) operation, but a research gap remains in the autonomous discovery of robust landmarks from aerial survey imagery needed for such systems. In this work, we propose a [...] Read more.

Vision-based navigation is a common solution for the critical challenge of GPS-denied Unmanned Aerial Vehicle (UAV) operation, but a research gap remains in the autonomous discovery of robust landmarks from aerial survey imagery needed for such systems. In this work, we propose a framework to fill this gap by identifying visually distinctive urban buildings from aerial survey imagery and curating them into a landmark database for GPS-free UAV localization. The proposed framework constructs semantically rich embeddings using intermediate layers from a pre-trained YOLOv11n-seg segmentation network. This novel technique requires no additional training. An unsupervised landmark selection strategy, based on the Isolation Forest algorithm, then identifies objects with statistically unique embeddings. Experimental validation on the VPAIR aerial-to-aerial benchmark shows that the proposed max-pooled embeddings, assembled from selected layers, significantly improve retrieval performance. The top-1 retrieval accuracy for landmarks more than doubled compared to typical buildings (0.53 vs. 0.31), and a Recall@5 of 0.70 is achieved for landmarks. Overall, this study demonstrates that unsupervised outlier selection in a carefully constructed embedding space yields a highly discriminative, computation-friendly set of landmarks suitable for real-time, robust UAV navigation. Full article

(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition, 2nd Edition)

► Show Figures

Graphical abstract

23 pages, 3858 KB

Open AccessArticle

MCFA: Multi-Scale Cascade and Feature Adaptive Alignment Network for Cross-View Geo-Localization

by Kaiji Hou, Qiang Tong, Na Yan, Xiulei Liu and Shoulu Hou

Sensors 2025, 25(14), 4519; https://doi.org/10.3390/s25144519 - 21 Jul 2025

Viewed by 850

Abstract

Cross-view geo-localization (CVGL) presents significant challenges due to the drastic variations in perspective and scene layout between unmanned aerial vehicle (UAV) and satellite images. Existing methods have made certain advancements in extracting local features from images. However, they exhibit limitations in modeling the [...] Read more.

Cross-view geo-localization (CVGL) presents significant challenges due to the drastic variations in perspective and scene layout between unmanned aerial vehicle (UAV) and satellite images. Existing methods have made certain advancements in extracting local features from images. However, they exhibit limitations in modeling the interactions among local features and fall short in aligning cross-view representations accurately. To address these issues, we propose a Multi-Scale Cascade and Feature Adaptive Alignment (MCFA) network, which consists of a Multi-Scale Cascade Module (MSCM) and a Feature Adaptive Alignment Module (FAAM). The MSCM captures the features of the target’s adjacent regions and enhances the model’s robustness by learning key region information through association and fusion. The FAAM, with its dynamically weighted feature alignment module, adaptively adjusts feature differences across different viewpoints, achieving feature alignment between drone and satellite images. Our method achieves state-of-the-art (SOTA) performance on two public datasets, University-1652 and SUES-200. In generalization experiments, our model outperforms existing SOTA methods, with an average improvement of 1.52% in R@1 and 2.09% in AP, demonstrating its effectiveness and strong generalization in cross-view geo-localization tasks. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

23 pages, 29759 KB

Open AccessArticle

UAV-Satellite Cross-View Image Matching Based on Adaptive Threshold-Guided Ring Partitioning Framework

by Yushi Liao, Juan Su, Decao Ma and Chao Niu

Remote Sens. 2025, 17(14), 2448; https://doi.org/10.3390/rs17142448 - 15 Jul 2025

Viewed by 2252

Abstract

Cross-view image matching between UAV and satellite platforms is critical for geographic localization but remains challenging due to domain gaps caused by disparities in imaging sensors, viewpoints, and illumination conditions. To address these challenges, this paper proposes an Adaptive Threshold-guided Ring Partitioning Framework [...] Read more.

Cross-view image matching between UAV and satellite platforms is critical for geographic localization but remains challenging due to domain gaps caused by disparities in imaging sensors, viewpoints, and illumination conditions. To address these challenges, this paper proposes an Adaptive Threshold-guided Ring Partitioning Framework (ATRPF) for UAV–satellite cross-view image matching. Unlike conventional ring-based methods with fixed partitioning rules, ATRPF innovatively incorporates heatmap-guided adaptive thresholds and learnable hyperparameters to dynamically adjust ring-wise feature extraction regions, significantly enhancing cross-domain representation learning through context-aware adaptability. The framework synergizes three core components: brightness-aligned preprocessing to reduce illumination-induced domain shifts, hybrid loss functions to improve feature discriminability across domains, and keypoint-aware re-ranking to refine retrieval results by compensating for neural networks’ localization uncertainty. Comprehensive evaluations on the University-1652 benchmark demonstrate the framework’s superiority; it achieves 82.50% Recall@1 and 84.28% AP for UAV→Satellite geo-localization, along with 90.87% Recall@1 and 80.25% AP for Satellite→UAV navigation. These results validate the framework’s capability to bridge UAV–satellite domain gaps while maintaining robust matching precision under heterogeneous imaging conditions, providing a viable solution for practical applications such as UAV navigation in GNSS-denied environments. Full article

(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)

► Show Figures

Figure 1

22 pages, 23971 KB

Open AccessArticle

Remote Target High-Precision Global Geolocalization of UAV Based on Multimodal Visual Servo

by Xuyang Zhou, Ruofei He, Wei Jia, Hongjuan Liu, Yuanchao Ma and Wei Sun

Remote Sens. 2025, 17(14), 2426; https://doi.org/10.3390/rs17142426 - 12 Jul 2025

Viewed by 603

Abstract

In this work, we propose a geolocation framework for distant ground targets integrating laser rangefinder sensors with multimodal visual servo control. By simulating binocular visual servo measurements through monocular visual servo tracking at fixed time intervals, our approach requires only single-session sensor attitude [...] Read more.

In this work, we propose a geolocation framework for distant ground targets integrating laser rangefinder sensors with multimodal visual servo control. By simulating binocular visual servo measurements through monocular visual servo tracking at fixed time intervals, our approach requires only single-session sensor attitude correction calibration to accurately geolocalize multiple targets during a single flight, which significantly enhances operational efficiency in multi-target geolocation scenarios. We design a step-convergent target geolocation optimization algorithm. By adjusting the step size and the scale factor of the cost function, we achieve fast accuracy convergence for different UAV reconnaissance modes, while maintaining the geolocation accuracy without divergence even when the laser ranging sensor is turned off for a short period. The experimental results show that through the UAV’s continuous reconnaissance measurements, the geolocalization error of remote ground targets based on our algorithm is less than 7 m for 3000 m, and less than 3.5 m for 1500 m. We have realized the fast and high-precision geolocalization of remote targets on the ground under the high-altitude reconnaissance of UAVs. Full article

(This article belongs to the Special Issue Applications of UAV Photogrammetric Survey in Smart City and Smart Region)

► Show Figures

Figure 1

31 pages, 31711 KB

Open AccessArticle

On the Usage of Deep Learning Techniques for Unmanned Aerial Vehicle-Based Citrus Crop Health Assessment

by Ana I. Gálvez-Gutiérrez, Frederico Afonso and Juana M. Martínez-Heredia

Remote Sens. 2025, 17(13), 2253; https://doi.org/10.3390/rs17132253 - 30 Jun 2025

Cited by 2 | Viewed by 963

Abstract

This work proposes an end-to-end solution for leaf segmentation, disease detection, and damage quantification, specifically focusing on citrus crops. The primary motivation behind this research is to enable the early detection of phytosanitary problems, which directly impact the productivity and profitability of Spanish [...] Read more.

This work proposes an end-to-end solution for leaf segmentation, disease detection, and damage quantification, specifically focusing on citrus crops. The primary motivation behind this research is to enable the early detection of phytosanitary problems, which directly impact the productivity and profitability of Spanish and Portuguese agricultural developments, while ensuring environmentally safe management practices. It integrates an onboard computing module for Unmanned Aerial Vehicles (UAVs) using a Raspberry Pi 4 with Global Positioning System (GPS) and camera modules, allowing the real-time geolocation of images in citrus croplands. To address the lack of public data, a comprehensive database was created and manually labelled at the pixel level to provide accurate training data for a deep learning approach. To reduce annotation effort, we developed a custom automation algorithm for pixel-wise labelling in complex natural backgrounds. A SegNet architecture with a Visual Geometry Group 16 (VGG16) backbone was trained for the semantic, pixel-wise segmentation of citrus foliage. The model was successfully integrated as a modular component within a broader system architecture and was tested with UAV-acquired images, demonstrating accurate disease detection and quantification, even under varied conditions. The developed system provides a robust tool for the efficient monitoring of citrus crops in precision agriculture. Full article

(This article belongs to the Special Issue Application of Satellite and UAV Data in Precision Agriculture)

► Show Figures

Figure 1

19 pages, 6772 KB

Open AccessArticle

A Cross-Mamba Interaction Network for UAV-to-Satallite Geolocalization

by Lingyun Tian, Qiang Shen, Yang Gao, Simiao Wang, Yunan Liu and Zilong Deng

Drones 2025, 9(6), 427; https://doi.org/10.3390/drones9060427 - 12 Jun 2025

Cited by 1 | Viewed by 1332

Abstract

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face [...] Read more.

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face challenges in capturing global feature dependencies due to their restricted receptive fields. Inspired by state-space models (SSMs), which have demonstrated efficacy in modeling long sequences, we propose a pure Mamba-based method called the Cross-Mamba Interaction Network (CMIN) for UAV geolocalization. CMIN consists of three key components: feature extraction, information interaction, and feature fusion. It leverages Mamba’s strengths in global information modeling to effectively capture feature correlations between UAV and satellite images over a larger receptive field. For feature extraction, we design a Siamese Feature Extraction Module (SFEM) based on two basic vision Mamba blocks, enabling the model to capture the correlation between UAV and satellite image features. In terms of information interaction, we introduce a Local Cross-Attention Module (LCAM) to fuse cross-Mamba features, providing a solution for feature matching via deep learning. By aggregating features from various layers of SFEMs, we generate heatmaps for the satellite image that help determine the UAV’s geographical coordinates. Additionally, we propose a Center Masking strategy for data augmentation, which promotes the model’s ability to learn richer contextual information from UAV images. Experimental results on benchmark datasets show that our method achieves state-of-the-art performance. Ablation studies further validate the effectiveness of each component of CMIN. Full article

► Show Figures

Figure 1

21 pages, 1560 KB

Open AccessArticle

Energy-Efficient Deployment Simulator of UAV-Mounted Base Stations Under Dynamic Weather Conditions

by Gyeonghyeon Min and Jaewoo So

Sensors 2025, 25(12), 3648; https://doi.org/10.3390/s25123648 - 11 Jun 2025

Viewed by 639

Abstract

In unmanned aerial vehicle (UAV)-mounted base station (MBS) networks, user equipment (UE) experiences dynamic channel variations because of the mobility of the UAV and the changing weather conditions. In order to overcome the degradation in the quality of service (QoS) of the UE [...] Read more.

In unmanned aerial vehicle (UAV)-mounted base station (MBS) networks, user equipment (UE) experiences dynamic channel variations because of the mobility of the UAV and the changing weather conditions. In order to overcome the degradation in the quality of service (QoS) of the UE due to channel variations, it is important to appropriately determine the three-dimensional (3D) position and transmission power of the base station (BS) mounted on the UAV. Moreover, it is also important to account for both geographical and meteorological factors when deploying UAV-MBSs because they service ground UE in various regions and atmospheric environments. In this paper, we propose an energy-efficient UAV-MBS deployment scheme in multi-UAV-MBS networks using a hybrid improved simulated annealing–particle swarm optimization (ISA-PSO) algorithm to find the 3D position and transmission power of each UAV-MBS. Moreover, we developed a simulator for deploying UAV-MBSs, which took the dynamic weather conditions into consideration. The proposed scheme for deploying UAV-MBSs demonstrated superior performance, where it achieved faster convergence and higher stability compared with conventional approaches, making it well suited for practical deployment. The developed simulator integrates terrain data based on geolocation and real-time weather information to produce more practical results. Full article

(This article belongs to the Special Issue Energy-Efficient Communication Networks and Systems: 2nd Edition)

► Show Figures

Figure 1

21 pages, 10875 KB

Open AccessArticle

FIM-JFF: Lightweight and Fine-Grained Visual UAV Localization Algorithms in Complex Urban Electromagnetic Environments

by Faming Gong, Junjie Hao, Chengze Du, Hao Wang, Yanpu Zhao, Yi Yu and Xiaofeng Ji

Information 2025, 16(6), 452; https://doi.org/10.3390/info16060452 - 27 May 2025

Cited by 1 | Viewed by 741

Abstract

Unmanned aerial vehicles (UAVs) are a key driver of the low-altitude economy, where precise localization is critical for autonomous flight and complex task execution. However, conventional global positioning system (GPS) methods suffer from signal instability and degraded accuracy in dense urban areas. This [...] Read more.

Unmanned aerial vehicles (UAVs) are a key driver of the low-altitude economy, where precise localization is critical for autonomous flight and complex task execution. However, conventional global positioning system (GPS) methods suffer from signal instability and degraded accuracy in dense urban areas. This paper proposes a lightweight and fine-grained visual UAV localization algorithm (FIM-JFF) suitable for complex electromagnetic environments. FIM-JFF integrates both shallow and global image features to leverage contextual information from satellite and UAV imagery. Specifically, a local feature extraction module (LFE) is designed to capture rotation, scale, and illumination-invariant features. Additionally, an environment-adaptive lightweight network (EnvNet-Lite) is developed to extract global semantic features while adapting to lighting, texture, and contrast variations. Finally, UAV geolocation is determined by matching feature points and their spatial distributions across multi-source images. To validate the proposed method, a real-world dataset UAVs-1100 was constructed in complex urban electromagnetic environments. The experimental results demonstrate that FIM-JFF achieves an average localization error of 4.03 m with a processing time of 2.89 s, outperforming state-of-the-art methods by improving localization accuracy by 14.9% while reducing processing time by 0.76 s. Full article

► Show Figures

Figure 1

29 pages, 6039 KB

Open AccessArticle

Tree Species Detection and Enhancing Semantic Segmentation Using Machine Learning Models with Integrated Multispectral Channels from PlanetScope and Digital Aerial Photogrammetry in Young Boreal Forest

by Arun Gyawali, Mika Aalto and Tapio Ranta

Remote Sens. 2025, 17(11), 1811; https://doi.org/10.3390/rs17111811 - 22 May 2025

Cited by 1 | Viewed by 1798

Abstract

The precise identification and classification of tree species in young forests during their early development stages are vital for forest management and silvicultural efforts that support their growth and renewal. However, achieving accurate geolocation and species classification through field-based surveys is often a [...] Read more.

The precise identification and classification of tree species in young forests during their early development stages are vital for forest management and silvicultural efforts that support their growth and renewal. However, achieving accurate geolocation and species classification through field-based surveys is often a labor-intensive and complicated task. Remote sensing technologies combined with machine learning techniques present an encouraging solution, offering a more efficient alternative to conventional field-based methods. This study aimed to detect and classify young forest tree species using remote sensing imagery and machine learning techniques. The study mainly involved two different objectives: first, tree species detection using the latest version of You Only Look Once (YOLOv12), and second, semantic segmentation (classification) using random forest, Categorical Boosting (CatBoost), and a Convolutional Neural Network (CNN). To the best of our knowledge, this marks the first exploration utilizing YOLOv12 for tree species identification, along with the study that integrates digital aerial photogrammetry with Planet imagery to achieve semantic segmentation in young forests. The study used two remote sensing datasets: RGB imagery from unmanned aerial vehicle (UAV) ortho photography and RGB-NIR from PlanetScope. For YOLOv12-based tree species detection, only RGB from ortho photography was used, while semantic segmentation was performed with three sets of data: (1) Ortho RGB (3 bands), (2) Ortho RGB + canopy height model (CHM) + Planet RGB-NIR (8 bands), and (3) ortho RGB + CHM + Planet RGB-NIR + 12 vegetation indices (20 bands). With three models applied to these datasets, nine machine learning models were trained and tested using 57 images (1024 × 1024 pixels) and their corresponding mask tiles. The YOLOv12 model achieved 79% overall accuracy, with Scots pine performing best (precision: 97%, recall: 92%, mAP50: 97%, mAP75: 80%) and Norway spruce showing slightly lower accuracy (precision: 94%, recall: 82%, mAP50: 90%, mAP75: 71%). For semantic segmentation, the CatBoost model with 20 bands outperformed other models, achieving 85% accuracy, 80% Kappa, and 81% MCC, with CHM, EVI, NIRPlanet, GreenPlanet, NDGI, GNDVI, and NDVI being the most influential variables. These results indicate that a simple boosting model like CatBoost can outperform more complex CNNs for semantic segmentation in young forests. Full article

► Show Figures

Graphical abstract

24 pages, 9161 KB

Open AccessArticle

An Efficient Pyramid Transformer Network for Cross-View Geo-Localization in Complex Terrains

by Chengjie Ju, Wangping Xu, Nanxing Chen and Enhui Zheng

Drones 2025, 9(5), 379; https://doi.org/10.3390/drones9050379 - 17 May 2025

Viewed by 1343

Abstract

Unmanned aerial vehicle (UAV) self-localization in complex environments is critical when global navigation satellite systems (GNSSs) are unreliable. Existing datasets, often limited to low-altitude urban scenes, hinder generalization. This study introduces Multi-UAV, a novel dataset with 17.4 k high-resolution UAV–satellite image pairs from [...] Read more.

Unmanned aerial vehicle (UAV) self-localization in complex environments is critical when global navigation satellite systems (GNSSs) are unreliable. Existing datasets, often limited to low-altitude urban scenes, hinder generalization. This study introduces Multi-UAV, a novel dataset with 17.4 k high-resolution UAV–satellite image pairs from diverse terrains (urban, rural, mountainous, farmland, coastal) and altitudes across China, enhancing cross-view geolocalization research. We propose a lightweight value reduction pyramid transformer (VRPT) for efficient feature extraction and a residual feature pyramid network (RFPN) for multi-scale feature fusion. Using meter-level accuracy (MA@K) and relative distance score (RDS), VRPT achieves robust, high-precision localization across varied terrains, offering significant potential for resource-constrained UAV deployment. Full article

► Show Figures

Figure 1

22 pages, 20558 KB

Open AccessArticle

Long-Duration UAV Localization Across Day and Night by Fusing Dual-Vision Geo-Registration with Inertial Measurements

by Xuehui Xing, Xiaofeng He, Ke Liu, Zhizhong Chen, Guofeng Song, Qikai Hao, Lilian Zhang and Jun Mao

Drones 2025, 9(5), 373; https://doi.org/10.3390/drones9050373 - 15 May 2025

Cited by 1 | Viewed by 1050

Abstract

Remote sensing visual-light spectral (VIS) maps provide stable and rich features for geo-localization. However, it still remains a challenge to make use of VIS map features as localization references at night. To construct a cross-day-and-night localization system for long-duration UAVs, this study proposes [...] Read more.

Remote sensing visual-light spectral (VIS) maps provide stable and rich features for geo-localization. However, it still remains a challenge to make use of VIS map features as localization references at night. To construct a cross-day-and-night localization system for long-duration UAVs, this study proposes a visual–inertial integrated localization system, where the visual component can register both RGB and infrared camera images in one unified VIS map. To deal with the large differences between visible and thermal images, we inspected various visual features and utilized a pre-trained network for cross-domain feature extraction and matching. To obtain an accurate position from visual geo-localization, we demonstrate a localization error compensation algorithm with considerations about the camera attitude, flight height, and terrain height. Finally, the inertial and dual-vision information is fused with a State Transformation Extended Kalman Filter (ST-EKF) to generate long-term, drift-free localization performance. Finally, we conducted actual long-duration flight experiments with altitudes ranging from 700 to 2400 m and flight distances longer than 344.6 km. The experimental results demonstrate that the proposed method’s localization error is less than 50 m in its RMSE. Full article

► Show Figures

Figure 1

26 pages, 5752 KB

Open AccessReview

Towards a Holistic Approach for UAV-Based Large-Scale Photovoltaic Inspection: A Review on Deep Learning and Image Processing Techniques

by Zoubir Barraz, Imane Sebari, Kenza Ait El Kadi and Ibtihal Ait Abdelmoula

Technologies 2025, 13(3), 117; https://doi.org/10.3390/technologies13030117 - 14 Mar 2025

Cited by 2 | Viewed by 2222

Abstract

This paper provides an in-depth literature review on image processing techniques, focusing on deep learning approaches for anomaly detection and classification in photovoltaics. It examines key components of UAV-based PV inspection, including data acquisition protocols, panel segmentation and geolocation, anomaly classification, and optimizations [...] Read more.

This paper provides an in-depth literature review on image processing techniques, focusing on deep learning approaches for anomaly detection and classification in photovoltaics. It examines key components of UAV-based PV inspection, including data acquisition protocols, panel segmentation and geolocation, anomaly classification, and optimizations for model generalization. Furthermore, challenges related to domain adaptation, dataset limitations, and multimodal fusion of RGB and thermal data are also discussed. Finally, research gaps and opportunities are analyzed to create a holistic, scalable, and real-time inspection workflow for large-scale installation. This review serves as a reference for researchers and industry professionals to advance UAV-based PV inspection. Full article

(This article belongs to the Section Environmental Technology)

► Show Figures

Graphical abstract

Search Results (73)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (73)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI