Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework
Abstract
1. Introduction
2. Materials and Methods
2.1. SatLoc Dataset
- Targeted Platform Selection: Data acquisition using representative commercial DJI Mavic UAVs and custom-built small-scale rotorcraft UAVs to reflect the flight and imaging characteristics of platforms in practical applications.
- Geomorphological Scene Diversity: Covering typical geographical environments for low-altitude operations of small-scale UAVs, including urban, suburban, rural, water bodies, and industrial areas. Efforts were made to include different lighting (noon/morning/dusk) and weather conditions to ensure sufficient testing of algorithm generalization.
- Dynamic Diversity: Incorporating diverse flight modes, such as different altitudes, speeds, and maneuvers (hovering, straight-line flight, turns, etc.), to approximate the flight conditions of real mission scenarios as closely as possible.
- Data Quality and Completeness: Providing high-precision trajectories as ground truth. In addition to the core UAV images and satellite images, necessary sensor intrinsic and extrinsic parameters, as well as other sensor data that may be used for fusion (including IMU and barometer), are provided, ensuring precise temporal synchronization of all sensor data with the ground truth trajectory.
2.1.1. Data Acquisition
UAV Platforms
- DJI off-the-shelf product Mavic Air 2, take-off weight 520 g, wheelbase 320 mm, equipped with a three-axis stabilized gimbal, capable of providing high-quality stable imagery. It also has a built-in GNSS time synchronization function. Its visible light imaging component includes a 1/2-inch CMOS sensor, lens equivalent focal length of 24 mm, 84-degree wide field of view, aperture f/2.8, downward-facing shooting mode, and original image resolution of 1920 × 1080.
- A custom-built multi-rotor flight platform using the PX4 flight control system, equipped with a miniature three-axis electro-optical pod. The visible light imaging component has a focal length of 2.4–9.1 mm, and an original image resolution of 1920 × 1080.
Sensors
- GNSS device: The Mavic Air 2 uses a built-in GNSS positioning module with a horizontal positioning accuracy of 1.5 m. The custom-built flight platform uses a u-blox M8030-KT with a positioning accuracy of 1.5 m. Note that GNSS module data is only used to record ground truth trajectories and is not used in the evaluation algorithms.
- IMU: The Mavic Air 2 uses its built-in IMU. The custom-built flight platform uses a BMI-055 as the onboard IMU, providing three-axis gyroscope and three-axis accelerometer data at a sampling frequency of 500 Hz.
- Barometer/Altimeter: The custom-built flight platform uses an MS5611 as a pressure sensor, and its data is fused with IMU data to calculate relative altitude, with an elevation measurement accuracy of 1.5 m.
Time Synchronization
Route Planning
Environment
Satellite Imagery
2.1.2. Dataset Characteristics
2.2. Methodology: Three-Layer Adaptive Fusion Pipeline
2.2.1. System Overview
2.2.2. Layer 1: Absolute Localization (Aerial-Satellite Image Matching)
- Objective: Provide global absolute pose estimation by matching the current UAV view with a satellite map to correct accumulated drift and anchor the localization result in a global coordinate system.
- Input: UAV downward-facing camera image at current time t, and a set of corresponding satellite map tiles retrieved based on a coarse prior position or the estimated position from the previous time step.
- Output: Estimated position in the global coordinate system.
- Processing Flow: Use a pre-trained DinoV2 model (ViT architecture [8]) to extract features from and , respectively. Similarity is determined through criteria of cosine similarity and neighborhood spatial consistency. Finally, the uncertainty of this layer is calculated based on the distribution of similarity scores and pixel scale similarity.
- Matching Strategy: Our localization is achieved through a hierarchical matching process that begins with global image retrieval and concludes with local patch-level refinement, as shown in Figure 5. The process leverages a database of reference satellite image tiles, , prepared offline from a large-scale image, . For each tile, a global descriptor, , is pre-computed using a backbone network and optimal transport aggregation [25]. During online operation, a query image, , is captured by the UAV, preprocessed for size and resolution consistency, and encoded into an equivalent descriptor, . Coarse localization is first performed by retrieving the best-matching satellite tile, , which maximizes the cosine similarity with the query descriptor, as defined in (1).
- Uncertainty Metric : The reliability of this absolute localization result is quantified using the following two indicators:
- Distribution Concentration of Matching Scores: The candidate matching scores are normalized, and the confidence is determined by the distinctiveness between the best score and the average of sub-optimal scores. A more concentrated distribution indicates higher confidence. Let be the optimal matching score, and be the mean of a set of M sub-optimal scores, with both normalized to . Let the score difference threshold be . The confidence from the score distribution is expressed as:
- Pixel Scale Similarity: When the Ground Sampling Distance (GSD) of the UAV image is much smaller than that of the satellite image, it implies that the UAV’s field of view is too small, making global feature extraction difficult. Therefore, a smaller ratio between their GSDs indicates lower pixel scale similarity and thus lower confidence. Let be the GSD of the UAV image (meters/pixel) and be the GSD of the satellite image (meters/pixel). The ratio is . As the UAV’s flight altitude directly determines its , this metric allows our framework to dynamically assess and adapt to the challenges of scale variation during matching. Let the acceptable ratio threshold be (determined empirically based on the matching algorithm’s performance, set to 0.8 in this paper). The resulting confidence expression is:
2.2.3. Layer 2: Relative Pose Estimation (Visual Odometry)
- Objective: To accurately estimate the high-frequency relative motion (pose change) of the UAV between consecutive frames.
- Input: Consecutive UAV camera images and .
- Output: Inter-frame relative displacement .
- Processing Flow: Feature matching and relative displacement estimation are performed using XFeat. XFeat is a lightweight and efficient feature detector and matcher, particularly suitable for edge computing platforms, and it performs well in pose estimation and homography estimation tasks [26]. The relative displacement estimation process is as follows:
- Use XFeat to detect keypoints and extract descriptors on and .
- Match the descriptors between the two image frames and use the Direct Linear Transform (DLT) algorithm with RANSAC to determine the homography matrix .
- Decompose the normalized homography matrix according to the form:
- Confidence/Uncertainty Metric (): Referencing common principles in multi-view geometry [27], the reliability of the relative pose estimation is quantified using the following metrics:
- Number/Proportion of Inliers: The number or proportion of matched point pairs obtained after RANSAC estimation. The more inliers, the higher the confidence. Let be the number of matched inliers and be the inlier count threshold. The inlier count confidence is thus given by:
- Distribution of Matched Points: ** Evaluate the spatial distribution of matched points across the image. A more uniform and widespread distribution indicates a more reliable estimation. To this end, the image is divided into k grid cells, and the ratio of non-empty cells (containing inliers) to the total number of cells is calculated. Let be the number of non-empty cells. Then:
The final confidence is expressed as:
2.2.4. Layer 3: Velocity Estimation
- Objective: To provide a direct estimate of the UAV’s velocity vector.
- Input: Consecutive UAV camera images and , and the synchronized relative altitude measurement (from a barometer or altimeter).
- Output: Estimated velocity vector (in the world frame).
- Processing Flow: Fuse optical flow with altitude information to estimate the UAV’s motion velocity. Optical flow directly estimates pixel motion [28,29], and when combined with altitude information, it can be used to derive the UAV’s velocity. Considering the computational constraints of edge computing hardware [30,31], the pyramidal Lucas-Kanade (LK) sparse optical flow method is adopted [32,33]. The processing flow is as follows:
- Calculate the sparse optical flow field from image to . Combined with Harris corner detection, optical flow is computed only for highly distinctive feature points to reduce redundant calculations.
- Using camera intrinsics and the current altitude , convert the pixel velocities to motion velocity on the dominant plane (assumed to be the ground here), thereby estimating the UAV’s horizontal velocity .
- Confidence/Uncertainty Metric (): The reliability of the velocity estimation is quantified using the following metrics:
- Forward–Backward Consistency Error: For each feature point, calculate the forward optical flow from and the backward optical flow from . The discrepancy between these two positions, , is taken as the forward–backward consistency error. A smaller value indicates higher confidence. Therefore, we define:
- Photometric Consistency Error: The average Sum of Squared Differences (SSD) of brightness between the tracked feature point patch and the original patch. A smaller error indicates higher confidence. Calculate the Sum of Squared Differences (SSD) between the neighborhood of a feature point in frame and its corresponding tracked neighborhood in frame . This is given by:
The final confidence is expressed as:
2.2.5. Adaptive Fusion Strategy
- State Vector Definition: As previously mentioned, the core outputs of the three localization modules are all related to horizontal displacement information. Therefore, the state vector can be defined as the horizontal position of the UAV at time step k:
- Auxiliary Velocity Estimation: Since data sources of Layer 2 and 3 are related to relative displacement or its derivative (i.e., relative velocity), we consider using a weighted fusion of the two as the velocity estimate for the current time step. This is used to build the process model, with the absolute localization information from data source 1 subsequently used as the observation for the measurement update. Specifically, at each time step k, we first attempt to estimate an instantaneous velocity (representing the average velocity from to k) from data sources of layers 2 and 3.
- Layer 2 (XFeat Relative Displacement): Velocity is estimated from the measurement :
- Layer 3 (Optical Flow Velocity): The measurement representing the velocity between and k is used directly:The fused velocity is estimated as follows:
- If either data source 2 or 3 is available and its confidence is above a threshold, a weighted average is used to fuse the two velocity estimates. The fused velocity is then:
- If both are unavailable, can be set to zero, or the fused velocity from the previous time step can be used.
- Process Model (Prediction Step): Using a constant velocity model, with the velocity provided by , the state transition equation is as follows:Where the state transition matrix is:The control input matrix is:The control input is:For the priori covariance matrix prediction, we have:
- Measurement Model (Update Step): After completing the prediction step using data sources 2 and 3, the measurement update step mainly relies on data source 1 (absolute localization) to correct the position.
- Measurement of Layer 1:
- Measurement Matrix:
- Measurement Noise Covariance : (calculated as before, based on ):
- Kalman Filter Update Steps (Standard):
- Calculate Measurement Residual (Innovation):
- Calculate Residual Covariance (Innovation Covariance):
- Calculate Kalman Gain:
- Update State Estimate:
- Update Covariance Estimate:
3. Results
3.1. Implementation Details
- Hardware Platform: The edge computing platform used for experiments is the RockChip RK3588Pro, which features up to 6 TOPS of AI performance and 32 GB of memory. The platform ran the SatLoc-Fusion pipeline algorithm in both a ground-based hardware-in-the-loop simulation (based on playback of pre-collected data) and an actual drone flight environment.
- Software Environment: The operating system is Ubuntu-20.04, equipped with the SDK corresponding to the RockChip RK3588 platform. Core libraries include PyTorch-1.13.1, OpenCV-4.10, and RKNN-toolkit2-1.3.0 for model acceleration. The main programming languages are Python/C++.
- Algorithm Parameters:
- DinoV2: The model is initialized with weights pre-trained on the UAV-VisLoc dataset. For our training process, positive pairs are constructed by matching each UAV image from UAV-VisLoc with its corresponding satellite patch via GPS coordinates. Negative pairs are formed by randomly associating UAV images with non-corresponding satellite regions.
- XFeat: We fine-tune the model starting from its publicly available pre-trained weights. To do this, we generate an augmented dataset by applying random geometric and photometric transformations to the UAV images from the UAV-VisLoc dataset.
- Optimization: To achieve real-time performance on edge devices, techniques such as model quantization and compression were employed. After INT8 quantization on the RKNN processing pipeline, the mean inference time for DinoV2 is 482 ms. XFeat can achieve 30 FPS (at resolution) after acceleration with the RKNN library.
3.2. Evaluation Metrics
- Mean Localization Error (MLE): Calculates the Root Mean Square Error (RMSE) between the estimated positions of the entire trajectory and the ground truth, in meters (m).
- Trajectory Localization Success Rate: The percentage of the total trajectory length occupied by segments where the localization error remains below a preset threshold (25 m), after excluding segments that have drifted.
3.3. Experimental Setup
3.4. Experimental Results
- For quantitative analysis, where consistency and the elimination of uncontrollable environmental variables are paramount for fair comparison, we primarily employ the HIL simulation platform. In this setup, pre-recorded image sequences and sensor data streams from our SatLoc dataset are fed into the edge computing hardware running our algorithm. This approach is crucial as it facilitates benchmarking in a highly controlled and repeatable environment, ensuring that performance comparisons against baseline methods are consistent and unbiased.
- All other evaluations, particularly qualitative analysis and system robustness assessments, are predominantly conducted through real-world flight tests. These field experiments allow for the validation of the system’s practical efficacy and utility under authentic and unpredictable operational conditions.
3.4.1. Quantitative Analysis
3.4.2. Qualitative Analysis
3.4.3. Analysis of System Robustness and Component Failure
4. Discussion
4.1. Results Interpretation
4.2. In-Depth Comparison with SOTA Methods
- Compared to pure CVGL methods: Many CVGL methods primarily focus on one-time place recognition or retrieval and may not provide continuous, high-frequency position-velocity output, making them difficult to use directly for UAV control. SatLoc-Fusion, by combining high-frequency VO and velocity estimation, provides continuous state estimates that meet control requirements.
- Compared to pure VO/VIO methods: Pure VO/VIO methods inevitably suffer from long-term cumulative drift [3]. SatLoc-Fusion, by introducing an absolute localization layer based on satellite maps (DinoV2), can periodically eliminate cumulative errors, significantly improving global localization accuracy.
4.3. Limitations Analysis
- Environmental Factors: (1) Extreme Weather/Lighting: The current system has been primarily validated during daytime and under good weather conditions. In adverse weather (heavy rain, heavy snow, dense fog) or nighttime conditions, the performance of visual sensors will severely degrade, potentially leading to system failure. (2) Seasonal Changes: The appearance of ground and satellite images can change significantly with seasons (vegetation, snow cover, etc.), which may affect the matching performance of DinoV2.
- A quantitative analysis of the system’s error characteristics and failure modes can be derived from our ablation study (Table 5). The results demonstrate the framework’s capacity for graceful degradation rather than catastrophic failure when individual components are compromised. For instance, in challenging scenarios characterized by a lack of distinct global features or sparse local textures, the absolute localization (Layer 1) may become unreliable. The ablation study simulates this condition by removing Layer 1 entirely, which results in a significant degradation of system performance: the Mean Localization Error (MLE) increases from 14.05 m to 27.84 m, and the trajectory localization success rate drops from to . This provides a concrete, quantitative measure of how a single component’s failure impacts the overall error rate. In such situations, the adaptive fusion core correctly shifts its reliance to the relative pose estimation (Layer 2) and velocity estimation (Layer 3). While this strategy prevents total localization failure, these layers are inherently susceptible to cumulative drift. This explains why trajectories through these difficult areas, while often remaining below the 50 m failure threshold, are the primary contributors to the increase in the overall MLE, thereby defining the operational performance boundaries of the system under the most adverse conditions.
- Sensor Failure: Although adaptive fusion can handle performance degradation in a single layer, if a critical sensor (like the downward-facing camera) completely fails or is occluded for an extended period, the system performance will severely degrade or even fail. Detection of sensor failures and more graceful degradation handling need to be strengthened.
- Computational Resources and System Latency: A notable limitation of the current framework is the significant processing latency of the absolute localization layer, which is approximately 485 ms. This performance bottleneck mainly comes from the power consumption and computing resource limitations of the edge computing hardware. This delay introduces a temporal misalignment, as the position measurement derived from an image captured at time t is only available for fusion at a later time, delay. Applying this delayed measurement directly to the state estimate can introduce significant errors, particularly during dynamic maneuvers. However, this effect can be effectively mitigated. By leveraging the high-frequency data from the onboard Inertial Measurement Unit (IMU), the UAV’s displacement during the delay interval can be estimated through inertial integration. This estimated displacement can then be used to propagate the delayed position measurement forward in time, yielding a corrected observation that is temporally consistent with the current filter state. While the implementation of this measurement compensation is a standard technique in advanced sensor fusion systems and represents a clear direction for future work, it was considered beyond the scope of this baseline study.
4.4. Future Work
- Improving Environmental Adaptability: Research domain adaptation techniques or continual learning methods to enhance the system’s robustness to changes in lighting, weather, and seasons. Explore the fusion of other sensors (such as thermal imaging cameras) to enhance all-weather operational capabilities.
- Enhancing Sensor Fault Tolerance: Develop more sophisticated sensor fault detection and isolation mechanisms, and achieve smoother performance degradation when sensors fail.
- Online Map Updating and Validation: Investigate methods for online updating or validating reference maps using real-time UAV observation data to cope with environmental changes.
- Uncertainty Quantification and Fusion Optimization: Explore more advanced uncertainty representation methods (such as non-Gaussian distributions) and fusion algorithms (such as particle filters, more complex factor graph optimization) to further improve accuracy and robustness.
- Multi-UAV Cooperative Localization: Extend this framework to multi-UAV systems, utilizing inter-vehicle communication and relative measurements to achieve more accurate and robust cooperative localization [36].
- Broader Platform and Scene Testing: Conduct tests on more types of small-scale UAV platforms with different payloads, and validate system performance in broader and more challenging real-world environments.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
VO | Visual Odometry |
UAV | Unmanned Aerial Vehicle |
CVGL | Cross-View Geo-Localization |
SLAM | Simultaneous Localisation and Mapping |
EKF/KF | (Extended) Kalman Filter |
GSD | Ground Sampling Distance (in meter/pix) |
IMU | Inertial Measurement Unit |
GNSS | Global Navigation Satellite System |
References
- Jarraya, I.; Al-Batati, A.; Kadri, M.B.; Abdelkader, M.; Ammar, A.; Boulila, W.; Koubaa, A. Gnss-Denied Unmanned Aerial Vehicle Navigation: Analyzing Computational Complexity, Sensor Fusion, and Localization Methodologies. Satell. Navig. 2025, 6, 9. [Google Scholar] [CrossRef]
- Yao, Y.; Sun, C.; Wang, T.; Yang, J.; Zheng, E. UAV Geo-Localization Dataset and Method Based on Cross-View Matching. Sensors 2024, 24, 6905. [Google Scholar] [CrossRef] [PubMed]
- He, M.; Chen, C.; Liu, J.; Li, C.; Lyu, X.; Huang, G.; Meng, Z. AerialVL: A Dataset, Baseline and Algorithm Framework for Aerial-Based Visual Localization with Reference Map. IEEE Robot. Autom. Lett. 2024, 9, 8210–8217. [Google Scholar] [CrossRef]
- Akhihiero, D.; Olawoye, U.; Das, S.; Gross, J. Cooperative Localization for GNSS-Denied Subterranean Navigation: A UAV–UGV Team Approach. NAVIGATION J. Inst. Navig. 2024, 71, navi.677. [Google Scholar] [CrossRef]
- Durgam, A.; Paheding, S.; Dhiman, V.; Devabhaktuni, V. Cross-View Geo-Localization: A Survey. IEEE Access 2024, 12, 192028–192050. [Google Scholar] [CrossRef]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Chen, Z.; Yang, Z.X.; Rong, H.J. Without Paired Labeled Data: An End-to-End Self-Supervised Paradigm for UAV-View Geo-Localization. arXiv 2025, arXiv:2502.11381. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. Dinov2: Learning Robust Visual Features without Supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar]
- Fan, J.; Zheng, E.; He, Y.; Yang, J. A Cross-View Geo-Localization Algorithm Using UAV Image and Satellite Image. Sensors 2024, 24, 3719. [Google Scholar] [CrossRef]
- Wei, G.; Liu, Y.; Yuan, X.; Xue, X.; Guo, L.; Yang, Y.; Zhao, C.; Bai, Z.; Zhang, H.; Xiao, R. From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection. arXiv 2025, arXiv:2505.03334. [Google Scholar]
- Huang, G.; Zhou, Y.; Zhao, L.; Gan, W. Cv-Cities: Advancing Cross-View Geo-Localization in Global Cities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 1592–1606. [Google Scholar] [CrossRef]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar]
- Huang, Z.; Xu, Q.; Sun, M.; Zhu, X.; Fan, S. Adaptive Kalman Filtering Localization Calibration Method Based on Dynamic Mutation Perception and Collaborative Correction. Entropy 2025, 27, 380. [Google Scholar] [CrossRef]
- Zhan, Q.; Shen, R.; Mao, Y.; Shu, Y.; Shen, L.; Yang, L.; Zhang, J.; Sun, C.; Guo, F.; Lu, Y. Adaptive Federated Kalman Filtering with Dimensional Isolation for Unmanned Aerial Vehicle Navigation in Degraded Industrial Environments. Drones 2025, 9, 168. [Google Scholar] [CrossRef]
- Dellaert, F.; Kaess, M. Factor Graphs for Robot Perception. In Foundations and Trends® in Robotics; Now Foundations and Trends: Norwell, MA, USA, 2017; Volume 6, pp. 1–139. [Google Scholar]
- Warburg, F.; Hauberg, S.; Lopez-Antequera, M.; Gargallo, P.; Kuang, Y.; Civera, J. Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2626–2635. [Google Scholar]
- Weyand, T.; Araujo, A.; Cao, B.; Sim, J. Google Landmarks Dataset V2-a Large-Scale Benchmark for Instance-Level Recognition and Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2575–2584. [Google Scholar]
- Ji, Y.; He, B.; Tan, Z.; Wu, L. Game4loc: A Uav Geo-Localization Benchmark from Game Data. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 20–27 February 2024; Volume 39, pp. 3913–3921. [Google Scholar]
- Chu, M.; Zheng, Z.; Ji, W.; Wang, T.; Chua, T.S. Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 213–231. [Google Scholar]
- Deuser, F.; Mansour, W.; Li, H.; Habel, K.; Werner, M.; Oswald, N. Temporal Resilience in Geo-Localization: Adapting to the Continuous Evolution of Urban and Rural Environments. In Proceedings of the Winter Conference on Applications of Computer Vision, Tucson, AZ, USA, 28 February–4 March 2025; pp. 479–488. [Google Scholar]
- Li, H.; Xu, C.; Yang, W.; Mi, L.; Yu, H.; Zhang, H.; Xia, G.S. Unsupervised Multi-View UAV Image Geo-Localization via Iterative Rendering. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5625015. [Google Scholar] [CrossRef]
- Guan, F.; Zhao, N.; Fang, Z.; Jiang, L.; Zhang, J.; Yu, Y.; Huang, H. Multi-Level Representation Learning via ConvNeXt-Based Network for Unaligned Cross-View Matching. Geo-Spat. Inf. Sci. 2025, 1–14. [Google Scholar] [CrossRef]
- Zheng, Z.; Wei, Y.; Yang, Y. University-1652: A Multi-View Multi-Source Benchmark for Drone-Based Geo-Localization. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1395–1403. [Google Scholar]
- Zhao, D.; Andrews, J.; Papakyriakopoulos, O.; Xiang, A. Position: Measure Dataset Diversity, Don’t Just Claim It. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; pp. 60644–60673. [Google Scholar]
- Izquierdo, S.; Civera, J. Optimal Transport Aggregation for Visual Place Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17658–17668. [Google Scholar]
- Potje, G.; Cadar, F.; Araujo, A.; Martins, R.; Nascimento, E.R. Xfeat: Accelerated Features for Lightweight Image Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2682–2691. [Google Scholar]
- Qiu, Y.; Chen, Y.; Zhang, Z.; Wang, W.; Scherer, S. MAC-VO: Metrics-Aware Covariance for Learning-Based Stereo Visual Odometry. arXiv 2024, arXiv:2409.09479. [Google Scholar]
- Dong, Q.; Cao, C.; Fu, Y. Rethinking Optical Flow from Geometric Matching Consistent Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1337–1347. [Google Scholar]
- Chen, Y.H.; Wu, C.T. Reynoldsflow: Exquisite Flow Estimation via Reynolds Transport Theorem. arXiv 2025, arXiv:2503.04500. [Google Scholar] [CrossRef]
- Sun, D.; Yang, X.; Liu, M.Y.; Kautz, J. Pwc-Net: Cnns for Optical Flow Using Pyramid, Warping, and Cost Volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8934–8943. [Google Scholar]
- Teed, Z.; Deng, J. Raft: Recurrent All-Pairs Field Transforms for Optical Flow. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23 August 2020; pp. 402–419. [Google Scholar]
- Ma, D.; Imamura, K.; Gao, Z.; Wang, X.; Yamane, S. Hierarchical Motion Field Alignment for Robust Optical Flow Estimation. Sensors 2025, 25, 2653. [Google Scholar] [CrossRef]
- Shi, K.; Miao, Y.; Li, X.; Li, W.; Nie, S.; Wang, X.; Li, D.; Sheng, Y. Fast Recurrent Field Transforms for Optical Flow on Edge GPUs. Meas. Sci. Technol. 2025, 36, 035409. [Google Scholar] [CrossRef]
- Wikipedia. Kalman Filter. Available online: https://en.wikipedia.org/wiki/Kalman_filter (accessed on 17 June 2025).
- Wikipedia. Extended Kalman Filter. Available online: https://en.wikipedia.org/wiki/Extended_Kalman_filter (accessed on 17 June 2025).
- PX4 Development Team. PX4 Autopilot Software, Version 1.14.0. 2024. Available online: https://github.com/PX4/PX4-Autopilot (accessed on 17 June 2025).
- Teed, Z.; Deng, J. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Visual, 6–14 December 2021; pp. 16558–16569. [Google Scholar]
- Hoshino, Y.; Rathnayake, N.; Dang, T.L.; Rathnayake, U. Flow Velocity Analysis of Rivers Using Farneback Optical Flow and STIV Techniques with Drone Data. In Proceedings of the International Symposium on Information and Communication Technology; Springer: Singapore, 2024; pp. 17–26. [Google Scholar]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Engel, J.; Koltun, V.; Cremers, D. Direct Sparse Odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Xia, Z.; Alahi, A. FG2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 6362–6372. [Google Scholar]
Feature | Specification |
---|---|
Platform and Sensors | |
Primary Platform | DJI Mavic Air 2 |
Secondary Platform | Custom PX4-based Quadrotor |
UAV Image Resolution | 1920 × 1080 pixels |
UAV Image Format | JPEG |
Satellite Imagery Source | Google Earth/Siwei Earth (Level-18 Tiles) |
Dataset Scale and Scope | |
Total Trajectories | 50 |
Total Trajectory Length | 395 km |
Total UAV Images | 48,162 |
Total Dataset Size | ∼529.1 GB |
Geographic Coverage | 136.3 sq km |
Flight and Imaging Parameters | |
Flight Altitude Range | 100 m–300 m |
UAV GSD Range | ∼0.38 m/px–∼0.57 m/px |
Satellite GSD | ∼0.52 m/px (for Level-18) |
Temporal Variation Analysis | |
UAV Data Collection Period | September 2024–February 2025 (Example) |
Satellite Imagery Vintage | 2022–2024 |
Typical Temporal Gap | 3–18 months |
Seasonal Mismatches | Yes (∼40% of trajectories feature significant seasonal differences, e.g., green foliage vs. bare trees, to test robustness) |
Feature | SatLoc (This Paper) | VDUAV | University-1652 | AerialVL |
---|---|---|---|---|
Platform Focus | Small-scale rotorcraft UAV | General UAV (virtual) | UAV/Ground/Satellite | UAV (rotorcraft) |
Data Source | Real-world | Digital Twin (simulation) [2] | Real-world/Simulated [23] | Real-world [3] |
Environment Diversity | High (urban/rural/lake and wetland/mountainous forest/road network, etc., different times, different weather) | High (city/plain/hills, etc.) [2] | Medium (mainly campus) [23] | Medium (urban/farmland/road network, different times) [3] |
Scale | 395 km | 12.4k Images (Virtual Reality Scene) | 1652 location types [23] | ~70 km [3] |
Ground Truth | GPS/Fusion (accuracy 1.5 m) | Virtual coordinate mapping (sub-meter) [2] | GPS tags/Simulated [23] | GPS/Fusion (accuracy 1.5 m) [3] |
Satellite Map | Provides corresponding tiles and reference full map | Provides corresponding tiles [2] | Provides corresponding images [23] | Provides reference map database [3] |
Main Limitations Addressed | Real small-scale platform data, diverse real scenes | Low simulation cost, easy scene expansion [2] | Multi-view data [23] | Large-scale real trajectory framework comparison [3] |
Method | Metrics | Traj.1 | Traj.2 | Traj.3 | Avg. |
---|---|---|---|---|---|
SatLoc-Fusion (ours) | MLE (m) | 14.05 | 14.12 | 16.57 | 14.91 |
Succ.Rate (%) | 96 | 97 | 87 | 93 | |
ORB-SLAM3 [41] | MLE (m) | 23.57 | 23.51 | 24.72 | 23.93 |
Succ.Rate (%) | 60 | 55 | 54 | 56 | |
DSO [40] | MLE (m) | 24.44 | 23.98 | 24.78 | 24.4 |
Succ.Rate (%) | 58 | 54 | 51 | 54 | |
Farneback [42] | MLE (m) | 30.08 | 29.86 | 30.56 | 30.17 |
Succ.Rate (%) | 32 | 34 | 31 | 32 |
Method | Short Trajectory | Long Trajectory | ||
---|---|---|---|---|
MLE (m) ↓ | Succ.Rate (%) ↑ | MLE (m) ↓ | Succ.Rate (%) ↑ | |
SatLoc-Fusion (ours) | 18.32 | 92.1 | 14.05 | 90.5 |
ORB-SLAM3 [41] | 28.93 | 38.4 | 27.39 | 42.5 |
DSO [40] | 27.25 | 46.4 | 25.72 | 48.5 |
Farneback [42] | 26.53 | 50.4 | 27.31 | 46 |
AerialVL.Comb. Method 1 [3] | 20.01 | 71 | 22.41 | 55.5 |
AerialVL.Comb. Method 2 [3] | 22.27 | 80 | 15.86 | 85.5 |
Configuration | ATE (m) ↓ | Success Rate (%) ↑ | Frequency (Hz) ↑ |
---|---|---|---|
Full Pipeline | 14.05 | 94 | 2.03 |
Remove Layer 1 (Absolute Loc.) | 27.84 | 55 | 24.5 |
Remove Layer 2 (Relative Loc.) | 18.81 | 57 | 2.8 |
Remove Layer 3 (Velocity Est.) | 16.42 | 74 | 2.23 |
Remove Adaptive Fusion (Static Fusion) | 17.85 | 85 | 2.11 |
Configuration | Time Cost (m/s) | Remark |
---|---|---|
Layer 1 (Absolute Loc.) | 482 | - |
Layer 2 (Relative Loc.) | 28 | Input image resized to |
Layer 3 (Velocity Est.) | 20 | Uses a pyramidal Lucas-Kanade (LK) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, X.; Zhang, X.; Yang, X.; Zhao, J.; Liu, Z.; Shuang, F. Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework. Remote Sens. 2025, 17, 3048. https://doi.org/10.3390/rs17173048
Zhou X, Zhang X, Yang X, Zhao J, Liu Z, Shuang F. Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework. Remote Sensing. 2025; 17(17):3048. https://doi.org/10.3390/rs17173048
Chicago/Turabian StyleZhou, Xiang, Xiangkai Zhang, Xu Yang, Jiannan Zhao, Zhiyong Liu, and Feng Shuang. 2025. "Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework" Remote Sensing 17, no. 17: 3048. https://doi.org/10.3390/rs17173048
APA StyleZhou, X., Zhang, X., Yang, X., Zhao, J., Liu, Z., & Shuang, F. (2025). Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework. Remote Sensing, 17(17), 3048. https://doi.org/10.3390/rs17173048