**5. Conclusions**

In this paper, we focus on the challenge of a tiny target and blured edge detection problem in UAV visual localization. M-O SiamRPN with weight adaptive joint multiple intersection over union loss and a stretched Wallis shadow compensation method are proposed. The pre-processed aerial images significantly reduce the effect of shadows and are the basis for subsequent image matching procedures. M-O SiamRPN with weight adaptive joint multiple intersection over union loss consists of three parts: a dual-stream Siamese first-order framework based on ResNet50; a covariance-based second-order information generation module; and an RPN module under weight adaptive multiple constraints. To exploit the first-order features adequately, we used Resnet50 as the backbone and proposed the concept of spatial continuity to rank the convolution kernels and select the features with richer local information for 2-O feature generation. For the blurred edges of tiny targets, we presented M-O features, which incorporate second-order information obtained by normalized covariance calculation of selected 1-O features to enhance the network description capability. To address the problem of positive and negative sample imbalance

in the detection task, a weight adaptive cross-entropy loss and MIoU were designed to improve the regression precision. The former automatically adjusts the penalty coefficient of the loss function according to the number of positive and negative samples, while the latter constrains the interest anchor box from multiple perspectives. We built a consumer-grade UAV acquisition platform to construct an aerial image dataset for experimental validation. The results show that our framework obtained excellent performance for each quantitative and qualitative metric.

The proposed visual framework is only based on a UAV platform, which is capable of autonomously achieving high-precision real-time localization in the event of GNSS and other navigation module failure. This provides the basis for higher-level extended functionality, which is important for safe, full-scene applications of UAVs.

**Author Contributions:** Conceptualization, K.W. and J.C. (Jie Chu); methodology, K.W., J.C. (Jie Chu) and J.C. (Jueping Cai); software, J.C. (Jiayan Chen) and Y.C.; experiment validation: K.W., Y.C. and J.C. (Jiayan Chen); writing—review and editing, K.W. and J.C. (Jie Chu). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Shaanxi Province Key Research and Development Program (grant number 2021ZDLGY02-01), Wuhu-Xidian University Industry-University-Research Cooperation Special Fund (XWYCXY-012021003) and Supported by the National 111 Center (B12026).

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.
