*3.2. Performance Evaluation*

To fairly validate the proposed framework, comprehensive comparison experiments are conducted. Siamese structure networks similar to our approach i.e., SiamFC [18], SiamRPN [24], SiamRPN++ [42], and other current best performing localization networks such as CFNet [43] and Global Tracker [44]. Here, we record the precision and success rate of all the matching networks, and the details of the different comparison results in different networks are given below. From Figure 11, it is obvious that the M-O SiamRPN with multi-optimization achieved significantly higher precision and success rate than those of other frameworks improving 0.016 and 0.019 over the previous best result of SiamRPN++. The results indicate that the multi-order features proposed in this work enhance the texture representation as well as the weight adaptive multiple optimization being able to reduce the influence of non-equilibrium information.

**Figure 11.** (**a**) Precision and (**b**) success plots of M-O SiamRPN with multi-optimization and state-ofthe-art frameworks. The mean precision and AUC scores are reported for each framework.

The performance improvement of the network is usually accompanied by an increase in time complexity. However, for a UAV visual localization task with a high real-time requirement, the framework must complete inference quickly to guarantee localization timeliness. Here, we compare the FPS and CLE of different localization frameworks to verify the processing speed and robustness, and the results are shown in Table 1. Since the proposed spatial continuity criterion is effective to select a small amount of significant first-order features, the processing time of M-O SiamRPN with multi-optimization is not remarkably increased even with the additional injection of second-order information. Compared to the structurally similar SiamRPN, our method is slightly slower, which is attributed to the replacement of the first-order feature extraction network and the generation of second-order information. In the proposed framework, 35 frames are processed per second, which is faster than the SiamRPN++ with the same backbone ResNet50.


**Table 1.** The FPS and CLE of different localization frameworks.
