**2. Materials and Methods**

### *2.1. Pre-Processing: Stretched Wallis Shadow Compensation Method*

Due to the light angle, terrain undulation, and building blockage, light is blocked in some regions of the UAV aerial image. The resulting shadows cover parts of the image, so the associated pseudo-information is unavoidable [30]. Recovery of aerial images by shadow recovery methods to remove shadow information is capable to improve the subsequent matching accuracy.

Here, a stretched Wallis shadow compensation method is proposed. The pixel contrastbased stretching factor is introduced in the Wallis filtering method, aiming for adaptive compensation for different degrees of shadow occlusions, to improve the contrast within the shadow region. Wallis is based on the property that the mean and variance of different regions in the whole image (without shadows) are essentially constant. Using the mean and variance of the image, the filter coefficients are constructed to achieve the shaded areas being restored to the normal lighting situation [31]. With the shaded regions being restored, the difference between these two parts is reduced and the brightness of the shaded regions is increased. The Wallis is described as:

$$\begin{cases} \, \, ^{I^t}(\mathbf{x}, \mathbf{y}) = \alpha I(\mathbf{x}, \mathbf{y}) + \beta\\ \, \, \, \, \mathbf{a} = \frac{\frac{\sigma^t}{\sigma^t + \frac{\sigma^t}{\sigma^2}}}{\frac{\sigma^t}{\sigma^2} + (1 - b - a)m} \end{cases} \tag{1}$$

where *I* denotes the pixel value of the original image at (*<sup>x</sup>*, *y*), *I t* is the target pixel value, *a*, *b* are hyperparameters, and the *σ*, *m*, *σt* , *m<sup>t</sup>* are variance in the neighborhood, pixel mean in the neighborhood, target variance, and target pixel mean, respectively. Different *α* are obtained via adjusting *a*, which is used to modify the variance of the shaded areas relative to the whole image, contributing to the increase in contrast in the shaded areas. Similarly, by changing the parameter *β* through *b*, the mean value of the shaded regions is increased and the brightness is improved. The result of shading recovery is shown in Figure 1. It can be observed that the recovery method based on the overall image does not address the effects of different lighting in the shadow regions, suffering from inadequate compensation.

**Figure 1.** Shadow compensation results using Stretched Wallis.

Considering the diversity of shadow intensities in different regions, a pixel contrastbased stretching factor is proposed and integrated to *α*. The pixel value *I* is calculated as:

$$I(\mathbf{x}, y) = \mathcal{R} \cdot V(\mathbf{x}, y) \tag{2}$$

where *R* is the reflectance and *V* is the light intensity *V* at (*<sup>x</sup>*, *y*), which is determined by the direct light intensity *Vd* and environmental light intensity *Ve*:

$$V = V^d + V^c \tag{3}$$

Given the part of the obscured image, the shadow regions can be approximated as if only the environmental light intensity is available. Thus, the mean pixel value *I* can be written separately according to the shaded and unshaded areas as:

$$I^{\text{shadow} - frac} = R \cdot (V^d + V^\epsilon) = \frac{\sum\_{i=1}^{\mathcal{U}} I^{s - f}\_i(x, y)}{n} \tag{4}$$

$$I^{\text{shadow}} = R \cdot V^{\varepsilon} = \frac{\sum\_{i=1}^{n} I\_i(x, y)}{n} \tag{5}$$

Therefore, the proportional relationship between environmental light intensity and direct light intensity in an aerial image can therefore be defined as:

$$r = \frac{I^{\text{shadow} - free} - I^{\text{shadow}}}{I^{\text{shadow}}} = \frac{1}{3} (\sum\_{R, G, B} \frac{\sum\_{j=1}^{m} I^{s-f} - \sum\_{i=1}^{n} I}{\sum\_{i=1}^{n} I}) \tag{6}$$

It can be seen that a larger *r* means that the contrast between the shadow-free region and shaded region is larger and more information needs to be recovered. Most of the *r* values in the shaded regions are in the range of 2 to 6. To ensure adequate compensation for strong shading and smoothness of stretching, the stretching factor *S* is defined as:

$$S = \log(1 + e^r) \tag{7}$$

Then, with the addition of the stretching factor *S*, *α* can be expressed as:

$$a = \frac{\sigma^t}{\sigma^t + \frac{\sigma}{\left(Sa\right)^2}} = \frac{\sigma^t}{\sigma^t + \frac{\sigma}{\left(\log\left(1 + \epsilon^r\right)a\right)^2}}\tag{8}$$

The region of width K around the shadow detected by reference [32] is represented as the non-shaded region associated with this shaded region, which is obtained by morphological expansion. The mean and variance of the shaded and non-shaded regions are calculated according to the above regions to solve for the parameters *α* and *β* of the stretched Wallis shading compensation. The recovery of the shaded area is shown in Figure 1. Obviously, the overall brightness and contrast of the shadows are significantly enhanced by the stretched Wallis compensation method. In addition, the pixel-based contrast stretching factor compensation is more adaptive and the contrast enhancement effect is more targeted.

### *2.2. M-O SiamRPN with Weight Adaptive Joint Multiple Intersection over Union for Visual Localization*

Visual localization is accomplished by comparing real-time aerial imagery with prestored satellite images, matching the most similar region in the satellite image to the target, and marking the coordinate position. Considering the non-ideal effects of small target size, blurred edges, and non-balanced information in aerial images, we propose an M-O SiamRPN and a weight adaptive joint multiple intersection over union loss function. The former is designed to address the localization task as a classification and detection problem in satellite maps using real-time aerial images as a template through a SiamRPN backbone, where multi-order with multi-resolution features is used to improve the expression ability and sensitivity to edge textures. The latter balances the effectiveness of a large number of negative samples and a small number of positive samples by simultaneously constraining the anchor box with weight adaptive scale and multiple intersection over union.
