1. Introduction
In recent years, unmanned aerial vehicles (UAVs) have been utilized in various fields, such as transportation, construction management, urban planning, agriculture, and disaster response, owing to their relatively low acquisition costs and ability to target diverse regions [
1,
2,
3,
4,
5]. To maximize the utility of UAV images, the localization of each piece of information is imperative, and precise calculation of the UAV’s positional data is required for accurate localization [
3,
6,
7]. Photogrammetric processes such as bundle adjustment are essential for accurately determining the location and orientation of UAV image acquisition [
8].
Selecting match pairs from overlapping UAV images to form image sets for bundle adjustment is a critical step that affects the accuracy and efficiency of UAV image localization. Localization methods such as structure from motion (SfM) and visual-SLAM, which are based on bundle adjustment, derive positional information from overlapping images [
9,
10]. The composition of an image set with a high proportion of overlaps significantly affects the precision of the bundle adjustment. Hence, identifying and composing overlapping images into image sets is vital to ensuring the accuracy of UAV localization. Moreover, the method used for determining image sets also affects the efficiency of image localization because selecting overlapping images can pose computational challenges [
11,
12,
13,
14].
1.1. Previous Studies
The methods for selecting overlapping UAV images can be broadly categorized into two types: prior knowledge-based methods and visual similarity-based methods [
12]. Prior knowledge-based methods utilize data from the positioning and orientation system (POS), which comprises GNSS and IMU devices mounted on a UAV, to select overlapping images. By contrast, visual similarity-based methods estimate image similarity based on feature information to select overlapping images. A detailed explanation of each methodology is provided below.
Prior knowledge-based methods select overlapping images based on the observation that images with close POS locations typically exhibit a higher spatial overlap. This approach, widely used to determine overlapping UAV image sets, offers simplicity and high computational efficiency [
15]. For example, Liang et al. (2021) utilized the POS and elevation data to select overlapping images based on their footprints [
16]. However, it is challenging to adapt prior-knowledge-based methods when the GNSS positional accuracy is insufficient or when GNSS information is unavailable. Consequently, many researchers have shifted their focus towards visual-similarity-based overlapping image selection.
Visual similarity-based methods estimate the similarity between UAV images based on the features extracted from the images and select the overlapping images. Therefore, they offer the advantage of constructing overlapping image sets without relying on prior knowledge, which makes them applicable to various data acquisition environments. These methods can be further subdivided into number of matches (NoM) approaches, which utilize the count of matched keypoints, and clustering-based image-retrieval methods [
12].
NoM methods evaluate the similarity based on the number of matched keypoints, with a higher count indicating greater similarity. This approach is based on the principle that images with a high degree of overlap will share numerous common features, resulting in a larger number of matched keypoints between them. The methodology involves conducting feature matching across all images using algorithms such as scale-invariant feature transform (SIFT) or speeded-up robust features (SURF). Images with a substantial number of matches are identified as highly overlapping. NoM-based methodologies have been utilized in various studies because of their relatively straightforward application and expectation of highly accurate overlapping image results. Verykokou and Ioannidis (2016, 2018) proposed a methodology for enhancing the efficiency of NoM-based methods by conducting stepwise feature matching through down-sampling and original image processing to select overlapping images [
17,
18]. Wu (2013) suggested sorting the features of each image by size, performing matching only for the top-scale features, and considering an image set with matches above a certain threshold as overlapping [
19]. Furthermore, he applied the proposed methodology as an overlapping image-set determination method for the incremental Structure from Motion (SfM) software VisualSfM (version 0.5.26) [
20].
Image retrieval is a process involving extracting features from images and clustering them into high-dimensional vectors, which are then utilized to identify overlapping images. The vocabulary tree is a method that converts features into bag of visual words (BoVW) vectors and is widely used in selecting image sets for localization in SfM software, such as ColMap, AliceVision, and Pix4Dmapper [
21,
22,
23]. BoVW is an approach that represents an image as a histogram of visual words, which are quantized representations of local image features. This method facilitates efficient image comparison and retrieval by transforming complex image data into a more manageable form. The efficiency and flexibility of the vocabulary tree have been leveraged in research to achieve efficient overlapping image set selection in oblique aerial large-scale datasets [
9,
21,
22,
23,
24,
25]. Kato et al. (2017a, b, 2022) improved the method of evaluating image similarity by combining set-based similarity calculation methods with BoVW-based vector similarity calculation methods to enhance the accuracy of overlapping image set selection from large-scale datasets collected from the web [
26,
27]. Havlena et al. (2010, 2013) enhanced the efficacy of 3D reconstruction by reducing the number of overlapping images set through BoVW-based image retrieval when dealing with extremely large datasets containing a multitude of duplicate or similar viewpoints [
28,
29]. When searching for overlapping images, fixing the number of images to be searched can influence the completeness of subsequent processes [
30,
31]. To address this issue, Jiang et al. (2020) proposed an adaptive threshold selection method that dynamically determines the number of images to be searched based on similarity scores.
1.2. Research Objectives
Research has been conducted to improve the efficiency and accuracy of visual similarity-based methods. However, certain limitations remain. The NoM methods achieve high accuracy by matching features extracted directly from each image to determine similarity. However, despite strategies such as scaling or constraints, the execution time increases exponentially with the number of images, thus limiting the efficiency [
12]. By contrast, BoVW vector-based image retrieval methods demonstrate high efficiency and apply to large datasets; however, their performance varies depending on the selection and representation of visual words [
32]. Clustering image features can lead to significant information loss, potentially resulting in the omission of detailed texture information and reduced accuracy, particularly in scenarios involving repetitive patterns.
This study aimed to develop an enhanced visual similarity-based method for determining overlapping image sets and improving the efficiency of image localization. The approach synergistically combines the BoVW and NoM techniques to leverage their complementary strengths. Initially, a BoVW vector-based image search rapidly extracts candidate image sets, significantly reducing the computational load. Subsequently, NoM-based similarity is applied exclusively to these candidates to derive the final image set, achieving high accuracy without relying on additional information such as GNSS data. This hybrid methodology effectively balances precision and computational efficiency, overcoming the limitations inherent in utilizing either method independently. Consequently, the proposed approach demonstrates superior accuracy compared to BoVW alone while simultaneously achieving improved computational efficiency relative to the exclusive use of NoM.
The remainder of this paper is organized as follows.
Section 2 provides a comprehensive explanation of each step and the evaluation method used in the proposed methodology. In
Section 3, the proposed methodology and conventional NoM- and BoVW-based approaches are applied to UAV image datasets for overlapping image-set determination, and the outcomes are evaluated using actual overlapping images. Subsequently, bundle adjustment was performed on the image sets derived from each method to assess the impact of the proposed methodology on the accuracy of estimating the image positional information.
Section 4 presents the conclusions of this study.
2. Proposed Methodology
2.1. Test Sites and Datasets
We validated the performance of the proposed overlapping image-set determination method using two datasets captured under different conditions. Two target images were selected from each dataset to evaluate whether the proposed method effectively selected the overlapping images. For each target image, the actual overlapping images and their corresponding overlap ratios were measured to assess the efficacy of the method for constructing image sets from highly overlapping images.
The proposed overlapping image-set determination method was evaluated and compared with conventional NoM- and BoVW-based approaches using two UAV image datasets captured under varying conditions. These two datasets were obtained using different camera models and flight altitudes, resulting in different resolutions (
Table 1). Dataset 1 is comprised of 386 images captured at an altitude of approximately 126 m over an urban area, with each image having a resolution of 5472 × 3080 pixels. Dataset 2 consists of 394 images captured at an altitude of approximately 178 m over a construction site, with each image having a resolution of 8192 × 5460 pixels. An overview of both datasets is presented in
Figure 1. Dataset 1 was acquired through a linear flight predominantly along arterial roads in an urban area, resulting in insufficient overlap between images in certain sections along the flight path. The target areas included parking lots, vegetation, roads, and buildings of varying heights. In contrast, Dataset 2 was collected in a rural area with a grid flight pattern to survey construction sites, covering several roads, construction sites, and bare land. Dataset 2 exhibits repetitive characteristics due to the nature of the surveyed areas.
In each dataset, two images with distinct shooting paths and visual characteristics were selected as targets to assess the image-set results. As shown in
Figure 1, each target was chosen from flight paths with varying geometric structures.
Figure 2 and
Figure 3 display individual target images. Multiple images were overlapped with varying ratios for each target, as shown in
Figure 4. Constructing image sets comprising highly overlapping target images can enhance localization accuracy. Therefore, the performance of the proposed method was evaluated by comparing the actual overlapping images and their overlap ratios for each target and assessing its ability to select overlapping images compared with conventional methods.
As the targets were captured from different flight paths, the number and proportion of overlapping images varied for each target.
Table 2 presents the dataset to which each target belongs, along with the maximum and minimum overlap area ratios. The first target was captured at the point where two straight paths intersected in Dataset 1, which included roads and parking lots. There were 34 overlapping images, with overlap ratios ranging from 90% to 8%. The second target was captured along a single path in Dataset 1, depicting roads and vegetation with 18 overlapping images and overlap ratios ranging from 90% to 8%. Target 3 was captured at a point where multiple straight paths intersected in Dataset 2, which featured a construction site. There were 53 overlapping images, with overlap ratios ranging from 80% to 5%. Finally, Target 4 was captured at a bend in the path on the outskirts of Dataset 2, showing bare land. There were 31 overlapping images, with overlap ratios varying from 90% to 1%.
2.2. Hybrid BoVW-NoM Technique
This study proposes an image similarity discrimination technique that sequentially applies BoVW- and NoM-based methodologies to achieve an accurate and efficient determination of overlapping images. Specifically, the proposed method integrates the relatively fast processing speed of BoVW with the higher accuracy of NoM, ensuring both speed and accuracy in the image similarity assessment. The proposed method comprises three main stages: (i) image preprocessing, (ii) identification of image set candidates, and (iii) determination of the overlapping image set (
Figure 5). First, in the image preprocessing stage, down-sampling was applied to ensure efficiency throughout the entire process. Subsequently, in the candidate selection stage, the BoVW-based rapid exploration of image similarity was utilized to select the image set candidates. Finally, similarities between the selected candidate images and the target were assessed using NoM to determine the overlapping image set.
In the preprocessing stage, down-sampling was applied to the input images to reduce their resolution, thereby enhancing the efficiency of the overall image-set selection. UAV images typically have high resolution, resulting in a large number of extracted features. This, in turn, increases the computational workload required for constructing the BoVW codebook and performing feature matching. Moreover, an excessive number of features may lead to redundancy among the features included in the codebook, thereby adversely affecting the efficiency of the similarity search [
32]. Therefore, in this study, we aimed to improve the efficiency of image-set determination by decreasing the resolution of the original UAV images. However, excessively low resolutions may result in insufficient features for a similarity search, necessitating judicious determination of the down-sampling level.
In this study, the optimal down-sampling level for the dataset was determined to be 25%, based on its impact on processing time and the quality of the overlapping image set results. To identify this optimal level, we conducted preliminary experiments comparing the computational efficiency and accuracy of the proposed overlapping image set selection method using the original images and those down-sampled to 50%, 25%, and 10% of their original dimensions. As shown in
Table 3, the total execution time decreased significantly as the size of the images decreased. However, the number of extracted features and image similarity scores decreased significantly. Therefore, identifying accurate overlapping images at a 10% image scale proved to be challenging. Consequently, a down-sampling ratio of 25% was chosen to maintain accuracy while optimizing execution time.
In the second stage of the proposed method, feature extraction was conducted on the down-sampled drone images. SURF, a widely utilized method in BoVW-based image retrieval demonstrated in numerous previous studies [
33,
34,
35,
36], was employed for extracting image features. SURF efficiently and reliably represents the local features of images, making it widely adopted for BoVW-based image retrieval. Specifically, SURF was implemented with the following parameters: a Hessian threshold of 1000, 3 octaves, and 4 scale levels per octave. As depicted in
Figure 6, the features extracted by SURF were utilized in both BoVW codebook generation and NoM feature matching to enhance efficiency.
In the third stage of the proposed methodology, image similarity is computed based on the BoVW vector. To calculate image similarity, descriptors of features extracted from the images were initially clustered to form a codebook using clustered centroids. Subsequently, all the images in the dataset were represented as word vectors using the generated codebook. During a similarity search, the image similarity is computed as the cosine similarity between all images in the codebook and the target image. This value ranges from 0 to 1, with higher values indicating greater similarity.
In the fourth stage, the sizes of the candidate and image sets are determined based on the characteristics of each target image. This approach addresses the variability in the number of overlapping images depending on the capture path of the target image. To achieve this, we utilized the adaptive threshold determination method proposed by Jiang and Jiang (2020) [
31]. This method leverages the fact that overlapping images have higher similarity scores than non-overlapping images and calculates similarity thresholds using statistical estimates of the similarity scores. To improve the accuracy of the image set, we deliberately selected a greater number of candidate images than the final set size. This approach ensures the inclusion of clearly overlapping images, which is essential for precise image location estimation [
28]. The expanded candidate pool compensates for potential limitations in the BoVW codebook generation process, where visually similar but non-overlapping images might receive high similarity scores.
Figure 7a,b illustrates the application of this threshold determination method to Targets 1 and 2, respectively. The thresholds for both candidate and image set sizes were dynamically determined based on the characteristics of the targets and the distribution of similarity scores. Once the candidate size was established, images were selected based on their similarity scores and sorted in descending order. This approach effectively balances the need for comprehensive image sets while avoiding the inclusion of non-overlapping images, thereby optimizing the subsequent image localization process.
In the final stage of the proposed method, the candidate images were matched with the target image based on NoM to determine the overlapping image set. This method utilizes features directly extracted from images for matching, assuming that images with many matched features exhibit high similarity and overlap. To enhance efficiency, the features extracted in the previous stage were reused for matching. The image set was selected from the candidates based on a predetermined number of images, prioritizing images with a high number of matched features.
Figure 8a,b depicts the candidate images based on the BoVW codebook for Target 4 and the final selected image set using NoM. Images with yellow borders represent images overlapping with the target image, whereas those with red borders do not overlap. Although candidate selection based solely on a BoVW-based image search may include many non-overlapping images, the results of feature matching effectively exclude non-overlapping images. Consequently, even overlapping images with low similarity during candidate exploration can be correctly matched through the feature-matching process and selected as results.
2.3. Evaluation Method
In this study, the performance of the proposed methodology was validated by comparing and analyzing the results of the BoVW- and NoM-based approaches with those of the proposed methodology. The performance was evaluated based on the results of deriving overlapping image sets and performing bundle adjustments using the derived overlapping image sets. The evaluation was conducted using selected target images as references. Specifically, the similarity was assessed between the four selected target images and the entire corresponding dataset. Based on this assessment, bundle adjustment was performed for the four resulting overlapping image sets to evaluate the performance of the methodologies. The final image set size determined by the thresholds of the proposed methodology was applied to the BoVW and NoM methods to ensure that the final overlapping image set of each method maintained the same number of images. The specific details of each evaluation item are presented in
Table 4.
The performance of the proposed method was evaluated in comparison with NoM- and BoVW-based approaches, focusing on both accuracy and computational efficiency. Methodologies that exhibit faster processing speeds while maintaining accuracy are considered superior in terms of performance. To quantify the accuracy of the overlapping image set results, three key metrics were employed.
- 1.
Precision of Overlap:
This metric measures the ratio of images in the set that genuinely overlap with the target image. True negatives are not considered in this calculation, as precision is prioritized over recall in image localization accuracy [
28].
- 2.
Average Overlap Area Ratio:
This metric evaluates whether an image set consists of images with relatively high overlap, calculated as the mean ratio of overlap between the target image and the images in the set. Unlike precision, which assesses the presence of overlap, the average overlap area ratio measures the degree of overlap. An image set composed of images with a higher overlap tends to form more robust image networks, which is advantageous for precise bundle adjustment.
- 3.
Correlation between Similarity and Overlap:
This metric assesses the relationship between the similarity scores of images in the set and their actual overlap with the target image. A strong correlation indicates that the similarity scores accurately reflect the overlap ratio, enabling more effective thresholding and image selection. Consequently, a high correlation enables fine-tuning of thresholds to prevent the exclusion of images with high actual overlap but unexpectedly low similarity scores from the image set.
The evaluation of bundle adjustment results using an image set involves assessing the precision of exterior orientation parameters (EOP) and root mean square (RMS) residuals for the image coordinates. Bundle adjustment was performed individually for each target image and the corresponding set. Because it was assumed that some GNSS information might be missing, only half of the available GNSS information from the images was used. The number and arrangement of ground control points (GCPs) vary depending on the image set included. Therefore, GCPs were not utilized to evaluate the impact of the image set obtained from the proposed methodology on bundle adjustment. Furthermore, images that could potentially compromise the stability or accuracy of bundle adjustment owing to localization failures were also excluded. The superior image set result was determined by minimizing the deviations of the computed EOP and reprojection errors of the tie points in the bundle adjustment results.
3. Results
In this study, we conducted a comparative analysis between the proposed overlapping image-set determination methodology and NoM- and BoVW-based approaches using four target images selected from two datasets. The evaluation of the results of the image set from each method was based on accuracy and processing time. In addition, we compared and analyzed the accuracy of the bundle adjustment results using an image set. Specifically, we examined the precision, average overlap area ratio, and the correlation between similarity and overlap from the perspective of overlapping image set selection accuracy, followed by a comparison of processing times. In addition, we investigated the precision of the bundle adjustment results and the reprojection error to further evaluate the performance of the methods.
The accuracy of the overlapping image set selection, evaluated based on precision, average overlap area ratio, and the correlation between similarity and overlap, appeared to be relatively higher in the proposed methodology and the NoM-based approach than in the BoVW-based approach (
Table 5). Specifically, both the proposed methodology and the NoM-based approach exhibited an average precision of 96%, whereas the BoVW-based approach showed a lower precision of 62%. In particular, for Targets 3 and 4, the proposed methodology and NoM-based approach demonstrated 100% precision each, whereas the BoVW-based approach exhibited precisions of 60% and 50%, respectively. This indicates that the image sets selected from the proposed methodology and NoM for Targets 3 and 4 were correctly composed of genuinely overlapping images, whereas 40% and 50% of the image sets selected from BoVW consisted of incorrectly selected non-overlapping images. This discrepancy may be attributed to the lack of diversity in the BoVW codebook, especially in datasets like Dataset 2, where Targets 3 and 4 were chosen. This is because Dataset 2 consists of uniform textures, leading to a lack of diversity in the BoVW codebook and resulting in a relatively lower accuracy.
The average overlap area ratio was the highest in the following order: the NoM-based method, proposed method, and BoVW-based method, with average values of 0.57, 0.55, and 0.40, respectively (
Table 6).
Figure 9,
Figure 10 and
Figure 11 illustrate the actual overlapping areas of the image set selected for Target 2. In these figures, non-overlapping images are delineated with a red border, whereas overlapping images are demarcated with a yellow border, indicating their respective overlapping areas. Both the proposed method (
Figure 9) and the NoM-based approach (
Figure 10) yielded an average overlap area ratio of 0.53, suggesting that more than half of the images constituting the image set overlapped by more than 50% with Target 2. By contrast, the BoVW-based approach (
Figure 11) demonstrated an average overlap area ratio of 0.40, signifying a relatively low overlap ratio between the images forming an image set. Consequently, it can be inferred that the proposed method and NoM-based approach generate an image set with a significantly higher overlap ratio than the BoVW-based approach.
Finally, the correlation between similarity and overlap was the highest in the following order: the proposed methodology, NoM-based approach, and BoVW-based approach. The proposed methodology and the NoM-based approach exhibited average correlations of 0.88 and 0.84, respectively, while BoVW showed a relatively lower correlation of 0.53 (
Table 7). In particular, for Target 4, the correlation of the proposed methodology was very high at 0.97, whereas that of the BoVW-based approach was low at 0.21. Conversely, the BoVW-based approach exhibited a higher correlation than the other methods for Target 1; however, the differences in the correlation values among the methodologies were not significant. Overall, considering the precision, average overlap area ratio, and the correlation between similarity and overlap, both the proposed methodology and the NoM-based approach consistently outperformed the BoVW-based approach (
Figure 12). Therefore, in terms of accuracy, the proposed methodology and the NoM-based approach appear to be relatively superior.
In terms of processing speed, the BoVW-based approach, proposed methodology, and NoM-based approach demonstrated efficient performance in that order. The processing speed was measured as the average time required to determine the overlapping image set for each target image in the dataset, with Dataset 1 including Targets 1 and 2, and Dataset 2 including Targets 3 and 4. In Dataset 1, the NoM-based approach required approximately ten times more processing time than the BoVW-based approach (1.51 s) and the proposed methodology (1.72 s), with an average of 15.76 s per image (
Table 8,
Figure 13). Conversely, in Dataset 2, which had a relatively higher resolution, all three methods exhibited increased average processing times compared to Dataset 1. The BoVW-based approach and the proposed methodology demonstrated average processing times of 4.13 s and 7.78 s per image, respectively. This represents a reduction in the processing time of 0.06 and 0.11 times compared to the NoM-based approach’s average processing time of 73.40 s. Overall, the processing times of the BoVW-based approach and proposed methodology were significantly lower than those of the NoM-based approach, indicating superior performance in terms of processing speed.
The bundle adjustment results for the overlapping image sets indicate that the proposed method and NoM-based approach demonstrate a relatively higher accuracy than the BoVW-based approach.
Table 9 presents the bundle adjustment accuracy results for the overlapping image sets for each method. As the size of the image sets was standardized, the processing time for bundle adjustment was consistent across all methods. The standard deviation of the EOP, representing precision, ranged from a maximum of 0.200 m to a minimum of 0.053 m for the proposed method and from a maximum of 0.217 m to a minimum of 0.044 m for the NoM-based approach, indicating comparable levels. This was higher than that of the BoVW-based approach, which ranged from a maximum of 0.693 m to a minimum of 0.153 m. Similarly, the RMS residual was approximately 0.750 on average for the proposed method and 0.830 for the NoM-based approach, which was higher than that of the BoVW-based approach, which showed an average of 1.558.
During the bundle adjustment process, some nonoverlapping images included in the image set were automatically excluded. Generally, a greater number of images were successfully utilized in bundle adjustment with the proposed methodology and the NoM-based approach than with the BoVW-based approach. However, in the case of Target 2, the BoVW approach yielded 18 successful images, which was more than the 16 images obtained using the proposed methodology and the NoM-based approach. This suggests the formation of two separate bundles rather than a single coherent network, indicating potential dataset fragmentation. When bundles are formed with less robust networks, fragmentation often leads to decreased accuracy. Overall, the accuracy of bundle adjustment for the image sets selected using the proposed methodology and the NoM-based approach appeared to be relatively higher. Consequently, it is inferred that the proposed overlapping image-set determination methods and the NoM-based method contribute relatively more to improving the localization accuracy compared to the BoVW-based methods.
4. Conclusions
In this study, we propose an enhanced method for determining overlapping image sets to improve the accuracy and efficiency of image localization. Our approach integrates the NoM-based approach, which is known for its relatively high accuracy but low efficiency, with the BoVW-based method, which exhibits a relatively faster processing speed but lower accuracy. This method sequentially applies image-similarity searches based on BoVW and NoM to construct overlapping image sets. The method involved the following steps. First, we applied down-sampling to the input images to improve the efficiency of the overall process. We then extracted feature points using the SURF algorithm and conducted a BoVW-based image search using these feature points. Based on the BoVW-based image similarity, we determined the size of the candidate and image sets, and selected candidates accordingly. Finally, we determined the image sets by evaluating the similarity of the images based on the NoMs for the selected candidates.
The proposed method demonstrated improved performance compared with both NoM- and BoVW-based approaches. A comparative analysis of the four target images revealed that the proposed method demonstrated a higher accuracy than the BoVW-based approach, while also achieving faster processing speeds than the NoM-based approach. The average precision of the proposed method remained consistent with that of the NoM-based approach at 96%, surpassing that of the BoVW-based approach by 62%. Additionally, metrics such as the overlap area ratio and the correlation between similarity and overlap showed higher values for the proposed method, averaging 0.55 and 0.88, respectively, compared to 0.40 and 0.53 for the BoVW-based approach. Furthermore, the processing time of the proposed method was approximately 0.11 times shorter than that of the NoM-based approach, achieving a similar level of accuracy at a faster pace. Moreover, in the bundle-adjustment results using the derived image set, the proposed method exhibited higher precision and RMS-residual values than the BoVW-based approach, indicating its potential for enhanced accuracy in estimating image positions.
Overall, the proposed method for determining overlapping image sets is expected to enhance the accuracy and efficiency of image localization, particularly for UAV image datasets that lack prior location information, such as GNSS data. In future research, we will focus on evaluating the proposed method across diverse environmental conditions and seasons to assess its robustness in various scenarios. We aim to incorporate deep learning-based feature extraction techniques to improve performance under challenging conditions. Ultimately, by developing high-efficiency and high-precision image-set determination methodologies, we seek to further enhance the accuracy and efficiency of localization, thereby increasing the applicability of this study.