Overlapping Image-Set Determination Method Based on Hybrid BoVW-NoM Approach for UAV Image Localization

Lee, Juyeon; Choi, Kanghyeok

doi:10.3390/app14135839

Open AccessArticle

Overlapping Image-Set Determination Method Based on Hybrid BoVW-NoM Approach for UAV Image Localization

by

Juyeon Lee

and

Kanghyeok Choi

^*

Department of Geoinformatic Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon 22212, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5839; https://doi.org/10.3390/app14135839

Submission received: 7 May 2024 / Revised: 26 June 2024 / Accepted: 28 June 2024 / Published: 4 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

With the increasing use of unmanned aerial vehicles (UAVs) in various fields, achieving the precise localization of UAV images is crucial for enhancing their utility. Photogrammetry-based techniques, particularly bundle adjustment, serve as foundational methods for accurately determining the spatial coordinates of UAV images. The effectiveness of bundle adjustment is significantly influenced by the selection of input data, particularly the composition of overlapping image sets. The selection process of overlapping images significantly impacts both the accuracy of spatial coordinate determination and the computational efficiency of UAV image localization. Therefore, a strategic approach to this selection is crucial for optimizing the performance of bundle adjustment in UAV image processing. In this context, we propose an efficient methodology for determining overlapping image sets. The proposed method selects overlapping images based on image similarity, leveraging the complementary strengths of the bag of visual words and number of matches techniques. Essentially, our method achieves both high accuracy and high speed by utilizing a Bag of Visual Words for candidate selection and the number of matches for additional similarity assessment for overlapping image-set determination. We compared the performance of our proposed methodology with the conventional number of matches and bag-of-visual word-based methods for overlapping image-set determination. In the comparative evaluation, the proposed method demonstrated an average precision of 96%, comparable to that of the number of matches-based approach, while surpassing the 62% precision achieved by both bag-of-visual-word methods. Moreover, the processing time decreased by approximately 0.11 times compared with the number of matches-based methods, demonstrating relatively high efficiency. Furthermore, in the bundle adjustment results using image sets, the proposed method, along with the number of matches-based methods, showed reprojection error values of less than 1, indicating relatively high accuracy and contributing to the improvement in accuracy in estimating image positions.

Keywords:

match pair selection; unmanned aerial vehicles (UAVs); bundle adjustment; image retrieval; structure from motion; bag of visual words; image matching

1. Introduction

In recent years, unmanned aerial vehicles (UAVs) have been utilized in various fields, such as transportation, construction management, urban planning, agriculture, and disaster response, owing to their relatively low acquisition costs and ability to target diverse regions [1,2,3,4,5]. To maximize the utility of UAV images, the localization of each piece of information is imperative, and precise calculation of the UAV’s positional data is required for accurate localization [3,6,7]. Photogrammetric processes such as bundle adjustment are essential for accurately determining the location and orientation of UAV image acquisition [8].

Selecting match pairs from overlapping UAV images to form image sets for bundle adjustment is a critical step that affects the accuracy and efficiency of UAV image localization. Localization methods such as structure from motion (SfM) and visual-SLAM, which are based on bundle adjustment, derive positional information from overlapping images [9,10]. The composition of an image set with a high proportion of overlaps significantly affects the precision of the bundle adjustment. Hence, identifying and composing overlapping images into image sets is vital to ensuring the accuracy of UAV localization. Moreover, the method used for determining image sets also affects the efficiency of image localization because selecting overlapping images can pose computational challenges [11,12,13,14].

1.1. Previous Studies

The methods for selecting overlapping UAV images can be broadly categorized into two types: prior knowledge-based methods and visual similarity-based methods [12]. Prior knowledge-based methods utilize data from the positioning and orientation system (POS), which comprises GNSS and IMU devices mounted on a UAV, to select overlapping images. By contrast, visual similarity-based methods estimate image similarity based on feature information to select overlapping images. A detailed explanation of each methodology is provided below.

Prior knowledge-based methods select overlapping images based on the observation that images with close POS locations typically exhibit a higher spatial overlap. This approach, widely used to determine overlapping UAV image sets, offers simplicity and high computational efficiency [15]. For example, Liang et al. (2021) utilized the POS and elevation data to select overlapping images based on their footprints [16]. However, it is challenging to adapt prior-knowledge-based methods when the GNSS positional accuracy is insufficient or when GNSS information is unavailable. Consequently, many researchers have shifted their focus towards visual-similarity-based overlapping image selection.

Visual similarity-based methods estimate the similarity between UAV images based on the features extracted from the images and select the overlapping images. Therefore, they offer the advantage of constructing overlapping image sets without relying on prior knowledge, which makes them applicable to various data acquisition environments. These methods can be further subdivided into number of matches (NoM) approaches, which utilize the count of matched keypoints, and clustering-based image-retrieval methods [12].

NoM methods evaluate the similarity based on the number of matched keypoints, with a higher count indicating greater similarity. This approach is based on the principle that images with a high degree of overlap will share numerous common features, resulting in a larger number of matched keypoints between them. The methodology involves conducting feature matching across all images using algorithms such as scale-invariant feature transform (SIFT) or speeded-up robust features (SURF). Images with a substantial number of matches are identified as highly overlapping. NoM-based methodologies have been utilized in various studies because of their relatively straightforward application and expectation of highly accurate overlapping image results. Verykokou and Ioannidis (2016, 2018) proposed a methodology for enhancing the efficiency of NoM-based methods by conducting stepwise feature matching through down-sampling and original image processing to select overlapping images [17,18]. Wu (2013) suggested sorting the features of each image by size, performing matching only for the top-scale features, and considering an image set with matches above a certain threshold as overlapping [19]. Furthermore, he applied the proposed methodology as an overlapping image-set determination method for the incremental Structure from Motion (SfM) software VisualSfM (version 0.5.26) [20].

Image retrieval is a process involving extracting features from images and clustering them into high-dimensional vectors, which are then utilized to identify overlapping images. The vocabulary tree is a method that converts features into bag of visual words (BoVW) vectors and is widely used in selecting image sets for localization in SfM software, such as ColMap, AliceVision, and Pix4Dmapper [21,22,23]. BoVW is an approach that represents an image as a histogram of visual words, which are quantized representations of local image features. This method facilitates efficient image comparison and retrieval by transforming complex image data into a more manageable form. The efficiency and flexibility of the vocabulary tree have been leveraged in research to achieve efficient overlapping image set selection in oblique aerial large-scale datasets [9,21,22,23,24,25]. Kato et al. (2017a, b, 2022) improved the method of evaluating image similarity by combining set-based similarity calculation methods with BoVW-based vector similarity calculation methods to enhance the accuracy of overlapping image set selection from large-scale datasets collected from the web [26,27]. Havlena et al. (2010, 2013) enhanced the efficacy of 3D reconstruction by reducing the number of overlapping images set through BoVW-based image retrieval when dealing with extremely large datasets containing a multitude of duplicate or similar viewpoints [28,29]. When searching for overlapping images, fixing the number of images to be searched can influence the completeness of subsequent processes [30,31]. To address this issue, Jiang et al. (2020) proposed an adaptive threshold selection method that dynamically determines the number of images to be searched based on similarity scores.

1.2. Research Objectives

Research has been conducted to improve the efficiency and accuracy of visual similarity-based methods. However, certain limitations remain. The NoM methods achieve high accuracy by matching features extracted directly from each image to determine similarity. However, despite strategies such as scaling or constraints, the execution time increases exponentially with the number of images, thus limiting the efficiency [12]. By contrast, BoVW vector-based image retrieval methods demonstrate high efficiency and apply to large datasets; however, their performance varies depending on the selection and representation of visual words [32]. Clustering image features can lead to significant information loss, potentially resulting in the omission of detailed texture information and reduced accuracy, particularly in scenarios involving repetitive patterns.

This study aimed to develop an enhanced visual similarity-based method for determining overlapping image sets and improving the efficiency of image localization. The approach synergistically combines the BoVW and NoM techniques to leverage their complementary strengths. Initially, a BoVW vector-based image search rapidly extracts candidate image sets, significantly reducing the computational load. Subsequently, NoM-based similarity is applied exclusively to these candidates to derive the final image set, achieving high accuracy without relying on additional information such as GNSS data. This hybrid methodology effectively balances precision and computational efficiency, overcoming the limitations inherent in utilizing either method independently. Consequently, the proposed approach demonstrates superior accuracy compared to BoVW alone while simultaneously achieving improved computational efficiency relative to the exclusive use of NoM.

The remainder of this paper is organized as follows. Section 2 provides a comprehensive explanation of each step and the evaluation method used in the proposed methodology. In Section 3, the proposed methodology and conventional NoM- and BoVW-based approaches are applied to UAV image datasets for overlapping image-set determination, and the outcomes are evaluated using actual overlapping images. Subsequently, bundle adjustment was performed on the image sets derived from each method to assess the impact of the proposed methodology on the accuracy of estimating the image positional information. Section 4 presents the conclusions of this study.

2. Proposed Methodology

2.1. Test Sites and Datasets

We validated the performance of the proposed overlapping image-set determination method using two datasets captured under different conditions. Two target images were selected from each dataset to evaluate whether the proposed method effectively selected the overlapping images. For each target image, the actual overlapping images and their corresponding overlap ratios were measured to assess the efficacy of the method for constructing image sets from highly overlapping images.

The proposed overlapping image-set determination method was evaluated and compared with conventional NoM- and BoVW-based approaches using two UAV image datasets captured under varying conditions. These two datasets were obtained using different camera models and flight altitudes, resulting in different resolutions (Table 1). Dataset 1 is comprised of 386 images captured at an altitude of approximately 126 m over an urban area, with each image having a resolution of 5472 × 3080 pixels. Dataset 2 consists of 394 images captured at an altitude of approximately 178 m over a construction site, with each image having a resolution of 8192 × 5460 pixels. An overview of both datasets is presented in Figure 1. Dataset 1 was acquired through a linear flight predominantly along arterial roads in an urban area, resulting in insufficient overlap between images in certain sections along the flight path. The target areas included parking lots, vegetation, roads, and buildings of varying heights. In contrast, Dataset 2 was collected in a rural area with a grid flight pattern to survey construction sites, covering several roads, construction sites, and bare land. Dataset 2 exhibits repetitive characteristics due to the nature of the surveyed areas.

In each dataset, two images with distinct shooting paths and visual characteristics were selected as targets to assess the image-set results. As shown in Figure 1, each target was chosen from flight paths with varying geometric structures. Figure 2 and Figure 3 display individual target images. Multiple images were overlapped with varying ratios for each target, as shown in Figure 4. Constructing image sets comprising highly overlapping target images can enhance localization accuracy. Therefore, the performance of the proposed method was evaluated by comparing the actual overlapping images and their overlap ratios for each target and assessing its ability to select overlapping images compared with conventional methods.

As the targets were captured from different flight paths, the number and proportion of overlapping images varied for each target. Table 2 presents the dataset to which each target belongs, along with the maximum and minimum overlap area ratios. The first target was captured at the point where two straight paths intersected in Dataset 1, which included roads and parking lots. There were 34 overlapping images, with overlap ratios ranging from 90% to 8%. The second target was captured along a single path in Dataset 1, depicting roads and vegetation with 18 overlapping images and overlap ratios ranging from 90% to 8%. Target 3 was captured at a point where multiple straight paths intersected in Dataset 2, which featured a construction site. There were 53 overlapping images, with overlap ratios ranging from 80% to 5%. Finally, Target 4 was captured at a bend in the path on the outskirts of Dataset 2, showing bare land. There were 31 overlapping images, with overlap ratios varying from 90% to 1%.

2.2. Hybrid BoVW-NoM Technique

This study proposes an image similarity discrimination technique that sequentially applies BoVW- and NoM-based methodologies to achieve an accurate and efficient determination of overlapping images. Specifically, the proposed method integrates the relatively fast processing speed of BoVW with the higher accuracy of NoM, ensuring both speed and accuracy in the image similarity assessment. The proposed method comprises three main stages: (i) image preprocessing, (ii) identification of image set candidates, and (iii) determination of the overlapping image set (Figure 5). First, in the image preprocessing stage, down-sampling was applied to ensure efficiency throughout the entire process. Subsequently, in the candidate selection stage, the BoVW-based rapid exploration of image similarity was utilized to select the image set candidates. Finally, similarities between the selected candidate images and the target were assessed using NoM to determine the overlapping image set.

In the preprocessing stage, down-sampling was applied to the input images to reduce their resolution, thereby enhancing the efficiency of the overall image-set selection. UAV images typically have high resolution, resulting in a large number of extracted features. This, in turn, increases the computational workload required for constructing the BoVW codebook and performing feature matching. Moreover, an excessive number of features may lead to redundancy among the features included in the codebook, thereby adversely affecting the efficiency of the similarity search [32]. Therefore, in this study, we aimed to improve the efficiency of image-set determination by decreasing the resolution of the original UAV images. However, excessively low resolutions may result in insufficient features for a similarity search, necessitating judicious determination of the down-sampling level.

In this study, the optimal down-sampling level for the dataset was determined to be 25%, based on its impact on processing time and the quality of the overlapping image set results. To identify this optimal level, we conducted preliminary experiments comparing the computational efficiency and accuracy of the proposed overlapping image set selection method using the original images and those down-sampled to 50%, 25%, and 10% of their original dimensions. As shown in Table 3, the total execution time decreased significantly as the size of the images decreased. However, the number of extracted features and image similarity scores decreased significantly. Therefore, identifying accurate overlapping images at a 10% image scale proved to be challenging. Consequently, a down-sampling ratio of 25% was chosen to maintain accuracy while optimizing execution time.

In the second stage of the proposed method, feature extraction was conducted on the down-sampled drone images. SURF, a widely utilized method in BoVW-based image retrieval demonstrated in numerous previous studies [33,34,35,36], was employed for extracting image features. SURF efficiently and reliably represents the local features of images, making it widely adopted for BoVW-based image retrieval. Specifically, SURF was implemented with the following parameters: a Hessian threshold of 1000, 3 octaves, and 4 scale levels per octave. As depicted in Figure 6, the features extracted by SURF were utilized in both BoVW codebook generation and NoM feature matching to enhance efficiency.

In the third stage of the proposed methodology, image similarity is computed based on the BoVW vector. To calculate image similarity, descriptors of features extracted from the images were initially clustered to form a codebook using clustered centroids. Subsequently, all the images in the dataset were represented as word vectors using the generated codebook. During a similarity search, the image similarity is computed as the cosine similarity between all images in the codebook and the target image. This value ranges from 0 to 1, with higher values indicating greater similarity.

In the fourth stage, the sizes of the candidate and image sets are determined based on the characteristics of each target image. This approach addresses the variability in the number of overlapping images depending on the capture path of the target image. To achieve this, we utilized the adaptive threshold determination method proposed by Jiang and Jiang (2020) [31]. This method leverages the fact that overlapping images have higher similarity scores than non-overlapping images and calculates similarity thresholds using statistical estimates of the similarity scores. To improve the accuracy of the image set, we deliberately selected a greater number of candidate images than the final set size. This approach ensures the inclusion of clearly overlapping images, which is essential for precise image location estimation [28]. The expanded candidate pool compensates for potential limitations in the BoVW codebook generation process, where visually similar but non-overlapping images might receive high similarity scores.

Figure 7a,b illustrates the application of this threshold determination method to Targets 1 and 2, respectively. The thresholds for both candidate and image set sizes were dynamically determined based on the characteristics of the targets and the distribution of similarity scores. Once the candidate size was established, images were selected based on their similarity scores and sorted in descending order. This approach effectively balances the need for comprehensive image sets while avoiding the inclusion of non-overlapping images, thereby optimizing the subsequent image localization process.

In the final stage of the proposed method, the candidate images were matched with the target image based on NoM to determine the overlapping image set. This method utilizes features directly extracted from images for matching, assuming that images with many matched features exhibit high similarity and overlap. To enhance efficiency, the features extracted in the previous stage were reused for matching. The image set was selected from the candidates based on a predetermined number of images, prioritizing images with a high number of matched features. Figure 8a,b depicts the candidate images based on the BoVW codebook for Target 4 and the final selected image set using NoM. Images with yellow borders represent images overlapping with the target image, whereas those with red borders do not overlap. Although candidate selection based solely on a BoVW-based image search may include many non-overlapping images, the results of feature matching effectively exclude non-overlapping images. Consequently, even overlapping images with low similarity during candidate exploration can be correctly matched through the feature-matching process and selected as results.

2.3. Evaluation Method

In this study, the performance of the proposed methodology was validated by comparing and analyzing the results of the BoVW- and NoM-based approaches with those of the proposed methodology. The performance was evaluated based on the results of deriving overlapping image sets and performing bundle adjustments using the derived overlapping image sets. The evaluation was conducted using selected target images as references. Specifically, the similarity was assessed between the four selected target images and the entire corresponding dataset. Based on this assessment, bundle adjustment was performed for the four resulting overlapping image sets to evaluate the performance of the methodologies. The final image set size determined by the thresholds of the proposed methodology was applied to the BoVW and NoM methods to ensure that the final overlapping image set of each method maintained the same number of images. The specific details of each evaluation item are presented in Table 4.

The performance of the proposed method was evaluated in comparison with NoM- and BoVW-based approaches, focusing on both accuracy and computational efficiency. Methodologies that exhibit faster processing speeds while maintaining accuracy are considered superior in terms of performance. To quantify the accuracy of the overlapping image set results, three key metrics were employed.

1.: Precision of Overlap:

This metric measures the ratio of images in the set that genuinely overlap with the target image. True negatives are not considered in this calculation, as precision is prioritized over recall in image localization accuracy [28].

2.: Average Overlap Area Ratio:

This metric evaluates whether an image set consists of images with relatively high overlap, calculated as the mean ratio of overlap between the target image and the images in the set. Unlike precision, which assesses the presence of overlap, the average overlap area ratio measures the degree of overlap. An image set composed of images with a higher overlap tends to form more robust image networks, which is advantageous for precise bundle adjustment.

3.: Correlation between Similarity and Overlap:

This metric assesses the relationship between the similarity scores of images in the set and their actual overlap with the target image. A strong correlation indicates that the similarity scores accurately reflect the overlap ratio, enabling more effective thresholding and image selection. Consequently, a high correlation enables fine-tuning of thresholds to prevent the exclusion of images with high actual overlap but unexpectedly low similarity scores from the image set.

The evaluation of bundle adjustment results using an image set involves assessing the precision of exterior orientation parameters (EOP) and root mean square (RMS) residuals for the image coordinates. Bundle adjustment was performed individually for each target image and the corresponding set. Because it was assumed that some GNSS information might be missing, only half of the available GNSS information from the images was used. The number and arrangement of ground control points (GCPs) vary depending on the image set included. Therefore, GCPs were not utilized to evaluate the impact of the image set obtained from the proposed methodology on bundle adjustment. Furthermore, images that could potentially compromise the stability or accuracy of bundle adjustment owing to localization failures were also excluded. The superior image set result was determined by minimizing the deviations of the computed EOP and reprojection errors of the tie points in the bundle adjustment results.

3. Results

In this study, we conducted a comparative analysis between the proposed overlapping image-set determination methodology and NoM- and BoVW-based approaches using four target images selected from two datasets. The evaluation of the results of the image set from each method was based on accuracy and processing time. In addition, we compared and analyzed the accuracy of the bundle adjustment results using an image set. Specifically, we examined the precision, average overlap area ratio, and the correlation between similarity and overlap from the perspective of overlapping image set selection accuracy, followed by a comparison of processing times. In addition, we investigated the precision of the bundle adjustment results and the reprojection error to further evaluate the performance of the methods.

The accuracy of the overlapping image set selection, evaluated based on precision, average overlap area ratio, and the correlation between similarity and overlap, appeared to be relatively higher in the proposed methodology and the NoM-based approach than in the BoVW-based approach (Table 5). Specifically, both the proposed methodology and the NoM-based approach exhibited an average precision of 96%, whereas the BoVW-based approach showed a lower precision of 62%. In particular, for Targets 3 and 4, the proposed methodology and NoM-based approach demonstrated 100% precision each, whereas the BoVW-based approach exhibited precisions of 60% and 50%, respectively. This indicates that the image sets selected from the proposed methodology and NoM for Targets 3 and 4 were correctly composed of genuinely overlapping images, whereas 40% and 50% of the image sets selected from BoVW consisted of incorrectly selected non-overlapping images. This discrepancy may be attributed to the lack of diversity in the BoVW codebook, especially in datasets like Dataset 2, where Targets 3 and 4 were chosen. This is because Dataset 2 consists of uniform textures, leading to a lack of diversity in the BoVW codebook and resulting in a relatively lower accuracy.

The average overlap area ratio was the highest in the following order: the NoM-based method, proposed method, and BoVW-based method, with average values of 0.57, 0.55, and 0.40, respectively (Table 6). Figure 9, Figure 10 and Figure 11 illustrate the actual overlapping areas of the image set selected for Target 2. In these figures, non-overlapping images are delineated with a red border, whereas overlapping images are demarcated with a yellow border, indicating their respective overlapping areas. Both the proposed method (Figure 9) and the NoM-based approach (Figure 10) yielded an average overlap area ratio of 0.53, suggesting that more than half of the images constituting the image set overlapped by more than 50% with Target 2. By contrast, the BoVW-based approach (Figure 11) demonstrated an average overlap area ratio of 0.40, signifying a relatively low overlap ratio between the images forming an image set. Consequently, it can be inferred that the proposed method and NoM-based approach generate an image set with a significantly higher overlap ratio than the BoVW-based approach.

Finally, the correlation between similarity and overlap was the highest in the following order: the proposed methodology, NoM-based approach, and BoVW-based approach. The proposed methodology and the NoM-based approach exhibited average correlations of 0.88 and 0.84, respectively, while BoVW showed a relatively lower correlation of 0.53 (Table 7). In particular, for Target 4, the correlation of the proposed methodology was very high at 0.97, whereas that of the BoVW-based approach was low at 0.21. Conversely, the BoVW-based approach exhibited a higher correlation than the other methods for Target 1; however, the differences in the correlation values among the methodologies were not significant. Overall, considering the precision, average overlap area ratio, and the correlation between similarity and overlap, both the proposed methodology and the NoM-based approach consistently outperformed the BoVW-based approach (Figure 12). Therefore, in terms of accuracy, the proposed methodology and the NoM-based approach appear to be relatively superior.

In terms of processing speed, the BoVW-based approach, proposed methodology, and NoM-based approach demonstrated efficient performance in that order. The processing speed was measured as the average time required to determine the overlapping image set for each target image in the dataset, with Dataset 1 including Targets 1 and 2, and Dataset 2 including Targets 3 and 4. In Dataset 1, the NoM-based approach required approximately ten times more processing time than the BoVW-based approach (1.51 s) and the proposed methodology (1.72 s), with an average of 15.76 s per image (Table 8, Figure 13). Conversely, in Dataset 2, which had a relatively higher resolution, all three methods exhibited increased average processing times compared to Dataset 1. The BoVW-based approach and the proposed methodology demonstrated average processing times of 4.13 s and 7.78 s per image, respectively. This represents a reduction in the processing time of 0.06 and 0.11 times compared to the NoM-based approach’s average processing time of 73.40 s. Overall, the processing times of the BoVW-based approach and proposed methodology were significantly lower than those of the NoM-based approach, indicating superior performance in terms of processing speed.

The bundle adjustment results for the overlapping image sets indicate that the proposed method and NoM-based approach demonstrate a relatively higher accuracy than the BoVW-based approach. Table 9 presents the bundle adjustment accuracy results for the overlapping image sets for each method. As the size of the image sets was standardized, the processing time for bundle adjustment was consistent across all methods. The standard deviation of the EOP, representing precision, ranged from a maximum of 0.200 m to a minimum of 0.053 m for the proposed method and from a maximum of 0.217 m to a minimum of 0.044 m for the NoM-based approach, indicating comparable levels. This was higher than that of the BoVW-based approach, which ranged from a maximum of 0.693 m to a minimum of 0.153 m. Similarly, the RMS residual was approximately 0.750 on average for the proposed method and 0.830 for the NoM-based approach, which was higher than that of the BoVW-based approach, which showed an average of 1.558.

During the bundle adjustment process, some nonoverlapping images included in the image set were automatically excluded. Generally, a greater number of images were successfully utilized in bundle adjustment with the proposed methodology and the NoM-based approach than with the BoVW-based approach. However, in the case of Target 2, the BoVW approach yielded 18 successful images, which was more than the 16 images obtained using the proposed methodology and the NoM-based approach. This suggests the formation of two separate bundles rather than a single coherent network, indicating potential dataset fragmentation. When bundles are formed with less robust networks, fragmentation often leads to decreased accuracy. Overall, the accuracy of bundle adjustment for the image sets selected using the proposed methodology and the NoM-based approach appeared to be relatively higher. Consequently, it is inferred that the proposed overlapping image-set determination methods and the NoM-based method contribute relatively more to improving the localization accuracy compared to the BoVW-based methods.

4. Conclusions

In this study, we propose an enhanced method for determining overlapping image sets to improve the accuracy and efficiency of image localization. Our approach integrates the NoM-based approach, which is known for its relatively high accuracy but low efficiency, with the BoVW-based method, which exhibits a relatively faster processing speed but lower accuracy. This method sequentially applies image-similarity searches based on BoVW and NoM to construct overlapping image sets. The method involved the following steps. First, we applied down-sampling to the input images to improve the efficiency of the overall process. We then extracted feature points using the SURF algorithm and conducted a BoVW-based image search using these feature points. Based on the BoVW-based image similarity, we determined the size of the candidate and image sets, and selected candidates accordingly. Finally, we determined the image sets by evaluating the similarity of the images based on the NoMs for the selected candidates.

The proposed method demonstrated improved performance compared with both NoM- and BoVW-based approaches. A comparative analysis of the four target images revealed that the proposed method demonstrated a higher accuracy than the BoVW-based approach, while also achieving faster processing speeds than the NoM-based approach. The average precision of the proposed method remained consistent with that of the NoM-based approach at 96%, surpassing that of the BoVW-based approach by 62%. Additionally, metrics such as the overlap area ratio and the correlation between similarity and overlap showed higher values for the proposed method, averaging 0.55 and 0.88, respectively, compared to 0.40 and 0.53 for the BoVW-based approach. Furthermore, the processing time of the proposed method was approximately 0.11 times shorter than that of the NoM-based approach, achieving a similar level of accuracy at a faster pace. Moreover, in the bundle-adjustment results using the derived image set, the proposed method exhibited higher precision and RMS-residual values than the BoVW-based approach, indicating its potential for enhanced accuracy in estimating image positions.

Overall, the proposed method for determining overlapping image sets is expected to enhance the accuracy and efficiency of image localization, particularly for UAV image datasets that lack prior location information, such as GNSS data. In future research, we will focus on evaluating the proposed method across diverse environmental conditions and seasons to assess its robustness in various scenarios. We aim to incorporate deep learning-based feature extraction techniques to improve performance under challenging conditions. Ultimately, by developing high-efficiency and high-precision image-set determination methodologies, we seek to further enhance the accuracy and efficiency of localization, thereby increasing the applicability of this study.

Author Contributions

Conceptualization, J.L. and K.C.; methodology, J.L. and K.C.; software, J.L. and K.C.; validation, J.L. and K.C.; formal analysis, J.L. and K.C.; investigation, J.L. and K.C.; resources, J.L. and K.C.; writing—original draft preparation, J.L.; writing—review and editing, K.C.; visualization, J.L.; supervision, K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00210493).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the following reasons: The datasets are part of an ongoing multi-institutional collaborative study supported by national funding. The image data are jointly owned by the participating institutions, and sharing with external parties is restricted until the project’s conclusion.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, Y.P.; Sithole, L.; Lee, T.T. Structure from motion technique for scene detection using autonomous drone navigation. IEEE Trans. Syst. Man Cybern. Syst. 2017, 49, 2559–2570. [Google Scholar] [CrossRef]
Cucci, D.A.; Rehak, M.; Skaloud, J. Bundle adjustment with raw inertial observations in UAV applications. ISPRS J. Photogramm. Remote Sens. 2017, 130, 1–12. [Google Scholar] [CrossRef]
Liu, P.; Chen, A.Y.; Huang, Y.N.; Han, J.Y.; Lai, J.S.; Kang, S.C.; Tsai, M.H. A review of rotorcraft unmanned aerial vehicle (UAV) developments and applications in civil engineering. Smart Struct. Syst. 2014, 13, 1065–1094. [Google Scholar] [CrossRef]
Gupta, S.K.; Shukla, D.P. Application of drone for landslide mapping, dimension estimation and its 3D reconstruction. J. Indian Soc. Remote Sens. 2018, 46, 903–914. [Google Scholar] [CrossRef]
Budiharto, W.; Irwansyah, E.; Suroso, J.S.; Chowanda, A.; Ngarianto, H.; Gunawan, A.A.S. Mapping and 3D modelling using quadrotor drone and GIS software. J. Big Data 2021, 8, 48. [Google Scholar] [CrossRef]
James, M.R.; Robson, S.; d’Oleire-Oltmanns, S.; Niethammer, U. Optimising UAV topographic surveys processed with structure-from-motion: Ground control quality, quantity and bundle adjustment. Geomorphology 2017, 280, 51–66. [Google Scholar] [CrossRef]
Daftry, S.; Hoppe, C.; Bischof, H. Building with drones: Accurate 3D facade reconstruction using MAVs. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 1–6. [Google Scholar]
Zhang, X.; Xie, Z. Reconstructing 3D Scenes from UAV Images Using a Structure-from-Motion Pipeline. In Proceedings of the 2018 26th International Conference on Geoinformatics, Kunming, China, 28–30 June 2018; pp. 1–6. [Google Scholar]
Liu, J.; Ma, Y.; Jiang, S.; Wang, L.; Li, Q.; Jiang, W. Matchable image retrieval for large-scale UAV images: An evaluation of SfM-based reconstruction. Int. J. Remote Sens. 2024, 45, 692–718. [Google Scholar] [CrossRef]
Lin, W.Y.; Liu, S.; Jiang, N.; Do, M.N.; Tan, P.; Lu, J. Repmatch: Robust feature matching and pose for reconstructing modern cities. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 562–579. [Google Scholar]
Cefalu, A.; Haala, N.; Fritsch, D. Hierarchical structure from motion combining global image orientation and structureless bundle adjustment. The International Archives of the Photogrammetry, Remote Sens. Spat. Inf. Sci. 2017, 42, 535–542. [Google Scholar] [CrossRef]
Jiang, S.; Jiang, C.; Jiang, W. Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM tools. ISPRS J. Photogramm. Remote Sens. 2020, 167, 230–251. [Google Scholar] [CrossRef]
Jiang, S.; Ma, Y.; Liu, J.; Li, Q.; Jiang, W.; Guo, B.; Wang, L. Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 9874–9887. [Google Scholar] [CrossRef]
Hartmann, W.; Havlena, M.; Schindler, K. Recent developments in large-scale tie-point matching. ISPRS J. Photogramm. Remote Sens. 2016, 115, 47–62. [Google Scholar] [CrossRef]
Rupnik, E.; Nex, F.; Remondino, F. Oblique multi-camera systems-orientation and dense matching issues. EuroCOW; 2014. Available online: https://hal.science/hal-02369314/ (accessed on 27 June 2024).
Liang, Y.; Li, D.; Feng, C.; Mao, J.; Wang, Q.; Cui, T. Efficient match pair selection for matching large-scale oblique UAV images using spatial priors. Int. J. Remote Sens. 2021, 42, 8878–8905. [Google Scholar] [CrossRef]
Verykokou, S.; Ioannidis, C. Automatic rough georeferencing of multiview oblique and vertical aerial image datasets of urban scenes. Photogramm. Rec. 2016, 31, 281–303. [Google Scholar] [CrossRef]
Verykokou, S.; Ioannidis, C. A photogrammetry-based structure from motion algorithm using robust iterative bundle adjustment techniques. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 4, 73–80. [Google Scholar] [CrossRef]
Wu, C. Towards linear-time incremental structure from motion. In Proceedings of the 2013 International Conference on 3D Vision-3DV 2013, Seattle, WA, USA, 16 September 2013; pp. 127–134. [Google Scholar]
Wu, C. VisualSFM: A Visual Structure from Motion System. 2011. Available online: https://ccwu.me/vsfm (accessed on 2 February 2023).
Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Griwodz, C.; Gasparini, S.; Calvet, L.; Gurdjos, P.; Castan, F.; Maujean, B.; Lanthony, Y. AliceVision Meshroom: An open-source 3D reconstruction pipeline. In Proceedings of the 12th ACM Multimedia Systems Conference, Istanbul, Turkey, 28 September–1 October 2021; pp. 241–247. [Google Scholar]
Pix4Dmapper. Available online: https://www.pix4d.com (accessed on 7 May 2024).
Nister, D.; Stewenius, H. Scalable recognition with a vocabulary tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2161–2168. [Google Scholar]
Jiang, S.; Jiang, W. Leveraging vocabulary tree for simultaneous match pair selection and guided feature matching of UAV images. ISPRS J. Photogramm. Remote Sens. 2022, 187, 273–293. [Google Scholar] [CrossRef]
Kato, T.; Shimizu, I.; Pajdla, T. Selecting match pairs for SfM by introducing Jaccard Similarity. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 12. [Google Scholar] [CrossRef]
Kato, T.; Shimizu, I.; Pajdla, T. Improving match pair selection for large scale Structure from Motion by introducing modified Simpson coefficient. IEICE Trans. Inf. Syst. 2022, 105, 1590–1599. [Google Scholar] [CrossRef]
Havlena, M.; Torii, A.; Pajdla, T. Efficient structure from motion by graph optimization. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 100–113. [Google Scholar]
Havlena, M.; Hartmann, W.; Schindler, K. Optimal reduction of large image databases for location recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2–8 December 2013; pp. 676–683. [Google Scholar]
Cui, H.; Shen, S.; Gao, W.; Wang, Z. Progressive large-scale structure-from-motion with orthogonal MSTs. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 79–88. [Google Scholar]
Jiang, S.; Jiang, W. Efficient match pair selection for oblique UAV images based on adaptive vocabulary tree. ISPRS J. Photogramm. Remote Sens. 2020, 161, 61–75. [Google Scholar] [CrossRef]
Duan, H.; Peng, Y.; Min, G.; Xiang, X.; Zhan, W.; Zou, H. Distributed in-memory vocabulary tree for real-time retrieval of big data images. Ad Hoc Netw. 2015, 35, 137–148. [Google Scholar] [CrossRef]
Baig, F.; Mehmood, Z.; Rashid, M.; Javid, M.A.; Rehman, A.; Saba, T.; Adnan, A. Boosting the performance of the BoVW model using SURF–CoHOG-based sparse features with relevance feedback for CBIR. Iran. J. Sci. Technol. Trans. Electr. Eng. 2020, 44, 99–118. [Google Scholar] [CrossRef]
Ali, R.; Maheshwari, M. Implementation and Analyzing SURF Feature Detection and Extraction on WANG Images Using Custom Bag of Features Model. In Data, Engineering and Applications; Sharma, S., Peng, S.L., Agrawal, J., Shukla, R.K., Le, D.N., Eds.; Springer: Singapore, 2022; Volume 907, pp. 154–196. [Google Scholar]
Alkhawlani, M.; Elmogy, M.; Elbakry, H. Content-based image retrieval using local features descriptors and bag-of-visual words. Int. J. Adv. Comput. Sci. Appl. 2015, 6, 212–219. [Google Scholar] [CrossRef]
Vimina, E.R.; Jacob, K.P. Feature fusion method using BoVW framework for enhancing image retrieval. IET Image Process. 2019, 13, 1979–1985. [Google Scholar] [CrossRef]

Figure 1. Localization of targets on structure from motion (SfM)-derived orthomosaics from UAV Datasets: (a) Dataset 1 with Targets 1 and 2; (b) Dataset 2 with Targets 3 and 4.

Figure 2. Target image: (a) Target 1; (b) Target 2.

Figure 3. Target image: (a) Target 3; (b) Target 4.

Figure 4. Overlapping images for Target 1.

Figure 5. Workflow of the proposed methodology.

Figure 6. Utilization of extracted features in codebook generation and image matching.

Figure 7. Threshold determination method for candidate and image set: (a) Target 1; (b) Target 3.

Figure 8. Visualization of candidate selection and final image-set determination for Target 4: (a) BoVW-based candidates; (b) final image-set determination using NoM (red borders: images that do not overlap, yellow borders: images that overlap with the target image).

Figure 9. Image set selection results for Target 2 using the proposed methodology (red borders: images that do not overlap, yellow borders: images that overlap with the target image).

Figure 10. Image set selection results for Target 2 using the NoM-based method (red borders: images that do not overlap, yellow borders: images that overlap with the target image).

Figure 11. Image set selection results for Target 2 using the BoVW-based method (red borders: images that do not overlap, yellow borders: images that overlap with the target image).

Figure 12. Comparison among proposed, NoM-based, and BoVW-based methods: (a) Precision; (b) average overlap area ratio; (c) correlation.

Figure 13. Comparison of average processing time for proposed method, NoM, and BoVW-based approaches.

Table 1. Characteristics of the UAV datasets.

Item Name	Dataset 1	Dataset 2
Flight height (m)	126	178
Survey area	Urban, arterial roads	Rural, construction sites
Camera mode	Yuneec E90X, 23 mm	DJI ZenmuseP1
Focal length (mm)	8.3	35.0
Number of images	386	394
Image resolution (pixel)	5472 × 3080	8192 × 5460
Distinctive features	Limited image overlap in certain sections	Repetitive terrain features

Table 2. Overlapping image data for each target.

Item Name	Target 1	Target 2	Target 3	Target 4
Dataset	Dataset 1	Dataset 1	Dataset 2	Dataset 2
Number of overlapping images	34	18	53	31
Maximum Overlap ratio (%)	90	90	80	90
Minimum Overlap ratio (%)	8	8	5	1

Table 3. Efficiency comparison of image scales (seconds).

Image Scale	Original	50%	25%	10%
Down-sampling	0	120	97	91
Codebook generation	4031	1385	562	93
Image-set determination	5546	973	170	70
Total	9578	2478	829	253

Table 4. Performance evaluation metrics for image-set determination and bundle adjustment.

Category	Name	Description
Image set determination	Processing time	Total duration spent on the selection of image set
	Precision	Proportion of overlapping images relative to the total
	Average overlap area ratio	Mean value representing the total overlap area
	Correlation	Relationship between overlap ratios and similarity scores
Bundle adjustment	Precision	Mean standard deviation of EOP
Bundle adjustment	Reprojection error	Root Mean Square residual for image coordinates

Table 5. Comparison of precision among proposed method, NoM, and BoVW-based approaches.

Target	Precision (%)
Target	NoM	BoVW	Proposed
1	94	83	94
2	89	56	89
3	100	60	100
4	100	50	100
Average	96	62	96

Table 6. Comparison of average overlap area ratio among proposed method, NoM, and BoVW-based approaches.

Target	Average Overlap Area Ratio
Target	NoM	BoVW	Proposed
1	0.54	0.48	0.54
2	0.53	0.40	0.53
3	0.62	0.40	0.61
4	0.56	0.33	0.50
Average	0.57	0.40	0.55

Table 7. Comparison of correlation among proposed method, NoM, and BoVW-based approaches.

Target	Correlation (Absolute Value)
Target	NoM	BoVW	Proposed
1	0.81	0.85	0.80
2	0.87	0.50	0.87
3	0.83	0.58	0.87
4	0.87	0.21	0.97
Average	0.84	0.53	0.88

Table 8. Average time required determining image set for proposed method, NoM, and BoVW-based approaches (second).

		Dataset 1			Dataset 2
		NoM	BoVW	Proposed	NoM	BoVW	Proposed
Codebook Generation		-	1.46	1.46	-	3.81	3.81
Image set determination	Image retrieval	-	0.05	0.05	-	0.32	0.32
	Number of images Determination	-	-	0.04	-	-	0.03
	Image matching	15.76	-	0.17	73.40	-	3.62
Total		15.76	1.51	1.72	73.40	4.13	7.78

Table 9. Bundle adjustment results for image set constructed by proposed method, NoM, and BoVW-based approaches.

Target		Mean Standard Deviation of EOPs (Precision)						Reprojection Error	Bundle Adjustment Results
Target		X0 (m)	Y0 (m)	Z0 (m)	ω (degree)	φ (degree)	κ (degree)	RMS Residual (pixel)	Number of Successful Images	Number of Excluded Images
1	NoM	0.088	0.075	0.013	0.042	0.041	0.076	0.66	18	1
	BoVW	0.092	0.091	0.201	0.372	0.339	0.100	1.11	16	3
	Proposed	0.004	0.004	0.001	0.003	0.004	0.006	0.55	18	1
2	NoM	0.053	0.067	0.037	0.443	0.391	0.086	0.43	16	2
	BoVW	0.292	0.494	0.062	0.729	1.190	0.212	0.90	18 (2 block)	0
	Proposed	0.056	0.069	0.036	0.456	0.403	0.101	0.52	16	2
3	NoM	0.030	0.019	0.026	0.075	0.091	0.010	0.43	11	0
	BoVW	0.147	0.169	0.174	0.784	0.800	0.231	0.90	7	4
	Proposed	0.044	0.025	0.123	0.057	0.068	0.058	0.52	11	0
4	NoM	0.129	0.103	0.100	0.307	0.336	0.055	1.10	19	0
	BoVW	0.271	0.246	0.175	0.623	0.441	0.246	1.81	17 (2 block)	2
	Proposed	0.149	0.114	0.090	0.229	0.326	0.073	0.99	19	0
Mean	NoM	0.075	0.066	0.044	0.217	0.215	0.057	0.830	16	0.75
	BoVW	0.201	0.250	0.153	0.627	0.693	0.197	1.558	14.5	2.25
	Proposed	0.063	0.053	0.063	0.186	0.200	0.060	0.750	16	0.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, J.; Choi, K. Overlapping Image-Set Determination Method Based on Hybrid BoVW-NoM Approach for UAV Image Localization. Appl. Sci. 2024, 14, 5839. https://doi.org/10.3390/app14135839

AMA Style

Lee J, Choi K. Overlapping Image-Set Determination Method Based on Hybrid BoVW-NoM Approach for UAV Image Localization. Applied Sciences. 2024; 14(13):5839. https://doi.org/10.3390/app14135839

Chicago/Turabian Style

Lee, Juyeon, and Kanghyeok Choi. 2024. "Overlapping Image-Set Determination Method Based on Hybrid BoVW-NoM Approach for UAV Image Localization" Applied Sciences 14, no. 13: 5839. https://doi.org/10.3390/app14135839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Overlapping Image-Set Determination Method Based on Hybrid BoVW-NoM Approach for UAV Image Localization

Abstract

1. Introduction

1.1. Previous Studies

1.2. Research Objectives

2. Proposed Methodology

2.1. Test Sites and Datasets

2.2. Hybrid BoVW-NoM Technique

2.3. Evaluation Method

3. Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI