4.2. Experimental Results and Analysis
This chapter conducts experiments on four self-collected 3D reconstruction datasets (Statue, Statue2, Archi1, Archi2), covering statue and architectural scenes to balance structural complexity and texture diversity, thereby validating the proposed algorithm’s efficacy and generalization capability.
4.2.1. Improving the Effectiveness Analysis of the RANSAC Algorithm
The RANSAC algorithm’s performance in research and use is vulnerable to outliers and noise interference found in high-resolution datasets. Furthermore, the fixed threshold mechanism has inherent shortcomings, including a high rate of misclassification and a heavy reliance on empirical data. In order to overcome these problems, this study suggests an adaptive threshold adjustment technique based on statistical features. By using dynamic threshold settings, the possibility of incorrectly classifying inliers is decreased, improving the model’s resilience to noise. The RANSAC optimization algorithm has a threshold range of 10 to 100, a threshold increment of 5, a threshold initialization of 30, and a total of 100 iterations.
Using the Statue1 dataset as a reference,
Figure 11 illustrates the effectiveness of the proposed algorithm in this study.
Figure 11a displays initial matching results using SIFT features and KD-Tree descriptors (green lines), revealing mismatches caused by endpoint misalignment.
Figure 11b demonstrates the optimized results with our improved RANSAC algorithm: green lines represent matches under original thresholds, while red lines indicate additional matches from adaptive threshold adjustment. This highlights the algorithm’s enhanced robustness through increased valid matching pairs.
The quantitative results of the detailed autonomous threshold adjustment RANSAC algorithm are presented in
Table 3.
Compared to fixed-threshold RANSAC (threshold = 40), the proposed adaptive threshold algorithm dynamically adjusts parameters (e.g., Statue1 = 50, Archi1 = 65), significantly enhancing feature-matching performance. Experimental results show: statue datasets (Statue1/Statue2) achieve 57.0%~80.7% growth in matched pairs, while architectural datasets (Archi1/Archi2) exhibit 45.2%~71.1% increases. Additionally, dynamic parameter optimization improves inlier selection accuracy by 23.6% (p < 0.01), effectively overcoming empirical threshold limitations, particularly in complex scenarios with significant resolution and noise variability.
4.2.2. Feature Detection and Matching
Feature point extraction and matching in images represent a crucial step in three-dimensional reconstruction. The feature matching phase is executed on the MATLAB platform, utilizing the SIFT algorithm for feature extraction. We employed the KD-Tree matching algorithm based on Euclidean distance to replace the previous brute-force matching approach with a matching threshold set at 0.5.
Figure 12 illustrates the SIFT feature detection results.
Figure 12a displays two randomly selected original images from the collected data.
Figure 12b presents the detection outcomes for these images, which took a cumulative time of 13.467 s. The number of feature points detected, from left to right, were 21,196 and 13,531, respectively.
Figure 13 illustrates the differing outcomes of two images after employing two methods of feature matching and error removal.
Figure 13a,b present outlier removal results using brute-force (BF) matching and traditional RANSAC, while
Figure 13c,d demonstrate the enhanced RANSAC algorithm integrating KD-Tree matching and adaptive threshold adjustment. KD-Tree achieves significantly lower mismatch rates compared to BF; The improved algorithm retains more matched pairs while drastically reducing execution time without compromising accuracy. A comparison of
Figure 13a and
Figure 13c reveals that BF exhibits marginally higher per-frame matching accuracy but suffers from substantially more mismatches. These observations validate the rationale for adopting the KD-Tree + SIFT framework, which balances efficiency and robustness while providing precise correspondences for downstream 3D reconstruction tasks.
The experimental findings of the performance comparison are shown in
Table 4. Experiments reveal significant differences in robustness, matching accuracy, and time efficiency among SIFT-based feature detection with varying matching strategies:
Time Efficiency: The SIFT + Brute Force (BF) baseline takes 84.523 s, increasing to 86.147 s after RANSAC optimization, confirming additional computational costs from iterative sampling. In contrast, SIFT + KD-Tree reduces matching time to 26.634 s (68.5% speedup over BF) through hierarchical spatial partitioning of high-dimensional feature vectors, with only a marginal increase to 26.953 s after enhanced RANSAC, indicating minimal computational overhead during model optimization.
Matching Quality: SIFT + BF generates 12,356 initial matches, retaining 5050 pairs after RANSAC filtering, with accuracy improving from 40.871% to 50.50%, demonstrating RANSAC’s outlier removal capability. Meanwhile, SIFT + KD-Tree reduces matches from 10,316 to 9136 pairs (11.4% decrease), achieving 88.561% initial matching accuracy after improved RANSAC, significantly outperforming SIFT + BF + RANSAC’s 59.1% reduction. This highlights the synergistic effect of SIFT’s high discriminative power and KD-Tree’s approximate nearest neighbor search, enhancing both precision and stability.
Overall, the SIFT + KD-Tree + enhanced RANSAC approach provides a solution that supports both real-time performance and robustness for large-scale image registration tasks by striking an ideal balance between time efficiency (26.953 s), matching accuracy (88.561%), and false match suppression capability (false match rate < 11.5%).
Figure 14 illustrates the comparative results of various feature-matching algorithms applied to self-collected data.
Figure 14 shows the matching outcomes of four feature matching algorithms on four sets of self-collected data: SURF [
22], SIFT [
19], MS-HLMO [
23], and our suggested approach. It is evident that our algorithm is capable of preserving a greater number of matching point pairs.
Table 5 and
Table 6 present a comparative analysis of the proposed algorithm against various matching algorithms, focusing on the number of matched point pairs and accuracy metrics across four sets of self-collected data.
Based on a thorough examination of
Table 5 and
Table 6, we can determine that our algorithm performs noticeably better than competing algorithms in two crucial metrics: the accuracy of matching and the quantity of matched point pairs. In particular, throughout all datasets, our algorithm produces a significantly greater number of matched point pairs; however, it excels in the Archi1 dataset, successfully matching 4090 point pairs. With a maximum accuracy of 88.56% (Statue1) and a minimum of 64.27% (Archi2), our algorithm also performs exceptionally well in terms of matching accuracy across all datasets, demonstrating high precision and robustness. On the other hand, the MS-HLMO algorithm performs inconsistently in terms of matching accuracy, especially underperforming in the Archi1 dataset, even though it outperforms SIFT and SURF in terms of the number of matched point pairs.
The specific comparison of the results is illustrated in
Figure 15. It can be clearly observed from
Figure 15 that across all datasets, our algorithm and MS-HLMO consistently outperform SIFT and SURF algorithms, particularly when it comes to matching accuracy. Consequently, our algorithm not only matches more point pairs when processing self-collected datasets but also achieves a higher matching accuracy when taking into account both the number of matched point pairs and matching accuracy, highlighting its superiority and potential in real-world applications.
In summary, to address the issue of insufficient point cloud density in existing multi-view image 3D reconstruction algorithms, this study proposes a solution framework based on feature-matching optimization. By integrating the scale invariance of SIFT feature descriptors with the high-dimensional search efficiency of KD-Tree (resulting in a 68.5% reduction in matching time) alongside a self-designed dynamic threshold RANSAC optimization strategy (which enhances inlier selection accuracy by 23.6%), the experimental results demonstrate the algorithm’s exceptional performance across four datasets. This accomplishment demonstrates the synergistic efficacy of the adaptive RANSAC filtering mechanism and SIFT + KD-Tree feature space optimization, which effectively alleviates the issue of low model point cloud density caused by insufficient matching points during multi-view image reconstruction and provides a larger number of matching point pairs for subsequent multi-view geometric computations.
4.2.3. Analysis of the Effectiveness of View Improvement Algorithms
To address the issue of numerous invalid matching relationships in the incremental SFM sparse reconstruction algorithm, the influence of similar structures across different time periods can diminish reconstruction accuracy. This paper proposes a feature-matching strategy based on similarity measures, which extracts the similarity of images and conducts an initial screening of matching relationships. By considering the temporal continuity of image acquisition, the initial matching views are optimized and adjusted to generate more complete and accurate matching views. The objective is to enhance the quantity and accuracy of point clouds in model reconstruction. A comparison with the COMAP algorithm, the matching view comparison results of COLMAP and the improved algorithm presented in this paper are illustrated in
Figure 16.
Figure 16 presents matching views in the Structure from Motion (SfM) algorithm using network graphs, where nodes represent viewpoints and edges indicate matching relationships. Node size and edge thickness may scale with the number of matching pairs or quality, visually reflecting differences in view construction across algorithms. The first row (
Figure 16(a1,b1,c1,d1)) shows matching views from the traditional COLMAP algorithm, revealing a high density of edges and well-organized views, indicating equal treatment of all images. However, similar structures across different time periods can cause ambiguous matches, reducing reconstruction accuracy, and feature matching on images without common areas leads to redundancy. In contrast, the second row (
Figure 16(a2,b2,c2,d2)) displays matching views from the improved incremental SFM sparse reconstruction algorithm proposed in this study, with significantly fewer edges and more targeted view distribution. This demonstrates that the improved algorithm establishes more matches between highly similar images, forming denser combinations. The enhanced algorithm not only reduces redundant matches but also more accurately filters valid matching relationships, providing a reliable foundation for improving reconstruction quality and precision.
The specific performance improvements of the algorithm presented in this paper, as demonstrated in the matching view, are illustrated in
Table 7.
The number of edges in the matching view, represented by in this case, is equal to the number of iterations needed for further feature-matching tasks. According to the table’s data, the enhanced algorithm’s matching iterations for every dataset are noticeably fewer than those generated by the original algorithm. In particular, there are 63.4%, 76.3%, 76%, and 84.2% decreases in matching iterations between Statue1 and Archi2, respectively.
Strong generalizability and applicability are demonstrated by the improved algorithm’s impressive performance across a variety of dataset types, including sculptures and architecture. The enhanced algorithm successfully lowers the number of matches, demonstrating its stability and effectiveness in managing datasets of different sizes, whether used on small datasets like Statue1 with only 30 images or large datasets like Archi2 with 100 images.
4.2.4. Improving the Analysis of Sparse Reconstruction Results in Algorithms
This section seeks to fully illustrate the influence of the suggested similarity measure enhancement on the incremental SFM sparse reconstruction algorithm, building on the analysis of the view improvement algorithm’s efficacy. We will compare the enhanced algorithm’s performance with other popular algorithms like VSFM [
24], COLMAP [
7], and SfSM [
25] and look into how well it performs in sparse reconstruction. We will evaluate the algorithms based on the number of point clouds and the accuracy of the reconstruction results by comparing and contrasting the sparse reconstruction results before and after the improvements.
Figure 17,
Figure 18,
Figure 19 and
Figure 20 illustrate the comparative sparse reconstruction results of the proposed algorithm against various other algorithms using the self-collected data from Statue1, Statue2, Archi1, and Archi2.
The presentation format of the results is as follows: from left to right, the front, side, and rear views of the reconstruction results are displayed. This three-view representation provides a clear visual demonstration of the accuracy and completeness of the sparse reconstruction from a subjective visual perspective. All results are saved in ‘.ply’ format and analyzed using the Cloud Compare software v2.13.2.
As illustrated in
Figure 17, the sparse reconstruction results for the self-acquired dataset Statue1 include SfSM, VSFM, COLMAP, and the improved algorithm proposed in this paper. In terms of point cloud density, the algorithm presented herein produces a more densely populated point cloud that covers a significant portion of the statue, whereas the other three methods yield relatively sparse point clouds with numerous voids and missing areas. This indicates that the proposed algorithm is more effective in feature point extraction and retains a greater amount of valid information during the matching process, resulting in a richer point cloud dataset.
The algorithm presented in this paper generates statue models with greater accuracy in geometric shapes and details, successfully recovering more surface textures and contours. In contrast, the sparse reconstruction results from the SfSM and VSFM algorithms exhibit noticeable distortion or deformation, while the COLMAP algorithm performs relatively well; however, the proposed algorithm demonstrates superior results. This enhancement is attributed to the optimization of the incremental structure from the Motion (SfM) sparse reconstruction algorithm and improvements in the initial feature matching phase.
Comparative analysis of the supplementary data in
Figure 18,
Figure 19 and
Figure 20, specifically for Statue2, Archi1, and Archi2, substantiates the significant advantages of the proposed algorithm in terms of reconstruction accuracy and point cloud density.
Through a comprehensive analysis of sparse reconstruction results from four self-collected datasets (Statue1, Statue2, Archi1, Archi2), the following conclusions are drawn: In terms of point cloud quantity and reconstruction accuracy, the proposed enhanced algorithm demonstrates significant advantages over traditional methods (e.g., SfSM, VSFM, COLMAP). The improved algorithm substantially increases the number of generated point clouds, achieving a more uniform distribution that nearly covers all areas of the scene, indicating its superior capability in extracting and utilizing image feature information to produce denser point cloud data. Furthermore, the enhanced algorithm excels in detail reconstruction, more accurately capturing subtle scene structures and textures, thereby improving the precision of geometric shapes and intricate details with a clear representation of scene contours and structures. Additionally, the improved algorithm exhibits greater robustness in handling complex scenes, effectively mitigating the impact of erroneous matches by optimizing feature-matching strategies and incorporating temporal continuity in image acquisition, enhancing both matching accuracy and robustness.
The specific quantitative results are presented in
Table 8, where a more detailed analysis of the sparse reconstruction outcomes is conducted from the perspectives of reprojection error and point cloud quantity. The specific evaluation metrics are detailed in
Section 4.1.2.
In terms of reprojection error, the proposed improved algorithm (Ours) achieves the lowest error values across all datasets, significantly outperforming traditional algorithms such as SfSM, VSFM, and COLMAP. For instance, in the Statue1 dataset, the reprojection error of the improved algorithm is 0.98, compared to 4.35 for SfSM, 3.25 for VSFM, and 1.21 for COLMAP, indicating a substantial increase in accuracy. This advantage is consistent across other datasets; for example, in the Archi2 dataset, the improved algorithm achieves an error of 1.58, significantly lower than other methods. Reprojection error measures the positional deviation of 3D reconstruction points projected onto the 2D image plane, with lower values indicating better geometric consistency between the reconstructed 3D structure and the actual imaging process. These results demonstrate the improved algorithm’s superior reconstruction accuracy, enabling more precise recovery of scene geometry and providing a reliable geometric foundation for subsequent 3D reconstruction tasks.
Regarding point cloud quantity, the improved algorithm exhibits exceptional performance. For example, the Statue1 dataset generates 78,986 point clouds, significantly surpassing SfSM’s 40,973, VSFM’s 60,588, and COLMAP’s 17,566. This trend is consistent in other datasets, such as Archi2, where the improved algorithm produces 500,247 point clouds, far exceeding other methods. This indicates that the improved algorithm more effectively extracts and utilizes image feature information, resulting in denser and more uniformly distributed point cloud data, thereby providing richer geometric support for subsequent 3D reconstruction.
4.2.5. PMVS Dense Reconstruction Results
PMVS (patch-based multi-view stereo) is a multi-view stereo matching algorithm used to reconstruct 3D models from multiple photos. The basic principle is to generate a dense 3D point cloud by performing dense matching on multi-view images. The core idea of PMVS is based on image patches, which obtain high-precision 3D reconstruction results via local optimization and global consistency.
Figure 21,
Figure 22,
Figure 23 and
Figure 24 illustrate the comparison of dense reconstruction results obtained from the proposed algorithm against various other algorithms using the self-collected datasets Statue1, Statue2, Archi1, and Archi2.
Figure 21 illustrates the dense reconstruction results of the Statue.
From the frontal perspective, the VSFM algorithm produces sparse point clouds with significant detail loss, while COLMAP increases point cloud density but still exhibits reconstruction blind spots. In contrast, the proposed algorithm generates uniformly distributed high-density point clouds, effectively capturing the statue’s textures and intricate details. In the lateral view, VSFM displays blurred outlines, and COLMAP inadequately handles complex geometries; however, the proposed algorithm achieves precise contour delineation and fine feature reconstruction. The rear perspective comparison is even more pronounced: VSFM suffers from extensive point cloud omissions, and COLMAP fails to provide complete coverage, whereas the proposed algorithm ensures full rear-view coverage.
The comparative results of the supplementary data for Statue2, Archi1, and Archi2 presented in
Figure 22,
Figure 23 and
Figure 24 further validate the significant advantages of the proposed algorithm in terms of point cloud quantity and completeness.
Through a comparative analysis from various perspectives, it is evident that the algorithm presented in this paper demonstrates significant improvements over VSFM and COLMAP in terms of point cloud quantity and detail reconstruction. The point clouds generated by VSFM appear relatively sparse across different viewpoints, resulting in the failure to recover numerous architectural details, which compromises the completeness of the reconstruction. Although COLMAP shows an increase in point cloud density, its performance in detail representation remains suboptimal, particularly concerning the textures and complex structures of buildings, failing to achieve the desired reconstruction quality. In contrast, the algorithm proposed in this paper not only generates a greater number of point clouds but also ensures a more uniform distribution, effectively capturing the intricate features of the architecture thereby enhancing the completeness and accuracy of the reconstruction across all perspectives.
Table 9 quantitatively illustrates the specific number of reconstructed point clouds generated by various dense reconstruction algorithms applied to self-acquired data.
This study conducts a comparative analysis of the dense reconstruction performance across four typical 3D reconstruction datasets (Statue1, Statue2, Archi1, Archi2) using three feature matching algorithms (VSFM, COLMAP, and an improved algorithm). As shown in
Table 1, VSFM demonstrates the best computational speed in point cloud generation (Statue1: 65.732 s, Archi2: 605.858 s), although its reconstruction density is significantly lower (Statue1: 424,949 points, Archi2: 840,909 points). The COLMAP algorithm shows a substantial increase in point cloud quantity compared to VSFM (Statue1: +150%, Archi2: +465%), but this comes with a considerable increase in time cost (Statue1: 3085.794 s, Archi2: 3785.947 s).
The results of the algorithm presented in this paper, as illustrated in
Figure 25, clearly demonstrate.
The improved algorithm achieves the highest reconstruction density across all datasets, generating point clouds that are 7.14 times that of VSFM (Statue1: 3,033,703 points) and 1.80 times that of COLMAP (Archi2: 8,574,136 points). Although its computation time is only reduced by 8.2% (Archi1) to 29.8% (Statue1) compared to COLMAP when considering both the number of point clouds and runtime, the proposed algorithm demonstrates superior performance.