Automatic Registration of Multi-Temporal 3D Models Based on Phase Congruency Method

Ren, Chaofeng; Feng, Kenan; Shang, Haixing; Li, Shiyuan

doi:10.3390/rs17081328

Open AccessArticle

Automatic Registration of Multi-Temporal 3D Models Based on Phase Congruency Method

¹

College of Geological Engineering and Geomatics, Chang’an University, Xi’an 710054, China

²

Key Laboratory of Western China’s Mineral Resources and Geological Engineering, Ministry of Education, Xi’an 710054, China

³

Northwest Engineering Corporation Limited, Power China Group, Xi’an 710065, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(8), 1328; https://doi.org/10.3390/rs17081328

Submission received: 17 January 2025 / Revised: 4 April 2025 / Accepted: 5 April 2025 / Published: 9 April 2025

(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

Download

Browse Figures

Versions Notes

Abstract

:

The application prospects of multi-temporal 3D models are broad. It is difficult to ensure that multi-temporal 3D models have a consistent spatial reference. In this study, a method for automatic alignment of multi-temporal 3D models based on phase congruency (PC) matching is proposed. Firstly, the texture image of the multi-temporal 3D model is obtained, and the key points are extracted from the texture image. Secondly, the affine model between the plane of the key point and its corresponding tile triangle is established, and the 2D coordinates of the key point are mapped to 3D spatial coordinates. Thirdly, multi-temporal 3D model matching is completed based on PC to obtain a large number of evenly distributed corresponding points. Finally, the parameters of the 3D transformation model are estimated based on the multi-temporal corresponding points, and the vertex update of the 3D model is completed. The first experiment demonstrates that the method proposed in this study performs remarkably well in improving the positioning accuracy of feature point coordinates, effectively reducing the mean error of the systematic error to below 0.001 m. The second experiment further reveals the significant impact of different 3D transformation models. The experimental results show that the coordinates obtained based on position and orientation system (POS) data have significant positioning errors, while the method proposed in this study can reduce the coordinate errors between the two-period models. Due to the fact that this method does not require obtaining ground control points (GCPs) and does not require manual measurement for 3D geometric registration, its application to multi-temporal 3D models can ensure high-precision spatial referencing for multi-temporal 3D models, streamlining processes to reduce resource intensity and enhancing economic efficiency.

Keywords:

unmanned aerial vehicles (UAVs); photogrammetry; 3D models; feature extraction; phase congruency (PC)

1. Introduction

Unmanned Aerial Vehicle (UAV) photogrammetry is an advanced technology that incorporates drones, sensors, telemetry remote control, communications, a position and orientation system (POS), a Global Navigation Satellite System (GNSS) and remote sensing applications. It is an application technology with high precision, intelligence, automation, and other characteristics [1]. Traditional terrestrial photogrammetry is often time consuming and labor intensive, requiring extensive fieldwork and manual setup. Additionally, it exposes surveyors to potential safety hazards in difficult-to-reach or hazardous areas, whereas UAV photogrammetry can avoid these issues. UAV photogrammetry technology can rapidly acquire high-resolution image data over large areas, offering advantages, including safety, reliability, real-time efficiency, cost effectiveness, and intuitive results [2]. The acquired multi-temporal image data are used in 3D reconstruction based on Structure from Motion (SfM) and Multi-View Stereo (MVS) algorithms, which can generate scene digital terrain products, such as the Digital Orthophoto Map (DOM), Digital Elevation Model (DEM), Digital Surface Model (DSM), etc. [3,4]. Due to the ability to represent fine details and the high accuracy of geometric measurements, 3D models are widely used in many fields, such as dam health monitoring [5], archaeology and cultural heritage [6], landslide monitoring [7,8,9], geological hazard investigation [10], and change detection [11]. However, extracting corresponding points from multi-temporal 3D models is a challenging task.

In conventional UAV photogrammetry, a certain number of ground control points (GCPs) need to be measured in the study area before acquiring UAV images in order to ensure the consistent spatial reference of multi-temporal 3D models [12]. UAV imagery is typically georeferenced using GCPs to ensure that the resulting 3D models have high geometric accuracy. The current 3D models generated based on GNSS-RTK methods without GCPs can reach centimeter-level accuracy for direct ground positioning. Compared with the GNSS-RTK method, UAV photogrammetry using conventional GCPs can achieve higher geometric accuracy [13,14]. However, due to the periodic 3D reconstruction, the multi-temporal model has system positioning errors, and it is still unable to be used directly for applications that detect multi-temporal changes. Especially for historical 3D model data, it is no longer possible to supplement the measurement of GCP [15,16]. Thus, it is more valuable to propose a new registration method for multi-temporal 3D models.

The key to the automatic registration of multi-temporal 3D models lies in obtaining the corresponding points and their spatial position information. After obtaining a large number of corresponding points, they can be used to estimate the relative geometric transformation parameters between models and perform automatic registration. The method of obtaining corresponding points in multi-temporal 3D model includes two steps: feature extraction and feature matching. There have been many research results on extracting feature points from images, such as Harris [17], FAST [18], SIFT [19] and SURF [20]. By extracting the corners and edges of an image as key points, the local information around the key points is extracted to form a feature vector. After that, the similarity of the feature vectors is evaluated to determine whether the key point is a corresponding point or not. The feature vectors of the key points are usually extracted based on image intensity or gradient information; however, either the intensity or gradient information of the image is sensitive to nonlinear radiation distortion (NRD), which can easily lead to the failure of key points matching. In recent years, deep learning methods have achieved significant advantages in image-matching tasks, such as SuperGlue [21] and LoFTR [22]. However, further research is needed to determine whether deep learning methods are suitable for matching texture images of 3D models.

Morrone and Owens proposed a detection method based on local energy features in 1987, which provided a new approach to solve this problem by using phase congruency (PC) to detect features [23]. Ye et al. implemented the matching of heterogeneous images based on the Histogram of Oriented Phase Congruent (HOPC) using the structural characteristics of the images [24]. Other researchers have extended the algorithm and proposed a Local Histogram of Oriental Phase Consistency (LHOPC) [25], Phase Congruency Structural Descriptor (PCSD) [26], and Radiation Variation Intensity Feature Transform (RIFT) [27]. PC methods can effectively resist NRD between images, but require a large amount of computing time, resulting in low processing efficiency. The multi-temporal 3D models have a large time span, and the radiometric information of their images is significantly different. Therefore, further research is needed to apply PC algorithms to the corresponding point matching of the 3D models.

Although existing matching algorithms have made some progress in theory and practice, they still cannot ensure the complete correctness of corresponding points. To enhance the accuracy of matching outcomes, identifying and eliminating erroneously matched points is crucial. Fischler [28] proposed the RANSAC algorithm, which can estimate the parameters of mathematical models in datasets containing outliers. Torr [29] further distinguished between inliers and outliers by introducing a mixed probability model, achieving maximum likelihood estimation of the parameter model. Wu [30] proposed the FSC algorithm, which effectively improves the reliability and efficiency of the registration algorithm by dividing the sample set into a high-accuracy sample set and a consensus set. Although these methods have achieved certain successes in the processing of 2D images, they are not applicable to the problem of erroneous corresponding point elimination in 3D models. The complexity of 3D models makes it difficult to represent the positional relationship of corresponding points through simple transformations. Therefore, further research is needed to eliminate errors in the corresponding points of multi-temporal 3D models.

In this study, a matching method based on PC is proposed to realize the automatic alignment of multi-temporal 3D models by estimating the 3D conversion model parameters from the acquired corresponding points. The method is adaptive to illumination differences, contrast differences and nonlinear radiation distortions in multi-temporal 3D models, and improves the reliability of automatic alignment of multi-temporal 3D models.

2. Study Area and Data

2.1. Study Areas

The first study area (S1), the Taibai region (33°50′–34°08′N, 107°41′–107°52′E), is located in the western part of the Guanzhong Plain, at the foot of the main peak of the Qinling Mountains, Taibai Mountain, and is under the jurisdiction of Meixian County, Baoji City, Shaanxi Province, as shown in Figure 1. The terrain and landforms in this area are complex, with the Qinling Mountains, shallow hills, loess plateau, and Weibei Plain from south to north [31]. The study area includes buildings, roads, farmland, forests, and wastelands, providing favorable conditions for feature extraction.

The second study area (S2) is located in the Alxa Plateau (37°24′–42°47′N, 97°10′–106°53′E) in the westernmost part of Alxa League, Inner Mongolia Autonomous Region, as shown in Figure 2. The overall trend is high in the southwest and low in the northeast, with an average altitude from 900 to 1400 m. The surface is mainly covered by deserts, including the Gobi, followed by grasslands, while the areas of forests, cultivated land, wetlands, and artificial surfaces are relatively small. There are three major deserts, Badain Jaran, Tengger, and Ulan Buh, as well as the Ejinago Desert [32]. Due to the fact that the study area is located in a desert, lacking sufficient useful texture information and having a large number of similar local image blocks, traditional matching algorithms may have problems, such as a large number of erroneous points, difficulty in matching identical points in the matching results, and poor robustness, which can affect the registration of 3D models.

2.2. UAV Data Acquisition

For the S1 study area, the UAV platform is a four-rotor UAV, model FeiMa D2000, equipped with a full-frame camera SONY ILCE-7RM4, as shown in Figure 3a. The UAV was flown at a relative altitude of 81 m, corresponding to a ground resolution distance (GSD) of 1.0 cm. Data acquisition was completed on 25 February 2023, and 27 February 2023, with 430 and 427 UAV images obtained, respectively. For the S2 study area, two types of UAVs were selected to accomplish data collection. The UAV flight platform used for the first data collection was a DJI M300 RTK equipped with a full-frame camera Zenmuse P1, as shown in Figure 3b. The relative flight height of the UAV is 119 m, and the corresponding GSD is 2.9 cm. The flight was completed on 25 January 2023, and a total of 705 images of the UAV were collected. The UAV platform for the second data acquisition is the DJI Mavic 2 Pro, equipped with an APS-C camera Hasselblad L1D-20C, as shown in Figure 3c. The UAV’s relative flight height is 107 m, and the corresponding GSD is 2.8 cm. The flight was completed on 26 February 2023, and 538 images of the UAV were collected. The information on three kinds of camera sensors used in the S1 and S2 study areas is shown in Table 1.

After acquiring the UAV image data, the 3D models of the two study areas were obtained based on the 3D reconstruction software “Context Capture” (https://geo-matching.com/products/contextcapture). The 3D models were generated using the POS data of the images to complete the georeferencing, and the results are shown in Figure 4a and Figure 4b, respectively.

3. Methodology

The automatic alignment method for multi-temporal 3D models proposed in this study consists of four key steps. Firstly, the texture image of multi-temporal 3D model is obtained, and the key points are extracted from the texture image. Secondly, the affine model between the plane of the key point and its corresponding tile triangle is established, and the 2D coordinates of the key point are mapped to 3D spatial coordinates. This step ensures geometric consistency and avoids reliance on GCPs. Thirdly, multi-temporal 3D model matching is completed based on PC to obtain a large number of evenly distributed corresponding points. Finally, the parameters of the 3D transformation model are estimated based on the multi-temporal corresponding points, and the vertex update of the 3D model is completed. The detailed diagram is shown in Figure 5.

3.1. Feature Point Extraction

The research object of this article is 3D real scene models, and obtaining feature points in 3D space is essentially extracting features from the texture images that make up the 3D models. Therefore, based on the OBJ files that store the geometric structure and material information of the 3D models, the 3D model consists of three parts: the coordinates of the 3D vertices, the triangles formed by the 3D vertices, and the texture image covering the triangles. The composition of the 3D model is shown in Figure 6.

Conventional image feature extraction methods are directly extracted from common 2D images, while for 3D models, feature points can only be extracted from texture images. As shown in Figure 6, the texture image has an irregular size and shape. Meanwhile, the texture images extracted from multi-temporal 3D models have obvious differences in radiation and brightness. Compared to gradient-based methods, PC is invariant to illumination and contrast variations. This property is critical for multi-temporal models where radiometric distortions are common. For instance, in desert regions (S2), SIFT fails due to homogeneous textures, while PC captures structural features (edges and corners) even under low-texture conditions. Thus, the PC model based on Log Gabor Filter (LGF) is applied to feature extraction of texture images in this study.

3.1.1. Phase Congruency (PC) Map Generation

Early PC theory was only applicable to 1D signals, while images are 2D signals; therefore, it needs to be extended to 2D. Kovesi [33] proposed the theory of 2D PC feature detection in 1995, which uses a multi-scale and multi-directional LGF to calculate the local phase features of images, overcoming problems such as noise. Compared to standard Gabor filters, Log Gabor Filter avoids DC components, enhancing robustness to illumination changes. Therefore, the Log Gabor Filter is optimized for multi-scale and multi-directional analysis. The expression of 2D Log Gabor Filter (2D-LGF) is shown in Equation (1).

G (ρ, θ, s, o) = \exp (- \frac{1}{2} {(\frac{ρ - ρ_{s}}{σ_{ρ}})}^{2}) \exp (- \frac{1}{2} {(\frac{θ - θ_{s o}}{σ_{θ}})}^{2})

(1)

where (ρ, θ) are the log-polar coordinates; s and o are the scale and the orientation of 2D-LGF, respectively; (ρ_s, θ_so) are the coordinates of the center of 2D-LGF; and (σ_ρ, σ_θ) are the bandwidths in ρ and θ, respectively, controlling radial and directional bandwidth separately and balancing frequency coverage with computational efficiency.

The LGF is initially defined in the frequency domain due to its superior flexibility in designing precise frequency responses, enabling multi-scale and multi-directional analysis. However, practical image processing tasks necessitate the application of filters in the spatial domain, where direct interaction with pixel values facilitates efficient local feature extraction, such as edges and textures. Consequently, the LGF is transformed from the frequency domain to the corresponding spatial domain by the inverse Fourier transform [34]. This transition allows for the utilization of LGF’s frequency-specific design while leveraging the computational efficiency and physical interpretability of spatial-domain operations. In the spatial domain, 2D-LGF can be represented as Equation (2).

G (x, y, s, o) = G^{even} (x, y, s, o) + i G^{odd} (x, y, s, o)

(2)

where G^even(x, y, s, o) and G^odd(x, y, s, o) stand for the even-symmetric and the odd-symmetric log-Gabor wavelets in the scale s and the orientation o, respectively.

Let I(x, y) denote a 2D image signal. Using 2D-LGF to filter the input image I involves convolving the even-symmetric and odd-symmetric filters of 2D-LGF with the input image. The response components can be represented as Equation (3).

[E_{s o} (x, y), O_{s o} (x, y)] = [I (x, y) * G^{even} (x, y, s, o), I (x, y) * G^{odd} (x, y, s, o)]

(3)

where E_so(x, y) and O_so(x, y) are the even and odd response components.

Then, the amplitude component A_so(x, y) and the phase component ϕ_so(x, y) of I(x, y) at scale s and orientation o can be obtained by Equations (4) and (5).

A_{s o} (x, y) = \sqrt{E_{s o} {(x, y)}^{2} + O_{s o} {(x, y)}^{2}}

(4)

ϕ_{s o} (x, y) = \arctan (O_{s o} (x, y) / E_{s o} (x, y))

(5)

Mikolajczyk and Schmid improved the PC model based on LGF by applying 1D analysis in multiple directions and combining the results in some way to analyze 2D images [35]. The PC model is shown in Equation (6).

P C (x, y) = \frac{\sum_{o} \sum_{s} W_{o} (x, y) ⌊ A_{s o} (x, y) Δ Φ_{s o} (x, y) - T_{o} ⌋}{\sum_{o} \sum_{s} A_{s o} (x, y) + ξ}

(6)

where PC(x, y) represents the phase characteristic result of the PC model; W_o(x, y) is the weight factor for frequency expansion; ξ is a small positive number that avoids the denominator being zero; ⌊·⌋ prevents values from being negative; T_o is the noise threshold; A_so(x, y) is the amplitude of the image point (x, y) at the scale n and orientation o of LGF; and ΔΦ_so(x, y) is a 2D phase deviation function, whose definition is,

Δ Φ_{s o} (x, y) = \cos (ϕ_{s o} (x, y) - {\bar{ϕ}}_{s o} (x, y)) - | \sin (ϕ_{s o} (x, y) - {\bar{ϕ}}_{s o} (x, y)) |

(7)

where ϕ_so (x, y) is the average phase angle.

PC detects features by quantifying the degree of consistency of local phases. When the local phase heights are consistent (such as edges or corners), the cos term approaches 1 and the sin term approaches 0, at which point ΔΦ_so(x, y) is approximately equal to 1. In contrast, in areas of phase confusion (such as noise), ΔΦ_so(x, y) approaches 0. The absolute value only applies to the sin term, as its odd symmetry is more sensitive to phase shift.

Based on Equation (6), the PC map can be obtained by calculating the result of the PC model. In order to better describe edge features, an independent phase map PC(θ_o) is calculated for each orientation o, where θ is the angle of orientation o.

3.1.2. Feature Detection

According to the moment analysis algorithm, the axis corresponding to the minimum moment is designated as the principal axis, which represents the direction information of the features [36]. The axis corresponding to the maximum moment is perpendicular to the main axis, reflecting the distinctiveness of the feature. Before calculation of the minimum and maximum moments, three intermediate quantities a, b and c should be calculated.

{\begin{array}{l} a = \sum_{o} {(P C (θ_{o}) \cos θ_{o})}^{2} \\ b = 2 \sum_{o} (P C (θ_{o}) \cos θ_{o}) \times (P C (θ_{o}) \sin θ_{o}) \\ c = \sum_{o} {(P C (θ_{o}) \sin θ_{o})}^{2} \end{array}

(8)

The maximum moment M_max and minimum moment M_min are given by Equation (9).

{\begin{array}{l} M_{\max} = \frac{1}{2} (c + a + \sqrt{b^{2} + {(a - c)}^{2}}) \\ M_{\min} = \frac{1}{2} (c + a - \sqrt{b^{2} + {(a - c)}^{2}}) \end{array}

(9)

M_max is the edge map of an image with the amount of feature information in the edge structure and M_min is the corner feature, which can be used in corner detection. After computing M_max and M_min of the PC maps, feature points can be obtained by performing Harris feature detection.

Figure 7 shows an example of using PC method for feature detection. Figure 7a,d show a pair of UAV images from two different periods. Figure 7b,e represent the normalized maximum moment maps. Figure 7c,f display detected feature points using the PC method, respectively.

3.1.3. Feature Description

The PC map generated by the PC model based on Equation (6) is insufficient to construct feature descriptors. Using odd-symmetric filters of 2D-LGF to calculate the direction information of PC can represent the direction with the most drastic changes in image features.

Considering that PC calculation requires multi-directional odd-symmetric filters of 2D-LGF, the convolution results of each orientation can be projected onto the horizontal and vertical directions to calculate the horizontal energy and vertical energy. The direction feature of PC is calculated as follows.

{\begin{cases} A = \sum_{θ} (C_{s o} (θ) \cos θ) \\ B = \sum_{θ} (C_{s o} (θ) \sin θ) \\ F = \arctan (b, a + ξ) \end{cases}

(10)

where A and B represent horizontal and vertical energy; F represents the direction feature of PC; C_so(θ) represents the convolution result of the odd-symmetric filter in the direction θ; and ξ is a minimal constant that avoids the denominator being zero.

In order to resist the differences in illumination, contrast and nonlinear radiation distortion, a method of feature point extraction and feature vector construction considering anisotropic weighted torque and absolute phase direction is proposed based on the Histogram of Absolute Phase Consistency Gradients (HAPCG) [37]. The study selects a circular local image patch centered at the feature points and calculates its absolute PC gradient and direction feature. By selecting the peak direction of the histogram of absolute phase consistency gradients as the main direction of the feature points, a logarithmic polar coordinate framework is used to construct descriptors. Finally, a descriptor containing 248 dimensional feature vectors is generated. In this study, the weighting value of the anisotropic moment map was set to 1, the feature point extraction threshold was set to 0.5, and the descriptor neighborhood window size was set to 50. Other parameters were kept consistent with the HAPCG code.

3.2. Feature Point Mapping

The vertices of texture images in 3D models are normalized coordinates. The feature point coordinates extracted based on the method in Section 3.1 are texture image point coordinates. In order to facilitate use, the extracted texture image point coordinates are converted to normalized coordinates. The conversion process is shown in Figure 8:

The 2D texture image is partitioned into a triangular mesh, where each 2D triangle corresponds to a specific surface triangle on the 3D model. By normalizing the coordinates of feature points within the 2D texture space, a mapping between the 2D texture images and their associated 3D surface triangles can be established. This mapping enables the transformation of normalized 2D texture coordinates to 3D spatial coordinates through affine transformation, as illustrated in Figure 8.

It should be noted that the origin of pixel coordinates is the upper left corner of the texture image, and the origin of normalized coordinates is the lower left corner of the texture coordinates, as shown in Figure 8. Therefore, the normalization of image point coordinates is shown in Equation (11).

{\begin{cases} x = \frac{u}{w i d} \\ y = 1 - \frac{v}{h e i} \end{cases}

(11)

where wid and hei, respectively, represent the width and height (in pixels) of the texture image; x and y represent the coordinates of the feature points in the pixel coordinate system; and u and v represent the normalized texture coordinates of the feature points.

As shown in Figure 9, the normalized image point coordinates (x₁, y₁), (x₂, y₂) and (x₃, y₃) a 2D triangle in the texture image. The coordinates of the space points (X₁, Y₁, Z₁), (X₂, Y₂, Z₂) and (X₃, Y₃, Z₃) are a 3D triangle on the surface of the 3D model. The 2D triangle corresponds to the 3D triangle vertices one by one. Ignoring the 3D triangle Z coordinates, the mapping relationship can be defined as a 2D affine variation, as shown in Equation (12).

[\begin{matrix} x_{des} \\ y_{des} \end{matrix}] = [\begin{array}{l} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}] [\begin{matrix} x_{src} \\ y_{src} \end{matrix}] + [\begin{matrix} a_{13} \\ a_{23} \end{matrix}]

(12)

where (x_src, y_src) is the normalized image point coordinates; (x_des, y_des) is the spatial plane coordinate corresponding to (x_src, y_src); a₁₁, a₁₂, a₂₁, a₂₂ are scaling and rotation parameters; and a₁₃, a₂₃ are translation parameters.

Based on the vertex coordinates of the 2D texture triangle and the vertex coordinates of the 3D model triangle, the affine transformation model parameters of Equation (10) can be calculated. After that, any point within the 2D texture triangle can be mapped into the 3D triangle, as shown in the mapping relationship between x_i and X_i in Figure 8. Since point Xi is located in the plane formed by the vertices (X₁, Y₁, Z₁), (X₂, Y₂, Z₂) and (X₃, Y₃, Z₃), the Z coordinates of point X_i can be calculated under coplanar conditions, as shown in Equation (13).

Z_{i} = Z_{1} - \frac{(X_{i} - X_{1}) (Y_{2} Z_{3} - Y_{3} Z_{2}) + (Y_{i} - Y_{1}) (Z_{2} X_{3} - Z_{3} X_{2})}{X_{2} Y_{3} - X_{3} Y_{2}}

(13)

where (X_i, Y_i, Z_i) are the spatial 3D coordinates of the feature points.

3.3. Feature Point Matching

After feature extraction in Section 3.2, a large number of evenly distributed feature points and feature vectors can be obtained. Obtaining the feature points extracted by the two-period models, a strategy of region matching needs to be adopted. Specifically, in order to facilitate the storage and retrieval of 3D model data, the 3D model is subdivided into tiles during generation. For each tile of the base model, the tile is set as the reference region as shown in Figure 10a. Then, the match region in the model to be registered based on the positional error of the two models is determined. As shown in Figure 10b, the match region should be larger than reference region. By determining the overlapping areas of the two models in advance and implementing a partition matching strategy based on them, the matching efficiency can be effectively improved, and unnecessary matching operations in unrelated areas can be reduced.

The spatial coordinates of feature points can be obtained through Section 3.3. In this study, the Euclidean distance is used as the similarity measure of the feature vector, as shown in Equation (14).

d (x, y) = \sqrt{\sum_{j = 1}^{n} {(x_{j} - y_{j})}^{2}}

(14)

where d(x, y) is the Euclidean distance between two feature descriptors, and (x_j, y_j) are the feature vectors of the feature descriptors in the jth dimension; and n is the dimension of the feature vectors.

After calculating the Euclidean distance between the feature descriptors of the two 3D models, the nearest neighbor algorithm is used to obtain preliminary matching results.

3.4. Estimate 3D Transformation Model

After obtaining the corresponding points of the two-period 3D model, the spatial position system error of the two-period 3D model can be compensated by establishing the 3D transformation model. The common form of the 3D transformation model is shown in Equation (15).

[\begin{matrix} X^{'} \\ Y^{'} \\ Z^{'} \end{matrix}] = [\begin{matrix} Δ X \\ Δ Y \\ Δ Z \end{matrix}] + M R [\begin{matrix} X \\ Y \\ Z \end{matrix}]

(15)

where (X, Y, Z) are the coordinates of the original coordinate system; (X′, Y′, Z′) are the coordinates of the target coordinate system; (ΔX, ΔY, ΔZ) are three translation parameters that represent the difference between the coordinate origins of two spatial coordinate systems; M is the scale transformation matrix, which represents the scaling degree of two spatial coordinate systems; and R is the coordinate rotation matrix. Refer to Equation (16) for details.

\begin{matrix} R = R_{Ψ} R_{φ} R_{θ} \\ = [\begin{matrix} \cos Ψ \cos θ - \sin Ψ \sin φ \sin θ & \cos Ψ \sin θ + \sin Ψ \sin φ \cos θ & - \sin Ψ \cos φ \\ - \cos φ \sin θ & \cos φ \cos θ & \sin φ \\ \sin Ψ \cos θ + \cos Ψ \sin φ \sin θ & \sin Ψ \sin θ - \cos Ψ \sin φ \cos θ & \cos Ψ \cos φ \end{matrix}] \end{matrix}

(16)

where R_θ, R_φ, and R_Ψ are three rotation matrices. The coordinates are rotated sequentially around the Z-axis by θ, the new X-axis by φ, and the new Y-axis by Ψ to obtain rotation matrices R_θ, R_φ, and R_Ψ; they are then combine to form the overall rotation matrix R.

As shown in Table 2, different types of 3D transformation models are given according to different parameters. Due to the influence of many factors on the accuracy of 3D models, different scale differences along each axis is inevitable. The 9P method involves three different scale parameters as shown in Equation (17).

[\begin{matrix} X^{'} \\ Y^{'} \\ Z^{'} \end{matrix}] = [\begin{matrix} Δ X \\ Δ Y \\ Δ Z \end{matrix}] + [\begin{matrix} 1 + m_{x} & 0 & 0 \\ 0 & 1 + m_{y} & 0 \\ 0 & 0 & 1 + m_{z} \end{matrix}] R [\begin{matrix} X \\ Y \\ Z \end{matrix}]

(17)

where m_x, m_y, and m_z represent scale factors for the X, Y and Z axes, respectively.

If the scale transformation of the two models in the three directional axes is consistent, 7P can effectively represent the transformation relationship as shown in Equation (13), where m_x, m_y, and m_z in Equation (18) are all equal.

[\begin{matrix} X^{'} \\ Y^{'} \\ Z^{'} \end{matrix}] = [\begin{matrix} Δ X \\ Δ Y \\ Δ Z \end{matrix}] + (1 + m) R [\begin{matrix} X \\ Y \\ Z \end{matrix}]

(18)

where m is the scale factor for each axis direction.

Scale variations along different axes necessitate independent scaling parameters m_x, m_y, and m_z. Changes in UAV altitude (causing Z-axis scaling), lens distortion (causing X/Y-axis asymmetric scaling), and terrain undulations can introduce scale differences in various axes. The 9P model addresses this through Equation (17), whereas 7P assumes uniform scaling, leading to residual errors. These errors are particularly evident when there are differences in conditions, such as drone equipment and flight routes.

However, the increase in calculation parameters of 9P will lead to an increase in computational complexity. Therefore, if the prior knowledge shows that the scaling of each axis is consistent (such as the same sensor and route), 7P is preferred to improve efficiency; Otherwise, it is recommended that 9P be used to ensure accuracy.

When the value of scale transformation is equal to 1, it becomes a 3D rigid transformation model as shown in Equation (19).

[\begin{matrix} X^{'} \\ Y^{'} \\ Z^{'} \end{matrix}] = [\begin{matrix} Δ X \\ Δ Y \\ Δ Z \end{matrix}] + R [\begin{matrix} X \\ Y \\ Z \end{matrix}]

(19)

If the two-period models neither undergo scale transformation nor rotation, but only shift the coordinate origin, the transformation model can be represented by 3P as shown in Equation (20).

[\begin{matrix} X^{'} \\ Y^{'} \\ Z^{'} \end{matrix}] = [\begin{matrix} Δ X \\ Δ Y \\ Δ Z \end{matrix}] + [\begin{matrix} X \\ Y \\ Z \end{matrix}]

(20)

When the rotation angle between the two 3D coordinate systems is small, the rotation matrix can be linearized and approximated according to the principle of differential similarity transformation [38], and Equation (21) can be obtained combining Equations (16) and (17).

L = B X

(21)

where L is the observation vector matrix, B is the coefficient matrix, and X is the estimated vector matrix. The specific form of Equation (21) refers to the Equation (22).

[\begin{matrix} X^{'} - X \\ Y^{'} - Y \\ Z^{'} - Z \end{matrix}] = [\begin{matrix} 1 & 0 & 0 & Y & 0 & - Z & X & 0 & 0 \\ 0 & 1 & 0 & - X & Z & 0 & 0 & Y & 0 \\ 0 & 0 & 1 & 0 & - Y & X & 0 & 0 & Z \end{matrix}] [\begin{matrix} \begin{array}{l} \begin{array}{l} Δ X \\ Δ Y \end{array} \\ Δ Z \\ θ \end{array} \\ φ \\ Ψ \\ m_{x} \\ m_{y} \\ m_{z} \end{matrix}]

(22)

When a sufficient number of corresponding points of the two-period models are obtained, their spatial point coordinates are introduced into Equation (22), and the 3D transformation model parameters of the two-period 3D model can be obtained according to the adjustment estimation of the least squares principle. As shown in Equation (23):

X = {(B^{T} PB)}^{- 1} (B^{T} PL)

(23)

where P is the weight matrix. As the feature points obtained are considered to have equal weights, the weight matrix P is an identity matrix.

3.5. Incorrect Point Removal

After completing two-period 3D model matching based on Section 3.3, the feature points obtained will contain a large number of outlier points, which makes it very difficult to accurately estimate the 3D transformation model by randomly selecting feature points. Therefore, when estimating the parameters of the 3D transformation model based on Equation (21), an algorithm based on RANSAC was adopted in this study to remove the outlier points and obtain robust estimation results [28]. The modified RANSAC algorithm is introduced to eliminate outlier points in 3D space, accounting for complex geometric transformations that simple 2D RANSAC cannot handle. Specifically, a large error threshold is set to identify inliers. Three pairs of matched points are randomly selected to compute the model transformation parameters, and the number of inliers that meet the criteria is then evaluated against a predefined threshold. If the number of inliers exceeds this threshold, the match is considered valid. This process is repeated by randomly selecting new sets of three matched points until the dataset size is reduced below a certain threshold or a specified number of iterations is reached. Finally, the error threshold is progressively reduced and the process is iterated to refine the matching results. The corresponding points is obtained through the above process, but those still contain a small number of outlier points. After a large number of incorrect feature points have been eliminated, the correct points are obtained from the corresponding points by RANSAC.

3.6. Analysis of Feature Points

3.6.1. Analysis of Feature Point Distribution

In order to analyze the distribution uniformity of feature points extracted by different methods, the average nearest neighbor distance D_avg and the metric K are used. First, compute the Euclidean distances between the extracted feature point and all other points, and identify the distance to the nearest neighbor for that point. Then, aggregate the nearest neighbor distances for all feature points and compute their mean value D_avg as shown in Equation (24).

D_{avg} = \frac{1}{ν} \sum_{μ = 1}^{ν} (d_{\min} (X_{μ}, Y_{μ}))

(24)

where d_min(X_μ, Y_μ) is the nearest neighbor Euclidean distance between (X_μ, Y_μ) and other feature points in the x and y directions. ν is the number of feature points extracted.

Finally, the ratio of feature points whose nearest neighbor distances are less than the average nearest neighbor distance is determined to derive the metric K, as shown in Equation (25).

K = \frac{λ}{ν}

(25)

where λ is the number of feature points whose nearest neighbor distances are less than D_avg.

3.6.2. Analysis of Feature Point Positioning Accuracy

In order to analyze the positioning accuracy of feature points extracted, the Mean Error ME and the Root Mean Square Error RMSE of points that have registered should be calculated. Taking the X direction as an example, ME and RMSE are shown in Equation (26).

{\begin{cases} M E = \frac{1}{ν} \sum_{μ = 1}^{ν} ({X^{'}}_{μ} - X_{μ}) \\ R M S E = \sqrt{\frac{1}{ν} \sum_{μ = 1}^{ν} ({X^{'}}_{μ} - X_{μ})^{2}} \end{cases}

(26)

where

X_{μ}^{'}

is the coordinate in the X direction of the points of the model that have been registered, X_μ is the coordinate in the X direction of the points of the base model.

4. Experiment Results and Discussion

4.1. Quality Analysis of 3D Models

To reveal the different terrain features and regional variation patterns between the two study areas, a comparative analysis was conducted utilizing the multi-temporal DSM and the elevation difference results derived from the two-period DSM.

As shown in Figure 11, the DSM and elevation difference of the two-period DSM based on the two-period 3D model in S1 were obtained. The terrain of S1 slopes from high in the west to low in the east, with a minimal overall elevation difference of only 36 m between its highest and lowest points, is shown in Figure 11a. As shown in Figure 11b, the elevation difference of the two-period DSM is not significant. Most of the changes were caused by inconsistencies in the location of land features. For example, the change in vehicle position resulted in a significant elevation difference in the red box.

As shown in Figure 12, the DSM and elevation difference of the two-period DSM based on the two-period 3D model in S2 were obtained. The terrain of S2 is characterized by a low central area and higher elevations in the north and south. The terrain at S2 has significant undulations, with a total elevation change of 372 m, as shown in Figure 12a. As shown in Figure 12b, the elevation difference of the two-period DSM is significant. For example, significant changes have occurred in the two-period models, resulting in a large elevation difference in the red box. In addition, there were inconsistencies in the local deformations of the two-period models, according to the elevation difference in S2.

As shown in Figure 13, the fusion of the two 3D models is shown, and it can be observed that the two-period 3D models cannot be directly fused. The difference between the two-period models in S1 is small; however, it is significant in S2, which is consistent with the results in Figure 12.

4.2. Distribution and Accuracy Analysis of Feature Points Matching

In order to verify the effectiveness of the phase congruency feature extraction method proposed in this study, four groups of texture images were selected for feature point extraction and matching, as shown in Figure 14a, Figure 14e, Figure 14i and Figure 14m, respectively. Figure 14a shows the texture image in the study area of S1, which has continuous texture content and a large image size. Figure 14e,i are both texture images in the study area of S2, where Figure 14e has low texture content and granular texture size, while Figure 14i has obvious differences in radiation distortion. Figure 14m shows the lighting differences in the same area of the texture image.

The results extracted using the SIFT method and the SuperGlue method are also compared and analyzed. Based on the SIFT method, feature points are extracted for Figure 14a, Figure 14e, Figure 14i and Figure 14m, with the results shown in Figure 14b, Figure 14f, Figure 14j and Figure 14n, respectively. For the SuperGlue method, the results are shown in Figure 14c, Figure 14f, Figure 14i and Figure 14l, respectively. It should be noted that the SuperGlue method used the original outdoor model from the code without adjusting any parameters. Figure 14d,e,h,k show the distribution of feature points obtained based on the method of this study. From the results of Figure 14b,d, it can be seen that both SIFT and PC can extract a large number of feature points. The SuperGlue method extracts fewer feature points in continuous texture area (Figure 14c) compared to the other two methods. The results indicate that the SuperGlue method requires further investigation regarding its applicability to texture image matching. In fragmented texture area, PC outperforms both SIFT and SuperGlue. In images with significant radiometric distortion, both SIFT and SuperGlue struggle to extract effective feature points, while PC maintains better performance. In substantial lighting differences area, both SuperGlue and PC can extract evenly distributed feature points. SIFT is sensitive to changes in lighting, while Superglue and PC have strong robustness to changes in lighting. In addition, the Lofter algorithm, which is an end-to-end matching method without a feature detection process, cannot meet the needs of feature point coordinate mapping. It should be noted that all methods will obtain a certain number of feature points in the non-textured image area. These outliers need to be eliminated.

To ensure optimal performance of the SIFT algorithm during feature extraction, we chose to collect data on clear, sunny days to minimize the impact of variable illumination conditions. The information on feature points extracted by SIFT and PC used in the S1 and S2 study areas is shown in Table 3. The experimental results reveal significant regional variations in feature extraction performance. In the vegetation-covered S1 area, both SIFT and PC demonstrate effective feature detection capabilities, with the PC method extracting 42,225 correct points compared to 8506 points from SIFT. However, in the desert-dominated S2 area, a marked performance divergence emerges. The SIFT method exhibits limited robustness, successfully detecting only 79 correct points. The D_avg values of S1 and S2 obtained using SIFT are less than those obtained using PC. The K values of S1 and S2 obtained using SIFT are more than those obtained using PC. Therefore, the method proposed in this study can obtain a larger number of uniformly distributed feature points. However, the computational efficiency of feature extraction of the PC method is much lower than that of SIFT. The time required to extract feature points from each texture image by PC has increased from around 0.1 s by SIFT to 3 s.

The feature points, corresponding points, and correct points extracted by SIFT in S1 are shown in Figure 15a, Figure 15b and Figure 15c, respectively. Figure 15d–f show the distribution of the feature points, corresponding points, and correct points extracted by PC in S1.

The feature points, corresponding points, and correct points extracted by SIFT in S2 are shown in Figure 16a, Figure 16b, and Figure 16c, respectively. Figure 16d–f show the distribution of the feature points, corresponding points, and correct points extracted by PC in S2. The corresponding points extracted by SIFT are distributed on the upper side of the model, while the corresponding points extracted by PC method have a better distribution.

As shown in Figure 17, both SIFT and PC can extract a large number of correct points in S1, but the point distribution extracted by the PC method is denser. However, the correct points extracted by SIFT in S2 are concentrated on the left side of the model, while the correct points extracted by PC are well distributed in the model.

4.3. Accuracy Analysis of 3D Model Registration

In this study, two experiments were designed to evaluate the geometric registration accuracy of a multi-temporal 3D model. The first experiment, based on the same geometric registration model, is used to evaluate the reliability of the corresponding points obtained by automatic matching. The second experiment, using different types of geometric registration models with the same correct points, is used to assess the adaptability of the registration models.

4.3.1. Reliability Evaluation of Points

For the first experiment, the manual measurement (MM) method was used to obtain 14 GCPs in S1, and 12 GCPs in S2, and their distributions were shown in Figure 18. For each method, the same geometric registration model 7P was uniformly used.

As shown in Figure 19, DX, DY, and DZ represent the coordinate differences of the points in the X, Y, and Z directions in S1. From Figure 19a, it can be seen that the error of the corresponding points generated directly using POS information in the two-period models of S1 is obvious. When using the methods of MM, SIFT, and PC, the positioning accuracy of the corresponding points has been improved, as shown in Figure 19b–d. Compared to the positioning accuracy of the corresponding points, the accuracy of the correct points is better by SIFT and PC, as shown in Figure 19e,f.

According to the results in Table 4, the overall ME of the corresponding point coordinates of the 3D model directly generated using airborne POS data is 0.863 m. The positioning accuracy is low, so geometric registration of 3D models is necessary. The overall ME with the method of MM has been reduced from 0.863 m to 0.046 m; however, there are still a few systematic errors that have not been eliminated. Compared to the method of MM, the overall ME of the corresponding and correct point coordinates using the method proposed in this study has been further reduced. The overall ME of the correct point coordinates using the method of SIFT is lower than that of the corresponding point coordinates. Compared to the overall RMSE of the corresponding point coordinates using the method of SIFT and PC, that of the corresponding point coordinates has been reduced from 0.122 m and 0.152 m to 0.102 m and 0.110 m, indicating that further removal of corresponding points is necessary.

As shown in Figure 20, the coordinate differences in S2 in three directions are presented. From Figure 20a, it can be seen that the error of the corresponding points generated directly using POS information in the two-period models of S2 is relatively large, especially in the X direction, within the range of 80–120 m. This indicates that there is a significant error in utilizing the coordinate information of drone POS data, therefore additional coordinate transformation data acquisition is required for the two-period models to achieve accurate registration of the model. When using the methods of MM, the positioning accuracy of the corresponding points has been improved, as shown in Figure 20b. The corresponding points obtained by SIFT and PC still have significant systematic errors, as shown in Figure 20c,d. Compared to the positioning accuracy of the corresponding points, the accuracy of the correct points is better by SIFT and PC, as shown in Figure 20e,f. However, the number of points has significantly decreased. However, the number of points has significantly decreased, especially those obtained by SIFT.

According to the results in Table 5, the overall ME and RMSE of the corresponding point coordinates of the 3D model directly generated using airborne POS data is 107.277 m and 108.211 m, which indicates that there are significant differences in the two-period 3D model. The overall ME with the method of MM has been reduced to 3.105 m, but there are still large systematic errors that have not been eliminated. Compared to the method of SIFT, the overall ME and RMSE of the corresponding and correct point coordinates using the method of PC has been further reduced. Compared to the overall RMSE of the corresponding point coordinates using the method of SIFT and PC, that of the corresponding point coordinates has been reduced from 13.326 m and 10.485 m to 9.484 m and 8.360 m, respectively, which indicates that there are significant differences between the two-period modelss.

To enhance the credibility of the experimental results, statistical significance tests were conducted. Specifically, paired t-tests were performed to compare the ME between PC and the POS-based method in both study areas (S1 and S2). The results demonstrated statistically significant reductions in ME for both regions, with the calculated t-statistics far exceeding the critical values (p < 0.001). 95% confidence intervals for RMSE are (0.098 m, 0.122 m) in S1 and (7.8 m, 8.9 m) in S2. These results confirm that the PC method achieves significantly higher accuracy compared to the POS-based approach, with narrow confidence intervals reflecting the robustness of the proposed method.

Comparing the experiments conducted in S1 and S2, it was found that both MM and the method proposed by this study can achieve good results in S1. However, there is a significant difference between the results obtained by MM and the method proposed by this study in S2, with the error of automatic registration decreasing from 3.105 m to within 1 m. Overall, the registration accuracy of S2 is much lower than that of S1. The primary reason for the significantly lower registration accuracy of the S2 area compared to S1 is attributed to two main factors. First, the two-period 3D model for the S2 region was acquired over a prolonged time interval using different UAV platforms and following distinct flight routes. This variation in data acquisition conditions, including differences in flight altitude, speed, and sensor specifications, introduces inconsistencies in the data quality. Second, S2 experienced substantial topographical changes during the time interval. Such alterations in terrain can lead to discrepancies in feature points and overall spatial structure between the two-period modelss, complicating the registration process and resulting in increased errors. As shown in Figure 21, due to the influence of camera parameters, flight altitude, flight route, weather, wind speed, wind direction, camera attitude and other factors at the time of the shooting, image quality problems lead to model errors in the 3D models, resulting in varying degrees of model deformation in different regions of a single model or in the same region of the two-period models.

As shown in Figure 22, the corresponding points of the two-period models have been successfully obtained using PC. However, the coordinate differences between the two models are still significant after registration. Therefore, these points cannot be used for estimating 3D transformation model and need to be removed. Due to significant differences in the two-period models of S2, a large number of corresponding points were removed.

From the first experiment, the geometric registration of 3D models is crucial for establishing a unified spatial reference frame across multi-temporal 3D models. This ensures consistency and comparability in spatial analysis and measurements. The method proposed in this study enables the acquisition of a 3D transformation model with higher precision than MM. This reduces human dependency and enhances the efficiency of the registration process.

4.3.2. Reliability Evaluation of Geometric Registration Models

For the second experiment, different types of geometric registration models with the same correct points in S1 and S2 are used to assess the adaptability of the registration models. Three geometric registration models, 3P, 7P, and 9P, are used for comparison.

In Table 6, ME and RMSE of the correct points have been improved by three methods, from 3P to 7P and then to 9P. The overall ME of 3P in S1 method can reach 0.036 m, indicating that there is no significant coordinate rotation or scale deformation in the two-period models. Both the methods of 7P and 9P can be used for estimating the 3D transformation model. The overall ME in S2 is less than 1 m, but the overall RMSE is greater than 8 m. This indicates that there is no significant systematic bias in the correct points of the two-period models; however, the dispersion is relatively large, which is due to the significant differences between the two-period modelss.

From the second experiment, the accuracy of the geometric registration model 3P is insufficient. The geometric registration 7P model can solve most of the system errors, and 9P is the optimal choice.

5. Conclusions

The study proposes an automatic registration method for multi-temporal 3D models. This method leverages the Phase Congruency (PC) algorithm to extract robust feature points from texture images of 3D models, even under challenging conditions, such as low texture, non-linear radiation distortion, and illumination differences. By mapping these 2D texture coordinates to 3D spatial coordinates and estimating a 3D transformation model, the proposed method ensures geometric consistency and avoids reliance on GCPs. Then, an incorrect point removal method for 3D geometric registration has been proposed to improve the reliability of feature points used for estimating transformation models. The first experiment demonstrates that the method proposed in this study performs remarkably well in improving the positioning accuracy of feature point coordinates. In the condition of relatively flat terrain, abundant textures, consistent flight conditions, and an unaltered study area, the systematic error can be effectively reduced to below 0.001 m through the utilization of high-precision instrumentation, meticulous data processing, and independent validation. Even in areas where the 3D model undergoes significant changes, the error remains as low as 0.5 m, which is far superior to the 3.105 m of the MM method. The second experiment further reveals four transformation models and demonstrates that the 9P model achieves the highest accuracy in multi-temporal 3D models with non-uniform scaling. Specifically, 3P is not applicable for 3D geometric registration; 7P can meet the general requirements of 3D geometric registration, and 9P stands out as the optimal choice among all tested models.

The computational time for feature extraction using PC is significantly higher than traditional methods like SIFT. Future work will explore further optimizations, such as parallel computing, adaptive parameter tuning and hardware acceleration, to enhance computational efficiency while preserving the robustness of PC. In desert areas, the current method still has certain limitations. For example, the lack of distinct features and the dynamic nature of the terrain pose challenges for robust feature point detection and precise 3D model alignment. To enhance the extraction effect and improve the registration accuracy of 3D models in desert areas, further research is needed. This could involve developing specialized algorithms that can better handle the unique challenges of desert environments. Additionally, incorporating additional data sources, such as LiDAR or multispectral imagery, may help in overcoming the limitations of texture-based extraction.

The experimental results show that the coordinates obtained based on POS data have significant positioning errors, and the method proposed in this study can reduce the coordinate errors between the two-period modelss. This method does not require measuring GCPs before data collection, nor does it require manual measurement for 3D geometric registration, ensuring that the obtained 3D model has a high-precision unified spatial reference benchmark, streamlining processes to reduce resource intensity and enhancing economic efficiency.

Author Contributions

Conceptualization, C.R.; methodology, C.R.; software, C.R.; validation, C.R. and K.F.; formal analysis, K.F.; investigation, H.S.; resources, H.S.; data curation, S.L.; writing—original draft preparation, K.F.; writing—review and editing, K.F.; visualization, S.L.; supervision, C.R.; project administration, K.F.; funding acquisition, C.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Research on the Key Technologies of Drone Remote Sensing Patrol and Monitoring in Clean Energy Engineering, grant number XBY-YBKJ-2023-23, the Research and Application of AI-Integrated Monitoring and Early Warning Technology in Hydropower and Water Conservancy Engineering, grant number XBY-KJ-2024-37, and the National Natural Science Foundation of China, grant number (42474029).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Haixing Shang was employed by the company Northwest Engineering Corporation Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Pajares, G. Overview and Current Status of Remote Sensing Applications Based on Unmanned Aerial Vehicles (UAVs). Photogramm. Eng. Remote Sens. 2015, 81, 281–329. [Google Scholar] [CrossRef]
Azmi, S.M.; Ahmad, B.; Ahmad, A. Accuracy assessment of topographic mapping using UAV image integrated with satellite images. In Proceedings of the 8th International Symposium of the Digital Earth (Isde8), Kuching, Sarawak, 26–29 August 2013; Volume 18. [Google Scholar] [CrossRef]
Schonberger, J.L.; Frahm, J.-M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Smith, M.W.; Vericat, D. From experimental plots to experimental landscapes: Topography, erosion and deposition in sub-humid badlands from structure-from-motion photogrammetry. Earth Surf. Process. Landf. 2015, 40, 1656–1671. [Google Scholar]
Zhao, S.Z.; Kang, F.; Li, J.J.; Ma, C.B. Structural health monitoring and inspection of dams based on UAV photogrammetry with image 3D reconstruction. Autom. Constr. 2021, 130, 103832. [Google Scholar] [CrossRef]
Remondino, F.; Barazzetti, L.; Nex, F.; Scaioni, M.; Sarazzi, D. UAV photogrammetry for mapping and 3D modeling: Current status and future perspectives. In Proceedings of the International Conference on Unmanned Aerial Vehicle in Geomatics (UAV-g), Zurich, Switzerland, 14–16 September 2011; pp. 25–31. [Google Scholar]
Yang, Y.; Wang, X.; Jin, W.; Cao, J.; Cheng, B.; Zhou, S. Characteristics analysis of the reservoir landslides base on unmanned aerial vehicle (UAV) scanning technology at the Maoergai Hydropower Station, Southwest China. In Proceedings of the IOP Conference Series: Earth and Environmental Science, Hangzhou, China, 17–19 July 2019; p. 012009. [Google Scholar]
Sun, J.W.; Yuan, G.Q.; Song, L.Y.; Zhang, H.W. Unmanned Aerial Vehicles (UAVs) in Landslide Investigation and Monitoring: A Review. Drones 2024, 8, 30. [Google Scholar] [CrossRef]
Sestras, P.; Bilasco, S.; Rosca, S.; Dudic, B.; Hysa, A.; Spalevic, V. Geodetic and UAV Monitoring in the Sustainable Management of Shallow Landslides and Erosion of a Susceptible Urban Environment. Remote Sens. 2021, 13, 385. [Google Scholar] [CrossRef]
Chen, G.; Qiang, X.; Xiujun, D.; Yuanzhen, J.; Hao, N. Application of UAV photogrammetry technology in the emergency rescue of catastrophic geohazards. Bull. Surv. 2020, 6, 6–11, 73. [Google Scholar]
Gonçalves, J.A.; Henriques, R. UAV photogrammetry for topographic monitoring of coastal areas. ISPRS J. Photogramm. Remote Sens. 2015, 104, 101–111. [Google Scholar] [CrossRef]
Yang, J.; Li, X.P.; Luo, L.; Zhao, L.W.; Wei, J.; Ma, T. New Supplementary Photography Methods after the Anomalous of Ground Control Points in UAV Structure-from-Motion Photogrammetry. Drones 2022, 6, 105. [Google Scholar] [CrossRef]
Forlani, G.; Dall’Asta, E.; Diotri, F.; di Cella, U.M.; Roncella, R.; Santise, M. Quality Assessment of DSMs Produced from UAV Flights Georeferenced with On-Board RTK Positioning. Remote Sens. 2018, 10, 311. [Google Scholar] [CrossRef]
Štroner, M.; Urban, R.; Reindl, T.; Seidl, J.; Brouček, J. Evaluation of the georeferencing accuracy of a photogrammetric model using a quadrocopter with onboard GNSS RTK. Sensors 2020, 20, 2318. [Google Scholar] [CrossRef]
Taddia, Y.; Stecchi, F.; Pellegrinelli, A. Coastal Mapping Using DJI Phantom 4 RTK in Post-Processing Kinematic Mode. Drones 2020, 4, 9. [Google Scholar] [CrossRef]
Shang, H.X.; Ju, G.H.; Li, G.L.; Li, Z.F.; Ren, C.F. An Automatic Geometric Registration Method for Multi Temporal 3D Models. Appl. Sci. 2022, 12, 11070. [Google Scholar] [CrossRef]
Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 15–17 September 1988; pp. 10–5244. [Google Scholar]
Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Proceedings, Part I 9. pp. 430–443. [Google Scholar]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar]
Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded up robust features. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Proceedings, Part I 9. Springer: Berlin/Heidelberg, Germany, 2006; Volume 3951, pp. 404–417. [Google Scholar] [CrossRef]
Sarlin, P.-E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching With Graph Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4937–4946. [Google Scholar]
Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8922–8931. [Google Scholar]
Morrone, M.C.; Owens, R.A. Feature detection from local energy. Pattern Recognit. Lett. 1987, 6, 303–313. [Google Scholar]
Ye, Y.X.; Shan, J.; Bruzzone, L.; Shen, L. Robust Registration of Multimodal Remote Sensing Images Based on Structural Similarity. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2941–2958. [Google Scholar] [CrossRef]
Ye, Y.X.; Shan, J.; Hao, S.Y.; Bruzzone, L.; Qin, Y. A local phase based invariant feature for remote sensing image matching. ISPRS J. Photogramm. Remote Sens. 2018, 142, 205–221. [Google Scholar] [CrossRef]
Fan, J.W.; Wu, Y.; Li, M.; Liang, W.K.; Cao, Y.C. SAR and Optical Image Registration Using Nonlinear Diffusion and Phase Congruency Structural Descriptor. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5368–5379. [Google Scholar] [CrossRef]
Li, J.; Hu, Q.; Ai, M. RIFT: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Trans. Image Process. 2019, 29, 3296–3310. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar]
Torr, P.H.; Zisserman, A. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 2000, 78, 138–156. [Google Scholar]
Wu, Y.; Ma, W.; Gong, M.; Su, L.; Jiao, L. A novel point-matching algorithm based on fast sample consensus for image registration. IEEE Geosci. Remote Sens. Lett. 2014, 12, 43–47. [Google Scholar] [CrossRef]
Yang, Y.; Wang, L.; Yang, F.; Hu, N.; Liang, L. Evaluation of the coordination between eco-environment and socioeconomy under the “Ecological County Strategy” in western China: A case study of Meixian. Ecol. Indic. 2021, 125, 107585. [Google Scholar] [CrossRef]
Wan, W.; Yan, C. Research progress of eco-environmental degradation in Alxa Plateau. J. Earth Environ. 2018, 9, 109–122. [Google Scholar]
Kovesi, P. Phase congruency detects corners and edges. In Proceedings of the Australian Pattern Recognition Society Conference: DICTA, Sydney, Australia, 10–12 December 2003. [Google Scholar]
Wang, F.; Han, J. Iris recognition based on 2D Log-Gabor filtering. J. Syst. Simul. 2008, 20, 11-1808. [Google Scholar]
Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [Google Scholar] [CrossRef]
Fischer, S.; Šroubek, F.; Perrinet, L.; Redondo, R.; Cristóbal, G. Self-invertible 2D log-Gabor wavelets. Int. J. Comput. Vis. 2007, 75, 231–246. [Google Scholar] [CrossRef]
Yao, Y.; Zhang, Y.; Wan, Y.; Liu, X.; Guo, H. Heterologous images matching considering anisotropic weighted moment and absolute phase orientation. Geomat. Inf. Sci. Wuhan Univ. 2021, 46, 1727–1736. [Google Scholar]
Zeng, W.; Tao, B.J.G. Non-linear adjustment model of three-dimensional coordinate transformation. Geomat. Inf. Sci. Wuhan Univ. 2003, 28, 566–568. [Google Scholar]

Figure 1. The Digital Orthophoto Map (DOM) of S1.

Figure 2. The DOM of S2.

Figure 3. UAV platforms and camera lens. (a) FEIMA D2000S. (b) DJI M300 RTK. (c) DJI Mavic 2 Pro.

Figure 4. The 3D models. (a) The 3D model in S1. (b) The 3D model in S2.

Figure 5. The procedure diagram of the proposed multi-temporal 3D models registration method.

Figure 6. The composition of the 3D model. (a) The 3D model. (b) The triangles formed by the 3D vertices. (c) The texture image of the 3D model and the texture coordinates corresponding to the vertices.

Figure 7. Feature detection. (a) The original image. (b) The normalized maximum moment map. (c) The feature points. (d–f) corresponds to (a–c) and are the second period images.

Figure 8. Coordinate normalization of feature points.

Figure 9. The mapping relationship between texture triangle and model triangle.

Figure 10. The matching strategy. (a) The based model; (b) The model to be registered.

Figure 11. The Digital Surface Model (DSM) and the elevation difference of two-period DSM in S1. (a) The DSM. (b) The elevation difference. The red box represents area with significant local changes.

Figure 12. The DSM and the elevation difference of two-period DSM in S2. (a) The DSM. (b) The elevation difference. The red box represents area with significant local changes.

Figure 13. The fusion effect of the two-period models. (a) The models in S1. (b) The models in S2.

Figure 14. The texture image and the distribution of feature points. (a) The continuous texture image. (e) The fragmented texture image. (i) The image with radiation differences. (m) The image with lighting differences. (b,f,j,n), (c,g,k,o), and (d,h,l,p) are the feature points extracted from (a,e,i,m) by SIFT, SuperGlue, and PC.

Figure 15. Distribution of feature points in S1. (a) The feature points extracted by SIFT are shown in red. (b) The corresponding points extracted by SIFT are shown in cyan. (c) The correct points extracted by SIFT are shown in pink. (d–f) corresponds to (a–c) and are the points extracted by PC.

Figure 16. Distribution of feature points in S2. (a) The feature points extracted by SIFT are shown in red. (b) The corresponding points extracted by SIFT are shown in cyan. (c) The correct points extracted by SIFT are shown in pink. (d–f) corresponds to (a–c) and are the points extracted by PC.

Figure 17. Distribution of feature points in the local 3D model. (a) The correct points extracted by SIFT in S1. (b) The correct points extracted by PC in S1. (c,d) corresponds to (a,b) and are the points extracted in S2.

Figure 18. Distribution of GCPs in the 3D model. (a) The GCPs in S1. (b) The GCPs in S2.

Figure 19. The point coordinate difference in S1. (a) The difference obtained from POS data. (b) The difference obtained by MM. (c) The difference of the corresponding point coordinates obtained by SIFT. (d) The difference of the correct point coordinates obtained by PC. (e) The difference of the corresponding point coordinates obtained by PC. (f) The difference of the correct point coordinates obtained by PC.

Figure 20. The point coordinate difference in S2. (a) The difference obtained from POS data. (b) The difference obtained by MM. (c) The difference of the corresponding point coordinates obtained by SIFT. (d) The difference of the correct point coordinates obtained by PC. (e) The difference of the corresponding point coordinates obtained by SIFT. (f) The difference of the correct point coordinates obtained by PC.

Figure 21. Inconsistent local deformation of 3D model. The red box represents the area where changes have occurred. (a) Region before the change. (b) Region after the change.

Figure 22. The corresponding points in areas of inconsistent local deformation of the models. The red and green points represent the corresponding points extracted from the first and second period models, respectively. (a) The first region in the first period model. (b) The first region in the second period model. (c) The second region in the first period model. (d) The second region in the second period model.

Table 1. Camera parameters.

UAV Platform	Hovering Time (min)	Camera	Focal Length (mm)	Sensor Size (pixel)
FEIMA D2000	60	SONY ILCE-7RM4	25	9504 × 6336
DJI M300 RTK	55	DJI Zenmuse P1	35	4096 × 2730
DJI Mavic 2 Pro	20	Hasselblad L1D-20c	10	5742 × 3648

Table 2. 3D Transformation model.

Model Type	Transformation	Degrees of Freedom	Geometric Characteristics Preserved
three parameters (3P)	translation	3	orientation
six parameters (6P)	rigid	6	lengths
seven parameters (7P)	similarity	7	angles
nine parameters (9P)	affine	9	parallelism

Table 3. Feature point extraction.

Study Area	Match Method	Feature Points	Corresponding Points	Correct Points	D_avg (m)	K	Computational Efficiency (s)
S1	SIFT	26,833	8506	7248	0.260	0.701	0.099
S1	PC	72,501	42,225	20,116	0.132	0.795	3.232
S2	SIFT	16,784	2996	79	8.417	0.709	0.116
S2	PC	20,122	3369	637	2.149	0.903	3.893

Table 4. Comparison of feature point accuracy of different methods in S1.

		POS	MM	SIFT		PC
		POS	MM	Corresponding Points	Correct Points	Corresponding Points	Correct Points
ME (m)	X	−0.432	0.008	−0.003	−1.821 × 10⁻⁵	−2.346 × 10⁻⁵	−1.920 × 10⁻⁵
	Y	−0.130	0.045	0.003	−6.091 × 10⁻⁶	5.936 × 10⁻⁷	1.802 × 10⁻⁷
	Z	0.736	0.001	2.786 × 10⁻⁴	−2.182 × 10⁻⁵	−1.921 × 10⁻⁵	−2.478 × 10⁻⁵
	Overall	0.863	0.046	0.005	2.906 × 10⁻⁵	3.033 × 10⁻⁵	3.135 × 10⁻⁵
RMSE (m)	X	0.433	0.022	0.022	0.009	0.022	0.016
	Y	0.196	0.156	0.120	0.102	0.149	0.109
	Z	0.737	0.016	0.010	0.009	0.016	0.010
	Overall	0.877	0.159	0.122	0.102	0.152	0.110

Table 5. Comparison of feature point accuracy of different methods in S2.

		POS	MM	SIFT		PC
		POS	MM	Corresponding Points	Correct Points	Corresponding Points	Correct Points
ME (m)	X	101.426	−1.624	0.642	0.424	0.369	0.427
	Y	−31.851	2.634	0.667	0.837	−0.179	−0.074
	Z	14.372	−0.250	0.018	−0.044	0.175	0.145
	Overall	107.277	3.105	0.926	0.939	0.446	0.457
RMSE (m)	X	101.879	13.155	8.727	4.068	6.712	6.024
	Y	33.007	8.345	8.371	7.003	6.976	3.535
	Z	15.522	5.608	5.599	4.934	3.640	0.457
	Overall	108.211	16.557	13.326	9.484	10.485	8.360

Table 6. Comparison of correct point accuracy of different geometric registration models.

		S1			S2
		3P	7P	9P	3P	7P	9P
ME (m)	X	0.004	−1.920 × 10⁻⁵	8.648 × 10⁻⁸	2.839	0.427	0.192
	Y	−0.035	1.802 × 10⁻⁷	2.878 × 10⁻⁶	4.367	−0.074	0.227
	Z	0.001	−2.478 × 10⁻⁵	1.780 × 10⁻⁶	−0.540	0.145	0.126
	Overall	0.036	3.135 × 10⁻⁵	6.026 × 10⁻⁷	5.237	0.457	0.323
RMSE (m)	X	0.016	0.016	0.016	7.484	4.594	4.511
	Y	0.127	0.109	0.084	11.881	6.024	6.115
	Z	0.011	0.010	0.008	4.242	3.536	3.129
	Overall	0.129	0.110	0.086	14.668	8.360	8.218

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, C.; Feng, K.; Shang, H.; Li, S. Automatic Registration of Multi-Temporal 3D Models Based on Phase Congruency Method. Remote Sens. 2025, 17, 1328. https://doi.org/10.3390/rs17081328

AMA Style

Ren C, Feng K, Shang H, Li S. Automatic Registration of Multi-Temporal 3D Models Based on Phase Congruency Method. Remote Sensing. 2025; 17(8):1328. https://doi.org/10.3390/rs17081328

Chicago/Turabian Style

Ren, Chaofeng, Kenan Feng, Haixing Shang, and Shiyuan Li. 2025. "Automatic Registration of Multi-Temporal 3D Models Based on Phase Congruency Method" Remote Sensing 17, no. 8: 1328. https://doi.org/10.3390/rs17081328

APA Style

Ren, C., Feng, K., Shang, H., & Li, S. (2025). Automatic Registration of Multi-Temporal 3D Models Based on Phase Congruency Method. Remote Sensing, 17(8), 1328. https://doi.org/10.3390/rs17081328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Registration of Multi-Temporal 3D Models Based on Phase Congruency Method

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Areas

2.2. UAV Data Acquisition

3. Methodology

3.1. Feature Point Extraction

3.1.1. Phase Congruency (PC) Map Generation

3.1.2. Feature Detection

3.1.3. Feature Description

3.2. Feature Point Mapping

3.3. Feature Point Matching

3.4. Estimate 3D Transformation Model

3.5. Incorrect Point Removal

3.6. Analysis of Feature Points

3.6.1. Analysis of Feature Point Distribution

3.6.2. Analysis of Feature Point Positioning Accuracy

4. Experiment Results and Discussion

4.1. Quality Analysis of 3D Models

4.2. Distribution and Accuracy Analysis of Feature Points Matching

4.3. Accuracy Analysis of 3D Model Registration

4.3.1. Reliability Evaluation of Points

4.3.2. Reliability Evaluation of Geometric Registration Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI