Coarse-to-Fine Image Registration for Multi-Temporal High Resolution Remote Sensing Based on a Low-Rank Constraint

Zhang, Peijing; Luo, Xiaoyan; Ma, Yan; Wang, Chengyi; Wang, Wei; Qian, Xu

doi:10.3390/rs14030573

Open AccessArticle

Coarse-to-Fine Image Registration for Multi-Temporal High Resolution Remote Sensing Based on a Low-Rank Constraint

by

Peijing Zhang

^1,2,3,

Xiaoyan Luo

^4,*,

Yan Ma

^5,6

,

Chengyi Wang

⁷,

Wei Wang

⁸ and

Xu Qian

²

¹

School for Informatics and Cyber Security, People’s Public Security University of China, Beijing 100038, China

²

School of Mechanical Electronic & Information Engineering, China University of Mining and Technology, Beijing 100083, China

³

College of Geoscience and Surveying Engineering, China University of Mining and Technology, Beijing 100083, China

⁴

School of Astronautics, Beihang University, Beijing 102206, China

⁵

Beijing Engineering Research Center of Smart Mechanical Innovation Design Service, Beijing Union University, Beijing 100101, China

⁶

College of Robotics, Beijing Union University, Beijing 100101, China

⁷

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China

⁸

Remote Sensing Center of Public Security, People’s Public Security University of China, Beijing 100038, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(3), 573; https://doi.org/10.3390/rs14030573

Submission received: 16 November 2021 / Revised: 7 January 2022 / Accepted: 20 January 2022 / Published: 25 January 2022

Download

Browse Figures

Versions Notes

Abstract

:

For multi-temporal high resolution remote sensing images, the image registration is important but difficult due to the high resolution and low-stability land-cover. Especially, the changing of land-cover, solar altitude angle, radiation intensity, and terrain fluctuation distortion in the overlapping areas can represent different image characteristics. These time-varying properties cause traditional registration methods with known reference information to fault. Therefore, in this paper we propose a comprehensive coarse-to-fine registration (CCFR) algorithm. First, we design a low-rank constraint-based batch reference extraction (LRC-BRE) method. Under the low-rank constraint, the stable features with highly spatial co-occurrence can be reconstructed via matrix decomposition, and are set as reference images to batch registration. Second, we improve the general feature registration with block feature matching and local linear transformation (BFM-LLT) operators including match outlier filtering (MOF) on regional mutual information and dual-weighted block fitting (DWBF). Finally, based on combining LRC-BRE and BFM-LLT, CCFR is integrated. Experimental results show that the proposed method has a good batch alignment effect, especially in the registration of large difference image pairs. The proposed CCFR achieves a significant performance improvement over many state-of-the-art registration algorithms.

Keywords:

registration; multi-temporal remote sensing images; high resolution; low-rank matrix factorization; optical satellite

Graphical Abstract

1. Introduction

Multi-temporal remote sensing image registration is a key preprocessing step for remote sensing image fusion [1,2], change detection [3,4], super-resolution reconstruction [5], and collaborative analysis [6]. In recent years, remote sensing image registration has mostly focused on solving the effects of radiation differences and geometric distortions by constructing robust feature descriptors [7,8], better screening of mismatched features [9,10], integrating multiple methods [11,12], and so on. Although research on multi-temporal image registration in complex terrain areas has been conducted [13,14], there are few studies devoted to accurate registration in complex terrain areas with low-stability land-cover. The low stability of features mentioned in this paper mainly refers to the land-cover changes caused by seasonal rotation, geological alteration, human activities, and other factors, which lead to huge imaging differences of the same plot in different time series. Common low-stability features include mining, cultivated farmland, and seasonal deciduous woody vegetation under high resolution imaging conditions. Complex terrain mainly refers to undulating mountains and large-scale artificial elevation features (such as dams and buildings) under high resolution conditions. For multi-temporal remote sensing images, radiation differences including the difference of radiation intensity and the change of solar altitude angle are also an important factor leading to differences in ground object imaging; in particular, changes of solar altitude angles lead to different degrees of shadow occlusion and contour diversity of mountainous and elevation ground objects. In addition, differences of shooting inclinations lead to complex geometric deformations of mountainous and elevation ground objects. Mountainous areas with different seasonal vegetation coverage, irregular mining, and beneficiation activities are typical representatives of low-stability and complex terrain, as shown in Figure 1. Due to the mixed influence of the above multiple factors, the remote sensing images have different period features in terms of gray, texture, and contour features, which bring challenges to the accurate temporal registration.

Currently, remote sensing images are registered based on the geocoding generated by satellite orbit parameters [15]. Affected by the accuracy of satellite positioning and control, these technologies can largely eliminate the global geometric distortion but generate images with ten to dozens of pixel offsets. For mountainous areas with complex terrain, in order to improve the registration accuracy based on geocoding, it is necessary to correct the image point element offset caused by terrain fluctuation with the help of a high-precision digital elevation model (DEM) [16] and a reference image map corrected by ground control points. Limited by the accuracy of the available DEM and the lack of reference images corrected by sufficient ground control points, there are still different degrees of alignment errors in local areas with serious distortion in mountainous areas after ortho-rectification [17,18]. For higher image alignment accuracy, it is necessary to further correct with the help of an automatic image registration method. In addition, the low-stability surface makes it difficult to automatically match the control points between the sensed image and the reference image, in which accurate manual auxiliary matching is also a challenging work.

The performance of an image registration algorithm depends on its image matching step and the transformation function. Common image matching drivers include intensity matching, feature matching, density matching, and other similarity matching algorithms. According to different domains of transformation, spatial transformation estimation methods include global mapping models, local mapping models, and elastic registration [19,20].

The intensity-based method usually uses the gray information of the image to measure the similarity and realize the global spatial transformation in the area. Based on statistical correlation such as normalized cross correlation (NCC) [21,22], mutual information (MI) [23,24] has a high applicability for remote sensing image registration without prominent detail features, but it is unable to deal with complex area deformation. In order to solve the registration of large-scale geometric deformation and complex terrain, most of the relevant studies focus on the combination of regional intensity similarity and local features. Usually, the feature-based method is used as a coarse registration to calculate parameters, which can be used as the initial condition to perform fine registration using an intensity-based method [11,25]. Some researchers improve the performance of registration by applying gradient features [26] and wavelet-like features [27] to image descriptors.

The feature-based methods separate image feature matching and spatial transformation, which can be used with a variety of local transformation modes to deal with complex deformation. SIFT [28] and its variants (SURF [29], ASIFT [30], and PCA-SIFT [31]) are widely used feature extractors. These extracted features are robust to changes in terms of brightness, rotation, and scaling. The ORB [32] and BRISK [33] algorithms belong to a corner detection method based on FAST [34], which improves the computational efficiency by using binary feature description words. In recent years, for the complex nonlinear radiation intensity difference between multimodal remote sensing images, feature algorithms, such as deny oriented gradient feature (DOGH) [7] and histogram of oriented phase convergence (HOPC) [35], have been proposed and achieved good experimental results in feature matching of optical and SAR image pairs. In order to solve the significant radiation difference between multi-temporal optical remote sensing images, the PSO-SIFT [8] algorithm gives a new gradient definition to each pixel in Gauss scale space to increase the robustness of the gradient direction and size difference in the same region. Aiming at the significant geometric distortion and radiation difference between multi-source optical remote sensing images, Chen et al. [10] proposed an iterative scale invariant feature transformation method to continuously improve the registration accuracy of remote sensing images. For local geometric distortion, Sedaghat and Ebadi [36] used an adaptive histogram quantization strategy to calculate feature position and gradient direction, so that they are robust to local view distortion. However, the above feature matching methods used to improve the robustness of feature matching for significant radiation difference and local geometric distortion achieve good results only in the medium–low resolution multisource remote sensing image registration, but pay less attention to the low stability of ground objects. In view of the impact of clouds, noise, and land-cover change on the accuracy of local registration, Feng et al. [12] proposed a step-by-step registration method combining feature and regional methods. In this method, the first step matching feature results are geometrically corrected by a region-based linear polynomial transformation model, but it is only robust to simple and small radiation differences.

The density-based methods use mostly optical flow estimation. Optical flow [37,38] can achieve per-pixel displacement estimation and is suitable for the accurate alignment of remote sensing images on complex terrain. However, this method has a limited ability to deal with large deformations between registered images, and is sensitive to low-stability surface noise, which causes resampled image texture distortion. Liu et al. [39] employed mesh segmentation to distribute the optical flow field of feature points to deal with complex non-rigid body deformation, so as to improve the robustness of optical flow registration. Xu et al. [40] used the optical flow field method to handle complicated non-rigid changes by matching dense SIFT features. However, when the image texture is inconsistent because of the local land-cover changing, the above method could cause local anomalies in the computed optical flow. In order to realize the high fidelity of the registered image in complex terrain areas, Feng et al. [13] added the detection and correction of abnormal displacement fields after the optical flow estimation.

For high resolution multi-temporal remote sensing images with low-stability land-cover and complex terrain (HMR-LC images), it is difficult to align the images facing severe radiation differences and complex geometric deformation by relying on a single registration method. Multi-level and combined registration methods are needed to solve the problems of this kind of remote sensing image registration. The feature method has good adaptability to large-scale rotation, displacement and deformation, and this method can be used together with the local transformation model to solve the inconsistency of local deformation in complex terrain. The above research experience in solving complex registration problems shows that the registration performance can be improved by the comprehensive application of feature method and regional intensity information, phase information, and structure information [10,11,25,40]. However, due to the comprehensive influence of land-cover changes, terrain fluctuations, mountain shadows, radiation changes, and other factors, the probability of feature matching errors for HMR-LC is high, which seriously affects the registration results. On the one hand, affected by changes of low land-cover, the texture features of unstable objects in the image pair are significantly different, and even the texture features of partially overlapping areas are completely changed. On the other hand, due to the influence of radiation differences and shooting inclination differences, there are serious geometric deformations and complex radiation differences in the local feature window of elevation objects and complex terrain imaging, which exceeds the invariance of features and the comparability of intensity information, and it is difficult to accurately express the complex geometric relationship of different positions using the surrounding feature information [41]. These greatly reduce the possibility of detecting consistent features in image pairs, resulting in the failure of the application of mainstream feature methods in the studied target image.

Different from the previous research ideas used to solve the influence of surface change, radiation difference, and complex terrain on registration, we no longer focus on improving the registration algorithm of single image pairs, but propose a batch reference extraction into a comprehensive coarse-to-fine registration (CCFR) framework. This framework not only realizes the robust registration of significantly different HMR-LC images, but also solves the problem of a lack of a reliable baseline image for registration.

The main contributions of this paper are as follows:

(1): Inspired by the excellent image denoising and restoration ability of the low-rank decomposition algorithm, we design a low-rank constraint-based batch reference (LRC-BRE) method to restore the stable features holding highly spatial co-occurrence in the image sequence, and construct a corresponding batch reference, in which each original image has a respective reference.
(2): To match the original image and the reference image, the regional mutual information is considered to filter the match outliers, named as a match outlier filtering (MOF). Additionally, a dual-weighted block fitting (DWBF) is developed based on the feature inverse distance weight and feature regional similarity weight. The above two operators are integrated to form the block feature matching and local linear transformation (BFM-LLT) registration processing, which has good robustness and alignment accuracy for the coarse registration of multi-temporal remote sensing images with medium and low texture differences and for the registration of original images and stable feature images restored by LRC-BRE.
(3): A new comprehensive coarse-to-fine registration (CCFR) framework integrated by LRC-BRE and BFM-LLT is proposed for HMR-LC images. By taking the recovered stable feature image as the reference baseline image, the proposed framework transforms the direct registration of large difference HMR-LC image pairs into the indirect registration of small and medium difference image pairs, and realizes the applicability of the mainstream image registration methods.
(4): On GF-2 and GF-1 satellite remote sensing image datasets with low-stability land-cover and complex terrain, the experimental results show that the comprehensive registration framework CCFR is more effective than the latest registration algorithm in visual quality and quantitative evaluation, owing to the combining of LRC-BRE with good batch reference and BFM-LLT with improved alignment effect.

The rest of the paper is organized as follows: The details of the proposed method are described in Section 2. The experimental results and verification from visual and quantitative perspectives are provided in Section 3. The influencing factors of CCFR indirect registration and the work of CCFR in different optical satellite images are discussed in Section 4. Finally, our conclusions are summarized in Section 5.

2. Methods

Image feature detection and matching performance is the key step to achieve accurate image registration results. The number, accuracy and distribution of matching inner points directly affect the registration accuracy. In this section, we aim to propose a coarse-to-fine registration (CCFR) framework for multi-temporal high resolution remote sensing images, which can not only generate the stable feature reference sequence automatically, but also register the point robustly. A simple flowchart of the CCFR framework is shown in Figure 2. To solve the stable feature reference problem in batch registration processing, we introduce a low-rank constraint-based batch reference (LRC-BRE) extracting method via matrix decomposition under a low-rank constraint. In CCFR, the LRC-BRE is executed based on blocks. We also developed a block feature matching and local linear transformation (BFM-LLT) registration processing via integrating the match outlier filtering (MOF) and the dual-weighted block fitting (DWBF) operators. BFM-LLT is applicable to both batch coarse registration and batch fine registration.

2.1. LRC-BRE: A Low-Rank Constraint-Based Batch Reference Extraction

With the rapid development of matrix rank related optimization techniques, the matrix minimum low-rank constraint attempted to replace the image similarity evaluation to realize the batch alignment of natural images [42,43]. Matrix rank correlation optimization technology originated from low-rank matrix factorization (LRMF) [44]. LRMF is an effective method of developing reliable, scalable, and robust algorithms to estimate a low-rank matrix of interest, from potentially noisy, nonlinear, and highly incomplete observations [45]. In view of the above excellent ability of LRMF, we use this model to restore the stable feature images of multi-temporal image sequences to capture a reference image with stable surface features and small radiation difference for multi-temporal remote sensing image registration. Thus, a larger number and higher accuracy of feature pairs are obtained than the direct feature detection and matching of the original image pair.

The texture difference of multi-temporal low-stability images are treated as noise, and based on the relevant hypothesis and inferences of stable feature image restoration, they can be solved from an LRMF problem. The related theorems, hypothesis and inferences are given in the appendix section.

The result of feature matching is directly related to the similarity of texture features between image pairs. The higher the texture similarity of image pairs is, the more matching features and the higher the matching accuracy can be obtained. Let

{diff}_{Ω} (I_{i}, I_{j})

represent the texture difference degree of the overlapping area Ω of the locally aligned image pair

(I_{i}, I_{j})

, and index

{diff}_{Ω} (I_{i}, I_{j})

be the reverse index of the texture similarity of the image pair.

{diff}_{Ω} (I_{i}, I_{j})

is negatively correlated with the success rate of feature matching. Let

T

denote spatial transformation, and g(

I_{i}^{Ω}

)

=

g(

T \circ I_{j}^{Ω}

),

I_{i}^{Ω} \in I_{i}, I_{j}^{Ω} \in I_{j};

then there is

{diff}_{Ω} (I_{i}, I_{j}) = diff (I_{i}^{Ω}, T \circ I_{j}^{Ω})

. When the image pairs

(I_{i}, I_{j})

are fully aligned, the comprehensive difference of the image pairs is equivalent to the texture difference, that is,

diff (I_{i}, I_{j}) = {diff}_{Ω} (I_{i}, I_{j})

; when not aligned, there is

diff (I_{i}, I_{j}) \geq {diff}_{Ω} (I_{i}, I_{j})

. When there is only a local offset in the image pair, it can be considered that

diff (I_{i}, I_{j}) \approx {diff}_{Ω} (I_{i}, I_{j})

. We assume that there is a threshold β. When

{diff}_{Ω} (I_{i}, I_{j}) > β

, the commonly used image feature matching methods fail to obtain enough correct matching features. For high resolution multi-temporal remote sensing images with low stable surface and complex terrain, due to the comprehensive influence of ground feature changes, terrain fluctuations, mountain shadows, radiation changes, and other factors, the texture difference of some images exceeds the threshold β.

According to Inference A3 in the Appendix A, for the image sequence

{\{I_{i}\}}_{i = 1}^{N}

and its restored stable feature image sequence

{\{Z_{i}\}}_{i = 1}^{N}

, the probability is

diff (I_{i}, I_{r})

>

diff I_{i}, Z_{i} (1 \leq i \leq N_{s 2} \leq N)

. According to Inference A1.1 in the Appendix A, for the image sequence

{\{I_{i}\}}_{i = 1}^{N}

conforming to Hypothesis A2 in the Appendix A and its restored stable feature image sequence

{\{Z_{i}\}}_{i = 1}^{N}

, there is

Z_{1} (x, y) \approx \dots \approx Z_{N} (x, y)

, g(

Z_{1}

)

\approx \dots \approx

g(

Z_{N}

). Then, we only need to take

Z_{i}

as the registration reference image and register the image pairs

\{I_{i}, Z_{i}\} (1 < i < N)

one by one, which can approximately realize the registration of any image pair

\{I_{i}, I_{j}\} (1 < i, j < N)

in the sequence

{\{I_{i}\}}_{i = 1}^{N}

. Let

I_{i}^{'} = T (I_{i}

,

Z_{i})

,

T (I_{i}, Z_{i}) : I_{i} \to Z_{i}

; then there is

I_{1}^{'} (x, y) \approx \dots \approx I_{N}^{'} (x, y)

g(

I_{1}^{'}

)

\approx \dots \approx

g(

I_{N}^{'}

). By using the stable feature image recovered by LRMF as the reference image, the registration between the image pairs (

I_{i}, I_{j})

with large differences

{diff}_{Ω} (I_{i}, I_{j})

>

β

is replaced by the registration of the image pairs

(I_{i}, Z_{i})

and

(I_{j}, Z_{j})

with medium difference. When

{diff}_{Ω} (I_{i}, Z_{i}) < β

and

{diff}_{Ω} (I_{j}, Z_{j}) < β

, the feature-based register method can be effectively applied. Thus, the registration for multi-temporal images with low stability and large difference is realized by indirect registration.

Through the indirect registration of the original image and the corresponding stable feature image in turn, the accurate registration between the low-stability original images with obvious feature differences is realized. For a few serious outlier images

I_{o}

in the original image sequence

{\{I_{i}\}}_{i = 1}^{N}

, the quality of the restored stable feature image is greatly affected. The stable feature image

Z_{o}^{*}

with high restoration quality and close to the texture features of the original image

I_{o}

is selected as the reference image for registration, that is,

I_{o}^{'} = g (I_{o}

,

Z_{o}^{*})

is used instead of

I_{o}^{'} = g (I_{o}

,

Z_{o})

to improve the robustness of registration to individual outlier image registration.

The flowchart of LRC-BRE and batch registration is shown in Figure 3. First, the alignment degree of the image sequence after coarse registration meets the requirements of Hypothesis A2 in the Appendix A. Moreover, for images with different resolutions, the image sequences need to have the same resolution through the interpolation algorithm. On this basis, the matrix

M = [vec (I_{1}) |\dots| vec (I_{N})]

is constructed by linearly straightening the coarse registration image sequence, and the stable feature image sequence

{\{Z_{i}\}}_{i = 1}^{N}

is restored by Formula (A3) in the Appendix A. Then, from the sequence

{\{Z_{i}\}}_{i = 1}^{N}

, the most suitable stable feature image

Z_{i}^{*}

is selected for each image

I_{i} \in {\{I_{i}\}}_{i = 1}^{N}

as the reference image. Finally, the batch registration of image sequences is realized through the registration of image pairs

(I_{i}

,

Z_{i}^{*})

one by one.

According to Inference A2 in the Appendix A, the recovery quality of

Z_{o}

is related to the degree of the initial outlier of

I_{o}

. The lower the initial outlier degree of

I_{o}

, the more sparse the separated noise matrix e, and the smaller the comprehensive difference

diff (I_{o}, Z_{o})

value of the corresponding stable feature image

Z_{o}

. When

diff (I_{o}, Z_{o}) > \min_{1 \leq i \leq N, i \neq o} diff (I_{o}, Z_{i})

, there are stable feature images

Z_{o}^{*}

closer to

I_{o}

than

Z_{o}

. Due to the low-rank constraint, the texture difference between image pairs

(Z_{o}, Z_{i})

is small, so the difference between

diff (I_{o}, Z_{o})

and

diff (I_{o}, Z_{i})

is not obvious. Therefore, we add the constraint of the original image

(I_{o}, I_{i})

on the similarity of texture features. Finally, in order to ensure that the stable feature image

Z_{o}^{*}

used for substitution has a clear texture, we add the constraint

\min diff I_{i}, I_{m e a n}

to make the image

I_{i}

have a low degree of outlier in the set

{\{I_{i}\}}_{i = 1}^{N}

. So far, the most suitable stable feature image

Z_{o}^{*}

can be determined by the following formula:

Z_{o}^{*} = B e s t (I_{o}, {\{I_{i}, Z_{i}\}}_{i = 1}^{N}) = \arg \min_{{\{I_{i}, Z_{i}\}}_{i = 1}^{N}} d i f f (I_{o}, Z_{i}) + γ_{1} * d i f f (I_{o}, I_{i}) + γ_{2} * d i f f I_{i}, I_{m e a n}

(1)

where,

γ_{1}

and

γ_{2}

is the compromise coefficient,

I_{m e a n} (x, y) = \sum_{i = 1}^{N} I_{i} (x, y) / N

. Here, we use the pixel intensity difference 1-norm of the image pair to calculate

diff (,)

. Since the difference of radiation intensity does not change the geometric characteristics of image texture, in order to avoid the excessive interference of radiation intensity on the evaluation index, we uniformly normalize the radiation intensity of the image set in the preprocessing step. By matching the statistics of image intensity probability distribution, the radiation normalization of image sequence is realized. Assuming that the radiation intensity of the image set

{\{I_{i}\}}_{i = 1}^{N}

obeys Gaussian distribution, there are:

{I^{'}}_{i} (x, y) = Histo (I_{i} (x, y)) = \frac{I_{i} (x, y) - μ_{I_{i}}}{σ_{I_{i}}} \times σ_{I_{m e a n}} + μ_{I_{m e a n}}

(2)

where

{I^{'}}_{i} (x, y)

is the pixel gray value after normalization of image radiation intensity,

μ_{I_{m e a n}}

and

σ_{I_{m e a n}}

are the gray mean and standard deviation of image

I_{m e a n}

, respectively, and

I_{i} (x, y)

,

μ_{I_{i}}, σ_{I_{i}}

are the pixel gray value, gray mean, and standard deviation of image

I_{i}

, respectively.

When the image set is not normalized by the unified radiation intensity, we can also calculate

diff (,)

by constructing the normalized difference 1-norm of the image pair:

diff (I_{s}, I_{r}) = | | {I^{'}}_{s} - {I^{'}}_{r} | |_{1} = \frac{1}{σ_{I_{s}} σ_{I_{r}}} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} [(I_{s} (x_{i}, y_{j}) - μ_{I_{s}}) σ_{I_{r}} - (I_{r} (x_{i}, y_{j}) - μ_{I_{r}}) σ_{I_{s}}]

(3)

where,

μ_{I_{s}} = \frac{1}{M N} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} I_{s} (x_{i}, y_{j})

,

μ_{I_{r}} = \frac{1}{M N} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} I_{r} (x_{i}, y_{j})

,

σ_{I_{s}} = \sqrt{\frac{1}{M N} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} {[I_{s} (x_{i}, y_{j}) - μ_{I_{s}}]}^{2}}

,

σ_{I_{r}} = \sqrt{\frac{1}{M N} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} {[I_{r} (x_{i}, y_{j}) - μ_{I_{r}}]}^{2}}

.

2.2. BFM-LLT: Block Feature Matching and Local Linear Transformation

Referring to the stable feature image

Z_{i}^{*}

, the texture difference between image pairs

(I_{i}, Z_{i}^{*})

is relatively small, but the traditional feature extraction methods still cause matching and location errors due to the interference of complex terrain and noisy background, which affect the final registration accuracy. In order to improve the features-based registration algorithm, we propose the BFM-LLT method. In CCFR, BFM-LLT is also used for coarse registration of image sets, which is suitable for image pairs whose texture difference is less than the threshold β.

The main steps of BFM-LLT are shown in Figure 4. First, BFM-LLT executes the feature matching algorithm in block mode, and uses MOF to filter the matching inner points and optimize the location. Then, the local homography matrix is calculated according to the dual-weighted block fitting (DWBF) method, and each block region executes homography transformation. The sizes of the matching block and fitting block do not need to be consistent, and it is recommended that the former be larger than the latter. BFM-LLT can better take into account the global alignment and local alignment of complex terrain image registration.

2.2.1. Match Outlier Filtering (MOF)

In the block feature matching phase, to deal with the high resolution, the sensed image and the reference image are divided into C1 × C2 blocks at equal intervals, respectively, and then the features are extracted and matched within the block. Block feature matching can improve the performance in two aspects: (1) through block feature matching, the feature area to be matched is reduced, so as to reduce the impact of low stable surface background noise and high self-similarity of vegetation texture, and improve the number and accuracy of feature-based matching. (2) In terms of computational performance, the block feature matching method reduces the overall computational complexity of the matching algorithm. In addition, the block feature matching method is also convenient for parallel calculation of each block and further improves the execution speed.

In order to improve the robustness of feature matching of complex texture background images, after consistency screening, we use MOF to filter and optimize the position of each inner point pair again, and record their RMI value of each group as the feature weight of the next local fitting. After feature distance matching (FDM) and random sample consensus screening (RANSAC) [46], the matching inner point pair is filtered by the intensity similarity threshold of the local area (taking the location of the feature point as the geometric center of local area). When the number of matching inner points is small, the template matching optimization of matching inner points is carried out based on the local area similarity index to further optimize the location of matching inner points. As shown in Figure 5, this method is suitable for the same resolution image registration with complex texture background.

Considering that regional mutual information (RMI) is an extension of mutual information [47], it combines spatial information in the form of energy function, and uses eight-neighbor information to calculate the joint probability distribution instead of the original gray information. Since RMI is robust to nonlinear intensity differences, and has been successfully applied to multispectral or multi-sensor image registration [23,24], we directly adopt it for the local area similarity measurement index:

P_{0} = P - \frac{1}{N} \sum_{i = 1}^{N} p_{i}

(4)

C = \frac{1}{N} P_{0} P_{0}^{T}

(5)

R M I = H_{g} (C_{A}) + H_{g} (C_{B}) - H_{g} (C)

(6)

where

P = [p_{1}, p_{2}, \dots, p_{N}]

,

N = (m - 2 r) (n - 2 r)

is the number of pixels to be counted in a region, the size of the overlapping region is m × n, r is the radius of the local window, and

p_{i}

is the pixel’s neighborhood vector for the i pixel. The size of P is d × N with d = 2 × r × r.

H_{g} (C)

is the joint entropy.

H_{g} (C_{A})

and

H_{g} (C_{B})

are the marginal entropies, where

C_{A}

is the (d/2) × (d/2) matrix in the upper left of C and

C_{B}

is the (d/2) × (d/2) matrix in the lower left of C.

2.2.2. Dual-Weighted Block Fitting (DWBF)

Due to the low-stability and complex terrain, the accuracy of multi-temporal image feature matching is uncertain, and the uneven distribution of matching features is common. A block homography based on feature inverse distance weight (IDW) and feature region similarity weight (RSW) is proposed, which is called the dual-weighted block fitting (DWBF) method. First, the sensed image and the reference image are divided into C3 × C4 blocks at equal intervals, and each block corresponds to a separate homogeneous space homography transformation matrix

H \in ℝ^{3 \times 3}

. Then, on the basis of a homography matrix estimation based on inner point pairs, the feature inverse distance weight [48] and feature similarity weight are added for each inner point pair. The feature inverse distance weight of each block is calculated based on the block center point

x_{*}

. For any x of the matched inner point set, the farther the distance between

x_{*}

and x is, the lower the weight is when estimating the block homography matrix; the feature similarity weight is determined by the RMI of the region where the inner point is located, and the inner point with a high similarity has a higher influence weight.

Let

x = {(x, 𝒴)}^{T}

and

x^{'} = {(x^{'}, 𝒴^{'})}^{T}

be matching points across overlapping images I and I′. A basic method to estimate H from a set of point matches

{\{x_{i}, {x^{'}}_{i}\}}_{i = 1}^{K}

across I and I′ can be expressed as:

\underset{h}{\hat{H} = \arg \min} \sum_{i = 1}^{K} | | a_{i} H^{2} | | \underset{h}{= \arg \min} | | A H^{2} | | s . t | | H | | = 1

(7)

where

a_{i} \in ℝ^{2 \times 9}

is computed for the i-th datum

\{x_{i}, {x^{'}}_{i}\}

, and the matrix A

\in ℝ^{2 N \times 9}

is obtained by stacking vertically

a_{i}

for all i. After introducing the IDW and RSW, the block homography matrix

H_{*}

is estimated from the problem:

\underset{h}{H_{*} = \arg \min} \sum_{i = 1}^{K} | | w_{*}^{i} a_{i} H | |^{2} \underset{h}{= \arg \min} | | W_{*} A H | |^{2}

(8)

w_{*}^{i} = e x p (- \frac{| | x_{*} - x_{i} | |^{2}}{η_{i} σ^{2}})

(9)

η_{i} = R M I (a r e a (x_{i}), a r e a (x^{'} i))

(10)

where,

x_{*}

is the coordinate of the center point of the block,

σ

is a scale parameter and the weight matrix

W_{*} = diag ([w_{*}^{1} w_{*}^{1} \dots w_{*}^{N} w_{*}^{N}])

. The problem in (8) is a weighted SVD problem, and the solution is simply the least significant right singular vector of

W_{*}

A.

2.3. Algorithm of CCFR

Based on the integration of LRC-BRE and BFM-LLT, CCFR can improve the overall restoration quality of a locally stable feature image sequence, and can select a more suitable local reference image for the local area of the sensing image rather than the global LRC-BRE application. The specific process of CCFR mainly includes three parts, as shown in Algorithm 1: (1) Batch coarse registration (steps 1–3). First, in order to reduce the complexity of feature matching and minimize the overall cost of spatial transformation of batch coarse registration, the image with the lowest outlier is selected as the common reference image of the whole image sequence G. Then, the BFM-LLT method is used to improve the alignment degree of coarse registration and realize the global alignment of most images. In the BFM step of rough registration, the matching results of each block are recorded as the guidance of whether the registered block participates in LRC-BRE. (2) Block-based LRC-BRE (steps 4 and 5). the LRC-BRE method is applied longitudinally to restore the block stable feature images for each block sequence after coarse alignment, and then they are synthesized into the global stable feature image horizontally. (3) Batch fine registration (step 6). With a globally stable feature image as a reference, the images are registered one by one using the BFM-LLT method.

Algorithm 1: CCFR

Input: images set

{\{I_{i}^{0}\}}_{i = 1}^{N}

Step 1:

{\{I_{i}^{0}\}}_{i = 1}^{N}

is aligned by geocoding, cropped and interpolated (optional) to obtain a group of image sequences G with the same size and the same resolution:

G = {\{I_{i}\}}_{i = 1}^{N} \in ℝ^{w \times h}

Step 2: The lowest outlier image

I_{φ}

from image sequence G is calculated by:

I_{φ} = \underset{1 \leq i \leq N}{\arg \min} \sum_{j = 1}^{N} diff (I_{i}, I_{m e a n})

Step 3: With

I_{φ}

as the reference image, based on the BFM-LLT method, the image pairs

(I_{i}, I_{φ}) i \in \{1, \dots, N\} and i \neq φ

are coarsely registered in turn to obtain a new image sequence

G^{'} = {\{{I^{'}}_{i}\}}_{i = 1}^{N}

, and the matching results are recorded in the matrix set

Λ = {\{A_{i}\}}_{i = 1}^{N} \in b o o l^{C_{1} \times C_{2}}

:

({I^{'}}_{i}, A_{i}) =

BFM-LLT (

I_{i}, I_{φ}, C_{1}, C_{2}

)

where

I_{i} \in G

, {I^{'}}_{i}

= [\begin{matrix} Ω_{i}^{1, 1} & \dots & Ω_{i}^{1, C_{2}} \\ ⋮ & ⋱ & ⋮ \\ Ω_{i}^{C_{1}, 1} & \dots & Ω_{i}^{C_{1}, C_{2}} \end{matrix}]

, A_{i} = [\begin{matrix} a_{i}^{1, 1} & \dots & a_{i}^{1, C_{2}} \\ ⋮ & ⋱ & ⋮ \\ a_{i}^{C_{1}, 1} & \dots & a_{i}^{C_{1}, C_{2}} \end{matrix}]

,

Ω_{i}^{j, k} \in ℝ^{w_{Ω} \times h_{Ω}}

represents the block region in row j and column k of the image

{I^{'}}_{i}

, and

a_{i}^{j, k}

represents the feature matching result of the block region in row j and column k of the image

I_{i}

. When the number of matching inner points in the region is insufficient, there is

a_{i}^{j, k} =

true.

Step 4: Generate a stable feature image block sequence for each block sequence:

{\{Ω_{i}^{j, k}, Ψ_{i}^{j, k}, Γ_{i}^{j, k}\}}_{i = 1}^{N} = \min_{L_{Ω^{j}}, S_{Ω^{j}}} | | L_{Ω^{j, k}} | |_{*} + \frac{1}{\sqrt{w_{Ω} \times h_{Ω}}} | | S_{Ω^{j}, k} | |_{1} s . t . | | M_{Ω^{j, k}} - L_{Ω^{j, k}} - S_{Ω^{j, k}} | |_{F} \leq ϵ

where

M_{Ω^{j, k}} = [vec (Ω_{1}^{j, k}) |\dots| vec (Ω_{q}^{j, k})] \in ℝ^{m_{Ω} \times q}, m_{Ω} = w_{Ω} \times h_{Ω}, Ω_{i}^{j, k} \in G_{Ω^{j, k}}^{q}, G_{Ω^{j, k}}^{q} = {\{Ω_{i}^{j, k}\}}_{i = 1}^{q} (q \leq N) = {\{Ω_{i}^{j, k} | a_{i}^{j, k} = = t r u e\}}_{i = 1}^{N},

Ψ_{i}^{j, k}

and

Γ_{i}^{j, k}

respectively represent the block stable feature image and block sparse matrix re-stored corresponding to

Ω_{i}^{j, k}

, and ϵ is a penalty factor.

Step 5: For each

I_{i} \in G

, its globally most suitable stable feature image

Z_{i}^{*}

is synthesized by:

Z_{i}^{*}

= [\begin{matrix} Ψ^{*}_{i}^{1, 1} & \dots & Ψ^{*}_{i}^{1, C_{2}} \\ ⋮ & ⋱ & ⋮ \\ Ψ^{*}_{i}^{C_{1}, 1} & \dots & Ψ^{*}_{i}^{C_{1}, C_{2}} \end{matrix}]

where

Ψ_{i}^{* j, k} = Best (Ω_{i}^{j, k}, {\{Ω_{l}^{j, k}, Ψ_{l}^{j, k}\}}_{l = 1}^{q})

is the most appropriate block stable feature image for

Ω_{i}^{j, k}

. The Best () is given in Formula (1).

Step 6: Block feature matching and local linear transformation are carried out on the image pair

({I^{'}}_{i}, Z_{i}^{*}) i \in \{1, \dots, N\}

in turn to obtain the accurately registered image sequence:

({I^{″}}_{i}, {A^{'}}_{i}) =

BFM-LLT (

{I^{'}}_{i}, Z_{i}^{*}, {C^{'}}_{1}, {C^{'}}_{2}

)

output:

{\{{I^{″}}_{i}\}}_{i = 1}^{N}

3. Results and Evaluations

In this section, three representative groups of HMR-LC images in Table 1 were used to evaluate the proposed algorithm. By visual judgment and quantitative evaluation, the superior performance and the broad applicability of the proposed method in batch registration for HMR-LC images was verified.

3.1. Experimental Data and Related Algorithms

The high resolution satellite images used in the experiments are from the GF-2 satellite launched by China Aerospace Science and Technology Corporation (CASC). The satellite is a solar synchronous return orbit satellite with a short return cycle. It has been put into use in 2014. So far, it has accumulated a large number of multi-temporal high resolution images in the target area. We selected the panchromatic image of the satellite with high resolution (0.8 m) for correlation registration experiment. The three groups of images were aligned and cropped by using the geocoding information of the satellite images. Among them, the images of the ecological reserve and mine production area were ortho-rectified with 30 m precision DEM. Due to the limitation of positioning and model accuracy, there were still different degrees of alignment errors in the image sequence after course registration.

Three image sequence sets of complex terrain areas with different types of low-stability features were selected in the experiment. Among them, the ecological reserve is an ecological protection area dominated by natural landscape, and the unstable features are mainly the seasonal changes of vegetation such as deciduous trees and shrubs. The mine production area belongs to an iron mine area under production, and the unstable features are mainly concentrated in the mining area and waste discharge area. The mine environmental treatment area had experienced large-scale limestone surface mining before 2010. During the imaging period, there were only sporadic mining activities in this area, and greening treatment was started in some areas. Unstable features are mainly small-scale surface excavation, manual treatment, and seasonal changes of vegetation. The three experimental areas have the characteristics of complex terrain, large imaging time span, and large changes of land-cover in different time sequences.

In the visual evaluation part, we displayed and compared the results of stable feature image recovery, feature matching, and image alignment in three types of regions. In the quantitative evaluation part, the number of feature matching inner points of the global image sequence in each type of region is compared and analyzed. Through normalized correlation coefficient (NCC), mutual information (MI), and root mean square error (RMSE), the registration results of different types of low-stability remote sensing images are compared and analyzed.

The SIFT, SURF, and ORB algorithms are used in feature extraction and matching. After matching based on the sum of squared differences (SSD) of the feature, all three algorithms use consistency screening to purify matching points. For all the experiments, the RANSAC threshold was 0.1.

In the registration display, we compared the proposed method with the following four registration algorithms: The registration method by rank minimization (RRM) [49], piecewise linear mapping (PLM) [50], optical flow modification (OFM) [13], and as-projective-as-possible (APAP) [48]. RRM uses the low-rank structure as an effective constraint for registration tasks. In the experiment, the RRM patch size is set to the same size as the feature matching block. The PLM integrated into ENVI estimates the local affine transformation model for every triangle with three feature points. In PLM, the auto tie point generation matching method is cross-correlation. OFM uses optical flow field estimation and abnormal optical flow modification for remote sensing image registration for complex terrain areas. Due to the weak ability of the optical flow algorithm to deal with large displacement, we first used SIFT to obtain global parameters in the OFM experiment. The APAP algorithm uses the moving direct linear transformation (MDLT) method to estimate the local parameters, by providing higher weights to the closer feature points and lower weights to the farther ones. The SIFT algorithm was used in APAP. The feature method used by CCFR, as well as the parameters for generating block stable feature images, are shown in Table 1.

In order to verify the effectiveness of the LRC-BRE method, we designed ablation experiments in the registration experiments of the ecological reserve and mine production areas. We assume that the image after geocoding alignment and ortho-rectification by 30 m precision DEM can effectively reduce the local distortion of complex terrain and better realize the basic alignment requirements of Hypothesis 2. Based on the above, we directly carried out LRC-BRE and batch registration according to the process shown in Figure 3, which is represented by LRC-BRE-BR subsequently. The feature extraction and fitting method used by LRC-BRE-BR is same to CCFR. Because the initial alignment of the mine environmental treatment area images without ortho-rectification was low, the effect of the ablation experiment was poor.

3.2. Visual Quality

3.2.1. Ecological Reserve

Image restoration result: we used LRC-BRE and CCFR to restore stable feature images. Figure 6 shows the global mean image and local areas of the relevant results. The distortion of ground features in local areas is serious, and the mountains and rocks are raised. There are obvious differences in the contour, shadow and texture of ground features in different seasons. Figure 6a is the original image sequenced by ortho-rectification and geocoding alignment, which shows that there are a few large offset images (marked with a red box), and the offset direction is random. Figure 6b is the stable feature image sequence restored by the LRC-BRE. The results show that the stable feature image is clearly restored, and the unstable feature and temporary noise are removed. Including the image with large offsets in the initial alignment, all of the restored stable feature images are aligned in the spatial position. In addition, the comparison between the red boxes in Figure 6a,b shows that the texture difference between the original image with low alignment and the restored stable feature image is relatively large. Figure 6c is the stable feature image restored by CCFR, which is closer to the original image texture than Figure 6b; the difference is most significant in area V. Due to the failure of block feature matching in the coarse registration stage of CCFR, 6cI (marked with a blue box) did not generate the block stable feature image. Figure 6d is the final result of batch alignment using CCFR, showing that all local images have good spatial alignment.

Feature matching result: In the experiment, we selected two cross-seasonal local area images from the original image of the ecological reserve. Figure 7a–c are the results of features extracted and matching of image pairs (Figure 6aI imaged in August 2017 and Figure 6aII imaged in November 2019) through the SURF, SIFT, and ORB algorithms, respectively, showing that the number of matching inner points (yellow line segment connection points) obtained by the three feature matching algorithms was small and the error rate was high. Figure 7d–f are respectively the results of feature matching by the SURF algorithm between Figure 6aI with its corresponding stable feature image restored by LRC-BRE (Figure 6bI), the best alternative stable feature image extracted by LRC-BRE (Figure 6bIV), and the best alternative stable feature image extracted by CCFR (Figure 6cIV). The results show that the three groups of image pairs achieved correct feature matching, and the number of matching inner points increased in turn. Figure 7g–i respectively show the feature matching by SURF between Figure 6aII and the corresponding stable feature image restored by LRC-BRE (Figure 6bII), the best alternative stable feature image restored by LRC-BRE (Figure 6bIII), and the corresponding stable feature image restored by CCFR (Figure 6cII). The results show that the three groups of image pairs achieved correct feature matching, and the number of matching inner points increased in turn.

Image registration result: Taking the global image in Figure 8a of Figure 6aI as the reference image and the global image in Figure 8b of Figure 6aII as the sensed image, comparative experiments were carried out using different registration methods. Figure 8c is the overlay of Figure 8a,b, which is fuzzy and shows that there is still much ghosting caused by the offset after geocoding registration and ortho-rectification. Figure 8d is made up of the reference image and the aligned results by CCFR, in which the ghosting is basically eliminated. In order to further evaluate the registration results of different algorithms, the yellow rectangular region of the reference image and the results of the RRM, PLM, APAP, OFM, LRC-BRE-BR, and CCFR algorithms are enlarged, respectively. Figure 8e is an enlarged view of the yellow area of Figure 8c, with a large position offset in both vertical and horizontal directions. Figure 8f–i show that the results of the relevant registration algorithm have different degrees of offset in both the vertical and horizontal directions. Among them, the RRM method (Figure 8f) has the largest alignment error. Due to the obvious difference between images and textures, the tie point automatically generated by PLM using regional cross-correlation is not ideal, and the alignment result (Figure 8g) has a large offset. As the area is dominated by a natural landscape with few significant features, the correctly matched inner points obtained by SIFT are very few and unevenly distributed, and are mainly concentrated in the upper left area with roads (Figure 8b, red dotted line area). Therefore, the alignment offset of the APAP method (Figure 8h) in area #II is much larger than that in area #I. The OFM algorithm (Figure 8i) corrected and weakened the alignment error of area #II to some extent. After batch registration using the CCFR method (Figure 8k), the alignment of the target image pairs performed well. LRC-BRE-BR (Figure 8j) alignment results take the second place. For example, the stones in the central area (at the junction of gray and green) in Figure 8jII have a small lateral offset to the right with respect to Figure 8kII.

3.2.2. Mine Production Area

Image restoration result: In the right local areas of the original image sequence shown in Figure 9a, the ground features are mainly a tailings pond (marked by a red circle) with smelting waste. With the discharge of iron ore production waste, the edge area of the tailings pond is constantly changing, and the water content of deposits in tailings pond is also changing. Figure 9a shows that there are a few large offset images (marked with a red box), and the offset direction is random. Figure 9b is the stable feature image sequence restored by LRC-BRE, which shows that the topographic features of the overlapping area are recovered and the temporary interference noise (such as clouds and shadows in Figure 9bIV) is removed. The textures of some stable feature images in Figure 9b restored form the original large offset images (marked with a red box) are fuzzy. However, the textures of the stable feature images in Figure 9c restored by CCFR form the original large offset images (marked with a red box) are clear, and their spatial positions are aligned. Figure 9d shows that all images after batch registration by CCFR are spatially aligned well.

Feature matching result: In the experiment, we selected two local area images with a large time span from the original image sequence of mine production. Figure 10a–c are the results of features extracted and matching of the image pair (Figure 9aI imaged in June 2020 and Figure 9aII imaged in December 2016) through the SURF, SIFT, and ORB algorithms respectively, showing that the number of matching inner points is small and the distribution is very uneven, most of which are concentrated in the tailings dam area at the upper right (marked by a purple box). Due to the continuous discharge of semi fluid slag waste in the edge area of the tailings pond (marked by a red circle), where the image texture difference is obvious and the number of matching inner points is small. The shadow occlusion area on the right side of the image (marked by a magenta circle) also does not obtain the matching feature inner points. Figure 10d shows that due to the good initial alignment and lower vegetation noise interference in the winter image, the SIFT feature matching inner points of Figure 9aII and its corresponding stable feature image (Figure 9bII) restored by LRC-BRE are dense. Figure 10e–g are the SIFT feature matching of Figure 9aI with its corresponding stable feature image (Figure 9bI) restored by LRC-BRE, the best alternative stable feature image (Figure 9bIII) restored by LRC-BRE, and its corresponding stable feature image (Figure 9cI) restored by CCFR, respectively. The results show that the matching inner points obtained by the three groups of image pairs have a uniform distribution, the number of which increases in turn. Figure 10h,i shows the results of purifying the inner points in Figure 10f,g respectively, by MOF, and the inner points with a large offset in the red circle area are eliminated.

Image registration result: Different registration methods were carried out with the global image in Figure 11a of Figure 9aI as the reference image and the global image in Figure 11b of Figure 9aII as the sensed image. Figure 11c is the overlay of Figure 11a,b, which shows that there is still much ghosting caused by certain offsets after geocoding registration and ortho-rectification. Figure 11d shows that the ghosting of the global image pair after registration using the CCFR algorithm is weakened and the texture of the yellow dotted line area is clearer, but the ground surface in the lower left area changes significantly due to surface mining, tailings discharge, land cultivation, and other reasons. Figure 11e is an enlarged view of the yellow area of Figure 11c, with a large position offset in both vertical and horizontal directions. Figure 11f–i show that the registration results of the RRM, PLM, APAP, and OFM algorithms have different degrees of offset, and the offset degree of region #I is generally larger than that of region #II. Figure 11f shows that the registration result of the RRM method has a large alignment error. Because some artificial features with high feature significance are distributed in the overlapping area, the alignment result of the APAP algorithm (Figure 11h) is relatively good. The alignment effect of the OFM algorithm (Figure 11i) is relatively good, but the registered image on the right side (marked by a red box) of region #II is over-stretched. After batch registration by LRC-BRE-BR (Figure 11j) and the CCFR method (Figure 11k), the alignment of the target image pair is good, and the image texture is not excessively stretched. The offset of the tailings dam section in the red box in area #I of Figure 11j,k belongs to the actual displacement of ground objects, which is related to the continuous heightening of the dam body to the left with the increase of tailings excreta.

3.2.3. Mine Environmental Treatment Area

Image restoration result: The image of the mine environmental treatment area is not ortho-rectified, and the initial alignment degree is low. CCFR was used for stable feature image restoration and batch registration in this area. LRC-BRE was only used to restore the stable feature image. The right side of Figure 12 is the local areas of the original image sequence and the stable feature image sequence recovered by different methods. The upper left corner of this local areas (for example, the red box in Figure 12a) is the quarry still being excavated during the imaging period, and the surface changed significantly over different periods. The right side of Figure 12a is the local areas of the original image sequence aligned by geocoding, showing that the initial alignment degree is low, and there are generally large offsets in the location of roads and quarries in the area. Its left global mean image texture is fuzzy and contains heavy ghosting. The right side of Figure 12b shows the local areas of the stable feature image recovered by the LRC-BRE method. The result shows that the image texture is blurred and the distortion is high. The right side of Figure 12c shows the local areas of the stable feature image recovered by the CCFR method. The results show that most areas recover the stable feature image, with clear ground objects and consistent spatial position. Only a few regions do not generate stable feature images. Figure 12d shows that all images after batch alignment using CCFR have been spatially aligned well, and the left global mean image texture is clear.

Feature matching result: In the experiment, we selected two local area images with a large time span but similar seasons from the original image sequence of the mine environmental treatment area. Figure 13a–c are the results of features extracted and the matching of image pairs (Figure 12aI imaged in late April 2020 and Figure 12aII imaged in early April 2016) through the SURF, SIFT, and ORB algorithms, respectively, showing that the number of matching inner points is small and the error rate is high. This is the main reason that the quarry area (marked by a red circle) in the image pair has changed significantly, although the seasonal change of vegetation and the change of solar altitude angle are small. Figure 13d shows that the number of SIFT feature matching inner points of Figure 12aI and the best alternative stable feature image (Figure 12aIV) recovered by CCFR is large and evenly distributed. In Figure 13d, there are still a few internal points with a position offset, which is effectively filtered by MOF in Figure 13e (marked by a red circle). Figure 13f shows that the SIFT feature matching inner points of Figure 12aII and its corresponding stable feature images recovered by CCFR are densely distributed.

Image registration result: Taking the global image in Figure 14a of Figure 12aI as the reference image and the global image in Figure 14b of Figure 12aII as the sensed image, comparative experiments were carried out using different registration methods. Figure 14c is the overlay of Figure 8a,b, which shows that there is much ghosting caused by the offset after geocoding registration. Figure 14d shows that the ghosting of the global image pair registered by the CCFR algorithm is significantly weakened, except that the image edge and some mining areas still have color superposition. Figure 14f–i shows respectively that the registration results of RRM, PLM, APAP, and OFM have different degrees of offset in area #I, which is related to the proximity of the mining area with obvious changes. The registration results of RRM (Figure 14f) and PLM (Figure 14g) also have obvious offsets in area #II. Since a small number of artificial objects with high feature significance are distributed near area #II, APAP (Figure 14h) and OFM (Figure 14i) achieve good alignment of trunk roads in area #II. However, the right resampled image (marked by a red box) of OFM (Figure 14i) in area #II has obvious tensile deformation. After batch registration by CCFR, the alignment of the target image pair (Figure 14j) is good, and the image is not excessively stretched.

From the visualization results of the above three groups of experiments, it can be seen that LRC-BRE and the block-based LRC-BRE (CCFR) are effective for extracting stable feature images as reference images. For the registration between original images and stable feature images extracted, the BFM-LLT method has good results in feature matching and local alignment. The CCFR method can realize the batch alignment of three different types of HMR-LC, and achieve better performance.

3.3. Evaluation of Registration Results

In order to evaluate the accuracy of the image registration, we used the normalized correlation coefficient (NCC), mutual information (MI), and root-mean-square error (RMSE) metrics to conduct the quantitative assessment. Both NCC and MI can reflect the degree of similarity between two images, and are popular evaluation indexes used in the quality assessment of image registration.

The NCC metric is used to provide an overall judgement as to whether the sensed image is well-aligned with the reference image [22,51]:

N C C (I_{1}, I_{2}) = \frac{\sum_{i = 1}^{N} [I_{1} (i) - \bar{I_{1}}] [I_{2} (i) - \bar{I_{2}}]}{\sqrt{\sum_{i = 1}^{N} {[I_{1} (i) - \bar{I_{1}}]}^{2}} \sqrt{\sum_{i = 1}^{N} {[I_{2} (i) - \bar{I_{2}}]}^{2}}}

(11)

where

I_{1} (i)

and

I_{2} (i)

are the intensity values of the i-th pixel in image

I_{1}

and image

I_{2}

, respectively; and

\bar{I_{1}}

and

\bar{I_{2}}

are the corresponding average intensity values. N is the pixel number. The available range of NCC is [−1, 1], and NCC generally belongs to (0,1). A larger value of NCC indicates a more accurate registration result.

The MI metric is used as a cost function in similarity measure-based approaches to registration [52]:

M I (I_{1}, I_{2}) = \sum_{g_{1} \subset I_{1}} \sum_{g_{2} \subset I_{2}} p_{I_{1}, I_{2}} (g_{1}, g_{2}) l o g [\frac{p_{I_{1}, I_{2}} (g_{1}, g_{2})}{p_{I_{1}} (g_{1}) p_{I_{2}} (g_{2})}]

(12)

The available range of MI is (0, 2). A larger value of MI represents a more precise registration result.

The RMSE metric focuses on evaluating the registration result by calculating the average distance of the corresponding points in the reference and aligned image:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(Δ x_{i})}^{2} + {(Δ y_{i})}^{2}}

(13)

where N is the number of evaluated points, and

x_{i}

and

y_{i}

are the residual differences of the i-th checkpoint pair in the x and y directions. A smaller RMSE indicates a better result.

The three groups of experiments correspond to Figure 8, Figure 11 and Figure 14 of the visual evaluation experiment, respectively. The quantitative evaluation results are shown in Table 2.

According to Table 2, we can draw the following conclusions: (1) Quantitative evaluation of the same type of areas and different registration methods: Due to the global offset in the original image alignment of the three areas, the NCC and MI indicators are low, and the RMSE/pixels indicators are high. The registration result of the RRM algorithm is the worst, and there are generally large offsets. Only in the local area with significant features such as artificial objects, the feature-based APAP method achieves a relatively good alignment result. The registration indicators of the OFM algorithm are better than RRA and APAP. The proposed CCFR methods achieved the best quantitative evaluation. In the ecological reserve and mine production area by geocoding registration and ortho-rectification, the quantitative evaluation of LRC-BRE-BR is second only to CCFR. (2) Quantitative evaluation of the same registration method and different type areas: Since there are only seasonal vegetation changes in land-cover, the NCC and MI indicators of multiple registration methods in the ecological reserve are overall higher than those in the other two areas. Due to the large surface changes caused by mining and waste discharge, the NCC and MI indicators of multiple registration methods in the mine production area are lower than in the other two areas. The overall similarity of RMSE/pixels indicators to different regions is slightly inconsistent with NCC and MI. For example, the RMSE/pixels indicators of ecological reserves are the highest. However, the change trend of RMSE/pixels in the same area and the same registration method is consistent with NCC and MI. That is, the larger the NCC, the larger the MI and the smaller the RMSE/pixels.

In general, the CCFR method is effective for HMR-LC images’ registration. Especially when the land-cover changes and influence of radiation differences are significant, the robustness of CCFR is superior to other registration methods of comparison.

4. Discussion

The previous section validates the effect of the proposed method in image registration. This section discusses the advantages and influencing factors of indirect registration with stable feature images restored by LRC-BRE as the reference, and introduces the work of CCFR in different satellite images.

4.1. Quantitative Comparison of Feature Matching Results

In order to objectively verify the influence of the stable feature image as a reference image on feature matching, Figure 15 shows the results of direct feature matching and indirect feature matching with the reference of a stable feature image as the reference for the multi-temporal global image sequence in the three experimental areas in Section 3.2. For the image pairs with a small texture difference from the specified image

I_{φ}

in the sequence, the effect of direct feature matching is better; however, for the HMR-LC images sequence, the image pairs with small texture differences are relatively rare. Details are as follows:

For the ecological reserve image sequence, the statistics of matched inner points with BFM(SURF) are shown in Figure 15a. The results show that the number of inner points obtained by image pair (

I_{i}, I_{φ}

) direct feature matching fluctuates greatly. The number of inner points between image pairs (

I_{i}, I_{φ}

) imaged in the same or similar seasons is the largest among the three types of broken lines, but that between image pairs (

I_{i}, I_{φ}

) imaged across seasons is lower than those of (

I_{i}, Z_{i}^{* L}

) and (

I_{i}, Z_{i}^{* C}

), or even close to 0. This is related to the large seasonal variation of deciduous woody vegetation in the area. The fluctuation of the number of inner points of the image pair (

I_{i}, Z_{i}^{* L}

) and (

I_{i}, Z_{i}^{* C}

) sequence is small, and the situations of broken line valleys is better than (

I_{i}, I_{φ}

).

For of mine production area image sequence, the statistics of feature matching of inner points with BFM(SIFT) are shown in Figure 15b. The results show that only a few (

I_{i}, I_{φ}

) with similar seasons have more inner points than those of (

I_{i}, Z_{i}^{* L}

) and (

I_{i}, Z_{i}^{* C}

) in the same period. In most cases, the number of inner points between (

I_{i}, Z_{i}^{* C}

) is the largest, and the number of inner points between (

I_{i}, Z_{i}^{* L}

) is the second. In addition to seasonal fluctuation of the number of inner points, due to the surface changes caused by the continuous large-scale mine production in the area during the imaging period, with increasing of the time interval from

I_{φ}

(red dot), the number of inner points of (

I_{i}, I_{φ}

) shows an overall downward trend. However, the number of inner points in (

I_{i}, Z_{i}^{* L}

) and (

I_{i}, Z_{i}^{* C}

) is relatively stable.

For the mine environmental treatment area image sequence, the statistics of feature matching of inner points with BFM(SIFT) is shown in Figure 15c. The results show that the number of inner points between (

I_{i}, I_{φ}

) with a large seasonal span is smaller than that between (

I_{i}, Z_{i}^{* C}

). In addition, the number of inner points of (

I_{i}, I_{φ}

) and (

I_{i}, Z_{i}^{* C}

) decreases as the time interval from I (red dot) increases. This is related to the land-cover changes caused by the strengthening of mine ecological restoration in the region since 2018.

4.2. Correlation Factors of Restored Stable Feature Image Quality

According to Theorem A1 in the Appendix A, the restored stable feature image is composed of the first R principal component vectors

{\{u_{i}\}}_{i = 1}^{R}

generated by the low-rank decomposition from the initial image sequence. Moreover, the difference of the coefficient

{\{σ_{i}\}}_{i = 1}^{R}

corresponding to the principal component vector sequence is exponential. Therefore, the restored quality of the stable feature image sequence is closely related to the principal component vectors sorted in the front. In addition, according to the Inference A2 in the Appendix A, when the texture difference noise remains unchanged, the restoration quality of the stable feature image sequence is positively correlated with the spatial consistency of the initial image sequence.

To facilitate the discussion on the correlation factors of the restored stable feature image quality, taking the local image sequences of ecological reserves in Section 3.2.1 as an example, the texture states of images

D_{i}

corresponding to the principal component vector

u_{i}

extracted by low-rank decomposition from different initial image sequences are compared. The image

D_{i}

is generated by the following:

D_{i} = v e c^{- 1} (a b s (u_{i}))

(14)

where abs(.) means taking an absolute value for each element of the vector, and

{vec}^{- 1}

indicates the reverse operation of stacking. Figure 16a is the first four principal component images generated by low-rank decomposition from nine local images of the first row in Figure 6a, in which the proportion of large offset images (marked by a red box) is 2/9. Figure 16b was generated by low-rank decomposition from all 18 local images in Figure 6a, in which the proportion of large offset images (marked by a red box) is also 2/9. Figure 16c was generated from 16 local images (excluding severely outlying block images after coarse registration by CCFR) in Figure 6a.

Among the three groups of results in Figure 16, the texture features of principal component images (especially the first two) (Figure 16c) generated by the CCFR method are the clearest through rough registration based on blocks and eliminating the severely outlying block images used by low-rank decomposition. When the proportion of large offset images in the initial image set is the same, the principal component images (Figure 16b) with more images participating in low-rank decomposition have a clearer texture and less ghosting than principal component images (Figure 16a), with less images participating. This shows that under the same initial alignment, the more initial images are involved in batch alignment, the texture spatial consistency of extracted stable feature image is better.

4.3. Registration with Different Optical Satellite Images

For the registration of remote sensing image sets with different resolutions, CCFR can still work after that the low-resolution remote sensing images are interpolated and resampled in accordance with high resolution images. To check how the proposed algorithm works in different optical satellite images, we collected images (Figure 17a) from the GF-1 and GF-2 optical satellites, which have different resolutions.

The dataset and pretreatment are described in Table 3. The imaging area belongs to the mine area, with mainly underground mining. The surface changes of multi-temporal images are mainly reflected by the continuous discharge and leakage of mine waste, as well as radiation differences. These images were preliminarily aligned based on geographic information, but there are still different degrees of offset. Because the images have different resolutions, we up-sampled the GF-1 satellite images according to the resolution of GF-2 satellite images.

For comparison, the results registered for GF-1 (Figure 18b) referring to GF-2 (Figure 18a) are shown in Figure 18 by different methods. The imaging time of Figure 18a is winter, and the partial area of the tailings pond, road, and ore storage yard are covered with snow, which is significantly different from the summer image of Figure 18b. Figure 18c is the overlay of Figure 18a,b, which shows that there is still much ghosting caused by offset. Figure 18d is made up of the reference image and the aligned results by CCFR, in which the ghosting is eliminated. Figure 18e is an enlarged view of the yellow area of Figure 18c, with a large position offset in both the vertical and horizontal directions. Figure 18f–i show the results of RRM, PLM, APAP, and OFM, respectively. There is a large offset in areas #I and #II of Figure 18f, and the registered image is overstretched (marked with a red box). Due to the influence of complex terrain and low-stability land-cover, the feature points extracted by PLM and APAP have small location errors. In Figure 18g,h, the offset is reduced, but not completely eliminated. OFM (Figure 18i) can eliminate the offset, but the registered image has obvious over-stretching (marked with a red box). CCFR (Figure 18j) achieved a precisely aligned result, and the registered image was not excessively stretched.

Table 4 lists the quantitative evaluations of the above registration results in terms of NCC, MI, and RMSE. The experimental results show that CCFR achieved good alignment in both visual and quantitative aspects.

Therefore, when the difference in image resolution is less than four-fold, the same pixel image set can be obtained after interpolation, and the CCFR method can be effectively used.

5. Conclusions

To decrease the influence of low-stability land-cover and large radiation differences on the conventional image registration methods, we constructed a new comprehensive registration framework for robust registration of HMR-LC combined with the stable feature image restoration strategy. By the LRC-BRE method proposed, a set of stable feature images can be restored from the multi-temporal image sequences, and the most suitable stable feature image can be selected as the reference image for each sensed image. The proposed BFM-LLT method is effective for coarse registration of HMR-LC images with medium and low texture differences and registration with stable feature images as a reference. In the BFM step, the MOF operator can reduce the impact of the high background noise and matching inner point position offset caused by distorted terrain on the feature matching accuracy. In the LLT step, the DWBF operator can solve the influence of uneven feature distribution and the similarity difference of the inner point area on the accuracy of complex terrain fitting. By integrating BFM-LLT and block-based LRC-BRE, the CCFR method can improve the quality of the extracted stable feature image as a reference, and has good robustness for the registration of HMR-LC images with large formats and significant differences. For the three types of realistic HMR-LC image sequences used in the experiments, the results qualitatively and quantitatively show that the proposed algorithm can achieve a reliable registration accuracy. For registration with different optical satellites, the CCFR method can be effectively used after the data pre-processing. For example, low-resolution remote sensing images should be interpolated and resampled in accordance with high resolution images in the set.

In future work, the CCFR framework can set C1 = 1 and C2 = 1 as the initial parameters of feature matching blocks to align the image sequences that have not undergone geocoding registration, and can introduce an iterative strategy to continuously improve the alignment result by increasing the number of blocks and executing CCFR many times.

Author Contributions

Conceptualization, P.Z. and X.L.; methodology, P.Z.; validation, P.Z. and X.L.; formal analysis, P.Z., X.L. and C.W.; investigation, P.Z. and W.W.; resources, P.Z. and W.W.; data curation, P.Z. and W.W.; writing—original draft preparation, P.Z.; writing—review and editing, P.Z., X.L. and C.W.; funding acquisition, P.Z. and Y.M.; supervision, X.Q. and C.W.; project administration, P.Z. and Y.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the R&D Program of the Beijing Municipal Education Commission (under grant number: KM202111417007).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Hypothesis A1.

It is assumed that

{\{I_{i}\}}_{i = 1}^{N} \in ℝ^{w \times h}

is a set of fully aligned low-stability multi-temporal remote sensing image sequences of the same region. The texture difference noise of image

I_{i}

is recorded as

e_{i}

, which is mainly composed of the superposition of noise

e_{i}^{c}

caused by the change of land-cover and noise

e_{i}^{r}

caused by radiation difference,

e_{i} = e_{i}^{c} \oplus e_{i}^{r}

.

{\{E_{i}^{0}\}}_{i = 1}^{N} \in ℝ^{w \times h}

represents the set of matrices that complement 0 for the vacant area of surface texture noise

e_{i}

according to the size of

I_{i}

. The elements of matrix

E_{i}^{0}

contain positive and negative values.

{\{Z_{i}^{0}\}}_{i = 1}^{N} \in ℝ^{m_{1} \times m_{2}}

is the set of images removing texture noise

e_{i}

that is expected to be recovered from the image sequence

{\{I_{i}\}}_{i = 1}^{N}

. Because the noise caused by the unstable surface is eliminated, we call image

Z_{i}^{0}

a stable feature image. There are

I_{i} = Z_{i}^{0} + E_{i}^{0}

,

E_{i}^{0} = E_{i}^{c} \oplus E_{i}^{r}

, g(

I_{1}

)

= \dots =

g(

I_{N}

), g(

Z_{1}^{0}

)

= \dots =

g(

Z_{N}^{0}

),

Z_{1}^{0} (x, y) = \dots = Z_{N}^{0} (x, y)

, where

i \in \{1, \dots, N\}

, “+” represents the sum of matrices and “

\oplus

” represents the noise superposition, g(

Z

represents the two-dimensional spatial position of image Z, and

Z (x, y)

represents the pixel gray value of image Z at two-dimensional spatial coordinates (x, y).

If vec:

ℝ^{w \times h} \to ℝ^{m}

denotes the operator that selects an m-pixel region of interest (typically m

≫

N) and stacks it as a vector, then there are M

= [vec (I_{1}), \dots, vec (I_{N})] \in ℝ^{m \times N}

,

L^{0} = [vec (Z_{1}^{0}), \dots, vec (Z_{N}^{0})] \in ℝ^{m \times N}

and

S^{0} = [vec (E_{1}^{0}), \dots, vec (E_{N}^{0})] \in ℝ^{m \times N}

. Because of

I_{i} = Z_{i}^{0} + E_{i}^{0}

, there is

M = L^{0} + S^{0}

. In the case of Hypothesis A1, since

{\{Z_{i}^{0}\}}_{i = 1}^{N}

is the image set expected to be restored after removing noise in the fully aligned same region, the

L^{0}

constructed by it is a low-rank matrix with a rank close to 1. For large-scale global images,

e^{c}

is a local image noise; after radiation normalization,

e^{r}

is mainly reflected in the local imaging noise caused by the change of solar altitude angle. Therefore,

S^{0}

is a sparse matrix, and M is an approximate low-rank matrix. Restoring the low-rank structure of M is a typical low-rank matrix factorization (LRMF) problem, which can be described as:

\min_{L, S} {| | S | |}_{2} s . t . r a n k (L) \leq r, M = L + S

(A1)

where

{| | \cdot | |}_{2}

denotes the 2-norm and

1 \leq r ≪ m i n (m, N)

. Then, through each column of matrices L and S, the stable feature image sequence estimation

{\{Z_{i}\}}_{i = 1}^{N}

and noise matrix sequence estimation

{\{E_{i}\}}_{i = 1}^{N}

are recovered.

Due to the influence of multiple factors, the low-stability complex terrain multi-temporal image sequence coarsely aligned by geocoding or image registration usually has different degrees of alignment errors. When the image alignment error conforms to the local random distribution and global contingency, we call this situation basic alignment. Differently from RASL [42] and RRM [49], we regard the imaging difference caused by the offset between the sensed image I and the stable feature image Z expected to be restored as noise. We can express this situation as follows:

Hypothesis A2.

It is assumed that

{\{I_{i}\}}_{i = 1}^{N} \in ℝ^{w \times h}

is a set of low-stability multi-temporal remote sensing image sequences near the same region with basic alignment. Taking the spatial position of the expected restored stable feature image

Z_{i}^{0}

as a reference, the pseudo texture noise caused by the alignment error is recorded as

e_{i}^{d}

.

e_{i}^{d}

includes and is limited to the following two cases: (1) There is offset noise caused by alignment errors in the local areas randomly distributed in the space of multi-temporal images. For complex terrain remote sensing images, local offset noise caused by terrain distortion is more common. (2) A few images

I_{i} (1 \leq i \leq N_{d} ≪ N)

have offset noise caused by global (or large range) alignment errors. If the texture difference noise of the image sequence conforms to Hypothesis A1, the image

I_{i}

comprehensive noise

e_{i} = e_{i}^{c} \oplus e_{i}^{r} \oplus e_{i}^{d}

.

In the case of Hypothesis A2, since

{\{Z_{i}^{0}\}}_{i = 1}^{N}

is the image set expected to be restored after removing noise and alignment errors, the

L^{0}

constructed by it is still a low-rank matrix in the basically aligned same region. According to the two limiting conditions of

e_{i}^{d}

in Hypothesis A2, the matrix

S^{0} = [v e c (E_{1}^{0}), \dots, v e c (E_{N}^{0})] \in ℝ^{m \times N}

still can be considered as sparse. When S is a sparse large noise matrix, Problem (1) is transformed into a bi-objective optimization problem:

\min_{L, S} (r a n k (L), {| | S | |}_{0}) s . t . M = L + S

(A2)

where

{| | \cdot | |}_{0}

is 0-norm. In Formula (2),

rank (L)

and

{| | S | |}_{0}

are nonlinear and nonconvex, so it is difficult to optimize. Using the robust principal component analysis (RPCA) [44] method, question (A2) is converted into a tractable convex optimization program:

\min_{L, S} {| | L | |}_{*} + λ {| | S | |}_{1} s . t . M = L + S

(A3)

where

{| | \cdot | |}_{*}

is nuclear-norm,

{| | \cdot | |}_{1}

is 1-norm and λ (>0) is a compromise factor. By solving the optimization problem (A3), we can recover the low-rank matrix estimate L and the coefficient matrix estimate S. Then the stable characteristic image sequence

{\{Z_{i}\}}_{i = 1}^{N}

and noise sequence matrix

{\{E_{i}\}}_{i = 1}^{N}

are recovered.

Next, combined with the algorithm implementation of low-rank decomposition, we analyze the composition and characteristics of the restored low-rank matrix and the stable feature image. The accelerated proximal gradient (APG) [53], augmented Lagrangian method (ALM) [54], and inexact ALM [55] algorithms can be used to solve the optimization problem (A2). The algorithm generally adopts a singular value threshold (SVT) to iteratively optimize the kernel norm of a low-rank matrix, and uses a soft threshold operator to optimize the 0-norm of a sparse matrix. Through alternating iterative calculation, it gradually approximates to the low-rank matrix L and sparse matrix S. The SVT operator is calculated by partial SVD. The final recovered low-rank matrix can be expressed as:

L^{*} = \sum_{i = 1}^{N} {(σ_{i} - δ)}_{+} u_{i} v_{i}^{T} = \sum_{i = 1}^{R} σ_{i} u_{i} v_{i}^{T}

(A4)

where δ is the threshold of the penalty item,

{(x)}_{+} = \max (x, 0)

.

u_{i}

,

v_{i}

and

σ_{i}

are the left singular vectors, the right singular vector, and the singular values, respectively. N is the column number of matrix L, and R is the number of

{(σ_{i} - δ)}_{+}

> 0. According to the composition and low rank of L, it is concluded that:

Theorem A1.

For an approximate low-rank data matrix M, through the low-rank decomposition based on the column vector, any column j of the recovered low-rank matrix

L

can be expressed as

L_{(j)} = \sum_{i = 1}^{R} σ_{i} v_{i}^{j} u_{i}

, where

{\{u_{i}\}}_{i = 1}^{R} \in ℝ^{m}

and

{\{v_{i}\}}_{i = 1}^{R} \in ℝ^{N}

are unit vectors,

v_{i}^{j}

represents the j-th element of unit vector

v_{i}

, and R = rank (L). Because the matrix L is low-rank, the columns of L are highly correlated and have an approximately linear correlation.

According to the sparsity requirements of Formula (A3) on the recovery matrix

S

, it is concluded that:

Theorem A2.

For an approximate low-rank data matrix M, through the low-rank decomposition based on the column vector, the sparse matrix S recovered must have a partial region

S^{Ω}

whose value is equal to or close to 0. According to

M = L + S

, it can be considered that the region Ω of matrix M and L is equal, that is,

M^{Ω} = L^{Ω}

. At the same time, in order to achieve sparse optimization, the iterative process makes a sparse approximation to the region with the maximum column co-occurrence probability of M.

In the low-rank decomposition process, each iterative update is the overall optimization of L and S estimates. This process does not take the difference of spatial location into account. It is only a principal component purify (PCP) solution based on the matrix value, that is, the intensity information of the image. Therefore, according to Theorems A1 and A2 and Hypothesis A2, we draw Inferences A1.1, A1.2, and A2.

Inference A1.1.

For a group of multi-temporal remote sensing image sequences

{\{I_{i}\}}_{i = 1}^{N}

satisfying Hypothesis A2, a group of stable feature image sequences

{\{Z_{i}\}}_{i = 1}^{N}

and a group of corresponding sparse noise matrix sequences

{\{E_{i}\}}_{i = 1}^{N}

can be recovered by low-rank decomposition of the matrix M composed of the images straightened in columns. There are

I_{i} = Z_{i} + E_{i}

,

E_{i} = E_{i}^{c} \oplus E_{i}^{r} \oplus E_{i}^{d}

, where

{\{Z_{i}\}}_{i = 1}^{N} \in ℝ^{w \times h}

is highly correlated and has similar texture features at the same spatial location. For any coordinate (x, y) in the global region, there is

Z_{1} (x, y) \approx \dots \approx Z_{N} (x, y)

. In addition, there is g(

Z_{1}

)

\approx \dots \approx

g(

Z_{N}

) in the spatial position.

Inference A1.2.

For the stable feature image sequence

{\{Z_{i}\}}_{i = 1}^{N}

recovered from a set of multi-temporal remote sensing image sequences

{\{I_{i}\}}_{i = 1}^{N}

conforming to Hypothesis A2, there must be some stable feature images

Z_{i} (1 \leq i \leq N_{S 1} \leq N)

, in which several local regions

Ω_{j}

(

j \geq 1

) are completely aligned. For any coordinate (x, y) in the region

Ω_{j}

, there is

Z_{1}^{Ω_{j}} (x, y) = \dots = Z_{N}^{Ω_{j}} (x, y)

. In addition, there is g(

Z_{1}^{Ω_{j}}

)

= \dots =

g(

Z_{N}^{Ω_{j}}

) in the spatial position.

According to Theorem A1, each column of the low-rank matrix L is a different linear combination of the same unit eigenvector group

{\{u_{i}\}}_{i = 1}^{R}

. Therefore, the degree of the outlier of the initial image sequence directly affects the generation of the principal component vector

{\{u_{i}\}}_{i = 1}^{R}

in low-rank decomposition, the overall restoration quality of the stable feature image sequence

{\{Z_{i}\}}_{i = 1}^{N}

, and the restoration of the local offset region of the individual image. Outliers include texture differences and spatial offsets. We define the overall alignment degree of an image sequence as spatial consistency, and draw a conclusion:

Inference A2.

When the texture difference noise remains unchanged, the restoration quality of the stable feature image sequence

{\{Z_{i}\}}_{i = 1}^{N}

is positively correlated with the spatial consistency of the initial image sequence

{\{I_{i}\}}_{i = 1}^{N}

. The higher the spatial consistency of the initial image sequence

{\{I_{i}\}}_{i = 1}^{N}

, the smaller the overall distortion of the stable feature image restored by low-rank decomposition, and the more sparse the separated noise matrix. For any image

I_{i} \in {\{I_{i}\}}_{i = 1}^{N}

, the restoration quality of the stable characteristic image

Z_{i}

and the outlier degree of the initial image

I_{i}

is inversely correlated. The lower the initial outlier of image

I_{i}

in the image sequence, the smaller the distortion of the restored stable feature image

Z_{i}

, the clearer the texture, and the closer

Z_{i}

is to the expected stable feature image

Z_{i}^{0}

. At the same time, the more sparse the separated noise matrix

E_{i}

is, the smaller the value of

| | E_{i} | |_{1} = | | I_{i} - Z_{i} | |_{1}

is.

According to the principle of low-rank sparse decomposition, there are

\sum_{i = 1}^{N} | | I_{i} - Z_{i} | |_{1} = \sum_{i = 1}^{N} | | E_{i} | |_{1} = {| | S | |}_{1}

. According to Theorem A2, determined by the sparsity of S, the matrix sequence

{\{E_{i}\}}_{i = 1}^{N}

has overall sparsity. Therefore, the image pair (

I_{i}, Z_{i}

) has small comprehensive differences. Let

diff (,)

represent the comprehensive difference of image pairs, and the following inference can be drawn:

Inference A3.

For a set of multi-temporal remote sensing image sequences

{\{I_{i}\}}_{i = 1}^{N}

conforming to Hypothesis A2 and its restored stable feature image sequence

{\{Z_{i}\}}_{i = 1}^{N}

, when the noise is widespread in the original image sequence, for any selected image

I_{r} \in {\{I_{i}\}}_{i = 1}^{N}

, there is

\sum_{i = 1}^{N} diff (I_{i}, I_{r}) > \sum_{i = 1}^{N} diff I_{i}, Z_{i}

; when N takes a larger value, there must be a set of image sequences

{\{I_{i}\}}_{i = 1}^{N_{s 2}} \subseteq {\{I_{i}\}}_{i = 1}^{N}

, and then there is

diff (I_{i}, I_{r})

>

diff I_{i}, Z_{i} (1 \leq i \leq N_{s 2} \leq N)

.

The derivation process of Inference A3 is as follows:

Here, we use the pixel intensity difference 1-norm

| | I_{i} - I_{j} | |

of the image pair to approximately calculate

diff (I_{i}, I_{j})

. It is assumed that the image set

{\{I_{i}\}}_{i = 1}^{N}

conforms to Hypothesis 2, the image subset

φ =

{

I_{1}, I_{2}, \dots, I_{t}

}

\subseteq {\{I_{i}\}}_{i = 1}^{N}

has comprehensive noises {

e_{1}, e_{2}, \dots, e_{t}

}, and low-rank decomposition can eliminate these noises, that is,

I_{i} = Z_{i} + e_{i}

. For any image

I_{r} \in {\{I_{i}\}}_{i = 1}^{N}

, ignoring the coincidence cancellation in the noise area, there are:

\sum_{i = 1}^{N} diff I_{i}, Z_{i} = {| | S | |}_{1} = \sum_{i = 1}^{t} | | e_{i} | |_{1}

(A5)

\sum_{i = 1}^{N} diff (I_{i}, I_{r}) = \{\begin{matrix} \sum_{i = 1}^{t} | | e_{i} | |_{1} = \sum_{i = 1}^{N} diff I_{i}, Z_{i} I_{r} \notin φ \\ \sum_{i = 1, i \neq r}^{t} | | e_{i} | |_{1} + (N - 1) * | | e_{r} | |_{1} = \sum_{i = 1}^{t} | | e_{i} | |_{1} + (N - 2) * | | e_{r} | |_{1} I_{r} \in φ \end{matrix}

(A6)

When comprehensive noise is widespread, then

I_{r} \in \{I_{1}, I_{2}, \dots, I_{t}\} (t \mapsto N)

. There is the equation:

\sum_{i = 1}^{N} diff (I_{i}, I_{r}) = \sum_{i = 1}^{N} diff (I_{i}, Z_{i}) + (N - 2) * | | e_{r} | |_{1}

(A7)

according to Equation (A7), when N > 2, there is

\sum_{i = 1}^{N} diff (I_{i}, I_{r}) > \sum_{i = 1}^{N} diff I_{i}, Z_{i}

. When N is large, there is

\sum_{i = 1}^{N} diff (I_{i}, I_{r}) ≫ \sum_{i = 1}^{N} diff I_{i}, Z_{i}

.

Due to

diff (I_{r}, I_{r})

= 0, the mean difference between the image sequence

{\{I_{i}\}}_{i = 1}^{N}

and image

I_{r}

is

\frac{\sum_{i = 1}^{N} diff (I_{i}, I_{r})}{N - 1}

. The mean difference of the image pair

{\{I_{i}, Z_{i}\}}_{i = 1}^{N}

is

\frac{\sum_{i = 1}^{N} diff (I_{i}, Z_{i})}{N}

. When N is large, which means

N - 1 \approx N

, there is

\frac{\sum_{i = 1}^{N} diff (I_{i}, I_{r})}{N - 1} \approx \frac{\sum_{i = 1}^{N} diff (I_{i}, I_{r})}{N} > \frac{\sum_{i = 1}^{N} diff (I_{i}, Z_{i})}{N}

. Thus, we can draw Inference A3.

References

Shen, H.; Meng, X.; Zhang, L. An Integrated Framework for the Spatio–Temporal–Spectral Fusion of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7135–7148. [Google Scholar] [CrossRef]
Zhou, Y.; Rangarajan, A.; Gader, P.D. An Integrated Approach to Registration and Fusion of Hyperspectral and Multispectral Images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3020–3033. [Google Scholar] [CrossRef]
Tang, Y.; Wang, L.-J.; Ma, G.-C.; Jia, H.-J.; Jin, X. Emergency monitoring of high-level landslide disasters in Jinsha River using domestic remote sensing satellites. J. Remote Sens. 2019, 23, 252–261. [Google Scholar]
Song, F.; Yang, Z.; Gao, X.; Dan, T.; Yang, Y.; Zhao, W.; Yu, R. Multi-Scale Feature Based Land Cover Change Detection in Mountainous Terrain Using Multi-Temporal and Multi-Sensor Remote Sensing Images. IEEE Access 2018, 6, 77494–77508. [Google Scholar] [CrossRef]
Bordone Molini, A.; Valsesia, D.; Fracastoro, G.; Magli, E. DeepSUM: Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3644–3656. [Google Scholar] [CrossRef] [Green Version]
Guan, X.-B.; Shen, H.-F.; Gan, W.-X.; Zhang, L.-P. Estimation and spatiotemporal analysis of winter NPP in Wuhan based on Landsat TM/ETM+ Images. Remote Sens. Technol. Appl. 2015, 30, 884–890. [Google Scholar]
Ye, Y. Fast and Robust Registration of Multimodal Remote Sensing Images via Dense Orientated Gradient Feature. ISPRS—Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, XLII-2/W7, 1009–1015. [Google Scholar] [CrossRef] [Green Version]
Ma, W.; Wen, Z.; Wu, Y.; Jiao, L.; Gong, M.; Zheng, Y.; Liu, L. Remote Sensing Image Registration with Modified SIFT and Enhanced Feature Matching. IEEE Geosci. Remote Sens. Lett. 2017, 14, 3–7. [Google Scholar] [CrossRef]
Chang, H.-H.; Wu, G.-L.; Chiang, M.-H. Remote Sensing Image Registration Based on Modified SIFT and Feature Slope Grouping. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1363–1367. [Google Scholar] [CrossRef]
Chen, S.; Zhong, S.; Xue, B.; Li, X.; Zhao, L.; Chang, C.-I. Iterative Scale-Invariant Feature Transform for Remote Sensing Image Registration. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3244–3265. [Google Scholar] [CrossRef]
Chen, S.; Li, X.; Zhao, L.; Yang, H. Medium-Low Resolution Multisource Remote Sensing Image Registration Based on SIFT and Robust Regional Mutual Information. Int. J. Remote Sens. 2018, 39, 3215–3242. [Google Scholar] [CrossRef]
Feng, R.; Du, Q.; Li, X.; Shen, H. Robust Registration for Remote Sensing Images by Combining and Localizing Feature- and Area-Based Methods. ISPRS Journal of Photogrammetry and Remote Sensing. 2019, 151, 15–26. [Google Scholar] [CrossRef]
Feng, R.; Du, Q.; Luo, H.; Shen, H.; Li, X.; Liu, B. A Registration Algorithm Based on Optical Flow Modification for Multi-temporal Remote Sensing Images Covering the Complex-terrain Region. J. Remote Sens. 2021, 25, 630. [Google Scholar] [CrossRef]
Feng, R.; Du, Q.; Shen, H.; Li, X. Region-by-Region Registration Combining Feature-Based and Optical Flow Methods for Remote Sensing Images. Remote Sens. 2021, 13, 1475. [Google Scholar] [CrossRef]
Schubert, A.; Small, D.; Jehle, M.; Meier, E. COSMO-Skymed, TerraSAR-X, and RADARSAT-2 Geolocation Accuracy after Compensation for Earth-System Effects. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 22–27 July 2012. [Google Scholar]
Aguilar, M.A.; del Mar Saldana, M.; Aguilar, F.J. Assessing Geometric Accuracy of the Orthorectification Process from Geoeye-1 and Worldview-2 Panchromatic Images. Int. J. Appl. Earth Obs. Geoinf. 2013, 21, 427–435. [Google Scholar] [CrossRef]
Chen, L.-C.; Li, S.-H.; Chen, J.-J.; Rau, J.-Y. Method of Ortho-Rectification for High-Resolution Remote Sensing Image, October 9, 2008. Available online: https://www.patentsencyclopedia.com/app/20080247669 (accessed on 25 October 2021).
Hasan, R.H. Evaluation of the Accuracy of Digital Elevation Model Produced from Different Open Source Data. J. Eng. 2019, 25, 100–112. [Google Scholar] [CrossRef] [Green Version]
Nag, S. Image Registration Techniques: A Survey. engrXiv 2017. [Google Scholar] [CrossRef] [Green Version]
Zitová, B.; Flusser, J. Image Registration Methods: A Survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef] [Green Version]
Martinez, A.; Garcia-Consuegra, J.; Abad, F. A Correlation-Symbolic Approach to Automatic Remotely Sensed Image Rectification. In Proceedings of the IEEE 1999 International Geoscience and Remote Sensing Symposium, IGARSS’99 (Cat. No.99CH36293), Hamburg, Germany, 28 June–2 July 1999. [Google Scholar]
Hel-Or, Y.; Hel-Or, H.; David, E. Fast Template Matching in Non-Linear Tone-Mapped Images. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
Chen, H.-M.; Varshney, P.K.; Arora, M.K. Performance of Mutual Information Similarity Measure for Registration of Multitemporal Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2003, 41, 2445–2454. [Google Scholar] [CrossRef]
Kern, J.P.; Pattichis, M.S. Robust Multispectral Image Registration Using Mutual-Information Models. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1494–1505. [Google Scholar] [CrossRef]
Zhao, L.-Y.; Lü, B.-Y.; Li, X.-R.; Chen, S.-H. Multi-source remote sensing image registration based on scale-invariant feature transform and optimization of regional mutual information. Acta Physica Sinica. 2015, 64, 124204. [Google Scholar] [CrossRef]
Ravanbakhsh, M.; Fraser, C.S. A Comparative Study of DEM Registration Approaches. J. Spat. Sci. 2013, 58, 79–89. [Google Scholar] [CrossRef]
Murphy, J.M.; Le Moigne, J.; Harding, D.J. Automatic Image Registration of Multimodal Remotely Sensed Data with Global Shearlet Features. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1685–1704. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In Computer Vision—ECCV 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
Morel, J.-M.; Yu, G. ASIFT: A New Framework for Fully Affine Invariant Image Comparison. SIAM J. Imaging Sci. 2009, 2, 438–469. [Google Scholar] [CrossRef]
Ke, Y.; Sukthankar, R. PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary Robust Invariant Scalable Keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
Rosten, E.; Porter, R.; Drummond, T. Faster and Better: A Machine Learning Approach to Corner Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 105–119. [Google Scholar] [CrossRef] [Green Version]
Ye, Y.; Shan, J.; Bruzzone, L.; Shen, L. Robust Registration of Multimodal Remote Sensing Images Based on Structural Similarity. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2941–2958. [Google Scholar] [CrossRef]
Sedaghat, A.; Ebadi, H. Remote Sensing Image Matching Based on Adaptive Binning SIFT Descriptor. IEEE Trans. Geosci. Remote Sens. 2015, 53, 5283–5293. [Google Scholar] [CrossRef]
Brox, T.; Bruhn, A.; Papenberg, N.; Weickert, J. High Accuracy Optical Flow Estimation Based on a Theory for Warping. In Lecture Notes in Computer Science; Springer Berlin Heidelberg: Berlin, Heidelberg, 2004; pp. 25–36. [Google Scholar]
Ren, Z.; Li, J.; Liu, S.; Zeng, B. Meshflow Video Denoising. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017. [Google Scholar]
Liu, L.; Chen, F.; Liu, J.-B. Optical Flow and Feature Constrains Algorithm for Remote Sensing Image Registration. Comput. Eng. Des. 2014, 35, 3127–3131. [Google Scholar]
Xu, F.; Yu, H.; Wang, J.; Yang, W. Accurate Registration of Multitemporal UAV Images Based on Detection of Major Changes. In Proceedings of the 2018 21st International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018. [Google Scholar]
Brigot, G.; Colin-Koeniguer, E.; Plyer, A.; Janez, F. Adaptation and Evaluation of an Optical Flow Method Applied to Coregistration of Forest Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2923–2939. [Google Scholar] [CrossRef] [Green Version]
Peng, Y.; Ganesh, A.; Wright, J.; Xu, W.; Ma, Y. RASL: Robust Alignment by Sparse and Low-Rank Decomposition for Linearly Correlated Images. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2233–2246. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Shen, B.; Ling, H. Online Robust Image Alignment via Iterative Convex Optimization. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
Candes, E.; Li, X.; Ma, Y.; Wright, J. Robust Principal Component Analysis?: Recovering Low-Rank Matrices from Sparse Errors. In Proceedings of the 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop, Jerusalem, Israel, 4–7 October 2010. [Google Scholar]
Chi, Y.; Lu, Y.M.; Chen, Y. Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview. IEEE Trans. Signal Processing 2019, 67, 5239–5269. [Google Scholar] [CrossRef] [Green Version]
Torr, P.H.S.; Murray, D.W. The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix. Int. J. Comput. Vis. 1997, 24, 271–300. [Google Scholar] [CrossRef]
Russakoff, D.B.; Tomasi, C.; Rohlfing, T.; Maurer, C.R., Jr. Image Similarity Using Mutual Information of Regions. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2004; pp. 596–607. [Google Scholar]
Zaragoza, J.; Tat-Jun, C.; Tran, Q.-H.; Brown, M.S.; Suter, D. As-Projective-as-Possible Image Stitching with Moving DLT. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1285–1298. [Google Scholar] [PubMed] [Green Version]
Hu, T.; Zhang, H.; Shen, H.; Zhang, L. Robust Registration by Rank Minimization for Multiangle Hyper/Multispectral Remotely Sensed Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2443–2457. [Google Scholar] [CrossRef]
Goshtasby, A. Piecewise Linear Mapping Functions for Image Registration. Pattern Recognit. 1986, 19, 459–466. [Google Scholar] [CrossRef]
Han, Y.; Bovolo, F.; Bruzzone, L. An Approach to Fine Coregistration between Very High Resolution Multispectral Images Based on Registration Noise Distribution. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6650–6662. [Google Scholar] [CrossRef]
Gharbia, R.; Ahmed, S.A.; Hassanien, A.E. Remote Sensing Image Registration Based on Particle Swarm Optimization and Mutual Information. In Advances in Intelligent Systems and Computing; Springer: New Delhi, India, 2015; pp. 399–408. [Google Scholar]
Lin, Z.; Ganesh, A.; Wright, J.; Wu, L.; Chen, M.; Ma, Y. Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix. 2009. Available online: https://www.ideals.illinois.edu/handle/2142/74352 (accessed on 20 January 2021).
Lin, Z.; Chen, M.; Ma, Y. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
Yuan, X.-M.; Yang, J.-F. Sparse and Low Rank Matrix Decomposition via Alternating Direction Method. Preprint 2009, 12. [Google Scholar]

Figure 1. Examples of multi-temporal satellite remote sensing panchromatic images with low-stability and complex terrain: (a) mountainous area covered by seasonal vegetation (700 × 850 pixels); (b) mine tailings pond (950 × 1100 pixels).

Figure 2. Simple flowchart of the CCFR framework.

Figure 3. Flowchart of the LRC-BRE (note: the batch registration step is not necessary for batch reference extraction, which is mainly considered for ablation experiments).

Figure 4. Block feature matching and local linear transformation method (BFM-LLT).

Figure 5. Template matching optimization based on the similarity of the local region where the matching inner point is located. (a) Local area of reference image (taking a stable feature image as an example, size

90 \times 120

pixels). (b) Local area of sensed image (

90 \times 120

pixels).

Figure 5. Template matching optimization based on the similarity of the local region where the matching inner point is located. (a) Local area of reference image (taking a stable feature image as an example, size

90 \times 120

pixels). (b) Local area of sensed image (

90 \times 120

pixels).

Figure 6. Global mean image (3600 × 3900 pixels) and local areas (600 × 650 pixels) of the remote sensing image sequence and relevant results by LRC-BRE and CCFR in an ecological reserve. The leftmost images of (a–d) are the global mean images of the original image sequence, the stable feature image sequence restored by LRC-BRE, the stable feature image sequence restored by CCFR, and the image sequence registered by CCFR, respectively. The right images of (a–d) are the local area of the above image sequence, whose spatial position corresponds to the yellow box on the left image.

Figure 7. Feature extraction and matching results of local area image (600 × 650 pixels) and restored stable feature image of ecological reserve. The feature algorithms used are (a) SURF, (b) SIFT, (c) ORB, and (d–i) SURF. The roman numeral in the lower right corner is the image number, which corresponds to the image with the same number and background color in Figure 6. The green cross represents the detected key points, the red circle represents the filtered outer points, and the green circle and its associated yellow connecting line are the reserved inner point pairs.

Figure 8. Registration results of image pairs in ecological reserves. (a) Reference image (3600 × 3900 pixels). (b) Sensed image (3600 × 3900 pixels). (c) The overlap of the original images (shown in pseudo color). (d) The overlap of the reference image and the result of the proposed algorithm. Magnified images (160 × 160 pixels) from (e) the original images, (f) RRM, (g) PLM, (h) APAP, (i) OFM, (j) LRC-BRE-BR, and (k) CCFR. In them, the green area of the magnified images is the reference image, and the gray area is the registered image. A red point suggests that the processed image is well-aligned with the reference image.

Figure 9. Global mean image (2850 × 3300 pixels) and local areas (950 × 1100 pixels) of the remote sensing image sequence and relevant results by LRC-BRE and CCFR in the mine production area. The leftmost images of (a–d) are the global mean images of the original image sequence, the stable feature image sequence restored by LRC-BRE, the stable feature image sequence restored by CCFR, and the image sequence registered by CCFR, respectively. The right images of (a–d) are the local areas of the above image sequence, whose spatial position corresponds to the yellow box of the left image.

Figure 10. Feature extraction and matching results of local original image (950 × 1100 pixels) and stable feature image of mine production area. The feature algorithms used are (a) SURF, (b) SIFT, (c) ORB, and (d–i) SIFT. (h) and (i) filtered the inner points again by MOF. The roman numeral in the lower right corner is the image number, which corresponds to the image with the same number and background color as in Figure 9.

Figure 11. Registration results of image pairs in the mine production area. (a) Reference image (2850 × 3300 pixels). (b) Sensed image (2850 × 3300 pixels). (c) The overlap of the original images (shown in pseudo color). (d) The overlap of the reference image and the result of the proposed algorithm. Magnified images (180 × 180 pixels) from (e) the original images, (f) RRM, (g) PLM, (h) APAP, (i) OFM, (j) LRC-BRE-BR, and (k) the CCFR method. In them, the green area of the magnified images is the reference image, and the gray area is the registered image. A red point suggests that the processed image is well aligned with the reference image.

Figure 12. Global mean image (3900 × 4000 pixels) and local areas (780 × 800 pixels) of the remote sensing image sequence and relevant results by LRC-BRE and CCFR in the mine environmental treatment area. The leftmost images of (a–d) are the global mean images of the original image sequence, the stable feature image sequence restored by LRC-BRE, the stable feature image sequence restored by CCFR, and the image sequence registered by CCFR, respectively. The right images of (a–d) are the local areas of the above image sequence, whose spatial position corresponds to the yellow box of the left image.

Figure 13. Feature extraction and matching results of local original image (780 × 800 pixels) and stable feature image of mine treatment area. The feature algorithms used are (a) SURF, (b) SIFT, (c) ORB, and (d–f) SIFT. (e) Filtering the inner points again by MOF. The roman numeral in the lower right corner is the image number, which corresponds to the image with the same number and background color in Figure 12.

Figure 14. Registration results of image pairs in the mine treatment area. (a) Reference image (3900 × 4000 pixels). (b) Sensed image (3900 × 4000 pixels). (c) The overlap of the original images (shown in pseudo color). (d) The overlap of the reference image and the result of the proposed algorithm. Magnified images (180 × 180 pixels) from (e) the original images, (f) RRM, (g) PLM, (h) APAP, (i) OFM, and (j) the CCFR method.

Figure 15. Global multi-temporal image sequence feature matching quantization results in the three experiments. (a) Ecological reserve. (b) Mine production area. (c) Mine environmental treatment area. The abscissa is the imaging date of the sensing image, and the ordinate is the number of feature matching inner points. The blue polyline, gray polyline, and yellow polyline are, respectively, the number of feature matching inner points with the lowest outlier image

I_{φ}

, the stable feature image

Z_{i}^{* L}

extracted by LRC-BRE, and the stable feature image

Z_{i}^{* C}

extracted by CCFR.

Figure 15. Global multi-temporal image sequence feature matching quantization results in the three experiments. (a) Ecological reserve. (b) Mine production area. (c) Mine environmental treatment area. The abscissa is the imaging date of the sensing image, and the ordinate is the number of feature matching inner points. The blue polyline, gray polyline, and yellow polyline are, respectively, the number of feature matching inner points with the lowest outlier image

I_{φ}

, the stable feature image

Z_{i}^{* L}

extracted by LRC-BRE, and the stable feature image

Z_{i}^{* C}

extracted by CCFR.

Figure 16. Rows (a–c) of the images correspond to the first four principal component vectors, which were extracted by low-rank decomposition from 9 local images of the first row in Figure 6a, all 18 local images in Figure 6a and Figure 17 local images (excluding severely outlier block images after coarse registration by CCFR) in Figure 6a, respectively. Each block size is 600 × 650 pixels.

Figure 17. The mine areas of the remote sensing images from the GF-2 and GF-1 optical satellites. (a) The original image set. (b) A local magnification (32 × 40 pixels) of the GF-1 image; (c) The result of sampling on (b); (d) A local magnification (80 × 100 pixels) of the GF-2 image.

Figure 18. Registration results of image pairs in the mine production area from the GF-1 and GF-2 satellites. (a) Reference image from GF-2 (2400 × 3000 pixels). (b) Sensed image from GF-1 (original size 960 × 1200 pixels). (c) The overlap of the original images (shown in pseudo color). (d) The overlap of the reference image and the result of the proposed algorithm. Magnified images (200 × 200 pixels) from (e) the original images, (f) RRM, (g) PLM, (h) APAP, (i) OFM, and (j) the CCFR method. In them, the green area of the magnified images is the reference image, and the gray area is the registered image. A red point suggests that the processed image is well-aligned with the reference image.

Table 1. Overview of the experimental data.

Type	Time	No.	Size	Res ¹	Characteristics	Preprocessing	BFM Method
Ecological reserve	2015–2020	18	3600 × 3900 pixels	0.8 m	Mountainous areas with high vegetation coverage; deciduous vegetation is widely distributed.	Geocoding alignment + ortho-rectification (30 m precision DEM)	SURF 6 × 6 block
Mine production	2015–2021	18	2850 × 3300 pixels	0.8 m	Large-scale mining and waste discharge; evergreen vegetation and deciduous vegetation are staggered.	Geocoding alignment + ortho-rectification (30 m precision DEM)	SIFT 3 × 3 block
Mine environmental treatment	2014–2020	22	3900 × 4000 pixels	0.8 m	Small-scale mining activities, greening treatment, deciduous vegetation is widely distributed	Geocoding alignment	SIFT 5 × 5 block

¹ “Res” represents the spatial resolution.

Table 2. Quantitative evaluation of the registration results in the three experiments.

	Ecological Reserve			Mine Production			Mine Environmental Treatment
	NCC	MI	RMSE/pixels	NCC	MI	RMSE/pixels	NCC	MI	RMSE/pixels
Original	0.28136	0.059138	0.37336	0.14756	0.026849	0.28831	0.28807	0.058247	0.27997
RRM	0.40549	0.10822	0.3533	0.27153	0.054601	0.26371	0.45068	0.13519	0.25229
PLM	0.50274	0.2535	0.31822	0.29541	0.093957	0.27233	0.49631	0.16344	0.24489
APAP	0.63317	0.32688	0.3533	0.34195	0.097451	0.25461	0.53916	0.19488	0.23318
OFM	0.6786	0.34853	0.30167	0.33511	0.09357	0.2557	0.50141	0.17111	0.24879
LRC-BRE-BR	0.68687	0.35448	0.31131	0.36022	0.10722	0.25249	null	null	null
CCFR	0.70928	0.39685	0.28171	0.38415	0.11548	0.24826	0.6413	0.30463	0.22609

Table 3. Overview of the experimental data of GF-1 and GF-2 satellites.

Type	Time	Sensor	No.	Size	Res ¹	Characteristics	Preprocessing
Mine production	2014–2021	GF-1	8	960 × 1200 pixels	2 m	Large-scale underground mining, continuous discharge and leakage of mine waste.	Geocoding alignment, up sampling
Mine production	2014–2021	GF-2	16	2400 × 3000 pixels	0.8 m		Geocoding alignment

¹ “Res” represents the spatial resolution.

Table 4. Quantitative evaluation of the registration results in the experiment with the GF-1 and GF-2 satellites.

	Ecological Reserve
	NCC	MI	RMSE/pixels
Original	0.28861	0.095284	0.39303
RRM	0.30568	0.09869	0.39306
PLM	0.47723	0.29056	0.36538
APAP	0.50373	0.44417	0.39686
OFM	0.50852	0.41378	0.37514
CCFR	0.51342	0.44967	0.3638

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, P.; Luo, X.; Ma, Y.; Wang, C.; Wang, W.; Qian, X. Coarse-to-Fine Image Registration for Multi-Temporal High Resolution Remote Sensing Based on a Low-Rank Constraint. Remote Sens. 2022, 14, 573. https://doi.org/10.3390/rs14030573

AMA Style

Zhang P, Luo X, Ma Y, Wang C, Wang W, Qian X. Coarse-to-Fine Image Registration for Multi-Temporal High Resolution Remote Sensing Based on a Low-Rank Constraint. Remote Sensing. 2022; 14(3):573. https://doi.org/10.3390/rs14030573

Chicago/Turabian Style

Zhang, Peijing, Xiaoyan Luo, Yan Ma, Chengyi Wang, Wei Wang, and Xu Qian. 2022. "Coarse-to-Fine Image Registration for Multi-Temporal High Resolution Remote Sensing Based on a Low-Rank Constraint" Remote Sensing 14, no. 3: 573. https://doi.org/10.3390/rs14030573

APA Style

Zhang, P., Luo, X., Ma, Y., Wang, C., Wang, W., & Qian, X. (2022). Coarse-to-Fine Image Registration for Multi-Temporal High Resolution Remote Sensing Based on a Low-Rank Constraint. Remote Sensing, 14(3), 573. https://doi.org/10.3390/rs14030573

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coarse-to-Fine Image Registration for Multi-Temporal High Resolution Remote Sensing Based on a Low-Rank Constraint

Abstract

1. Introduction

2. Methods

2.1. LRC-BRE: A Low-Rank Constraint-Based Batch Reference Extraction

2.2. BFM-LLT: Block Feature Matching and Local Linear Transformation

2.2.1. Match Outlier Filtering (MOF)

2.2.2. Dual-Weighted Block Fitting (DWBF)

2.3. Algorithm of CCFR

3. Results and Evaluations

3.1. Experimental Data and Related Algorithms

3.2. Visual Quality

3.2.1. Ecological Reserve

3.2.2. Mine Production Area

3.2.3. Mine Environmental Treatment Area

3.3. Evaluation of Registration Results

4. Discussion

4.1. Quantitative Comparison of Feature Matching Results

4.2. Correlation Factors of Restored Stable Feature Image Quality

4.3. Registration with Different Optical Satellite Images

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI