Next Article in Journal
Estimating Land Surface Temperature from Feng Yun-3C/MERSI Data Using a New Land Surface Emissivity Scheme
Previous Article in Journal
Barest Pixel Composite for Agricultural Areas Using Landsat Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Matching Multi-Source Optical Satellite Imagery Exploiting a Multi-Stage Approach

1
School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
2
Satellite Surveying and Mapping Application Center, NASG, Beijing 100048, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2017, 9(12), 1249; https://doi.org/10.3390/rs9121249
Submission received: 22 October 2017 / Revised: 29 November 2017 / Accepted: 30 November 2017 / Published: 1 December 2017
(This article belongs to the Section Remote Sensing Image Processing)

Abstract

:
Geometric distortions and intensity differences always exist in multi-source optical satellite imagery, seriously reducing the similarity between images, making it difficult to obtain adequate, accurate, stable, and well-distributed matches for image registration. With the goal of solving these problems, an effective image matching method is presented in this study for multi-source optical satellite imagery. The proposed method includes three steps: feature extraction, initial matching, and matching propagation. Firstly, a uniform robust scale invariant feature transform (UR-SIFT) detector was used to extract adequate and well-distributed feature points. Secondly, initial matching was conducted based on the Euclidean distance to obtain a few correct matches and the initial projective transformation between the image pair. Finally, two matching strategies were used to propagate matches and produce more reliable matching results. By using the geometric relationship between the image pair, geometric correspondence matching found more matches than the initial UR-SIFT feature points. Further probability relaxation matching propagated some new matches around the initial UR-SIFT feature points. Comprehensive experiments on Chinese ZY3 and GaoFen (GF) satellite images revealed that the proposed algorithm performs well in terms of the number of correct matches, correct matching rate, spatial distribution, and matching accuracy, compared to the standard UR-SIFT and triangulation-based propagation method.

1. Introduction

Image matching is the process of finding corresponding points on multi-view images of the same region; these images can be acquired by different sensors at different times [1]. Image matching is not only the precondition for automatic image registration [2,3], but also the foundation for three-dimensional (3D) modeling [4], change detection [5], image fusion [6], and environmental monitoring [7,8]. The need for multi-source-optical satellite imagery matching has quickly increased along with the increasing diversity of available optical satellite image data. For example, China launched the ZY3-02 satellite [9] on 30 May 2016, which is the second ZY3 series satellite. The first satellite was the ZY3-01 satellite [10], launched on 9 January 2012. With excellent performance, ZY3-01 plays an important role in stereo mapping. ZY3-02 uses a similar method as ZY3-01, but it includes some improvements, having a higher resolution for the optical cameras and adding a laser range finder to improve the vertical positional accuracy without ground control points. In addition, China initiated its high resolution earth observation program in 2010 [11]. This program plans to launch seven high resolution satellites, of which GF1, GF2, GF3, and GF4 satellites have been launched. GF1 and GF2 are optical remote sensing satellites. The panchromatic and multi-spectral resolution of GF1 are 2 m and 8 m, respectively, and those of GF2 are 1 m and 4 m, respectively. Determining how to fully use these optical satellite images from a variety of remote sensors has become an imperative, and the matching of multi-source optical satellite images is one of the critical aspects in this field.
Matching corresponding points between multi-source optical satellite images is difficult due to the geometric deformation caused by scale change, rotation, and view angle transformations, as well as the nonlinear intensity differences caused by different radiation resolutions, and the effects of the atmosphere and radiation noise [12]. Image matching methods, based on local invariant features, are widely used for multi-source optical satellite images due to their robustness for image scale transformation. The most representative image matching method is the scale-invariant feature transform (SIFT) algorithm [13,14]. However, when the standard SIFT operator is applied to multi-source optical satellite image matching, the number of matched feature points is limited; a large percentage of mismatches occur and the correctly matched points are not uniformly distributed, resulting in match failure. To improve the performance of SIFT, researchers have reformed it mainly by: (1) improving the feature extraction operator [15,16,17,18]; (2) improving the feature descriptor [19,20,21]; and (3) improving the matching strategy [22,23,24,25,26]. For example, to improve the feature extraction ability in textureless areas, Sedaggat et al. [17] proposed the uniform robust SIFT (UR-SIFT) algorithm. According to the characteristics of the image scale pyramid, the expected number of feature points of each layer and octave was pre-defined, then the layer images were divided into grids, and the number of feature points retained in each grid was determined by the local image information entropy, feature point contrast, and the number of feature points in the grid. The strategy used to filter the points resulted in the retained feature points having high robustness and were evenly distributed.
To improve the feature descriptor, Ye et al. [21] introduced gradient direction information to SIFT, then the descriptor was constructed based on the gradient orientation restriction SIFT (GOR-SIFT) and shape context, which describes the edge information. The Euclidean distance and the chi-squared statistic ( x 2 ) similarity measure were used to enhance the matching success rate. Ling et al. [23] presented a matching method integrating global Shuttle Radar Topography Mission (SRTM) and image segmentation. Before matching, the geometric epipolar lines were determined with the aid of SRTM, a few seed points were extracted by the Local Contrast (LC) operator, and the image was segmented, which was first conducted by the marker-based watershed algorithm. The image was further processed by Region Adjacency Graph (RAG), and good performance was achieved for the matching by combining the characteristics of distance, angle, and normalized correlation coefficient (DANCC). Triangulation-based methods, which are free of any other external data, have also been proven to be effective for image matching [24,25,26]. First, a few points are matched to build the initial triangles. Next, a certain number of feature points are detected within the corresponding triangles, and these feature points are matched under the triangle constraint. Then, the new matched points are inserted into the triangles, which are updated dynamically. Finally, the process is repeated until termination conditions are met, namely that the triangles are small enough or cannot match new points. For the matching of points in triangles, Geng [27] proposed the use of the nearest six matches around a candidate point to calculate the affine transformation in the local area for predicting the match. Then, the normalized correlation coefficients (NCCs) of the candidate point, with the points around the predicted location, were computed. If the NCC of the point with the highest value was larger than a pre-defined value, they were identified as a matching point pair.
Based on previous studies, this paper presents a multi-stage matching approach, called “EMMOSI” (Effective Matching of Multi-source Optical Satellite Imagery) for multi-source optical satellite images, which does not need any data other than the images themselves. The framework of the matching method is shown in Figure 1. Firstly, well-distributed feature points at each scale were extracted by the UR-SIFT operator. Next, the initial matching was conducted, and the transformation between multi-source images was computed. Finally, a new matching propagation method was used that includes the following two steps: (1) geometric correspondence matching is used to obtain some new correct matches; and (2) probabilistic relaxation matching was used to match the feature points that were not successfully matched in previous steps. The matches obtained in all steps were the final matches. Experimental results on a variety of optical satellite images showed that the proposed method can largely improve the use of feature points and increase the number of correct matches, which were more evenly distributed.

2. Methodology

2.1. UR-SIFT Feature Extraction

UR-SIFT [17] is based on the SIFT algorithm; it improves the robustness and accuracy of feature points by controlling the number, quality, and distribution. The main goal of the algorithm is to retain a certain number of high-quality features across the entire range of the image. The specific steps are as follows:
(1) Determine the approximate total number of feature points. The expected total number of feature points is set as 0.4% of the number of image pixels, with an upper limit of 5000, but, when the image is small, these parameters can be appropriately scaled to ensure that sufficient number of feature points are obtained.
(2) Determine the number of feature points for each scale space image as required for the SIFT algorithm. The initial feature points are local extreme points and 10% of which with the lowest contrasts are usually discarded. To ensure that the retained feature points can be evenly distributed, the number of feature points for each scale is pre-defined according to the scale coefficient, which can be calculated by SIFT [13]. Assuming that the total number of feature points is N, the number of l layers in the o octave is:
N o l = N × F o l o = 1 O N i = 1 L N F o l = 1
where F o l is the proportion value of l layer of o octave. F o l is negatively correlated with the scale coefficient S C o l . If we set f 0 = F 11 as the proportion parameter of the first layer of the first octave whose scale factor is S C 11 , then the proportion parameter of the l layer of o octave can be calculated by Equation (2). In addition, the sum of all the proportion values is 1, so F 0 can be computed, and the other proportion values are obtained.
F o l = S C 11 S C o l f 0
(3) Determine the number of feature points in each grid of each layer image. To ensure that the feature points of each layer image are uniformly distributed, each layer image is divided into virtual grids. For a grid cell G r i d i whose index is k of the l layer of o octave, the number of feature points N _ G r i d k retained in the certain grid is computed by the information entropy E k of the grid image, the number n k , and the average contrast C k of feature points located in the grid image.
N _ G r i d k = N o l [ W E E k i E k + W n n k i n k + ( 1 W E W n ) C k i C k ]
where W E and W n are the weight factors of entropy and feature number, respectively.
(4) Choose the feature points to be retained. Assuming that the total number of feature points is N c e l l , the detected feature points in each grid are ordered according to their contrasts, the feature points arranged in the top 3 × N c e l l are retained, and the remainder discarded. Then, the locations of the feature points are determined to sub-pixel accuracies, and the edge effects are removed according to standard SIFT algorithm. The local entropies of the retaining feature points are then computed, and N c e l l feature points with the highest entropies are retained.
(5) Based on the standard SIFT, the main intensity slope directions of the feature points are calculated, and the feature descriptors are constructed.
Figure 2 displays an example of the feature extraction comparison results of Standard SIFT and the UR-SIFT algorithm. The results reveal that UR-SIFT performs well with respect to the number and spatial distribution of extracted feature points.

2.2. Initial Matching

Initial matching was conducted using the minimum Euclidean distance between descriptors. The ration R , between the distance to the closest neighbor and that of the second-closest, was used to eliminate the wrong matches. We set R = 0.6 . For optical satellite images, many mismatches after constraint still existed, thus a cross-matching strategy was introduced. In cross matching, if p i P in the reference image is matched to q j Q in the input image, and q j Q is successfully matched to p i P conversely, then the point pair ( p i , q j ) is temporarily retained. The Random Sample Consensus (RANSAC) [28] algorithm was used to refine the obtained matches. The final refined matches were used to compute the transformation matrix H between the image pair. Since the terrain elevation variations were much smaller than the distance from the ground to the sensor, homography well describes the transformation between two satellite images. The homography relationship between a pair of corresponding satellite image feature points can be expressed by Equations (4) and (5).
{ x 2 = x 1 h 11 + y 1 h 12 + h 13 x 1 h 31 + y 1 h 32 + 1 y 2 = x 1 h 21 + y 1 h 22 + h 23 x 1 h 31 + y 1 h 32 + 1
{ x 1 = x 2 h 11 + y 2 h 12 + h 13 x 2 h 31 + y 2 h 32 + 1 y 1 = x 2 h 21 + y 2 h 22 + h 23 x 2 h 31 + y 2 h 32 + 1
where ( x 1 , y 1 ) is the coordinates of a point on the reference image, ( x 2 , y 2 ) is the coordinates of its corresponding point on the input image, h i j represents the parameters of homography matrix from the reference image to the input image, and h i j represents the parameters of the inverse of homography matrix.
To evaluate the calculated homography matrix, we used the cost function proposed in Navy et al. [29], expressed by Equation (6):
C ( h ) = 1 N ( i N   d ( h ( p i ) , p j ) + j N   d ( h 1 ( p j ) , p i ) )
where C is the cost function value, d is the distance function between two spatial points, N is number of matches, and ( p i , p j ) is a pair of matching points.

2.3. Propagation Matching

As the initial matches only occupy a small percentage of the whole feature points in multi-source optical satellite image matching, most of the feature points are wasted. Therefore, to fully use the remaining feature points, geometric correspondence matching and probability relaxation matching were used to complete the propagation process.

2.3.1. Geometric Correspondence Matching

Initial matching only used the local information of the feature points, but did not take advantage of the geometric relationship between the image pair. Thus, we first used the geometric constraint to select a few candidate points, and further used the Modified NCC (M-NCC) method to conduct feature matching. The traditional NCC method, based on rectangle windows, is not invariant to rotation and scale changes. However, rotation and scale changes often exist in multi-source optical satellite images. Thus, we modified the traditional NCC method by warping the correlation windows to enable robustness in the presence of such geometric distortions. Specifically, we opened a rectangular window in the reference image and projected it to the input image, using the homography obtained in initial matching, which is likely to be an irregular quadrangle. Then, we resampled the irregular quadrangle to the same size as the rectangular window using bilinear interpolation. Finally, we calculated the correlation coefficient for the rectangle windows according to the standard NCC. The specific steps of geometric correspondence matching are as follows:
(1) The feature points on the reference image, which were not successfully matched in the initial matching, were first selected out. Since the transformation matrix H was estimated in initial matching, the corresponding locations on the input image were predicted using Equation (4). As shown in Figure 3, P is a seed point and P is its corresponding point on the input image. Because H may contain some offset, the feature point on the input image, which is nearest to P , may not be the correct matching point, thus we chose to retain all the points whose distances to P were smaller than n pixels. If points { Q 1 ,   Q 2 ,   ,   Q n } existed on the input image, then we calculated their M-NCCs with P . After sampling the correlation window, the M-NCC value was calculated with Equation (7). If Q j had the highest M-NCC value with P , and the value was larger than the threshold r , then, in reverse, we calculated its corresponding point Q j on the reference image using Equation (5). If the distance between P and Q j was smaller than n , and their M-NCC value was largest among all the feature points on the reference image, whose distances to Q j were smaller than n , and the value was also larger than r , then P and Q j were initially identified as a pair of matching points.
ρ ( c , r ) = i = 1 w j = 1 s ( g ( i , j ) × g ( i + r , j + c ) ) 1 w × s ( i = 1 w j = 1 s g ( i , j ) ) ( i = 1 w j = 1 s g ( i + r ,   j + c ) ) [ i = 1 w j = 1 s g ( i , j ) 2 1 w × s ( i = 1 w j = 1 s g ( i , j ) ) 2 ] [ i = 1 w j = 1 s g ( i + r , j + c ) 2 1 w × s ( i = 1 w j = 1 s g ( i + r , j + c ) ) 2 ]
where w and s are the size of the matching window, g ( i , j ) is the gray value at point ( i , j ) on the reference image, and ρ ( c , r ) is the correlation coefficient.
(2) Mismatches elimination: We re-calculated the homography matrix H 1 by using the new matches and the initial matches. Then, the mismatches process was performed by using the global root mean square error (RMSE). If the RMSE was larger than r 1 pixels, we eliminated the points with largest error one by one until the RMSE was smaller than r 1 pixels. After that, the mean square error σ x and σ y in the horizontal and vertical directions were calculated. If a point’s horizontal error was larger than E x , or the vertical error was larger than E y , then the point was deleted.
(3) Taking the retained matches and H 1 as input, we repeated Steps 1 and 2 until the number of correct matches no longer changed. In general, the matches increased a little after several repetitions, but, given the time requirement involved, this paper used three as the maximum number of repetitions.

2.3.2. Probabilistic Relaxation Matching

Due to the complexity of the local texture structures of optical satellite images, homography cannot adequately express the transformation relationship between the images in some local areas, resulting in an offset of correct matches larger than n pixels. The M-NCCs for correct matches may be smaller than r , but we used strict values for n and r in geometric matching, resulting in the elimination of some correct matches. Additionally, the problem existed where the correctly matched point might have been close to the corresponding point but was not extracted in the feature extraction stage. Given these problems, we used the probability relaxation method [30,31] to match those feature points. Probability relaxation matching takes advantage of the constraints from surrounding matches to enhance robustness against geometric distortion, and uses the points around the predicted point as candidate points to increase the probability of finding the correct matching point, which greatly improves the matching success rate. The specific steps are as follows:
(1) Select the feature points which may be correctly matched, and finding their candidate matching points. If the feature points did not match successfully using geometric matching, but their transformation errors were less than n 1 pixels, then we considered that these feature points may be correctly matched. The larger the n 1 , the more correct matches obtained, but the time required to complete the matches was greater. Taking the 16 feature points around the corresponding point of a certain point P , their M-NCCs with P were calculated point by point. The candidate points were selected by setting a threshold r 2 . If the M-NCC value of one point was larger than r 1 , then the point was defined as one candidate matching point. For example, if the initial point was I i , I j represents its candidate matching point, t i is the number of candidate matching points, and ρ i j is their corresponding M-NCC. The initial probability that I j is the correct matching point of I i is:
P i j = ρ i j / k = 0 t i ρ i k   ; k = 0 , , t i
(2) Calculate the compatibility between the point pair to be matched and a neighboring match. P ( i , j ) is the probability of match I i I j , I k is one of the matched points located in the neighborhood of I i , and I j is the corresponding point of I i . To quantify the compatibility between the match I i I j and its neighbouring match I k I l , the following compatible coefficient function C ( i , j ; k , l ) was introduced:
C ( i , j ; k , l ) = T exp [ ( Δ p x 2 + Δ p y 2 ) / β ] Δ p x = ( x j x i ) ( x l x k ) ,   Δ p y = ( y j y i ) ( y l y k )
where Δ p x is the displacement of I i with its neighbouring point I k in x direction, and Δ p y is the displacement in y direction. The larger is the Δ p , the smaller is the compatibility. T and β are constant values.
(3) Calculate the match probability for each candidate point and find the correct matching point. The initial probability for each candidate was calculated with Equation (8). The probabilities P ( i , j ) are updated by the following rule:
P ( o + 1 ) ( i , j ) = P ( o ) ( i , j ) Q ( o ) ( i , j ) s = 1 t i P ( o ) ( i , s ) Q ( o ) ( i , s )
Q ( o ) ( i , j ) = I k Ω ( I i ) P ( o ) ( i , j ) C ( i , j ; k , l )
where Ω ( I i ) represents the matches around point I i , which are determined by Euclidean distances, and o is the iteration number. The function Q ( o ) (i, j) is the support that I i I j receives at the o th iteration from its neighboring match I k I l . The iteration process stops when one of the following two conditions is achieved: (a) the pre-defined number of iterations is reached, or (b) for all the candidate points, one of the match probabilities P ( i , j ) ( j = 1 , , t i   ) exceeds 1 , where 1 . The match with the highest probability, which is larger than 1 , is retained. If the pre-defined number has been reached and no probability meets the conditions, meaning that point I i is a false match.
(4) Mismatches elimination: Although probabilistic relaxation matching makes full use of the information of neighboring matches, it cannot ensure that all the new matches are correct, thus we needed to perform error elimination. Since we needed to consider the exceptions in local areas, RMSE could not be used, so we also used the cross matching method here. Taking one matched point on the input image as an initial point, and corresponding points around it on the reference image as candidate points, we repeated Steps 1–3. If the located point was the same as on the reference image, then we retained it; otherwise, we deleted it.

3. Experiments and Analysis

We compared EMMOSI with the standard UR-SIFT and the triangulation-based propagation method (TPM), which shares the same purpose as this paper. The concrete implementation process of TPM was as follows: we took the matching points from standard UR-SIFT as the input to create the initial triangulations, and then the centroids of each triangle in the reference were taken as the candidate points for the following matching. The reason for this was to improve the distribution uniformity of the propagation points. Finally, the method proposed in Geng [27] was used in the matching stage. All three methods were programmed by C++.

3.1. Description of Experimental Datasets

To evaluate the performance of EMMOSI, two series of satellite images from ZY3 and GF were used. ZY3 experimental data included the forward-view and nadir-view images of the ZY3-01 satellite, whose ground sampling distances (GSDs) are 3.5 m and 2.1 m, respectively, and the nadir-view, backward-view, and multi spectral images of the ZY3-02 satellite, whose GSDs are 2.1 m, 2.1 m, and 5.8 m, respectively. GF experimental images included one GF1 panchromatic image whose GSD is 2.0 m, and GF2 panchromatic and multi spectral images, whose GSDs are 0.8 m and 3.2 m, respectively.
All input image pairs were selected from different satellite sensors or different imaging years. ZY3 satellite images covered the mountain topography and complex terrain with monotonous and intense textures, and the GF satellite images covered building areas. Image pairs 1–5 are ZY3 images, and image pairs 6 and 7 are GF images. To comprehensively evaluate the performance of the proposed algorithm under different conditions, we used the following seven imaging condition combinations shown in Table 1:
Large illumination differences and scene changes existed. In addition, local geometric distortions and relief displacement existed in all image pairs. Their specific parameters are shown in Table 2, and the experimental images are displayed in Figure 4.

3.2. Setting of Parameters

3.2.1. Parameters in UR-SIFT Feature Extraction

The octave number of UR-SIFT is self-adaptive to the image size, and the number of layers was set as 3. The scale layer gridding was set to approximately 100 × 100 pixels, and the values of W e and W n were set to 0.2 and 0.5, respectively, according to Sedaghat et al. [17].

3.2.2. Parameters in Geometric Correspondence Matching

To find the proper value of n and r in the first step, different values were assigned to n , changing from 0.5 to 3 in increments of 0.5, while the different values assigned to r changed from 0.6 to 0.9 in increments of 0.1. Many experimental results show that when n = 1 and r = 0.8, the value of the cost function of the final homography is always within one pixel and the matching points obtained were considered as correct matches. In the second step, r 1 was set as one, which is reliable for eliminating the wrong matches [17]. E x and E y were set as 3 σ x and 3 σ y , respectively. The error was Gaussian distribution, and the accuracy of a correct match was around 0.3–0.4 pixels. If the horizontal or vertical error of a match was larger than 3 σ x and 3 σ y , it was probably a mismatch. This may have eliminated a few correct matches, but it ensured the high accuracy of the retained matches.

3.2.3. Parameters in Probabilistic Relaxation Matching

n 1 and r 2 were used to select candidate match points. The larger the n 1 and the smaller the r 2 , the more candidate match points were found and the more matches were obtained, which may have included some wrong matches. Many experiments revealed that, when n 1 = 2   and   r 2 = 0.7 , a good balance between the number and accuracy of the matches is achieved. T and β were set as 1000 and 10 according to Zheng et al. [31].

3.3. Evaluation Criteria and Implementation Details

The matching quality was evaluated from the following four factors: the number of correct matches, correct matching rate, distribution quality, and matching accuracy.
(1) The number of correct matches: For all the matches, if the residual error of one point was less than 1.2 pixels, it was taken as a correct match, otherwise it was defined as a false match [23].
(2) Correct matching ratio: Correct matching ratio was the proportion of correct matches compared to the total matches.
(3) Distribution quality: To evaluate the distribution quality of matched points, the statistical distribution quality factor S c a t proposed in Goncalves et al. [32] was used. The smaller is the S c a t , the more uniform is the distribution. S c a t evaluates the distribution of feature points by counting the distance between each match and other matches, and storing the intermediate value, which was used to evaluate distribution quality. If this value was larger than 95% of the mean of the image dimensions, good distribution was achieved.
(4) Matching accuracy: In general, matching accuracy assessment requires ground truth. For image matching, manually measured correspondences are always used as references. However, measuring large numbers of correspondences manually, and the measured accuracy, depends on the accuracy of human observation, which is subjective and may vary from person to person. Thus, we required more practical methods. In this paper, the matching accuracy assessment method, based on RMSE of adjustment, was used. Using the matches as tie points between images, we conducted free network adjustment for the image pair, and the RMSE of the adjustment was taken as the image matching accuracy. For satellite images, the imaging model used was the Rational Function Model (RFM). We performed free network adjustment based on RFM by using the matches, then each point had a residual error after adjustment, upon which we calculated the RMSE. Theoretically, RMSE consists of the following three parts: image matching error, imaging model error, and image internal geometric distortion. The image matching error is the chief component, whereas the other two are secondary components, which can be ignored for satellite images with good geometric quality. Thus, RMSE can objectively reflect the image matching accuracy, although the actual positional accuracy of matched points was better than RMSE.

3.4. Comparative Results and Analysis

Table 3 displays the number of extracted features, number of correct matches, correct matching rate, distribution quality measure S c a t , and matching accuracy measured by RMSE for UR-SIFT, TPM, and EMMOSI.
From the results shown in Table 3, despite the large number of extracted features, the standard UR-SIFT achieved few matches, because it only uses the feature descriptors, which are computed from a small local area. Although TPM can increase the number of correct matches, the increased rate is limited. In contrast, the proposed method significantly increased the number of matches with the aid of geometric correspondence between images and surrounding matched points, several orders of magnitude above the standard UR-SIFT. The matches of UR-SIFT were basically correct, and EMMOSI also had a high correct matching rate, which was higher than TPM in most cases.
Typically, the initial matches from UR-SIFT are unevenly distributed; they are mainly concentrated in regions with rich texture, whereas few matches exist in low texture areas. From the value of S c a t of TPM, TPM reduces rather than increases the uniformity of feature. The matching results of TPM are strongly dependent on the distribution of the initial feature points; the new inserted points were all in the regions of the initial triangulations. If the initial points are not well-distributed, TPM will not improve the uniformity of matches. Conversely, the value of S c a t of EMMOSI was much lower compared with standard UR-SIFT, indicating that a uniform distribution of the matches of EMMOSI was achieved.
The RMSE values reveal that the matching results from UR-SIFT are accurate. UR-SIFT matches can be reliably taken as initial points for TPM. The matching accuracies of TPM were lower than UR-SIFT because TPM mainly relies on accurate key points. If a few new inserted points are false, these wrong matches will seriously influence the final matching results. The result from the proposed method proves that EMMOSI can always achieve high accuracy, higher than that achieved by TPM. In addition to RMSE, Figure 5 shows the residual error histogram of the correct matches of EMMOSI for all the test image pairs. The residual errors were all within 1.2 pixels, and most of the values were within 0.8 pixel, which also proves the acceptable matching accuracy of the proposed algorithm.
Figure 6 displays the matching results of the first pair that represents the mountainous topography, on two ZY3-01 satellite images acquired in different seasons, so the land cover changed considerably. In Figure 6, the UR-SIFT matches have a good distribution, because the images have good texture in mountainous areas, but the matches are relatively sparse. TPM added a few matches in the first frame selection area, but did not add any match in the second frame selection compared to UR-SIFT. Initial triangulations are present in the second selection area, but the reason for no new matches is that the selected candidate feature points in these triangulations were not matched. However, regardless of the standard used to select the candidate matching points, the failure of matching some of the selected points will occur, and a few points that do not meet the specific criteria may match successfully. By using the geometry relationship between the image pair, EMMOSI found more matches than UR-SIFT in initial UR-SIFT feature points, and the probability relaxation matching propagated some new matches. As a result, EMMOSI achieved more evenly distributed matches than UR-SIFT and TPM, indicating that the proposed method can be applied to mountainous areas.
Figure 7 displays the matching result for the fourth image pair, which contains one forward-view image from the ZY3-01 satellite and one nadir-view image from the ZY3-02 satellite, covering complex terrain, and the two frame selection areas have less texture. In Figure 7, UR-SIFT failed to match in both selection areas, because the feature points are not distinct and cannot be matched when relying on Euclidean distance in textureless areas. TPM failed to match in the first selection area and only achieved a few matches in the second selection area. The reason for the matching failure for the first selection area is that there was no initial triangulation, thus no candidate feature points were selected for matching in that area. In contrast, EMMOSI found many matches in both selection areas because our method effectively uses UR-SIFT feature points by geometric correspondence matching, and increases some matched points located in the surrounding areas of the matched UR-SIFT points using probability relaxation matching. Overall, the matches of UR-SIFT and TPM were distributed unevenly. TPM only added a small number of new matches to UR-SIFT, and EMMOSIEMMOSI achieved many more matches than TPM and these matches were evenly distributed across the whole image, indicating that the proposed method can be applied to complex terrain, especially in textureless regions.
Figure 8 displays the matching result of the seventh image pair, including one GF1 panchromatic image and one GF2 multi spectral image, which cover an urban area. UR-SIFT and TPM matched few points in the selected areas, while EMMOSI found many matches. Overall, the EMMOSI had more matches than UR-SIFT and TPM. Additionally, the EMMOSI matches were evenly distributed throughout the image, indicating that the proposed method can be applied to urban areas.

4. Discussion

In this study, we present an effective method, EMMOSI, for multi-source optical satellite imagery matching, using seven image pairs with different imaging combinations to evaluate its performance. Two widely used state-of-the-art methods were compared and analyzed based on number of matches, correct matching rate, spatial distribution, and accuracy of matched points. As the experimental results indicated, EMMOSI is superior to UR-SIFT and TPM for number and distribution of correct matches.
Although UR-SIFT can independently extract adequate and well-distributed feature points regardless of the image texture being dense or sparse, it only achieves few matches by using the Euclidean distance between descriptors. These matches are too few to meet the requirements for image registration. TPM uses the UR-SIFT matches as seed points to build initial triangulations, finds new feature points in the initial triangulations, and matches these new points under the triangulation constraints. UR-SIFT matches are not always well-distributed and the newly found feature points are all within the initial triangulations, losing the well-distributed property of the initial UR-SIFT feature points, and adds extra feature extraction work. Then, the new matches are inserted into the triangulations, which are dynamically updated. However, we cannot ensure that all the new matches are correct, and if some false matches are inserted, the following building process of triangulations is considerably influenced, and may cause more false matches.
To acquire sufficient matches with good distribution, we first used the UR-SIFT operator to extract adequate and well-distributed feature points, laying a good foundation for the following feature matching stage. In initial matching, we obtained a few matches and the transformation matrix between the image pair. The value of the cost function of the transformation matrix revealed that the matrix was reliable. To make full use of the UR-SIFT feature points, we first conducted geometric correspondence matching basing on the transformation matrix, which uses the geometric relationship between the image pair, rather than local image information. This process constrains the corresponding points to a small searching area, so it can find more matches and these matches are well-distributed with high reliability. Since many matches were obtained, we could find some matches around the unmatched feature points from previous steps, which is the precondition of probabilistic relaxation matching. Probabilistic relaxation matching uses spatial distance constraints to conduct matching, which is robust for local image deformation.
To ensure a high accuracy, several constraints were implemented during the whole matching flow. In the initial matching process, a strict value for NNDR, cross matching, and RANSAC were used to ensure that the initial matches were correct. Then, in the geometric correspondence matching process, we used RMSE in global, x and y directions to eliminate the wrong matches, and, in the probabilistic relaxation matching stage, the searching area was limited to a small neighboring area, and cross matching was used to refine the matches.
Our experimental image pairs covered diverse terrain, including mountainous topography, complex terrain with intense and textureless regions, and urban areas. EMMOSI performed well for all topographies. In summary, EMMOSI is effective and stable for multi-source optical satellite imagery matching.

5. Conclusions

In this paper, EMMOSI, a multi-stage matching approach, is proposed for multi-source optical satellite imagery. The proposed algorithm includes the following steps: UR-SIFT feature extraction, initial matching, geometric correspondence matching, and probabilistic relaxation matching. The main contribution is the proposed matching propagation strategy. Geometric correspondence matching uses the geometric transformation information between the image pair, which finds the matches that satisfy the geometric transformation. Probabilistic relaxation matching uses spatial distance constraints from surrounding correct matches to find matches in local, distorted, and weak texture areas. Comprehensive experiments on images derived from different sensors with different viewing directions, geometric resolutions, spectral mode, and times proved that our method increased the number of correct matches and the accuracy and improved their spatial distribution.
The proposed method can be applied to a variety of remote sensing applications that require feature point matching, such as image registration and image mosaicking. In addition, the proposed propagation method can be added to other matching methods, which will broaden its application. For future work, a wider range of optical satellite images will be applied to evaluate the robustness of the proposed method, and we will try to improve the algorithm efficiency.

Acknowledgments

This paper was substantially supported by the National Key R&D Program of China (Grant No. 2017YFB0503004), the National Natural Science Foundation of China (Project Nos. 41301525 and 41571440), the Surveying and Mapping Public Welfare Project of China (No. 201512012), and the High Resolution Remote Sensing, surveying and mapping Application Demonstration System Research Program (Issue No. 1). We owe great appreciation to the anonymous reviewers for their critical, helpful and constructive comments and suggestions.

Author Contributions

Yuxuan Liu, Fan Mo and Pengjie Tao conceived and designed the experiments; Yuxuan Liu performed the experiments and analyzed the data; Fan Mo contributed experimental images; Pengjie Tao contributed analysis tools; and Yuxuan Liu and Pengjie Tao wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gruen, A. Development and status of image matching in photogrammetry. Photogramm. Record 2012, 27, 36–57. [Google Scholar] [CrossRef]
  2. Yan, L.; Roy, D.; Zhang, H.; Li, J.; Huang, H. An automated approach for sub-pixel registration of Landsat-8 operational land imager (OLI) and sentinel-2 multi spectral instrument (MSI) imagery. Remote Sens. 2016, 8, 520. [Google Scholar] [CrossRef]
  3. Gianinetto, M. Automatic co-registration of satellite time series. Photogramm. Record 2012, 27, 462–470. [Google Scholar] [CrossRef]
  4. Bulatov, D.; Wernerus, P.; Heipke, C. Multi-view dense matching supported by triangular meshes. ISPRS J. Photogramm. Remote Sens. 2011, 66, 907–918. [Google Scholar] [CrossRef]
  5. Du, P.; Liu, S.; Liu, P.; Tan, K.; Cheng, L. Sub-pixel change detection for urban land-cover analysis via multi-temporal remote sensing images. Geo-Spat. Inf. Sci. 2014, 17, 26–38. [Google Scholar] [CrossRef]
  6. Ghassemian, H. A review of remote sensing image fusion methods. Inf. Fusion 2016, 32, 75–89. [Google Scholar] [CrossRef]
  7. Stumpf, A.; Malet, J.P.; Delacourt, C. Correlation of satellite image time-series for the detection and monitoring of slow-moving landslides. Remote Sens. Environ. 2017, 189, 40–55. [Google Scholar] [CrossRef]
  8. Richter, R. Modelling and monitoring urban built environment via multi-source integrated and fused remote sensing data. Int. J. Image Data Fusion 2013, 4, 2–32. [Google Scholar]
  9. Deng, J.; Yang, Y. ZY-3-02 Camera Achieved Higher Stereo Imaging Accuracy. Aerosp. China 2016, 17, 62–63. [Google Scholar]
  10. Li, D.R. China’s first civilian three-line-array stereo mapping satellite: ZY-3. Acta Geod. Cartogr. Sin. 2012, 41, 317–322. [Google Scholar]
  11. Li, D.R.; Tong, Q.X.; Li, R.X.; Zhang, L.P. Current issues in high-resolution earth observation technology. Sci. China Earth Sci. 2012, 55, 1043–1051. [Google Scholar] [CrossRef]
  12. Scheffler, D.; Hollstein, A.; Diedrich, H.; Segl, K.; Hostert, P. AROSICS: An automated and robust open-source image co-registration software for multi-sensor satellite data. Remote Sens. 2017, 9, 676. [Google Scholar] [CrossRef]
  13. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  14. Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [Google Scholar] [CrossRef] [PubMed]
  15. Castillo-Carrión, S.; Guerrero-Ginel, J.E. Sift optimization and automation for matching images from multiple temporal sources. Int. J. Appl. Earth Obs. Geoinf. 2017, 57, 113–122. [Google Scholar] [CrossRef]
  16. Lingua, A.; Marenchino, D.; Nex, F. Performance analysis of the sift operator for automatic feature extraction and matching in photogrammetric applications. Sensors 2009, 9, 3745–3766. [Google Scholar] [CrossRef] [PubMed]
  17. Sedaghat, A.; Mokhtarzade, M.; Ebadi, H. Uniform robust scale-invariant feature matching for optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4516–4527. [Google Scholar] [CrossRef]
  18. Paul, S.; Pati, U.C. Remote sensing optical image registration using modified uniform robust sift. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1–5. [Google Scholar] [CrossRef]
  19. Sedaghat, A.; Ebadi, H. Distinctive order based self-similarity descriptor for multi-sensor remote sensing image matching. ISPRS J. Photogramm. Remote Sens. 2015, 108, 62–71. [Google Scholar] [CrossRef]
  20. Ye, Y.; Shan, J. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences. ISPRS J. Photogramm. Remote Sens. 2014, 90, 83–95. [Google Scholar] [CrossRef]
  21. Ye, Y.; Shan, J.; Xiong, J. A matching method combining SIFT and edge information for multi-source Remote Sensing Images. Geomat. Inf. Sci. Wuhan Univ. 2013, 10, 1148–1151. [Google Scholar]
  22. Gu, Y.; Ren, K.; Wang, P.; Gu, G. Polynomial fitting-based shape matching algorithm for multi-sensors remote sensing images. Infrared Phys. Technol. 2016, 76, 386–392. [Google Scholar] [CrossRef]
  23. Ling, X.; Zhang, Y.; Xiong, J.; Huang, X.; Chen, Z. An image matching algorithm integrating global SRTM and image segmentation for multi-source satellite imagery. Remote Sens. 2016, 8, 672. [Google Scholar] [CrossRef]
  24. Wu, B.; Zhang, Y.; Zhu, Q. Integrated point and edge matching on poor textural images constrained by self-adaptive triangulations. ISPRS J. Photogramm. Remote Sens. 2012, 68, 40–55. [Google Scholar] [CrossRef]
  25. Zhu, Q.; Wu, B.; Tian, Y. Propagation strategies for stereo image matching based on the dynamic triangle constraint. ISPRS J. Photogramm. Remote Sens. 2007, 62, 295–308. [Google Scholar] [CrossRef]
  26. Wu, B.; Zhang, Y.; Zhu, Q. A triangulation-based hierarchical image matching method for wide-baseline images. Photogramm. Eng. Remote Sens. 2011, 77, 695–708. [Google Scholar] [CrossRef]
  27. Geng, X. Research on photogrammetric processing for mars topographic mapping. Acta Geod. Cartogr. Sin. 2015, 44, 944. [Google Scholar]
  28. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  29. Navy, P.; Page, V.; Grandchamp, E.; Desachy, J. Matching two clusters of points extracted from satellite images. Pattern Recognit. Lett. 2006, 27, 268–274. [Google Scholar] [CrossRef]
  30. Deng, W.; Zou, H.; Guo, F.; Lei, L.; Zhou, S. Point-pattern matching based on point pair local topology and probabilistic relaxation labeling. Vis. Comput. 2016, 1–11. [Google Scholar] [CrossRef]
  31. Zheng, S.; Zhang, Z.; Zhang, J. Image relaxation matching based on feature points for DSM generation. Geo-Spat. Inf. Sci. 2004, 7, 243–248. [Google Scholar]
  32. Goncalves, H.; Goncalves, J.A.; Corte-Real, L. Measures for an objective evaluation of the geometric correction process quality. IEEE Geosci. Remote Sens. Lett. 2009, 6, 292–296. [Google Scholar] [CrossRef]
Figure 1. The framework of Effective Matching of Multi-source Optical Satellite Imagery (EMMOSI).
Figure 1. The framework of Effective Matching of Multi-source Optical Satellite Imagery (EMMOSI).
Remotesensing 09 01249 g001
Figure 2. Feature extraction: (a) ZY3-02 image: multispectral, 338 × 429 pixels; (b) scale invariant feature transform (SIFT) feature extraction result; and (c) uniform robust SFIT (UR-SIFT) feature extraction result.
Figure 2. Feature extraction: (a) ZY3-02 image: multispectral, 338 × 429 pixels; (b) scale invariant feature transform (SIFT) feature extraction result; and (c) uniform robust SFIT (UR-SIFT) feature extraction result.
Remotesensing 09 01249 g002
Figure 3. Geometric correspondence matching. P is a seed point, P is its predicted point calculated by Equation (1), Q j is the matched point, and the other points marked by a yellow circle are those points whose distance to P are within n pixels.
Figure 3. Geometric correspondence matching. P is a seed point, P is its predicted point calculated by Equation (1), Q j is the matched point, and the other points marked by a yellow circle are those points whose distance to P are within n pixels.
Remotesensing 09 01249 g003
Figure 4. Experimental satellite images. (ac): nadir-view image of ZY3-01; (d) forward-view image of ZY3-01; (e) nadir-view image of ZY3-02; (f) backward-view image of ZY3-02; (g) multi-spectral image of ZY3-02; (h) panchromatic image of GF1; (i) panchromatic image of GF2; and (j) multi-spectral image of GF2.
Figure 4. Experimental satellite images. (ac): nadir-view image of ZY3-01; (d) forward-view image of ZY3-01; (e) nadir-view image of ZY3-02; (f) backward-view image of ZY3-02; (g) multi-spectral image of ZY3-02; (h) panchromatic image of GF1; (i) panchromatic image of GF2; and (j) multi-spectral image of GF2.
Remotesensing 09 01249 g004aRemotesensing 09 01249 g004b
Figure 5. Residual error histogram of correct matches of EMMOSI in all test image pairs: (a) Nadir-view image of ZY3-01 and Nadir-view image of ZY3-01; (b) Nadir-view image of ZY3-01 and Nadir-view image of ZY3-02; (c) forward-view image of ZY3-01 and backward-view image of ZY3-02; (d) forward-view image of ZY3-01 and Nadir-view image of ZY3-02; (e) Nadir-view image of ZY3-01 and multi-spectral image of ZY3-02; (f) panchromatic image of GF1 and panchromatic image of GF2; and (g) panchromatic image of GF1 and multi-spectral image of GF2.
Figure 5. Residual error histogram of correct matches of EMMOSI in all test image pairs: (a) Nadir-view image of ZY3-01 and Nadir-view image of ZY3-01; (b) Nadir-view image of ZY3-01 and Nadir-view image of ZY3-02; (c) forward-view image of ZY3-01 and backward-view image of ZY3-02; (d) forward-view image of ZY3-01 and Nadir-view image of ZY3-02; (e) Nadir-view image of ZY3-01 and multi-spectral image of ZY3-02; (f) panchromatic image of GF1 and panchromatic image of GF2; and (g) panchromatic image of GF1 and multi-spectral image of GF2.
Remotesensing 09 01249 g005
Figure 6. Matching results for the first experimental image pair: (Left) Nadir-view image of ZY3-01; and (Right) Nadir-view image of ZY3-01.
Figure 6. Matching results for the first experimental image pair: (Left) Nadir-view image of ZY3-01; and (Right) Nadir-view image of ZY3-01.
Remotesensing 09 01249 g006
Figure 7. Matching results for the fourth experimental image pair: (Left) forward-view image of ZY3-01; and (Right) Nadir-view image of ZY3-02.
Figure 7. Matching results for the fourth experimental image pair: (Left) forward-view image of ZY3-01; and (Right) Nadir-view image of ZY3-02.
Remotesensing 09 01249 g007
Figure 8. Matching results for the seventh experimental image pair in an urban area: (Left) panchromatic image of GF1; and (Right) multi spectral image of ZY3-02.
Figure 8. Matching results for the seventh experimental image pair in an urban area: (Left) panchromatic image of GF1; and (Right) multi spectral image of ZY3-02.
Remotesensing 09 01249 g008
Table 1. Seven different imaging condition combinations.
Table 1. Seven different imaging condition combinations.
No.Different Remote SensorsDifferent Viewpoint CamerasDifferent ResolutionsDifferent Spectral ModesDifferent Acquisition Times
1
2
3
4
5
6
7
Table 2. Experimental image pairs.
Table 2. Experimental image pairs.
No.FigureSatelliteView View/Spectral ModeImage SizeGSD (m)Acquisition DateLocation
14aZY3-01Nadir-View802 × 7202.15 October 2013China-Guangzhou
4bZY3-01Nadir-View803 × 7182.121 January 2014
24cZY3-01Nadir-View932 ×10092.117 May 2016China-Fuxin
4eZY3-02Nadir-View932 ×10272.15 June 2016
34dZY3-01Forward-View600 × 6033.517 May 2016China-Fuxin
4fZY3-02Backward-View856 × 8283.55 June 2016
44dZY3-01Forward-View600 × 6033.517 May 2016China-Fuxin
4eZY3-02Nadir-View932 ×10272.15 June 2016
54eZY3-01Nadir-View932 ×10092.117 May 2016China-Fuxin
4gZY3-02Multi Spectral344 × 3745.85 June 2016
64hGF1Panchromatic344 × 3142.04 November 2016China-Guangzhou
4iGF2Panchromatic841 × 7660.830 April 2017
74hGF1Panchromatic344 × 3142.04 November 2016China-Guangzhou
4jGF2Multi Spectral212 × 1933.230 April 2017
Table 3. Experimental results of uniform robust scale invariant feature transform (UR-SIFT), triangulation-based propagation method, TPM and the proposed method, Effective Matching of Multi-Source Optical Satellite Imagery (EMMOSI).
Table 3. Experimental results of uniform robust scale invariant feature transform (UR-SIFT), triangulation-based propagation method, TPM and the proposed method, Effective Matching of Multi-Source Optical Satellite Imagery (EMMOSI).
No.AlgorithmExtracted FeaturesCorrect MatchesCorrect Matching Rate S c a t RMSE/Pixel
Reference ImageInput Image
1UR-SIFT46174630147100.0%0.11170.314
TPM4617463026898.5%0.03540.389
EMMOSI46174630887100.0%0.02570.328
2UR-SIFT65486373156100.0%0.04670.266
TPM6548637327297.1%0.05410.448
EMMOSI6548637369699.1%0.02360.282
3UR-SIFT5167613149100.0%0.66800.258
TPM516761317492.5%0.74300.399
EMMOSI5167613128997.0%0.54510.318
4UR-SIFT5167637360100.0%0.11290.225
TPM5167637310596.3%0.63890.395
EMMOSI5167637330297.4%0.08060.320
5UR-SIFT654820673694.7%0.67800.343
TPM654820674279.2%0.80620.363
EMMOSI654820678083.5%0.02820.365
6UR-SIFT1553641928100.0%0.40940.170
TPM1553641941100.0%0.49080.332
EMMOSI155361497787.5%0.34560.305
7UR-SIFT15536432696.3%0.09850.311
TPM15536433588.6%0.64430.358
EMMOSI15536435989.4%0.02060.304

Share and Cite

MDPI and ACS Style

Liu, Y.; Mo, F.; Tao, P. Matching Multi-Source Optical Satellite Imagery Exploiting a Multi-Stage Approach. Remote Sens. 2017, 9, 1249. https://doi.org/10.3390/rs9121249

AMA Style

Liu Y, Mo F, Tao P. Matching Multi-Source Optical Satellite Imagery Exploiting a Multi-Stage Approach. Remote Sensing. 2017; 9(12):1249. https://doi.org/10.3390/rs9121249

Chicago/Turabian Style

Liu, Yuxuan, Fan Mo, and Pengjie Tao. 2017. "Matching Multi-Source Optical Satellite Imagery Exploiting a Multi-Stage Approach" Remote Sensing 9, no. 12: 1249. https://doi.org/10.3390/rs9121249

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop