^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Automatic image registration (AIR) has been widely studied in the fields of medical imaging, computer vision, and remote sensing. In various cases, such as image fusion, high registration accuracy should be achieved to meet application requirements. For satellite images, the large image size and unstable positioning accuracy resulting from the limited manufacturing technology of charge-coupled device, focal plane distortion, and unrecorded spacecraft jitter lead to difficulty in obtaining agreeable corresponding points for registration using only area-based matching or feature-based matching. In this situation, a coarse-to-fine matching strategy integrating two types of algorithms is proven feasible and effective. In this paper, an AIR method for application to the fusion of ZY-1-02C satellite imagery is proposed. First, the images are geometrically corrected. Coarse matching, based on scale invariant feature transform, is performed for the subsampled corrected images, and a rough global estimation is made with the matching results. Harris feature points are then extracted, and the coordinates of the corresponding points are calculated according to the global estimation results. Precise matching is conducted, based on normalized cross correlation and least squares matching. As complex image distortion cannot be precisely estimated, a local estimation using the structure of triangulated irregular network is applied to eliminate the false matches. Finally, image resampling is conducted, based on local affine transformation, to achieve high-precision registration. Experiments with ZY-1-02C datasets demonstrate that the accuracy of the proposed method meets the requirements of fusion application, and its efficiency is also suitable for the commercial operation of the automatic satellite data process system.

Image registration is a fundamental image processing technique in the areas of medical imaging, computer vision, and remote sensing (RS). In the field of RS, automatic image registration (AIR) has been a widely studied problem for decades. AIR has been achieved in many different datasets, including multi-platform [

Image matching is a technique that identifies corresponding structures such as point, line, and surface through certain criteria. The algorithms of this technique can be broadly classified into two categories: area-based matching (ABM) and feature-based matching (FBM). ABM calculates a certain measurement using gray data in the fixed-size window from two images and treats the center points of windows as corresponding points when the measured value exceeds the threshold. The most commonly used algorithms are methods based on mutual information [

The second step of image registration is to estimate the image transformation model with the results of image matching. The affine and polynomial transformation models are commonly adopted for RS image registration. However, under certain situations, such as when the image size is extremely large or when complex image distortion exists, the aforementioned two models can make only a rough estimation of the transformation, but the fitting precision is insufficient and usually uneven on the image. To implement high-precision registration of images with significant distortions, some local fitting methods, such as B-spline functions [

The final process of registration, that is, image resampling, can be conducted using the estimated parameters of the transformation model to warp the input image to the reference image. To achieve sub-pixel accuracy, image interpolation methods, such as bilinear interpolation and cubic convolution interpolation, are usually employed for resampling [

ZY-1-02C satellite, launched on 22 December 2011, is equipped with a multispectral (MUX) camera (10 m resolution, including infrared, red, and green bands), a panchromatic (PAN) camera (5 m resolution), and a panchromatic high-resolution (HR) camera (2.36 m resolution). As China’s first satellite that is customized specifically for the land resource department, an automatic satellite data processing system (ASDPS) was established on land to ensure its commercial operation. In accordance with the ASDPS design, the PAN and MUX images captured from the same area are geometrically corrected first, and then image fusion processing is performed to obtain color infrared images with higher resolution. Theoretically, the geographical coordinates of the corrected images from different sensors should be extremely close, and the images can be fused directly. However, the positioning accuracy of different images is influenced by the limited manufacturing technology of the charge-coupled device, focal plane distortion, and unrecorded spacecraft jitter [

An AIR method applied to the fusion of ZY-1-02C satellite imagery is proposed in this paper. First, the images are geometrically corrected with equivalent ground sampling distance (GSD), a coarse matching based on SIFT for the subsampled corrected images is performed, and a rough global estimation is made with the matching results. Feature points are then extracted using a Harris detector, and the coordinates of their corresponding points are calculated according to the results of the global estimation. Precise matching is conducted based on normalized cross correlation (NCC) and least squares matching (LSM). To fit the complex image distortion, which cannot be precisely estimated, a local estimation using the structure of TIN is applied to eliminate the false matches. Finally, using the optimized TIN generated in the process of error elimination, image resampling is conducted based on local affine transformation to achieve high-precision registration. Experiments with ZY-1-02C datasets demonstrate that the accuracy of the proposed method meets the requirements of fusion application, and its efficiency is also suitable for the commercial operation of ASDPS.

The proposed approach can be divided into six steps and each step is specified in

Before the process of geometric correction, the long strip satellite images were segmented into scenes by ASDPS, and the rational polynomial coefficient parameters for each scene of the images were provided. Therefore, the rational function model (RFM) is adopted for geometric correction in our approach. As a result of the limited positioning accuracy of the satellite, the average elevation of the positioning range is used for image correction. SRTM-DEM [

In the RFM, image pixel coordinates are defined as the ratios of polynomials of ground coordinates [_{n}_{n}_{n}_{n}_{n}_{i}

To eliminate the scale differences between images, the equivalent GSD is adopted for geometric correction so that precise matching can be implemented smoothly. Even after geometric correction, a significant accuracy difference remains between images of ZY-1-02C (see analysis in Section 3.1). In this situation, direct NCC matching results in a dramatic increase in time consumption and matching error. Therefore, a coarse matching for the subsampled corrected images is performed in advance. With the results of rough global estimation through coarse matching, the searching range of precise matching is sharply reduced, and precision and efficiency are evidently improved.

In our related work, the coarse-to-fine matching strategy is generally applied using a multi-level image pyramid created by methods that can preserve more image information, such as the wavelet-based pyramid approach [_{sub}_{sub}

To avoid repeatable loading of the large image, interval sampling is conducted while saving the corrected images in the process of geometric correction. Aside from its use in coarse matching, the subsampled image can also be filed to ASDPS as a browse map of the corrected image.

SIFT matching is a highly successful FBM algorithm proposed by Lowe [

The subsampled MUX image has to be transformed to grayscale before SIFT matching. To minimize the information losses, principal component analysis (PCA), which was first proposed by Hotelling [

A number of corresponding points can be obtained via SIFT matching. Multiplying the coordinates of these points by the value of

The matching results inevitably contain false matches, which can be eliminated by comparing the residual error of each point pair and the root mean square error (RMSE). The overall accuracy of the coarse matching points is not extremely high. Thus, point pairs with its residual error more than two times of RMSE are removed. The remaining matching points are used to calculate a new RMSE, and the iteration continues until the residual errors of all the remaining point pairs are less than twice the RMSE.

Approximate parameters of image transformation are obtained through global estimation. With the parameters, the position of the corresponding points can be predicted, and the point should be searched out in a relatively small window around the position. Similar to the process of coarse matching, the full-size MUX image should be converted into grayscale through PCA before precise matching. In our approach, the process of precise matching is divided into two steps: feature extraction and matching based on NCC and LSM.

In the featureless and textureless area, determining a reliable matching result is difficult. This situation can be avoided by matching after feature extraction.

The Harris detector adopted in our approach is widely known for its high speed and stable result [

After extracting feature points from MUX images, the coordinates of the corresponding points on HR/PAN images can be estimated through affine transformation. Around the estimated position, a large searching window need not be set to find the corresponding points, which ensures the high efficiency and reliability of matching results when an NCC-based approach is used.

NCC matching can only achieve pixel-level precision. To achieve sub-pixel matching precision, the coordinates of corresponding points obtained by NCC are used as initial values to be refined using LSM. Proposed by Ackermann [

Numerous corresponding points can be obtained through precise image matching. False matches must be eliminated from the corresponding points before use in image registration. The complicated image distortion in PAN/HR and MUX images causes difficulty in employing a mathematical model for the direct and precise description of image transformation. The model cannot, therefore, serve as a criterion to eliminate false matches. However, influences of distortions can be minimized in most cases when the judging area is reduced to a relatively small one [

Using a TIN structure, the adjacency between points can be determined simply and efficiently. Wang

The divide-and-conquer approach proposed by Lee

Based on the preceding analysis, the proposed gross error elimination method is demonstrated as follows:

(1) A TIN is constructed using the coordinates of matching points, and the points in TIN are judged one by one in the following steps;

(2) Several nearest neighboring points around the current judging point are collected based on the TIN structure. The neighboring points are determined by an iterative method: first, all the points adjacently connected to the judging point are collected; then, more points that are adjacently connected to the collected points are found and gathered continually. In our approach, the recommended number of iteration times is 2, as shown in

(3) Based on the coordinates of the collected matching points, affine transformation parameters of the local distortion can be estimated;

(4) The residual error of the judging point is calculated using the affine parameters obtained previously. If the error is greater than a certain threshold (which is twice the RMSE in our approach), the judging point and its corresponding point are eliminated as a false match. Otherwise, we go to step (2) to judge the next point;

(5) After traversing all the points in the TIN, we return to step (1) to reconstruct a new TIN using the remaining points. The process continues until the residual errors of all points meet the requirements.

The accuracy of the geometrically corrected image of ZY-1-02C is influenced by numerous factors. Thus, transformation models, such as affine and geometric polynomials, cannot describe the distortions precisely. However, in the local areas of images, affine transformation is sufficient to estimate the distortion. Therefore, image resampling is implemented using local affine transformation based on TIN to achieve high precision.

An optimized TIN has been obtained in the process of error elimination. For every triangle in the optimized TIN, the affine transformation is calculated. Based on the parameters of the transformation, the gray data of the triangle in the input image can be rectified to the reference image. When all the triangles have been rectified, image registration between the input image and reference image is achieved. To achieve registration precision at the sub-pixel level, the bilinear interpolation method is used to resample the input image.

To evaluate the performance of the proposed approach, three datasets from ZY-1-02C were used for the experiments. Among these datasets, two consist of a scene of a PAN image and a scene of an MUX image, whereas the third set consists of a scene of an HR image and a scene of an MUX image. The first dataset is in Mohe, a county in the northern border of China, with a mainly mountainous terrain. The second dataset is in Hangzhou, an eastern city of China, where the main terrain is urban and partly mountainous. The third dataset is in Tokyo, Japan, which consists of the main part of the city and a large sea area. Details of the datasets are described in

Before image registration, geometric correction with equivalent GSD was applied to the images. Meanwhile, subsampled corrected images were generated. According to

To clarify the necessity of image coarse matching before precise matching, a number of checkpoints evenly distributed on the images were manually measured, and the coordinate differences between the input and reference images were analyzed. Generally, corresponding points can be determined by using the geographic information of the images. Therefore, we firstly obtained the geographic coordinates of the checkpoints, which are (_{ref}_{ref}_{input}_{input}_{geo}_{ref}_{input}_{geo}_{ref}_{input}_{geo}_{geo}

According to the proposed strategy, SIFT matching was conducted on the subsampled images, and a rough estimation of global affine transformation was made based on the matching results. _{input}_{input}

With the results of coarse matching, a large number of evenly distributed corresponding points can be acquired using the NCC-based matching method, and sub-pixel level matching accuracy can be achieved based on LSM.

Image registration was conducted by resampling the input image using local affine transformation based on TIN. In ASDPS, the images after registration are supposed to be fused by IHS algorithm [

All the algorithms mentioned in this paper were implemented using C++ language. To verify the efficiency of the proposed method, a desktop computer with Microsoft Windows 7 operating system was adopted for the experiments, and the main hardware environment consisted of a quad-core CPU with a speed of 3.20 GHz and 4 GB memory. The statistics show that three datasets respectively consumed 68.2 s, 83.2 s, and 561.8 s in the registration, which completely meet the time requirements of the commercial operation of ASDPS.

The comparative experiments were conducted by using ENVI 5.0. Similar to our approach, Förstner operator is chosen to extract feature points, and matching method of cross correlation is selected to obtain corresponding points. In ENVI, at least three pairs of seed tie points must be measured manually before the automatic generation of matching points, and the false matches should also be removed from matching points by human intervention.

For each set of data, registration tests based on three methods provided in ENVI were conducted, including affine transformation, quadratic polynomial, and triangulation. The registration and fusion results are shown in

As our method is fully automatic, the efficiency is significantly higher than the methods of ENVI. To examine the registration accuracy, RMSE of the registered images by different methods were compared, which is calculated by 50 pairs of checkpoints. More methods of accuracy assessment can be found in Reference [

As shown in

An AIR method with high precision and efficiency applied to fusion of ZY-1-02C imagery is proposed in this paper. A significant characteristic of the proposed method is the successful combination of the benefits from SIFT matching and NCC-based matching. As ASDPS is a commercial operation system with minimal human intervention, the proposed approach is a fully automatic procedure.

Our related studies have two representative approaches to realize a coarse-to-fine registration. One is to perform coarse matching on full-size images by means of FBM, such as SIFT or SURF, and perform precise matching on coarsely registered images by means of ABM [

To resolve the abovementioned issues, the corrected images were subsampled only once with a simple interval sampling method, and coarse matching based on SIFT was performed on the subsampled images. Relatively constant and suitable subsampled image sizes can be obtained with

The error elimination applied in this article differs from the approaches reported in most studies. Traditional error elimination is mostly based on global estimation. The false matches or outliers are excluded by estimation based on least squares [

The main purpose of this article is to obtain fused ZY-1-02C images in spite of significant difference in positioning accuracy among images captured from different sensors. Our method can also be applied to similar problems in satellite data sources. However, the proposed method also has limitations. First, geometric correction with equivalent GSD was adopted before registration. The correction requires RPC parameters for each scene of images. Thus, our method cannot be directly applied to images with rotation and scale differences without RPC parameters. Moreover, using a coarse-to-fine strategy to achieve registration is unnecessary for satellite images with very small difference in positioning accuracy; NCC-based matching can be directly performed, and the corresponding points can be easily searched on the basis of the geographic information of the images.

In the experiments, 50 evenly distributed point pairs were manually selected for each dataset to assess the registration accuracy. The sub-pixel level accuracies were achieved for all the datasets. Comparative analysis revealed that our method is better than methods in ENVI.

The registration error in this article is mainly produced by three factors. First, the error may be generated in image matching. Theoretically, LSM can achieve sub-pixel level precision; however, the accuracy of individual point pairs may remain low because of different influences such as noises. Second, the error may be caused by local affine estimation. In most cases, affine transformation based on TIN structure can sufficiently estimate the local-image distortion. However, in image areas where the distribution of matching points is relatively sparse, some triangle may cross a larger range of image than others would. The estimation based on affine may be insufficient under this condition. Third, the error may be caused by image resampling. Bilinear interpolation adopted in our approach may also produce errors.

In the proposed method, the registration accuracy is significantly determined by the results of precise matching. Parameters such as the size of the template and searching window and the threshold of correlation coefficient are determinants of the matching results. In our experiments, the searching window size (51 × 51 pixels) is determined according to the coarse matching results, and the template window size (13 × 13 pixels) and the correlation coefficient threshold (0.85) are set from experience. Whether there exists any parameter that could yield better results remains uncertain. Moreover, the maximum resolution discrepancy in our test datasets is about four times. We are also uncertain if the proposed approach can process images with higher differences in resolution.

Significant difference in positioning accuracy exists among images captured by different sensors of ZY-1-02C. A coarse-to-fine matching was applied to register images under this situation. Unlike other related works, instead of performing coarse matching directly on the full-size images or using a multi-level image pyramid to implement the coarse-to-fine matching, coarse matching based on scale invariant feature transform (SIFT) was performed for the subsampled images and a rough global estimation was made with the matching results. On the basis of the global estimation results, precise matching was conducted by means of normalized cross correlation and least squares matching. After eliminating the false matches through a local estimation by triangulated irregular network structure, image resampling was implemented based on local affine transformation to achieve registration.

The proposed method can achieve highly precise registration even when the sizes of the processed images are extremely large (maximum of 27,466 × 29,645 pixels). The experiments show that three test datasets consumed 68.2 s, 83.2 s, and 561.8 s in the registration with the accuracies of 0.37 pixels, 0.43 pixels, and 0.66 pixels, respectively. These findings suggest that our method is better than methods in ENVI. The accuracy and efficiency can fully meet application requirements of ZY-1-02C. Our method has been successfully applied at the China Center for Resources Satellite Data and Application, and the satellite data can be processed efficiently in a parallel computing environment.

However, it is important to mention that some further improvements are required. The adopted SIFT-matching approach can be replaced by a more efficient feature-based matching algorithm, such as speeded up robust features or Gradient location-orientation histogram [

The authors would like to thank the reviewers and editors whose constructive comments and suggestions substantially improved the paper. This research was supported by National Natural Science Foundation of China with Project Number 41301519 and National Key Basic Research and Development Program with Project Number 2012CB719904. We also acknowledge the help of China Centre for Resources Satellite Data and Application for providing experimental datasets.

The authors declare no conflict of interest.

Flowchart of the entire process.

Analysis of false match identification. (

Coordinate difference between observed values of checkpoints. For (

Results of SIFT matching on subsampled images. Blue crosses indicate estimated points and red crosses indicate excluded points. (

Coordinate differences between estimated and observed values. (

Results of precise matching. Only correct matches are indicated on the images. (

Registration and fusion results of Dataset 1. (

Registration and fusion results of Dataset 2. (

Registration and fusion results of Dataset 3. (

Registration and fusion results of Dataset 1 using ENVI methods. (

Registration and fusion results of Dataset 2 using ENVI methods. (

Registration and fusion results of Dataset 3 using methods of ENVI. (

Comparison of registration accuracy.

Overview of test datasets.

1 | Mohe | 000589 | 121.8/53.3 | 1 February 2012 | PAN/MUX | 5.0/10.0 |

2 | Hangzhou | 000847 | 120.1/30.3 | 12 February 2012 | PAN/MUX | 5.0/10.0 |

3 | Tokyo | 000918 | 139.9/35.7 | 24 February 2012 | HR/MUX | 2.36/10.0 |

Images after geometric correction.

| |||||
---|---|---|---|---|---|

1 | 5.0 | PAN | 15,119 × 14,318 | MUX | 15,136 × 14,505 |

2 | 5.0 | PAN | 14,534 × 13,882 | MUX | 14,528 × 13,926 |

3 | 2.5 | HR | 27,466 × 29,645 | MUX | 28,811 × 27,318 |

Statistics of coordinate differences between corresponding checkpoints.

Dataset 1 | Δ_{geo} |
195.34 | 255.67 | 226.36 |

Δ_{geo} |
343.17 | 502.30 | 427.01 | |

| ||||

Dataset 2 | Δ_{geo} |
303.42 | 307.53 | 304.89 |

Δ_{geo} |
882.04 | 927.03 | 906.54 | |

| ||||

Dataset 3 | Δ_{geo} |
235.05 | 255.15 | 242.09 |

Δ_{geo} |
115.35 | 167.09 | 137.10 |

Statistics of coordinate differences between estimated and observed values.

Dataset 1 | Δ_{affine} |
5.10 | 21.24 | 14.62 |

Δ_{affine} |
0.76 | 6.34 | 3.68 | |

| ||||

Dataset 2 | Δ_{affine} |
0.08 | 4.23 | 2.27 |

Δ_{affine} |
0.03 | 6.74 | 1.73 | |

| ||||

Dataset 3 | Δ_{affine} |
0.64 | 21.03 | 9.91 |

Δ_{affine} |
0.35 | 13.56 | 4.54 |

Results of registration accuracy.

ENVI | Affine | 12.59 | 2.07 | 5.95 |

Quadratic Polynomial | 9.51 | 1.75 | 5.26 | |

Triangulation | 2.61 | 1.12 | 1.73 | |

| ||||

Our method | 0.37 | 0.43 | 0.66 |