1. Introduction
With the continuous advancement of remote sensing technology, both domestically and internationally, the variety of remote sensing images has become increasingly di-verse. The optical image, containing a wealth of spectral information and distinctive structural features, is easy to understand. Meanwhile, the optical image is restricted by adverse weather conditions, such as fog, drizzle, and snow [
1]. While synthetic aperture radar (SAR) provides structural details of the reflective target according to the strong penetration ability, the primary influence for the application of SAR images is hindered by speckle noise [
2,
3]. To overcome the respective limitations of optical and SAR images, their fusion has become an important research direction in the field of remote sensing [
4]. The fusion of optical and SAR images aims to combine the advantages of both to achieve all-weather, high-precision remote sensing observations. Image fusion includes image preprocessing, image registration, fusion strategy selection, fusion result optimization, and evaluation. Overall, image registration plays a crucial role for the preprocess of the image fusion [
5,
6], which ensures that multiple images from different sensors at different times can be precisely aligned in spatial positions before fusion [
7,
8]. Recently, remote sensing image registration has been extensively researched and broadly categorized into area-based methods and feature-based methods [
9].
Area-based methods create a template in the sensed image and then apply multiple similarity measures to find the best correspondences in the reference image. The common similarity measures include mutual information [
10] and the normalized cross-correlation [
11]. Note that area-based methods are affected by the presence of extremes in the calculation of local features and a large amount of computation.
Compared with area-based methods, feature-based methods perform the registration by extracting image features, thus avoiding a large number of pixel-level computations [
12]. The feature-based approach consists of the following four main steps: feature detection, feature description, feature matching, and transform model estimation [
13]. The scale invariant feature transform (SIFT) algorithm is a classic feature-based approach that is widely applied for its ability to extract feature descriptors with scale, illumination, and rotation invariance from images [
14]. However, when the SIFT algorithm is applied to optical and SAR image registration, the SIFT algorithm still faces two main problems: one is the influence of the high noise levels in SAR images on the stability of the algorithm, and the other is the sensitivity to significant radiometric differences between optical and SAR images. To address these issues, researchers have proposed various extensions of SIFT algorithms, including SAR-SIFT [
15], PSO-SIFT [
16], and OS-SIFT [
17]. Furthermore, some structural feature-based algorithms are gradually applied to optical and SAR image registration due to their low sensitivity to nonlinear radiometric variations [
18]. Fan et al. [
19] proposed an innovative technique for optical and SAR image registration. The technique is based on the phase congruency structural descriptor (PCSD) and the nonlinear diffusion method, which significantly improves the registration accuracy between SAR and optical images by effectively reducing the interference of speckle noise and enhancing the robustness of the structure descriptor. Zhu et al. [
20] proposed a Gabor structure-based registration method. First, the geometric differences between images were eliminated through coarse registration, and then, a more accurate correspondence was obtained via fine registration. Li et al. [
21] proposed an image registration algorithm named radiation-variation insensitive feature transform (RIFT) to overcome nonlinear radiation distortions by exploiting the maximum index map and phase consistency. Xiong et al. [
22] defined a novel adjacent self-similarity (ASS) feature for efficiently capturing image structures and presented a robust algorithm for registering optical and SAR images by using the ASS feature. Although the above methods can solve the problems of geometric and nonlinear radiometric differences between optical and SAR images, as well as speckle noise in SAR images, to some extent, they still face challenges in the pursuit of high registration accuracy and sufficiently correct correspondences.
Besides classical registration methods, deep learning methods have shown significant advantages in the field of image registration. The advantages are mainly reflected in the extraction of high-level semantic features [
23]. These features can accurately capture the structural information of the images and are particularly suitable for SAR and optical image registration. Another major advantage of deep learning methods is their powerful learning and generalization capabilities. Through training on large-scale datasets, the model is able to learn the intrinsic laws and patterns of image matching and apply them to new image data. This makes deep learning methods more flexible and adaptable in dealing with complex and variable image registration tasks. Two-channel networks, Siamese networks, and pseudo-Siamese networks are common deep learning approaches [
24]. Wu et al. [
25] proposed a deep learning network based on U-net and Siamese networks for SAR and optical image registration. Pseudo-Siamese networks exhibited greater flexibility in feature extraction. Zhou et al. [
26] generated multiscale convolutional gradient features (MCGFs) by extracting multidirectional gradient features and constructing a shallow pseudo-Siamese network. Xiang et al. [
27] proposed an optical and SAR image registration method combining a pseudo-Siamese network and a residual denoising network, which outperformed the traditional registration algorithm in terms of both computational cost and matching accuracy. Li et al. [
28] proposed combining a dual-channel network and a multi-level key-point detector to improve the registration performance by suppressing the geometric and nonlinear radiometric variation between optical and SAR images. In optical and SAR image registration, the deep learning methods still face several challenges despite some progress. First, the obvious geometric and radiometric differences between optical and SAR images disable the traditional Siamese network in multimodal image matching. Although the pseudo-Siamese network is flexible, the distinct two-parameter design increases the risk of overfitting. Second, neural networks are highly dependent on data, and the quality and quantity of training data directly affect network performance. Therefore, the traditional image registration method is still the mainstream direction at this stage.
To improve the number of correct correspondences in image registration, this paper proposes an optical and SAR image registration algorithm based on the SIFT algorithm (OS-PSO). In addition, we construct a novel dataset consisting of Sentinel-1 SAR images and Sentinel-2 optical images in combination with the public WHU-OPT-SAR dataset to evaluate the performance of the proposed algorithm in different satellite images.
The scientific contributions of this paper are as follows:
We propose a novel optical and SAR image registration approach based on the OS-SIFT. A modified ratio of exponentially weighted averages (MROEWA) operator is designed for the gradient computation of SAR images, aiming to eliminate the problem of the sharp increase in the gradient magnitude at edge points due to sudden dark patches, so that the gradient distributions of SAR images are more consistent with the gradient distributions of optical images computed using the multiscale Sobel operator.
An enhanced matching method is introduced that defines a novel matching distance based on the scale, position, and main orientation of the key-points for more accurate correspondences.
We construct an optical and SAR image dataset named BISTU-OPT-SAR and validate the different performances of the OS-PSO algorithm in Sentinel and Gaofen images in combination with the public WHU-OPT-SAR dataset [
29]. Furthermore, the performance of the OS-PSO algorithm in different scenarios, such as urban, suburban, river, farmland, and lake, is thoroughly discussed and analyzed.
4. Experimental Results and Discussion
This section utilizes two datasets, BISTU-OPT-SAR and WHU-OPT-SAR [
29], to evaluate the full effectiveness of the proposed OS-PSO algorithm. First, we introduce the characteristics of these two experimental datasets, the parameter settings, and evaluation criteria of the experiments. Then, we evaluate the performance of the OS-PSO algorithm in terms of key-point detection and enhanced matching, aiming to find the parameter configurations with the optimal registration results. Next, the robustness of the OS-PSO algorithm in the face of geometric differences is discussed. Finally, the adaptability of the OS-PSO algorithm and existing algorithms in various complex scenarios is compared and analyzed. All experiments are performed on a computer with an AMD R7-6800H processor and 16.0 GB of RAM using MATLAB R2018a software.
4.1. Description of the Datasets and Parameter Settings
To fully evaluate the proposed OS-PSO algorithm, two datasets are selected for testing: the self-made BISTU-OPT-SAR dataset and the public WHU-OPT-SAR dataset [
29], The details of the dataset are shown in
Table 1.
The BISTU-OPT-SAR dataset covers parts of the Tianjin and Anhui Province in China, and its data include optical images from the Sentinel-2 satellite and SAR images from the Sentinel-1 satellite. Before testing, the optical and SAR images of the BISTU-OPT-SAR dataset undergo the necessary preprocessing to ensure the accuracy and validity of the following registration work. For Sentinel-2 optical images, we utilize the Sen2cor tool to conduct radiometric calibration and atmospheric correction to remove aberrations in the images caused by light and the atmosphere. For Sentinel-1 SAR images, we use a more complex preprocessing procedure, which mainly includes orbit correction [
35], thermal noise removal [
36], speckle filtering [
37], radiometric calibration [
38], and geocoding [
39], and all the steps are performed in the SNAP software. Orbital correction removes the effect of orbital errors on the SAR image by accurately measuring and correcting the orbital parameters. Thermal noise removal is used to calibrate the backscatter interference caused by thermal noise in the input SAR data. Speckle filtering is specifically used to reduce speckle noise in the image caused by the coherence of the SAR system. Radiometric calibration is the process of converting the digital values of raw remote sensing images into physical quantities, ensuring that data from different sensors have a consistent scale for the comparison and analysis. Geocoding is the process of converting the pixel information in the SAR images to the actual geographic coordinate system to achieve an accurate match between the SAR images and the real geographic space. After preprocessing, both optical and SAR images are in the WGS84 geographic coordinate system. The image size of Tianjin in the BISTU-OPT-SAR dataset is 1024 × 1024 pixels, and the image size of Anhui Province is 850 × 850 pixels. To ensure consistency between optical and SAR images, the resolution of the optical images is also resampled to 10 m to achieve pixel-by-pixel correspondence with the SAR images.
The WHU-OPT-SAR dataset contains optical images taken by the GF-1 satellite and SAR images captured by the GF-3 satellite, both of which are in the WGS84 geographic coordinate system, with a resolution of 5 m after pre-processing. Considering the efficiency of data processing and analysis, the WHU-OPT-SAR dataset is divided into several smaller image patches, each with a size of 800 × 800 pixels.
All parameters for our proposed OS-PSO algorithm are set following the instructions of the OS-SIFT algorithm and PSO-SIFT algorithm. For Harris scale spaces, we set the scale of the first layer to 2 and the constant between adjacent scales to . The number of scales is set to 8, and the arbitrary parameter is 0.04. The setting of the parameter in the MROEWA operator is obtained on the basis of analyses and adjustments after experiments on several images. By setting to 0.95, it ensures that a sufficient number of correspondences are preserved to lay the foundation for further fine processing. To obtain more precise correspondences, the ratio threshold of the re-matching () is optimized during the enhanced matching process.
4.2. Evaluation Criteria
This paper adopts a dual evaluation system combining subjective and objective methods. The subjective method consists of observing matching images, checkerboard mosaic images, and enlarged sub-images to check the positional information of the matching points and the details of image registration. The objective method uses evaluation criteria, such as the correct match number (CMN), correct match rate (CMR), and root mean square error (RMSE). The registration is judged to have failed if the CMN is less than 4.
The CMR is defined as follows:
where
represents the number of correctly matched key-points and
represents the number of key-points screened using the NNDR method.
The RMSE reflects the accuracy of the algorithm’s registration and is defined as follows:
where
represents the total number of matching key-points,
and
mean the i-th matching key-point pairs.
is the actual output of the reference images, and
is the predicted output of the sensed images, respectively.
4.3. Experiments on Key-Point Detection
In this section, we use the repeatability rate [
40,
41] as an indicator to evaluate the performance of the proposed detector. Specifically, any two key-points
and
of a given pair for the reference and sensed image are judged as repeatable key-points if they satisfy the following:
The localization error () is set from 1 to 5 in the proposed OS-PSO algorithm. The repeatability rate is determined by the ratio between the number of repeatable key-points and the total number of key-points. As the repeatability rate increases, key-point detection becomes stable.
The proposed Harris detector based on the MROEWA operator suppresses the potential influence of sudden dark patches based on key-point detection. The in-depth experimental analysis of the linear processing parameters in the MROEWA operator is carried out. Our goal is to find a linear processing parameter that makes our proposed detector more stable through continuous adjustments and optimization.
We calculate the average repeatability rate of ten image pairs for the WHU-OPT-SAR dataset and the BISTU-OPT-SAR dataset with different localization errors (
), respectively, and the experimental results are displayed in
Figure 4. It is observed that the proposed method is more outstanding in detection performance compared with the SAR-Harris (
set to 0) detector, and the adjustment of the
has a stronger effect on the WHU-OPT-SAR dataset compared to the BISTU-OPT-SAR dataset. When
= 0.5, our proposed method reaches the highest level of key-point repeatability based on both datasets. The average repeatability rate of the WHU-OPT-SAR dataset is close to 50% when the localization error
is set to 5.
4.4. Experiments on Enhanced Matching
Our experiments in this part aim to verify the performance of the enhanced matching method and to explore the impact of the ratio threshold (
) for re-matching on the registration results. The first step is to compare the enhanced matching method with the classical RANSAC method, focusing on evaluating its advantages in improving the number of correct correspondences. We select ten image pairs and calculate the metrics, such as the average CMN, average CMR, and average RMSE for different matching methods.
Figure 5 illustrates the performance of different feature matching methods. It displays that the enhanced matching algorithm outperforms the RANSAC method in CMN, RMSE, and CMR. Further analysis reveals the superior performance of the enhanced matching method when
is set to 0.89.
4.5. Discussion on the Robustness of Geometric Differences
In this section, considering the WHU-OPT-SAR dataset and the BISTU-OPT-SAR dataset with no geometric differences between the optical images and the SAR images, we additionally select four image pairs with different geometric differences for the experiment. Among them, P1 exhibits slight scale differences and obvious rotation differences, P2 has obvious scale differences but no rotation differences, P3 has no obvious differences in both scale and rotation, while P4 has obvious translation and rotation differences. Specific information is provided in
Table 2. For different image pairs, the parameters for re-matching need to be fine-tuned to obtain similar RMSE values. The experimental results are shown in
Figure 6.
The experimental results demonstrate that the proposed OS-PSO algorithm successfully matches four pairs of optical and SAR images, demonstrating its strong adaptability to diverse geometric differences. Specifically, most of the correctly corresponding points for P1 are located in non-vegetated areas. P2 exhibits significant scale differences, resulting in the lowest number of correct correspondences among the four image pairs, which are primarily concentrated in areas with dense buildings. From the enlarged sub-images of
Figure 6b, it can be observed that the errors are significantly smaller around the buildings compared to the road areas. This phenomenon is attributed to the shading effect in the building areas, which reduces the gray scale difference between the optical and SAR images in these areas. P3 has a sufficient number of correct correspondences and is evenly distributed because of its unclear geometric differences and regular structure. Despite the obvious geometric differences in P4, the OS-PSO algorithm is still able to achieve excellent registration results because of the minimal gray-scale differences. Especially in areas with significant features, such as airports, the algorithm demonstrates high accuracy. The precise alignment effect can be observed in the enlarged sub-images of
Figure 6d. The overall outcomes show that the OS-PSO algorithm exhibits superior robustness in handling image registration with different types of geometric differences.
4.6. Comparative Experiments in Different Scenarios
In this section, we select six image pairs from the BISTU-OPT-SAR dataset and the WHU-OPT-SAR dataset to evaluate the performance of our proposed OS-PSO algorithm. These image pairs cover different scenarios, including urban, suburban, river, farmland and lake, as shown in
Figure 7. Pairs 1, 2, 3, and 4 are all taken from the BISTU-OPT-SAR dataset. Pairs 1 and 2 show the urban and suburban areas of Tianjin, respectively. Pair 1 contains buildings of different structures with rich structural features, which are more seriously affected by noise and sudden dark patches. Pair 2 consists mainly of vegetation interspersed with a river, which is slightly disturbed by noise. Pairs 3 and 4 are from Anhui Province and show mainly river images; both of them are affected by noise and sudden dark patches to varying degrees. Pair 4 has a significantly gray-scale difference between the optical and SAR images, with the color of the water being lighter in the optical image. Pairs 5 and 6 describe the farmland area and lake area in Hubei Province from the WHU-OPT-SAR dataset. Pair 5 depicts a crisscrossed terraced landscape. Here, the area is flat and has a regular land-use pattern. Pair 6 presents a large lake and scattered villages around it, with a relatively homogenous structure.
Since the proposed OS-PSO algorithm is a feature-based technique, we chose five sophisticated feature-based algorithms for the comparison. SAR-SIFT is an enhanced SIFT algorithm specifically tailored for SAR images. By introducing a new gradient definition, it generates robust directions and magnitudes that are resilient to speckle noise, making it more suitable for feature extraction in SAR images. PSO-SIFT optimizes the computation of image gradients to enhance robustness to intensity differences and introduces enhanced matching methods to increase the number of correct correspondences. OS-SIFT combines the multiscale Sobel operator and the multiscale ROEWA operator to compute the optical and SAR image gradients. RIFT is a novel optical and SAR image registration algorithm that employs phase consistency for key-point detection and maximum index maps for descriptor construction, with rotation and radiation invariance. OSS is an image registration algorithm based on self-similarity features that is robust to radiometric differences between optical and SAR images. In addition, we also design two algorithms, MROEWA+FSC and OS-SIFT+PSO, to further analyze the specific impact of our proposed MROEWA operator and enhanced matching method on the experiments.
Table 3 presents the registration results of the proposed OS-PSO algorithm and the seven comparison methods in different scenarios. These results demonstrate that the OS-PSO algorithm not only successfully achieves image registration for all types of complex scenarios but also outperforms the other algorithms in terms of the number of correct correspondences and the registration accuracy. The RMSE values of the OS-PSO algorithm based on the six pairs of test images are maintained at about 0.7. This excellent performance is attributed to the OS-PSO algorithm’s use of the MROEWA operator and the enhanced matching method. The MROEWA operator is able to suppress the effect of sudden dark patches in the SAR image, thus generating more consistent gradients between the optical and SAR images. The enhanced matching method obtains more correct correspondences by two matching, which can be seen from the comparison of the MROEWA+FSC with the OS-PSO algorithm. In contrast, the PSO-SIFT, SAR-SIFT, and OS-SIFT algorithms fail to register in certain scenarios due to their sensitivity to speckle noise and radiometric differences. The OSS and RIFT algorithms, although successfully achieving registration, are obviously inferior to our proposed OS-PSO algorithm in terms of registration performance, as evidenced by the lower CMN values and higher RMSE values. In addition, these two methods also suffer from the uneven distribution of correct correspondence points, increasing the risk of local mismatch, which is also present in the PSO-SIFT, SAR-SIFT, and OS-SIFT algorithms.
Figure 8 and
Figure 9 display the matching images and checkerboard mosaic images of the proposed OS-PSO algorithm based on six pairs of test images. For Pair 1, the PSO-SIFT algorithm fails to achieve the registration because of the interference of severe noise and sudden dark patches, while the other algorithms succeed in the registration task with excellent noise immunity. It is notable that most of the correctly corresponding points are concentrated in areas with dense buildings and around rivers.
Figure 9a also demonstrates that these areas are more accurately aligned. Of these, the OS-PSO algorithm demonstrates the most optimal performance in terms of RMSE and CMN, which is attributed to the high repeatability of the key-points. For Pair 2, neither the PSO-SIFT nor SAR-SIFT algorithms are able to detect the correct correspondences due to the unclear structure of the large-area vegetation in the SAR image, thus failing to register the two images. Although the other algorithms succeed in the registration, there are only a few correctly corresponding points in the vegetated areas. A slight deviation of the vegetation area can be seen in
Figure 9b. Meanwhile, the CMN values of all methods are lower than Pair 1 except OS-SIFT+PSO.
Pairs 3 and 4 are affected by noise and sudden dark patches to some extent, and the edge structures of the land areas are blurred. In this case, the OS-SIFT algorithm fails to achieve the registration of optical and SAR images because of false correspondences. In Pair 4, there are also obvious gray-scale differences between optical and SAR images, especially in the water area, which pose a great challenge to image registration. Nevertheless, the OS-PSO algorithm successfully registers the optical and SAR images due to the stability of the key-point detector; however, the RMSE value is the largest and the registration accuracy is the lowest among all the image pairs. It is worth noting that most of the correctly corresponding points are located at the land–water boundary, and
Figure 9d also demonstrates that more accurate alignments are obtained at the water–land boundary. This is because the land–water boundary, as the intersection of two different textures or gray-scale areas, contains rich edge information that facilitates the detection of key-points.
For Pair 5, the farmland with a regular pattern shows obvious edge and texture features in the remote sensing images, and there is no explicit gray-scale difference between the optical the SAR images, which provide favorable conditions for accurate identification and matching of the corresponding key-points. Therefore, all algorithms successfully achieve the registration. The OS-PSO algorithm identifies and matches 140 correct correspondences, significantly outperforming all the other scenarios. Moreover, the correctly corresponding points of Pair 5 have the most uniform distribution among all image pairs, as shown in
Figure 8h, and there is basically no local mismatch. For Pair 6, the lake area presents smooth, single color and texture features, reducing the identifiability of the key-points. It is difficult to capture sufficient and stable key-points during the registration process. As a result, the OS-PSO algorithm has the lowest CMN value of 73 compared to the other five image pairs.
5. Conclusions
Aiming at the problem of insufficient correct correspondences caused by different imaging principles between optical and SAR images, this paper proposes a remote sensing image registration algorithm named OS-PSO on the basis of OS-SIFT. First, we design a modified gradient operator based on the ROEWA operator, which provides the uniform linear processing of the local filtering results of SAR images, effectively avoiding the abnormal increase in the gradient magnitude at the edge points of SAR images due to sudden dark patches and thus maintaining the consistent gradient with optical images. Secondly, we establish the Harris scale spaces for optical and SAR images, respectively, and obtain the stable key-points by detecting the local maxima, followed by the descriptor construction utilizing the GLOH method. Then, we introduce an enhanced matching method incorporating the scale, position, and main orientation information of the key-points for re-matching, to increase the number of correspondences. The proposed OS-PSO algorithm significantly improves the number of correct correspondences and the registration accuracy compared with the state-of-the-art algorithms and is applicable to all complex scenarios.