*4.1. Experimental Description*

The aim of the simulated experiments was to analyze the ability of SRIFT to resist di fferent image geometric distortions, including scale, rotation, and combined distortions.

#### 4.1.1. Simulated Dataset Construction and Evaluation

The simulated dataset experiments involved two sets of images, in which the ground truth had been geometrically corrected and manually checked, so that the positioning accuracy was better than one pixel, as shown in Figures 4 and 5. Figure 4a,b show the first case of a SAR and optical image pair from Shanghai, China, which were acquired by the GF-3 and GF-2 remote sensing satellites, respectively. Figure 5a,b show the second case of a SAR and optical image pair from Leshan, Sichuan province, China, which were acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, respectively.

(a) Sentinel-1 SAR image (b) Sentinel-2 optical image (c) Ground truth 

'

**Figure 5.** The original simulated data and the ground truth for the Sentinel-1 and Sentinel-2 images.

The experimental parameters were set as follows. The simulated datasets were generated by scale and rotation transforms with regard to the ground-truth images, where the images were resampled by 0.5, 0.3, and 0.25 times for the scale transforms and rotated by 10, 30, and 90 degrees, respectively. Therefore, the geometric distortion parameters of the images were known and could be used to calculate the accuracy of the registration results.

If the sensed image *I* can be regarded as an initial condition, and transform matrices of the simulated dataset can be denoted as *T*, then image *I* can be transformed into an image *I T*. When the image *I* has *m* keypoints *p*1, *p*2, ... , *pm* , the corresponding keypoints in the image *I T* are *pT* 1 , *pT* 2 , ... , *pT m* , and are regarded as the ground truth.

The transform matrices calculated after registration are appropriately denoted as *T*, and the root-mean-square error (RMSE) is used to evaluate the registration accuracy of the simulated data images. The higher the RMSE, the worse the accuracy. The RMSE is defined as:

$$RMSE = \sqrt{\frac{1}{m} \sum\_{i=1}^{m} \left( p\_i^{\overleftarrow{T}} - p\_i^{T} \right)^2} \tag{25}$$

where (*p* '*T i* − *pT i* ) is the residual error, which is calculated between the transformation parameters '*T* and the transformation parameters *T*. '*T* is solved by the corresponding points after registration, while *T* is given by the ground truth. The RMSE is measured in pixels. The keypoints in which the residual error is less than two pixels are regarded as correctly matched. The number of correctly matched (*NCM*) corresponding keypoints is an important evaluation metric for image matching.

#### 4.1.2. The Overall Performance Comparison

Qualitative evaluation: the overall performances for the corresponding points obtained by the proposed SRIFT method on the simulated datasets are shown in Figures 6 and 7, in which the best results of the state-of-the-art comparison methods for each distortion case are selected for display. As can be seen, the corresponding points obtained by SRIFT are abundant, evenly distributed, and accurately located, and can thus be used to calculate the image transformation model, and then register the image with the calculated model to obtain the registration result.

From the experimental results, it can be seen that, in the process of keypoint extraction, the SRIFT algorithm can extract points characterized by a uniform distribution and su fficient quantity, but after the feature matching and error elimination, the regions with rich shape and texture, and high structural uniqueness, retain more keypoints. In contrast, flat regions with less texture or regions with less structural uniqueness and more repeated textures have fewer corresponding keypoints. Compared with the registration methods based on intensity or gradient, the SRIFT algorithm is essentially a kind of structural descriptor. The description vector of each keypoint is the geometric statistics in an image patch of a specific size centered on this keypoint. In other words, a SRIFT vector describes all the structural information of an image block centered on the keypoint, with a specific sized region (the region sizes are described in Section 3.3.3) as the radius. When the structural information of two image blocks is highly similar, their center points are the registered keypoints.

Quantitative evaluation: Tables 1 and 2 list the RMSEs of the simulated data image registration results of the SRIFT method, as well as those of the other state-of-the-art image registration methods.

As can be seen from Tables 1 and 2, the registration accuracy of the SRIFT algorithm is consistently the highest. Compared with SIFT and ASIFT, the SRIFT algorithm has a better feature extraction and description ability for multimodal images. SAR-SIFT, PSO-SIFT, DLSS, HOPC, PCSD, and RIFT are specially designed for multimodal image registration. Compared with the template matching algorithms such as DLSS and HOPC, SRIFT can resist the various scale and rotation distortions. Compared with feature matching algorithms such as SAR-SIFT, PSO-SIFT, PCSD, and RIFT, SRIFT can overcome more complex image geometric distortions. For a more detailed analysis of the results of the state-of-the-art algorithms, see Section 4.2.3.

(a) 0.5 times scale by phase congruency-based descriptor (PCSD) (b) 0.5 times scale by SRIFT

**Figure 6.** *Cont*.

(g) 10° rotation by radiation-invariant feature transform (RIFT) (h) 10° rotation by SRIFT

(c) 0.3 times scale by PCSD (d) 0.3 times scale by SRIFT 

(e) 0.25 times scale by PCSD (f) 0.25 times scale by SRIFT

(i) 30° rotation by RIFT (j) 30° rotation by SRIFT

(k) 90° rotation by RIFT (l) 90° rotation by SRIFT 

**Figure 6.** The simulated experiment results for the GF-3 and GF-2 images.

(k) 90° rotation by RIFT (l) 90° rotation by SRIFT 

**Figure 7.** The simulated experiment results for the Sentinel-1 and Sentinel-2 images.


**Table 1.** Root-mean-square error (RMSE) comparison for simulated dataset 1 (GF-3 SAR image and GF-2 optical image).

**Table 2.** RMSE comparison for simulated dataset 2 (Sentinel-1 SAR image and Sentinel-2 optical image).


4.1.3. The Ability of the Algorithm to Resist the Scale and Rotation Distortion

To test the robustness of SRIFT to image scale and rotation distortion, simulated images were generated by resizing the image using a scale change factor from 1 to 10 with different intervals and rotating the image using a rotation change factor from 0 to 360 with an interval of 30. The matching performance of the proposed method with scale and rotation distortion is shown in Figure 8.

**Figure 8.** SRIFT matching performance with scale and rotation distortion.

As shown in Figure 8a, when the image scale difference is between one time and four times, the precision of the proposed method does not decrease significantly when the scale factor increases. The proposed method maintains a good performance when the scale factor is less than four times. We can, therefore, conclude that the proposed method is robust for scale difference. However, when the scale difference increases to more than five times, the correct matching point number plummets. From Figure 8b, for rotation distortion at any angle, the proposed SRIFT algorithm extracts relatively abundant corresponding points.

#### *4.2. Experiments with Real Images*

In the real-data experiments, the registration results obtained by the proposed method were compared to those obtained by eight state-of-the-art methods: SIFT, ASIFT, SAR-SIFT, PSO-SIFT, DLSS, HOPC, PCSD, and RIFT.
