*2.3. Feature Description*

After a set of keypoints are detected by the UMPC-Harris method, feature descriptors need to be designed for each keypoint to achieve image registration. The orientation and maximum amplitude index of PC have been proved suitable for describing the similar local features of multi-sensor images [34,37]. In this subsection, we first introduce the construction process of the multi-scale max index maps and the orientation map of phase congruency. Finally, the HOSMI descriptor is established to increase the distinction of the features.

#### 2.3.1. Multi-Scale Max Index Maps

Max index map is more suitable for multimodal image registration and is more robust to intensity radiation distortions compared to the gradient amplitude map. Furthermore, in remote sensing images, many salient features appear in different scales [38]. Inspired by this, multi-scale MIMs are formed by calculating the index of the maximum amplitude at each scale to improve the significance of descriptors. Figure 3 shows the construction process of the four-scale MIMs.

First, the input image *<sup>I</sup>*(*<sup>x</sup>*, *y*) is convoluted with LGF to obtain *s* × *o* PC amplitude maps in four scales and six orientations. Second, in the six amplitude maps of the same scale, we can find the maximum amplitude *maxAs*,*<sup>o</sup>*(*<sup>i</sup>*, *j*)*<sup>o</sup>*1 and the corresponding orientation *o*, where the superscripts and subscripts indicate orientations ranging from 1 to *o* for a pixel *p* with coordinates *As*,*<sup>o</sup>*(*<sup>i</sup>*, *j*). Third, the coordinate of the pixel *p* in MIM is represented by the corresponding orientation *o*. Thus, the MIM is an image with all elements from 1 to *o*. Finally, multi-scale MIMs are constructed by calculating the index of the maximum amplitude on four scales. Therefore, the information in the fine scale and coarse scale of the input image can be obtained, which can effectively enhance the saliency of features in the image.

**Figure 3.** Construction process of four-scale maximum index maps (MIMs).

#### 2.3.2. Orientation of Phase Congruency

The orientation of PC represents the important directions of feature variation, and it has been proved robust to nonlinear radiation distortions [32]. Therefore, similar to the gradient and gradient orientations in the SIFT algorithm, we need to find orientation information in addition to multiscale MIMs.

The PC is calculated by the odd and even symmetric wavelets of LGF, wherein the odd-symmetric wavelet is a smooth derivative filter, which can compute the image derivative in a single direction [34]. For the calculation of PC, the convolution results of the odd-symmetric are obtained according to six orientations. The six convolution results are projected into *x* and *y* axes, respectively, and the projections of the x and y axes are obtained. The orientation of the PC can be calculated by the arctangent functions defined as:

$$O\_{\mathbf{x}} = \sum\_{o} \left( o\_{s,o}(\theta\_o) \cos(\theta\_o) \right),\tag{16}$$

$$O\_{\mathcal{Y}} = \sum\_{o} (o\_{\mathbb{S},o}(\theta\_o) \sin(\theta\_o)),\tag{17}$$

$$O\_{\mathbb{M}} = \arctan(O\_{\mathbb{Y}}, O\_{\mathbb{x}}),\tag{18}$$

where θ*o* represents the angle corresponding to the orientation *o*, and *es*,*<sup>o</sup>*(<sup>θ</sup>*o*) is the convolution result of the odd-symmetric in angle θ*<sup>o</sup>*. Further, *Ox* and *Oy* are the sum of the projection of the convolution result in the *x* and *y* directions, respectively. The PC orientation <sup>O</sup>*pc* can be obtained by the arctangent function. Notably, the PC orientation is limited to 0◦ , 180◦ , which can handle gradient inversion in optical and SAR images. Figure 4 shows the calculation process of the PC orientation.

**Figure 4.** Calculation process of PC orientation.

#### 2.3.3. The Proposed HOSMI Feature Descriptor

The proposed HOSMI descriptor is constructed using the histograms of the PC orientation on the multi-scale MIM. Figure 5 presents the main processing chain of the proposed HOSMI descriptor.

**Figure 5.** Main processing chain of proposed HOSMI descriptor.

As shown in Figure 5, HOSMI is calculated based on a grid of patches, where local histograms of PC orientation are formed on each scale MIM. The main steps of the feature descriptor are listed below:

	- • To calculate the feature vector of a patch, PC orientation is formed using *no* bins covering the 180 degrees range of orientations. The sample added to the histogram is the element of the corresponding location on the MIM. To interpolate the peak position for better accuracy, a parabola is fitted to the three histogram values closest to each peak. The feature vector of patch *P* is calculated on four scales; therefore, the dimension of the feature vector of a patch is *s* × *no*. In Figure 5, we take the first patch *P*1 as an example. The scale used in the PC method is set to 4, and the feature vector of the patch is constructed as *P*1 = [*<sup>H</sup>*1, *H*2, *H*3, *<sup>H</sup>*4], where *H*1~*H*4 are the histograms of the four scales.
	- • To obtain the feature descriptor of a keypoint, the feature vectors of all patches are combined into one feature vector. The feature descriptor is normalized by the *L*2 norm to achieve better invariance to illumination and shadowing. The dimension of the feature descriptor of a keypoint is *s* × *no* × *np* × *np*. As shown in Figure 5, if the local region of a keypoint is divided into 4 × 4 patches, the feature descriptor is constructed by the 16 patches, as in *HOSMI* = [*<sup>P</sup>*1, *P*2, ···, *<sup>P</sup>*16].

A pair of corresponding points in the optical and SAR images are selected to construct descriptors. To verify the similarity of descriptors, we draw descriptors into stem images in Figure 6.

**Figure 6.** Comparison of similarity of feature vector between a pair of corresponding points in optical and SAR images. (**a**) Optical image; (**b**) Feature vector of the optical keypoint; (**c**) SAR image; (**d**) Feature vector of the SAR keypoint.

Figure 6 shows the HOSMI descriptors of a pair of keypoints between the optical and SAR images. This pair of optical and SAR images has a strong difference in intensity and in gradient inversion, and there is obvious scattering in the SAR image. These differences introduce grea<sup>t</sup> challenges to the robustness of the descriptors. The square region represents the local region (96 × 96 pixels) around the keypoint, which is used for computing the feature descriptor. As shown in Figure 6, the similarity of the feature vector is high and radiation changes have a low effect on the proposed descriptor.

#### **3. Experimental Results and Discussion**

In this section, we evaluate the performance of the feature detector on simulated images with different SAR noise levels and radiometric (non-uniform intensity) changes. Then, eight pairs of optical and SAR images are used to test the ROS-PC and analyze the experimental results. The registration performances are evaluated via objective and subjective approaches. One approach is to use the evaluation criteria, and the other is to use a chessboard mosaic image and enlarged submaps. Finally, experiments are conducted to evaluate the tolerance of rotation and scale changes from the ROS-PC method. All experiments were conducted with the MATLAB R2017b software on a computer with an Intel Core i5-7200U CPU and 16.0 GB memory.

#### *3.1. Performance Experiments of Proposed UMPC-Harris Detector*

We test the performance of the proposed UMPC-Harris detector on simulated images with different noise levels and radiometric (non-uniform intensity) changes. The UMPC-Harris is compared to three other state-of-the-art detectors, Harris, SAR-Harris, and m+M-Harris.

#### 3.1.1. Evaluation Criteria of Feature Detector

Repeatability rate: Given a pair of simulated optical and SAR images to be registered, the keypoints are detected on the two images. Further, two points are regarded as a pair of corresponding keypoints, only if their coordinates are satisfied:

$$\left\|\left\|p\_{\text{ss}}(\mathbf{x},\mathbf{y}) - p\_{\text{ss}}(\mathbf{x},\mathbf{y})\right\|\right\|\_{2} \leq T\_{\text{\textquotedblleft}}\tag{19}$$

where *pso*(*<sup>x</sup>*, *y*) and *pss*(*<sup>x</sup>*, *y*) denote the coordinates of the corresponding keypoints in the simulated optical and SAR images, respectively. The function · 2 denotes the Euclidean distance between points *pso*(*<sup>x</sup>*, *y*) to *pss*(*<sup>x</sup>*, *y*). *T* is the threshold of the Euclidean distance, which is set to 2 pixels in this experiment. The repeatability rate is defined as:

$$R\_{np} = \frac{2N\_{cor}}{n\_{s0} + n\_{ss}} \,'\tag{20}$$

where *Ncor* is the number of pairs of the corresponding keypoints, and *nso* and *nss* are the number of keypoints in the simulated optical and SAR images, respectively. The repeatability rate is a number between 0 and 1. The larger the repeatability rate, the better is the robustness of the detector.

3.1.2. Experimental Data and Parameter Settings of Feature Detector

a. Experimental Data

We used the high-resolution (HR) optical images from the official website of Changguang Satellite Technology Company as the experimental data. The resolution of these images is better than 1 m/pixel. We selected three images with 1000 × 1000 pixels, captured at the Kabul International Airport, Afghanistan, in June 2018; these images are named as Group 1 to 3, as shown in Figure 7.

**Figure 7.** High-resolution (HR) optical images. (**a**) Group 1; (**b**) Group 2; (**c**) Group 3.
