*3.1. Accuracy of Image Registration*

The accuracy of image registration can be measured by RMSE of registration points, which is defined by

$$RMSE(f) = \sqrt{\frac{1}{k} \sum\_{i=1}^{k} \left\| f(x\_i) - x\_i' \right\|^2} \tag{19}$$

where *xi* denotes the coordinates of grid points *GA* in image *IA*, and *x <sup>i</sup>* denotes the coordinates corresponding to *xi* in image *IB*; *f* represents different image registration models, and the proposed algorithm and APAP algorithm use the local homography matrix, while the other algorithms use the global homography matrix as their image registration model; *f*(*xi*) denotes the coordinates transformed from *xi* by using the image registration model *f*, which is the estimation of *x <sup>i</sup>*; *k* is the total number of matching points in the pair of images, and it is set to 25 in the experiments.

Table 1 shows the average RMSE of registration points achieved by several different image registration algorithms when implemented on the test set generated by the proposed method. To better present the performance of learning-based image registration algorithms, Table 1 gives in detail the registration accuracy of several deep learning-based image registration algorithms using VGG, Googlenet and Xception neural networks, respectively.


**Table 1.** RMSE comparison of different image registration algorithms.

From Table 1, it can be seen that the accuracy of the pixel-based ECC image registration algorithm is the lowest, and that of the feature-based SIFT image registration algorithm is higher. The APAP algorithm takes into account the locality of image registration, so it achieves the best result among the pixel-based and feature-based algorithms. The performance of the learning-based image registration algorithms is related to the used CNN models, and more advanced CNN models have higher image registration accuracy. The samples used by the DeTone's algorithm and Nguyen's algorithm are relatively simple, so there is little difference in the accuracy of image registration under different neural networks. These two algorithms do not fully consider the locality of image registration, resulting in low accuracy of image registration. Compared with other algorithms, the proposed algorithm achieves the highest image registration accuracy by using the Xception network model. In addition, from Table 1, it is seen that the effect of the proposed algorithm under Xception network is better than that under Googlenet and VGG networks. This is because the samples and labels used in the proposed algorithm are more complex, and there are obvious differences under different neural networks. When combined with more advanced CNN models, the proposed algorithm can achieve higher accuracy of image registration.

#### *3.2. Running Time*

To compare the calculation complexity of different image registration algorithms, Table 2 shows the average running time of each algorithm running for 10 times, where all algorithms are implemented under a computer with Intel i7-6700 CPU, 32 GB memory and one NVIDIA GTX 1080 Ti GPU. It is seen that APAP algorithm runs slowest due to the use of the local homography matrix and ORB algorithm runs fastest among the traditional image registration algorithms. For learning-based image registration algorithms, Table 2 gives the running time when the algorithms are accelerated with one GPU, as well as the running time achieved without the GPU. It is seen that GPU can significantly speed up the learning-based algorithms. The running speed of GPU is much faster than that of CPU, and different neural network models achieve different running speeds, among which Xception runs the slowest and Googlenet runs the fastest. Because the DeTone's and Nguyen's algorithms are only different in loss function and the neural network model is basically the same, the running time of the two algorithms are the same under the same conditions. The proposed algorithm involves the estimation of local homography matrices, so it runs slower than DeTone's and Nguyen's algorithms under the same neural network.


**Table 2.** Running time comparison of different image registration algorithms.

#### *3.3. Robustness to Illumination, Color and Brightness*

In order to compare the robustness of different image registration algorithms to illumination, color, and brightness, the test set in the experiments is augmented, and the used image augmentation method is the same as that of the training set. After image augmentation, the registration accuracy and failure rate of each algorithm are compared. We only randomly augmented some of the images in the test set, but not all of them. The higher the number of augmented images is, the higher the image augmentation degree of the test set is, and the test set has more diversity in illumination, color and brightness. The image augmentation degree can be represented by the probability of an image being augmented in the test set. The test set used in this experiment contains 5000 pairs of test images. Each algorithm runs 10 times repeatedly, during which the image augmentation is randomly implemented at a pre-specified image augmentation degree, and the average result of the 10 runs is taken as the final result of this algorithm with respect to the pre-specified image augmentation degree. Therefore, the image augmentation degree also represents the degree that the test set is affected by image augmentation.

The accuracy and failure rate of image registration can be used to measure the robustness of different image registration algorithms. Since the maximum perturbation values of each grid point in the sample image in the horizontal and vertical directions are ρ*<sup>x</sup>* and ρ*<sup>y</sup>* respectively, when the accuracy of image registration of a pair of images is greater than  ρ2 *<sup>x</sup>* + ρ2*y*, the pair can be considered as a registration failure, and the failure rate of image registration on the test set can further be calculated. Considering that the RMSE values of test samples failed to be registered may be too large, and these extreme data may affect the RMSE values of the whole test set greatly, therefore, the RMSE of the whole test set is defined as

$$\begin{aligned} RMSE'\_i &= \min(RMSE\_i, \sqrt{\rho^2 x + \rho^2 y}) \\ RMSE &= \frac{1}{K} \sum\_{i=1}^{K} RMSE'\_i \end{aligned} \tag{20}$$

where *RMSEi* represents the RMSE value of the *i*th pair of images, and *K* denotes the total number of image pairs in the test set.

Figures 2–5 show the failure rate and RMSE achieved by different algorithms under different image augmentation degrees. The abscissa is the image augmentation degree of the test set, which changes from 0.0 to 1.0 with a step size of 0.1; the ordinate represents the registration failure rate or RMSE. Figure 2 shows the robustness comparison of seven image registration algorithms, in which the CNN model used by DeTone's and Nguyen's algorithms is VGG, while the model used by the proposed algorithm is Xception. As can be seen from Figure 2, the robustness of the traditional image registration algorithms to illumination, color, and brightness is very poor, and the robustness of the learning-based algorithms, especially the supervised learning-based algorithm, is better than that of the traditional ones. Figures 3–5 further give robustness analysis of the three learning-based image registration algorithms under three different CNN models. The used three CNN models are VGG, Googlenet and Xception, respectively. It can be seen that under the same neural network model, the robustness of Nguyen's algorithm is inferior to the other two algorithms. Nguyen's algorithm uses L1 norm as a loss function in the unsupervised learning algorithm, requiring the same image augmentation parameters for *I <sup>A</sup>* and *I <sup>B</sup>* in each pair of samples during the training, otherwise, the model will not converge normally, which results in the poor robustness of the unsupervised learning image registration algorithm. In contrast, DeTone's algorithm and the proposed algorithm do not have this problem, because both of them adopt supervised learning; the label value can supervise the training of the neural network very well, so the model has better robustness.

**Figure 2.** Robustness of seven image registration algorithms under different image augmentation degrees: (**a**) Failure rate; (**b**) RMSE.

**Figure 3.** Robustness of DeTone's algorithm, Nguyen's algorithm and the proposed algorithm using VGG: (**a**) Failure rate; (**b**) RMSE.

**Figure 4.** Robustness of DeTone's algorithm, Nguyen's algorithm and the proposed algorithm using Googlenet: (**a**) Failure rate; (**b**) RMSE.

**Figure 5.** Robustness of DeTone's algorithm, Nguyen's algorithm and the proposed algorithm using Xception: (**a**) Failure rate; (**b**) RMSE.

In order to further analyze the influence of different perturbation values on the accuracy of the proposed algorithm, four maximum perturbation values in Step 1 including 24, 28, 32, and 36 are tested on test sets with different image augmentation degrees, respectively. The experimental results are shown in Figure 6, in which the abscissa and ordinate are the image augmentation degree of the test set and RMSE achieved by different image registration algorithms, respectively. It can be seen that as the maximum perturbation value ρ decreases, the RMSE of image registration also decreases, that is, the higher the accuracy of image registration.

**Figure 6.** Robustness of the proposed algorithm under different perturbation values and CNNs: (**a**) ρ = 36; (**b**) ρ = 32; (**c**) ρ = 28; (**d**) ρ = 24.

Figure 7 gives the visualized homography estimation results. The red boxes in the left images are mapped to the red boxes in the right images. These red boxes are labels, which are generated by the proposed method described in Section 2.3. The yellow boxes in the right images indicate the results of homography estimation. The more the red and yellow boxes in the right images coincide, the higher the accuracy of feature point matching is. From Figure 7, it is also noticed that the proposed algorithm with Xception model is superior to the proposed algorithms with Googlenet and VGG neural network models.

**Figure 7.** Visualization analysis of the proposed algorithm under different CNNs (The red boxes indicate the ground truth, and the yellow boxes are the estimation results): (**a**) accuracy of image registration under VGG (RMSE = 10.154711); (**b**) accuracy of image registration under VGG (RMSE = 2.240815); (**c**) accuracy of image registration under Googlenet (RMSE = 7.2284245); (**d**) accuracy of image registration under Googlenet (RMSE = 1.9681364); (**e**) accuracy of image registration under Xception (RMSE = 3.1798978); (**f**) accuracy of image registration under Xception (RMSE = 1.4085304).
