**3. Experimental Results and Analysis**

To test the performance of the proposed algorithm, it is compared with Scale-Invariant Feature Transform (SIFT) algorithm [11], Oriented FAST and Rotated BRIEF(ORB) algorithm [12], Error Checking and Correction (ECC) algorithm [7], APAP [19], the DeTone's algorithm [14], and the Nguyen's algorithm [15]. The experiments are implemented on a computer with Intel i7-6700 CPU, 32 GB memory, one NVIDIA GTX 1080 Ti GPU, and the operating system used is Ubuntu 16.04 LTS.

The performances of different image registration algorithms are compared in terms of accuracy, running time and robustness. The three algorithms of SIFT, ORB and ECC are implemented by using Python OpenCV. The RANdom SAmple Consensus (RANSAC) threshold of SIFT and ORB algorithms is 5. The maximum number of iterations of the ECC algorithm is 1000. The adopted framework of deep learning is TensorFlow [25]. The APAP, DeTone's algorithm and Nguyen's algorithm are implemented with Python programming language on the same platform.

To facilitate comparison with the DeTone's and Nguyen's algorithms, the size of sample images used in this paper is the same as that of DeTone's and Nguyen's algorithms. The used perturbation values consist of components in horizontal and vertical directions, the range of which should not be too small or too large. If the perturbation range is too small, the generated perturbation value will be small, which will reduce the diversity of the samples and weaken the generalization ability of the model. However, if the perturbation range is too large, it may easily generate some samples with extreme deformation, which will make the training of the model more difficult and lead to the reduction of prediction accuracy of the model. The maximum perturbation values ρ*<sup>x</sup>* or ρ*<sup>y</sup>* of corner points in Step 1 of the proposed image sample and label generation method should not exceed half of the width or height of the original image respectively. Generally, taking 1/3~1/10 of the image width or height can ensure that the generated samples have better diversity and visual quality. Similarly, in Step 4, taking 1/3~1/10 of ρ*<sup>x</sup>* for ρ *<sup>x</sup>*, 1/3~1/10 of ρ*<sup>y</sup>* for ρ *<sup>y</sup>* can achieve better results.

The original data sets used in the experiments are MS-COCOCO2014 and MS-COCOCO2017 data sets [26]. Firstly, all images in these two data sets are scaled to 320 × 240, on which the proposed sample and label generation method is performed to obtain the gray-scale sample images with the size of 128 × 128. The maximum perturbation values ρ*<sup>x</sup>* and ρ*<sup>y</sup>* in horizontal and vertical directions of the corner points in Step 1 are set to 45, and the number of matching points for each pair of images in Step 2 is set to 5 × 5. The maximum perturbation values ρ *<sup>x</sup>* and ρ *<sup>y</sup>* in Step 4 are set to 11. In Step 5, the values of *w*min and *h*min are both 5. In Step 8, the threshold of overlap degree is 0.3, that is, when

the overlap degree is lower than 0.3, the sample will be discarded. To increase the robustness of the model and reduce the possibility of over-fitting, image augmentation technology [27] is also used in the generation of training samples. The color and brightness of some of the sample images are randomly changed, and some of the sample images are processed with Gamma transformation. Finally, a total of 500,000 pairs of images are generated as a training set, 10,000 pairs of images as a validation set, and 5000 pairs of images as a test set.

In order to prove the generality of the proposed algorithm, three CNNs, including VGG, Googlenet and Xception, are used to train and test each of the learning-based image registration algorithms. The used optimization algorithm is Adam [28], where β<sup>1</sup> = 0.9, β<sup>2</sup> = 0.999, ε = 10<sup>−</sup>8. The batch size is 128. The initial learning rate of the proposed algorithm and supervised learning of DeTone's algorithm is 0.0005, and that of unsupervised learning of Nguyen's algorithm is 0.0001. To prevent over-fitting, dropout [29] is used before the output layer of all neural networks. In the process of training, the test error of the validation set can be observed. When the test error of the validation set is no longer reduced, the training is stopped to prevent under-fitting or over-fitting.

When training the network models of the DeTone's algorithm and Nguyen's algorithm, the perturbation values of their samples are also set to 45, the same optimization techniques and image augmentation techniques as well as the same CNN are adopted. The number of training samples generated is the same as that of the proposed algorithm, and the training methods and observation methods are also the same. All algorithms are tested on the test set generated by the proposed method to ensure the objectivity of the comparison.
