**1. Introduction**

Image registration is a process of image matching and transformation of two or more different images. It is widely used in such fields as panoramic image splicing [1,2], high dynamic range imaging [3], simultaneous localization and mapping (SLAM) [4], and so on.

Traditional image registration algorithms are mainly classified into pixel-based algorithms and feature-based algorithms [5,6]. In pixel-based image registration algorithms, the original pixel values are directly used to estimate the transformation relationship between images [7,8]. Firstly, the homography matrix between a pair of images is initialized. Then, the homography matrix is used to transform the image, and the errors of pixel values of the transformed image are calculated. Finally, the optimization technique is used to minimize the error function to achieve image registration. The pixel-based algorithms usually run slowly and are effective to low-texture scenes, but have poor robustness to scale, rotation and brightness.

In feature-based image registration algorithms [9,10] such as SIFT [11], ORB [12], etc., feature points of images are generally extracted first, and the corresponding relationship between feature points of the two images is established by feature matching, and the optimal homography matrix is estimated by algorithms such as RANSAC [13], etc. Feature-based image registration algorithms are generally better and faster than pixel-based image registration, but feature-based algorithms require that there must be enough matching points between the two images and that the accuracy of matching points is higher and the location distribution of matching points is uniform. Otherwise, the registration accuracy will be greatly reduced. Feature-based image registration algorithms generally have good robustness to scale and rotation and have robustness to brightness to some extent, but are not suitable for low-texture images.

Recently, some deep learning-based image registration algorithms have been proposed. DeTone et al. [14] proposed a homography matrix estimation algorithm with supervised learning. A 128 × 128 image *IA* was generated by randomly clipping from an image *I*, and then random perturbation values were added to the coordinates of the four corners of the image *IA* to generate four perturbation points, so that four pairs of matching points were obtained. The homography matrix corresponding to the four pairs of points was calculated by using the coordinates of the four corners of image *IA* and their corresponding perturbation points. The homography matrix was used to transform image *IA* into image *IB*. Then, the images *IA* and *IB* were converted into grayscale images as samples, and the coordinate differences between the four corner points of *IA* and their corresponding perturbation points in *IB* were used as labels, with which a 10-layer VGG (Visual Geometry Group) network was trained, and finally a homography matrix estimation model that could be used for image registration was obtained. The algorithm has better robustness to brightness, scale, rotation, and texture. On the basis of DeTone's work, Nguyen et al. [15] proposed a homography matrix estimation algorithm with unsupervised learning to solve the shortcoming of artificially generated labels in supervised learning, but this algorithm had weak robustness to illumination. The samples used in these two algorithms were mainly artificially generated samples. The artificial samples ensured that the accuracy of the samples and labels was high enough, which was a beneficial exploration for deep learning to solve the actual image registration problem. However, the artificial samples adopted by these two works default to no parallax between the images to be registered, so only four pairs of corresponding points are used to represent the registration relationship between the two images. However, in practice, there is parallax between the images to be registered, and the relationship between such kinds of images is often not exact homography transformation.

In image registration, it is necessary to estimate the homography matrix between the target image and the reference image. The homography matrix is used to transform the target image to achieve the alignment of the target image and the reference image in spatial coordinates. The transformation process is called image mapping or image transformation. According to the application scope of the homography matrix, image transformation can be divided into global homography transformation and local homography transformation. Global homography transformation [7,11,12,14,16] uses the same homography matrix to transform the whole image. It requires that the target image and the reference image contain basically the same image information in the overlapping region. It is only suitable for images with small or no parallax. When this condition is not satisfied, the accuracy of image registration will be reduced significantly. Local homography transformation algorithm [17–19] maps different regions of an image using different transformation matrices, which can better overcome the shortcomings of the global homography transformation algorithm. As-Projective-As-Possible (APAP) algorithm [19] is a representative local homography transformation algorithm. It first extracts the feature matching points between the images and then divides the images into a uniform grid. Moving direct linear transform (MDLT) is used to estimate the homography matrix of each grid. Finally, the homography matrix of each grid is used to implement local homography transformation on the image to be registered. For images that do not satisfy the condition of global homography transformation, the image registration accuracy achieved by APAP algorithm is higher than that achieved by the global homography transformation algorithm [20]. APAP algorithm is also a feature-based image registration algorithm in essence. It also has the characteristics of a feature-based image registration algorithm and has higher accuracy than the general feature-based image registration algorithm. The general image registration algorithm based on global homography transformation only uses one homography matrix estimation and one homography transformation, while APAP algorithm needs multiple homography matrix estimations and homography transformations, so the speed of the APAP algorithm is slower than that of the general feature-based image registration algorithm.

The above two deep learning-based image registration algorithms are both for global homography transformation, and the used samples cannot be adopted to estimate the local homography matrix. Therefore, based on the above researches, an image registration algorithm based on deep learning and local homography transformation is proposed in this paper. An image sample and label generation method suitable for local homography transformation is designed so as to train the image registration model with convolutional neural network (CNN) effectively. The resulted image registration model can effectively reduce the error of image registration and overcome the defects of poor robustness of traditional image registration algorithms and low accuracy of existing deep learning-based image registration algorithms.

The main contributions of this paper are as follows: (1) A CNN and local homography transformation-based algorithm are proposed to solve the problem of image registration, which is a useful exploration for deep learning to solve the problem of image registration; (2) an image sample and label generation method suitable for local homography transformation is proposed, and the generated samples have good diversity and can simulate the actual image registration situation.

The rest of this paper is organized as follows. Section 2 mainly introduces the basic theory of the proposed algorithm, focusing on the image sample, label generation, CNN model, and loss function. Section 3 shows the experimental results, which verify the effectiveness of the proposed algorithm. The conclusion is given in Section 4, which summarizes the main work of this paper and analyses the shortcomings of the algorithm and possible improvement aspects.

#### **2. Image Registration Algorithm Based on Deep Learning and Local Homography Transformation**

In supervised learning-based image registration, sample labeling is required first. However, the cost of labeling samples manually is too high, and it is usually difficult to ensure the labeling accuracy, as well as to collect enough diverse images for registration. To solve this problem, an image registration algorithm based on deep learning and local homography transformation is proposed in this paper. Firstly, a sample and label generation method for deep learning is designed. In this method, direct linear transformation (DLT) and moving direct linear transformation (MDLT) are used to automatically generate more reasonable and effective samples and labels for deep learning, and then supervised learning is used to train CNN so as to obtain the image registration model, with which the local homography transformation-based image registration can be achieved.
