*2.3. Sample and Label Generation Method Based on Local Homography Transformation*

In the homography matrix, the rotational and shear components are often much smaller than the translation components, so it is difficult for a model to converge if the homography matrix is used as a label directly. Therefore, DeTone et al. proposed a method of substituting four pairs of corresponding points for the homography matrix [14]. The algorithm uses global homography transformation and is only suitable for the registration of an image without parallax. However, the actual images usually have parallax.

To overcome the shortcomings of DeTone's method, an improved sample generation method based on local homography transformation is proposed to generate sample images with parallax, as illustrated in Figure 1. The sample and label generation process is described in detail as follows:

**Figure 1.** The process of the proposed sample and label generation method: (**a**) Generate four pairs of points and obtain the corresponding homography matrix **H***AB* <sup>4</sup>*pt*; (**b**) randomly cut out the original image to generate an *M* × *N* uniform grid *GA*; (**c**) *M* × *N* points *G <sup>A</sup>* transformed from *GA* by using **H***AB* <sup>4</sup>*pt*; (**d**) *M* × *N* perturbation points *G <sup>A</sup>* generated from *G <sup>A</sup>*; (**e**) adaptively generate *m* × *n* uniform grid; (**f**) image *IB* transformed from *IA* using local homography matrices **H***AB <sup>L</sup>* ; (**g**) generated alternative samples; (**h**) calculation of overlap degree of two sample images.

Step 1: Firstly, add random perturbation values to the coordinates of the four corners {P1, P2, P3, P4} of the original image *IA* to obtain four new points {P 1, P 2, P 3, P 4}, where the ranges of the random perturbation values in horizontal and vertical directions are [−ρ*x*, ρ*x*] and [−ρ*y*, ρ*y*], respectively. The two points before and after the perturbation form a pair of corresponding points, therefore, a total of four pairs of corresponding points are obtained, as shown in Figure 1a. Then, calculate the homography matrix **H***AB* <sup>4</sup>*pt* corresponding to the four pairs of corresponding points.

Step 2: Randomly select a point *p* in the original image *IA*, cut out a block *I <sup>A</sup>* with fixed size using *p*as the upper left corner of the block, and divide the block into a uniform grid to get *M* × *N* grid points *GA*, as illustrated in Figure 1b.

Step 3: According to Equations (1) and (2), transform the *M* × *N* grid points *GA* into new corresponding *M* × *N* points *G <sup>A</sup>* by using the homography matrix **H***AB* <sup>4</sup>*pt*, as illustrated in Figure 1c.

Step 4: Add random perturbation values to each of the new corresponding *M* × *N* points *G <sup>A</sup>* to get *M* × *N* perturbation points *G <sup>A</sup>*, as illustrated in Figure 1d. The ranges of random perturbation values in horizontal and vertical directions are −ρ *<sup>x</sup>*, ρ *x* and −ρ *<sup>y</sup>*, ρ *y* , respectively, and ρ *<sup>x</sup>* < ρ*x*/2, ρ *<sup>y</sup>* < ρ*y*/2, so as to ensure the global consistency of these random perturbation points.

Step 5: Through the *M* × *N* uniform grid points, *GA* generated in Step 2 and *M* × *N* corresponding perturbation points *G <sup>A</sup>* generated in Step 4, the corresponding global homography matrix **H***AB <sup>g</sup>* is calculated by the DLT algorithm. Then transform the *M* × *N* uniform grid points *GA* into new points *G <sup>A</sup>* by using **H***AB <sup>g</sup>* and calculate the root mean square error (RMSE) between *G <sup>A</sup>* and *G <sup>A</sup>*. After that, divide the original image *IA* into an *m* × *n* uniform grid according to the RMSE, as shown in Figure 1e. If the RMSE is large, which means that there is a strong locality between *GA* and *G <sup>A</sup>*, the grid of the original image should be partitioned smaller to improve the local accuracy; conversely, if the RMSE is small, it means that the local homography matrixes have strong global character, therefore, the grid of the original image can be partitioned larger so as to speed up sample generation. The number of rows and columns of the uniform grid can be determined by

$$m = \text{int}\left(\text{min}\left(1 + \frac{H \cdot y\_{\text{rms}}}{\rho' y\_{\text{min}}}, \frac{H}{h\_{\text{min}}}\right)\right) \quad n = \text{int}\left(\text{min}\left(1 + \frac{W \cdot x\_{\text{rms}}}{\rho' x\_{\text{min}}}, \frac{W}{w\_{\text{min}}}\right)\right) \tag{16}$$

where *m* and *n* are the number of rows and columns of the uniform grid, *W* and *H* are the width and height of the image *IA*, *xrmse* and *yrmse* represent the RMSE between *G <sup>A</sup>* and *G <sup>A</sup>* in horizontal and vertical directions, and *w*min and *h*min represent the minimum width and minimum height of each image block, respectively. *w*min and *h*min should not be too small, otherwise, it will cause too many blocks of some samples, which will affect the speed of sample generation; however, it also should not be too large, so as to avoid too few blocks of samples, which will result in an unnatural block effect in the transformed image.

Step 6: Calculate the local homography matrix **H***AB <sup>j</sup>* (*j* = 1, 2, ··· , *m* × *n*) corresponding to each block of the *m* × *n* uniform grid with the MDLT algorithm, in which the *M* × *N* pairs of corresponding points between *GA* and *G <sup>A</sup>* are used as the pairs of matching points, so that the *m* × *n* local homography matrixes **H***AB <sup>L</sup>* = ! **H***AB j* " " " "*<sup>j</sup>* <sup>=</sup> 1, 2, ··· , *<sup>m</sup>* <sup>×</sup> *<sup>n</sup>* # are obtained. Then transform the original image *IA* into a new image *IB* with **H***AB <sup>L</sup>* and calculate the coordinate of the points *GB* in image *IB* corresponding to *GA* in *IA* with **H***AB L* .

Figure 1f shows the image *IB* generated from the original image *IA* shown in Figure 1a after local homography transformation, and the grid points in Figure 1f represent the new grid points generated by local homography transformation corresponding to the *M* × *N* uniform grid points *GA* in Figure 1b.

Step 7: For image *IB*, an image block with the same size and coordinates as that of *I <sup>A</sup>* in image *IA* is cropped as *I <sup>B</sup>*. Image *I <sup>A</sup>* and image *I <sup>B</sup>* constitute the alternative sample of the neural network. The coordinate difference *GAB* between the points *GB* in image *IB* and its corresponding points *GA* in image *IA* forms the alternative label of the neural network.

Figure 1g gives a pair of alternative samples cropped from the images in Figure 1b,f.

Step 8: In the process of generation of image *IB*, if the overlap degree of two sample images is too low because of the extreme distribution of perturbation point *G <sup>A</sup>*, the samples are regarded to be invalid and will be discarded.

The calculation of the overlap degree of two sample images is illustrated in Figure 1h. Let *I <sup>A</sup>* be the corresponding binary mask of sample image *I <sup>A</sup>* in the original image *IA*. Transform the mask image *I <sup>A</sup>* through the local homography matrix **H***AB <sup>L</sup>* so as to obtain the corresponding binary mask *I <sup>B</sup>* in the image *IB*. Then the binary mask images *I <sup>A</sup>* and *I <sup>B</sup>* are intersected to get the binary mask image *I AB*, in which the non-zero-pixel region indicates the overlap region of the two sample images, as shown in Figure 1h. Thus, the overlap degree of two sample images is calculated as

$$
\partial = \frac{S\_{AB}}{S\_A} \tag{17}
$$

where ∂ denotes the overlap degree, *SA* denotes the number of non-zero pixels in *I <sup>A</sup>*, and *SAB* denotes the number of non-zero pixels in *I AB*. If ∂ of two sample images is lower than a threshold, the two sample images will be discarded.
