*2.1. Direct Linear Transformation (DLT)*

If there is no parallax between the reference and target images, the mapping relationship between the two images is simple homographic, which can be described by the homography matrix. Suppose that two points with coordinates **x** = [*x* , *y* ] <sup>T</sup> and **x** = [*x*, *y*] <sup>T</sup> are the corresponding matching points on the reference image *I'* and the target image *I* respectively, and the corresponding relationship between these two points can be expressed as

$$
\widetilde{\mathbf{x}}' = \mathbf{H} \widetilde{\mathbf{x}} \tag{1}
$$

,

.

where **<sup>x</sup>** and **<sup>x</sup>** are the homogeneous coordinates of the two points respectively, and **<sup>x</sup>** = ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎝ *x y z* ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎠

$$
\widetilde{\mathbf{x}} = \begin{pmatrix} x \\ y \\ 1 \end{pmatrix}, \mathbf{H} \text{ is the homograph matrix between the two images, } \mathbf{H} = \begin{pmatrix} h\_{11} & h\_{12} & h\_{13} \\ h\_{21} & h\_{22} & h\_{23} \\ h\_{31} & h\_{32} & h\_{33} \end{pmatrix}
$$

In the non-homogeneous coordinates, the corresponding relationship between matching points **x** and **x** can be expressed as

$$\mathbf{x}' = \frac{\mathbf{x}''}{z''} = \frac{h\_{11}\mathbf{x} + h\_{12}\mathbf{y} + h\_{13}}{h\_{31}\mathbf{x} + h\_{32}\mathbf{y} + h\_{33}} \\ \mathbf{y}' = \frac{\mathbf{y}''}{z''} = \frac{h\_{21}\mathbf{x} + h\_{22}\mathbf{y} + h\_{23}}{h\_{31}\mathbf{x} + h\_{32}\mathbf{y} + h\_{33}} \tag{2}$$

Transform Equation (1) into the form of 03×1=**<sup>x</sup>** <sup>×</sup> **<sup>H</sup><sup>x</sup>** and obtain

$$
\begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 & -\mathbf{x} & -\mathbf{y} & -1 & \mathbf{x}\mathbf{y}' & \mathbf{y}\mathbf{y}' & \mathbf{y}' \\ \mathbf{x} & \mathbf{y} & 1 & 0 & 0 & 0 & -\mathbf{x}\mathbf{x}' & -\mathbf{y}\mathbf{x}' & -\mathbf{x}' \\ -\mathbf{x}\mathbf{y}' & -\mathbf{y}\mathbf{y}' & -\mathbf{y}' & \mathbf{x}\mathbf{x}' & \mathbf{y}\mathbf{x}' & \mathbf{x}' & 0 & 0 & 0 \end{pmatrix} \mathbf{h} \tag{3}
$$

where **h** = (*h*<sup>11</sup> *h*<sup>12</sup> *h*<sup>13</sup> *h*<sup>21</sup> *h*<sup>22</sup> *h*<sup>23</sup> *h*<sup>31</sup> *h*<sup>32</sup> *h*33) T.

When estimating **H**, more matching point information can be used to reduce the estimation error. In Equation (3), only two rows of the 3 × 9 coefficient matrix on the right side of the equation are independent. By selecting the first two rows to form an independent coefficient matrix **A***i*, and taking all matching points into account, a 2*N* × 9 coefficient matrix **A** can be formed. By using the least square method, the solution of **h** can be expressed as

$$\hat{\mathbf{h}} = \underset{\mathbf{h}}{\operatorname{argmin}} \sum\_{i=1}^{N} \left\| \mathbf{A}\_{i} \mathbf{h} \right\|^{2} = \underset{\mathbf{h}}{\operatorname{argmin}} \left\| \mathbf{A} \mathbf{h} \right\|^{2} \tag{4}$$

where **<sup>h</sup>**<sup>ˆ</sup> is an estimation of **<sup>h</sup>**, **Ah** denotes the two norms of vector **Ah**, **<sup>h</sup>** is the normalized unit vector, *N* denotes the total number of pairs of matching points, and **A***<sup>i</sup>* denotes the independent coefficient matrix corresponding to the *i*th pair of matching points. Singular value decomposition (SVD) can be used to calculate **h**ˆ. The right singular vector corresponding to the minimum singular value of **A** is the result. The estimation of homography matrix **H** is obtained by arranging the elements of vector **h**ˆ in a certain order.

Considering that SVD is time-consuming, which will affect the training speed of the neural network, Equation (3) is transformed into the form of non-homogeneous linear least squares. Let *h*<sup>33</sup> = 1, two independent non-homogeneous linear equations can be obtained as

$$\mathbf{A}^{'}\_{i}\mathbf{h}^{'} = \mathbf{b}^{'}\_{i} \tag{5}$$

$$\mathbf{A}'\_{i} = \begin{pmatrix} 0 & 0 & 0 & -\mathbf{x} & -y & -1 & \mathbf{x}y' & yy' \\ \mathbf{x} & \mathbf{y} & 1 & 0 & 0 & 0 & -\mathbf{x}\mathbf{x}' & -y\mathbf{x}' \end{pmatrix} \tag{6}$$

$$\mathbf{h}' = \begin{pmatrix} h\_{11} \ h\_{12} \ h\_{13} \ h\_{21} \ h\_{22} \ h\_{23} \ h\_{31} \ h\_{32} \end{pmatrix}^{\mathrm{T}} \tag{7}$$

$$\mathbf{b}'\_{i} = \begin{pmatrix} -y'\\ \mathbf{x}' \end{pmatrix} \tag{8}$$

If all *N* matching points are included, then Equation (4) can be represented as

$$\mathbf{h}^{'} = \underset{\mathbf{h}^{'}}{\operatorname{argmin}} \sum\_{i=1}^{N} \left\| \mathbf{A}^{'}\_{i} \mathbf{h}^{'} - \mathbf{b}^{'}\_{i} \right\|^{2} = \underset{\mathbf{h}^{'}}{\operatorname{argmin}} \left\| \mathbf{A}^{'} \mathbf{h}^{'} - \mathbf{b}^{'} \right\|^{2} \tag{9}$$

where **h**ˆ is the estimation of **h** , and **A** is the coefficient matrix of 2*N* × 8 obtained by arranging all coefficient matrices **A** *<sup>i</sup>* in the vertical direction. **b** is a constant column matrix of 2*N* × 1 obtained by arranging all the constant column matrices **b** *<sup>i</sup>* in the vertical direction.

Let *<sup>E</sup>* = **A h -b** 2 ; **h**ˆ can be calculated through *dE d***h** = 0

$$\hat{\mathbf{h}}' = \left(\mathbf{A}'^{\mathrm{T}}\mathbf{A}'\right)^{-1}\mathbf{A}'^{\mathrm{T}}\mathbf{b}' \tag{10}$$
