*2.4. Loss Function and Convolutional Neural Network*

RMSE can be used as a loss function of CNN, which is defined by

$$L\_s = \sqrt{\frac{1}{k} \sum\_{i=1}^{k} \left\| \mathbf{x}\_i - \mathbf{x}\_i \right\|^2} \tag{18}$$

where *xi* is the label value of the *i*th pair of matching points, *x*ˆ*<sup>i</sup>* is the corresponding output value of the CNN, and *k* is the total number of pairs of matching points.

General CNN can be used to obtain the image registration model. In this paper, three network architectures including VGG [22], Googlenet [23] and Xception [24] are compared. The structure of the VGG network is simple and the depth of the network is easily expanded, but its training speed is slow and it requires a lot of hardware resources. For simplicity, we adopted a 10-layer VGG network [14] in the experiments. Googlenet can deepen the depth and width of the neural network, speed up the training speed, and reduce the hardware resources needed by the network. The convergence speed of the Xception network is fast, and the hardware resources required are also less. Additionally, the convergence performance of the Xception network is generally better than that of VGG and Googlenet networks.
