*3.4. Postprocessing*

#### 3.4.1. Generation of a Calibration Matrix from a Network

Basically, postprocessing is performed to generate the calibration matrix *RTpred* that is shown in Equation (12). The rotation matrix *Rpred* and translation vector *Tpred* in Equation (12) are generated by the quaternion parameters *q*0, *q*1, *q*2, and *q*3, and translation parameters *τ<sup>p</sup> x* , *τ<sup>p</sup> y* , and *τ<sup>p</sup> z* inferred from the network we built, as shown in Equations (13) and (14).

$$RT\_{pred} = \begin{bmatrix} R\_{pred} & & T\_{pred} \\ 0 & 0 & 0 & 1 \end{bmatrix} \tag{12}$$

$$R\_{przd} = \begin{bmatrix} 1 - 2(q\_2^2 + q\_3^2) & 2(q\_1 q\_2 - q\_0 q\_3) & 2(q\_0 q\_2 + q\_1 q\_3) \\ 2(q\_1 q\_2 + q\_0 q\_3) & 1 - 2(q\_1^2 + q\_3^2) & 2(q\_2 q\_3 - q\_0 q\_1) \\ 2(q\_1 q\_3 - q\_0 q\_2) & 2(q\_0 q\_1 + q\_2 q\_3) & 1 - 2(q\_1^2 + q\_2^2) \end{bmatrix} \tag{13}$$

$$T\_{prcd} = \begin{bmatrix} \pi\_x^p & \pi\_y^p & \pi\_z^p \end{bmatrix}^T \tag{14}$$

$$\begin{aligned} \theta\_x^p &= \operatorname{atan2}\left(R\_{\operatorname{prcl}}(3,2), R\_{\operatorname{prcl}}(3,3)\right) \\ \theta\_y^p &= \operatorname{atan2}\left(-R\_{\operatorname{prcl}}(3,1), \sqrt{R\_{\operatorname{prcl}}(3,2)^2 + R\_{\operatorname{prcl}}(3,3)^2}\right) \\ \theta\_z^p &= \operatorname{atan2}\left(R\_{\operatorname{prcl}}(2,1), R\_{\operatorname{prcl}}(1,1)\right) \end{aligned} \tag{15}$$

Equation (15) shows how to calculate the rotation angle about each of the *x*-, *y*-, and *z*-axes from the rotation matrix *Rpred*. In Equation (15), (*<sup>r</sup>*,*<sup>c</sup>*) indicates the row index *r* and column index *c* of the matrix *Rpred*. The angle calculation described in Equation (15) is used to convert a given rotation matrix into Euler angles.

#### 3.4.2. Calculation of Calibration Error

To evaluate the proposed calibration system, it is necessary to calculate the error of the predicted parameters. For this, we calculate the transformation matrix *RTerror*, which contains the errors of the predicted parameters by Equation (16). *RTmis* and *RTonline* in Equation (16) are calculated by Equations (3) and (17), respectively. In Equation (17), each of *RT*1, *RT*2, *RT*3, *RT*4, and *RT*5 is a calibration matrix predicted by each of the five networks, Net1, Net2, Net3, Net4, and Net5. The calculation of these five matrices is described in detail in 3.4.3. From *RTerror*, we calculate the error of the rotation-related parameters using Equation (18) and the error of the translation-related parameters using Equation (19).

$$RT\_{errvr} = RT\_{online} \cdot RT\_{mis} \tag{16}$$

$$RT\_{online} = RT\_5 \cdot RT\_4 \cdot RT\_3 \cdot RT\_2 \cdot RT\_1 \tag{17}$$

$$\begin{aligned} \theta\_x^\varepsilon &= \operatorname{atan2}(\operatorname{RT}\_{\operatorname{error}}(3,2), \operatorname{RT}\_{\operatorname{error}}(3,3)) \\ \theta\_y^\varepsilon &= \operatorname{atan2}\left(-\operatorname{RT}\_{\operatorname{error}}(3,1), \sqrt{\operatorname{RT}\_{\operatorname{error}}(3,2)^2 + \operatorname{RT}\_{\operatorname{error}}(3,3)^2}\right) \\ \theta\_z^\varepsilon &= \operatorname{atan2}(\operatorname{RT}\_{\operatorname{error}}(2,1), \operatorname{RT}\_{\operatorname{error}}(1,1)) \end{aligned} \tag{18}$$

$$\begin{aligned} \tau\_x^c &= RT\_{error}(1, 4) \\ \tau\_y^c &= RT\_{error}(2, 4) \\ \tau\_z^c &= RT\_{error}(3, 4) \end{aligned} \tag{19}$$

In Equations (18) and (19), (*<sup>r</sup>*,*<sup>c</sup>*) indicates the row index *r* and column index *c* of the matrix *RTerror*.

In the KITTI dataset, the rotation angle about the *x*-axis, the rotation angle about the *y*axis, and the rotation angle about the *z*-axis correspond to pitch, yaw, and roll, respectively. In contrast, in the Oxford dataset, they correspond to roll, pitch, and yaw, respectively.

#### 3.4.3. Iterative Refinement for Precise Calibration

The training uses all five deviation ranges, but the evaluation of the proposed method is performed with randomly sampled deviations only in Rg1, which is the largest deviation range. Using this sampled deviation, the transformation matrix *RTmis* is formed as shown in Equations (3), (7), and (8). Then, a point cloud prepared for evaluation is initially transformed using Equation (3). By inputting this transformed point cloud into the trained Net1, the values of parameters that describe translation and rotation are inferred. With these inferred values, we obtain the *RTpred* of Equation (12). This *RTpred* becomes *RT*1. We multiply the initial transformed points by this *RT*1 to obtain new transformed points, and we input these new transformed points into the trained Net2 to obtain *RTpred* from Net2. This new *RTpred* becomes *RT*2. In this way, the input points to the current network are multiplied by *RTpred*, which is the output of the current network, to obtain new transformed points for use as the input to the next network; this process of obtaining new *RTpred* by inputting them into the next network is repeated until Net5. For each point cloud prepared for evaluation as described above, a calibration matrix (*RTi*, *i* = 1,···,5) is obtained from each of the five networks, and the final calibration matrix *RTonline* is obtained by multiplying the calibration matrices as shown in Equation (17). The iterative transformation process of the point cloud for evaluation as described above is expressed as follows:

$$
\widehat{P}'\_1 = RT\_{\text{mis}}RT\_{\text{init}}\widehat{P}^T \tag{20}
$$

$$
\widehat{P'\_i} = RT\_{i-1} \overline{P'\_{i-1'}} \text{ : } i = 2, \dots, 5 \tag{21}
$$

3.4.4. Temporal Filtering for Precise Calibration

Calibration performed with only a single frame can be vulnerable to various forms of noise. According to [25], this problem can be improved by analyzing the results over time. For this purpose, N. Schneider et al. [25] check the distribution of the results over all evaluation frames while maintaining the value of the sampled deviation used for the first frame. They take the median over the whole sequence, which enables the best performance on the test set. They sample the deviations from Rg1. They repeat 100 runs of this experiment, keeping the sampled deviations until all test frames are passed and resampling the deviations at the start of a new run.

It is good to analyze the results obtained over multiple frames. However, applying all the test frames to temporal filtering has a drawback in the context of autonomous driving. In the case of the KITTI dataset, the calibration parameter values are inferred from the results obtained from processing about 4500 frames, which takes a long time. It is also difficult to predict what will happen during this time. Therefore, we reduce the number of frames to use for temporal filtering and randomly determine the start frame for filtering among these frames. We set the bundle size of frames to 100 and performed quantitative analysis by taking the median from 100 results obtained by applying this bundle. The value of parameters from *RTonline* for each frame is obtained using Equations (14) and (15). The basis for setting the bundle size is given in Section 4.3.3.
