**5. Experiments**

We conducted experiments to ascertain our method's efficacy and priority. The convergence stability of our point-to-point distortion calibration method was proved by repeating experiments, which were repeated 10 times on 10 groups of images. Additionally, we evaluated the accuracy of the distortion rectification map calculated from the result of 7 training processes using a test set that was not used for the previous calibration. Additionally, the influence of the number of calibration images on the calibration results was investigated. We compared the performance of the distortion calibration results between our method, Zhang's method [14], and Thomas S. et al.'s method [11], using 1920 × 1080 pixels laparoscopy, demonstrating a reprojection error and RMSE of camera parameter estimation. The ablation experiment demonstrated that optimization with a novel objective function and point-to-point calculation of lens distortion contributed to the final result's improvement.

#### *5.1. Experimental Procedures*

The 2D targets employed in the experiments of Zhang's method were circular and checkerboard pattern targets. We also adopted the deltille grid target proposed by Ha et al. [30] and the speckle pattern target proposed by Chen et al. [17]. As depicted in Figure 7a, the speckle pattern was synthesized using Equation (1) with n = 1.5 × 10<sup>4</sup> and D = 60 pixels in a resolution of 4000 × 4000 pixels2. It was printed on adhesive matte paper by HP Indigo 7600 and stuck on a piece of glass to serve as a calibration target of 6 × 6 cm2. The circular pattern calibration target consisted of circulars with a 3 cm diameter and 6 cm center distance, forming a 7 × 7 array, as depicted in Figure 7b. The deltille grid pattern calibration target was composed of equilateral triangles with a side length of 6 cm and an arrangement, as demonstrated in Figure 7c. The checkerboard pattern calibration target had 6 × 6 cm<sup>2</sup> squares, forming an 8 × 8 array, as in Figure 7d. The circular pattern, deltille grid pattern, and checkerboard pattern calibration targets were all printed on an alumina sheet with a glass substrate. To make a comparison under the same conditions, we used 7 × 7 array features extracted from the speckle pattern as the input of the method in [11]. Calibration images of each calibration target were captured by a 1920 × 1080-pixels binocular laparoscopy. We adjusted the lighting conditions to obtain the best imaging performance for each pattern, respectively, during image recording.

**Figure 7.** Two-dimensional targets used in the experiment, containing: (**a**) speckle pattern calibration target; (**b**) circular pattern calibration target; (**c**) triangle pattern calibration target; (**d**) chessboard pattern calibration target.

The experimental equipment was arranged as displayed in Figure 8. The calibration target was mounted on a mechanical arm, which was programmed to change its pose by inclination from −24◦ to 24◦ with a 6◦ interval. We positioned the calibration target initially in such a way that its projection covered the entire image area. The calculations were performed on a server with 256 CPUs and 512 GB of memory.

**Figure 8.** Experimental setup.

#### *5.2. Validity under Different Initialization*

To investigate our method's performance for each kind of calibration target, we grouped 20 images of different poses. For this, the poses of selected images had to be various, and all selected images had to cover the whole field of view. Figure 9 displays the poses of a group of selected images. We selected 10 groups of images as a training set.

**Figure 9.** Poses of the target in a group of the training set.

Here, we verified the stability of our optimization's convergence under different initialization conditions by 10 training sets. The initial estimation was made by Zhang's calibration method. Then, optimization using our novel objective function was performed, and a convergence curve was recorded. Figure 10 displays the average value and range of the convergence curve in 10 training processes. The vertical axis represents the value of the objective function described in Equations (11)–(13).

**Figure 10.** Average value and range of convergence curve in 10 training processes.

Additionally, we examined the distortion calibration results when different numbers of calibration images were utilized. For this purpose, the camera was calibrated 16 times, using from 10 to 40 calibration images. Then, training set images, undistorted by a distortion rectification map, were calibrated using Zhang's calibration method, assuming no distortion remains. The reprojection error was recorded, indicating the accuracy of the distortion rectification map. Figure 11 illustrates the reprojection error of calibration using different numbers of calibration images. The reprojection error calculated from training results was smaller than the initial estimation, even when only 10 images were utilized, and remained stable when more than 20 images were used.

**Figure 11.** Reprojection error when different numbers of calibration images are used.

#### *5.3. Ablation Study*

Based on the initial estimation, we systematically added parts of our method and obtained a calibration result to demonstrate how individual parts influence the final performance. In the case of Map Extraction, the parameters obtained from the initial estimation were directly employed to calculate the point-to-point distortion rectification map, and calibration images were corrected by the point-to-point distortion rectification map. Then, assuming no distortion remained, the camera parameters were estimated using Zhang's calibration method. Each configuration of the calibration progress was repeated with 5 groups of 20 images.

As listed in Table 1, the mean reprojection error of Map Extraction was reduced by 11.48%, compared to the result of the Initial Estimation. The last configuration contained

our complete calibration progress, with a mean reprojection error reduction of 30.61% compared to the initial estimation result. The reprojection errors' distribution in 5 repeated ablation experiments is showed in Figure 12. As a result, it can be inferred that in our method, both the optimizations with novel objective functions and the calculation of a point-to-point distortion rectification map are critical for improving calibration accuracy.

**Table 1.** The result of the ablation study.


**Figure 12.** Reprojection errors' distribution in ablation experiments.

#### *5.4. Benchmark Performance*

This section compares the reprojection error and stability of the parameter estimation of previous methods with our novel method. Zhang's calibration method with the circle pattern target, the checkerboard pattern target, the deltille grid target, proposed by Ha et al. [30], and the speckle pattern target proposed by Chen et al. [17] are included in the comparison. For Zhang's method, using each target, we repeated the calibration progress 7 times using 7 groups of 20 pictures. A test set of 20 images was selected, excluding images in the training set. For the method of [11] and our method, we showed the reprojection error on the test set under the result of the calibration using 7 different groups of pictures. As to our method, for images in the test set, distortion was rectified using a point-to-point distortion rectification map calculated from the training result. Then, assuming no distortion remained, the camera parameters were estimated using Zhang's calibration method.

The reprojection error is shown in Table 2. The method of the top 4 lines in Table 2 is Zhang's calibration method with different calibration patterns. The reprojection errors of the chessboard, deltille grid, circle, and speckle calibration target methods were 0.34990613255, 0.115054 and 0.107224, respectively. Compared with Zhang's calibration method using different targets, the reprojection error of our novel point-to-point distortion calibration method was the smallest as it was reduced by 28.5% beyond Zhang's method using the same pattern.

The reprojection error was 0.075841 in the training result of our method, and was 0.076663 in the test result, exhibiting the performance of the distortion rectification map obtained from the training result on the new data. Although the reprojection error of the test set is slightly greater than that of the training set, it is still less than that of Zhang's calibration approach with any type of calibration target. This demonstrates that the distortion correction map calculated from our point-to-point distortion calibration method could effectively correct new images captured with the same camera and achieve the desired impact.

**Table 2.** Reprojection error and RMSE of internal parameters' estimation of different calibration methods, with a training set of 228 images for method of line 6, and 20 images for other methods.


To compare our point-to-point distortion calibration method with the method of [11], the performance on the test set under different amounts of calibration images is shown in Table 2 and Figure 13. Assuming that 20 images were used in our method, with the same number of images, the estimation result of [11] was inferior to that of our method because of overfitting, and when 228 images were used in [11], the estimation result was superior to that of our method with 20 images.

**Figure 13.** Reprojection error of Thomas S. et al.'s method and our point-to-point distortion calibration method on test set when different numbers of calibration images are used.

Table 2 and Figure 14 show the distributions of the internal parameters estimated using different calibration methods. The RMSE of the internal parameters' estimation is listed in Table 2. The small circle in Figure 14 represents the average value of the estimated internal parameters, and the upper and lower sides of the error bar represent the max and min value of the estimated internal parameters, respectively. It can be inferred that with the method of Chen et al. [17] and our novel method, the internal parameters' estimation in the repeated calibration is more stable than the other methods.

**Figure 14.** Distributions of internal parameters were estimated using different calibration methods. (**<sup>a</sup>**–**d**) are the distribution of estimated fx, fy, cx, and cy, respectively.

## **6. Discussion**

We considered both the simulation and experimentation with a real camera when designing the experiment. In the simulation, the method employing the polynomial distortion model to simulate the camera distortion exhibits more advantages. In contrast, if we set additional distortions not limited to the polynomial distortion model, the point-to-point distortion calibration method offers more advantages. To make the experimental conditions neutral between the camera calibration method with the polynomial distortion model and point-to-point distortion calibration method, we used real cameras for our experiments.

As can be seen from the result of the validity experiment under a different initialization, the convergence curve of the optimization calculation by our method is stable, and the reprojection error is satisfactory when the number of calibration images involved in the optimization is not smaller than 20. The ablation study illustrated that the novel objective functions and the calculation of a point-to-point distortion rectification map have both resulted in a significant reduction of the reprojection error. The benchmark performance shows that the reprojection error of our method is smaller than that of methods using the polynomial distortion model. The accuracy of methods using the polynomial distortion model depends on whether the calibration pattern can achieve more accurate feature extraction and whether the features of image edges can be extracted. Our method not only

uses the speckle pattern with higher feature extraction accuracy, but also adopts a full pixel distortion description and a specially designed objective function for optimization, so its reprojection error is superior to the method of the top 4 lines in Table 2.The method using the raxel model can achieve a smaller reprojection error than our method. When 228 images were used in the raxel model, the estimation result was superior to that of our method with 20 images. The following conclusion can be drawn from the findings of our experiment:


The setting of hyperparameters is a component in our technique that was not disclosed previously. To achieve the best performance of our method, hyperparameters were searched before calibrating different cameras in different environments. One of the hyperparameters was the subset size of the DIC calculation. The other one was the correlation coefficient threshold that determined which feature points were used in the parameter optimization.

1. Subset Size

In DIC, a larger size subset usually leads to a higher feature matching accuracy. However, an oversized subset introduces other problems, such as the complexity of deformation in the subset region. In this case, the current subset shape function cannot appropriately fit the subset deformation, resulting in decreased accuracy or failure of DIC. After a test with various subset sizes in our experiment, we used a subset with a radius of 70 pixels for DIC in the initial estimate and final verification, and a subset with a radius of 65 pixels in the parameter optimization.

2. Correlation Coefficient Cutoff

The correlation coefficient cutoff is used to determine whether the DIC results are reliable. A correlation coefficient cutoff that is set too high can introduce inaccurately matched features into the parameter optimization and reduce the accuracy of the parameter estimation. A correlation coefficient cutoff that is set too small results in large invalid regions of a calibration image that lack any features suitable for parameter optimization, which can also decrease the parameter estimation's accuracy. After testing with different cutoff values, we used 0.065 as the cutoff value of the correlation coefficient in our experiment. This implies that features matched in the DIC with a correlation coefficient of less than 0.065 will be used for parameter optimization, whereas features matched in the DIC with a correlation coefficient of more than 0.065 will be filtered out.

Our method is devoted to the accurate calibration of camera parameters and lens distortion, which paves the way for a better performance of HAR. Developing Gao et al. [18] and Chen et al.'s work [17], our method can obtain a point-to-point distortion rectification map of the camera without establishing distortion models or strictly restricting experimental conditions.

## **7. Conclusions**

We propose a camera calibration method that requires only dozens of images to obtain point-by-point distortion calibration results and internal camera parameters. This approach extracts dense features using a speckle pattern calibration target and DIC, as well as a new objective function for parameter optimization. The distortion rectification map is calculated from the result of the parameter optimization. We can warp camera-captured images into undistorted ones using a distortion rectification map. Compared with commonly used methods, this method is not limited to the polynomial distortion model, and also allows for the pixel-level calibration of the camera distortion. We designed experiments to validate

our approach's stability under various initialization conditions and compared it to the method of [11], using the same calibration target, and Zhang's calibration method, utilizing a variety of calibration targets. Our method has a lower reprojection error than that of the compared method with the same number of calibration images, as demonstrated by experiments on a test set. This proves that our method can ge<sup>t</sup> a more accurate estimation of the camera distortion and camera parameters, so as to better describe the mapping between real space and image space. Therefore, our method is more advantageous than calibration methods using the polynomial distortion model in downstream tasks.

Despite the advantages above, our method is limited by its single optical center assumption, and its accuracy is inferior to that of methods using the raxel model. The accuracy of the distortion rectification map of our method is also limited by the number of images. As the DIC calculation at the edge of the speckle region is not accurate enough, there are some undesirable points that cannot be ruled out in the distortion rectification map. A possible solution is not to use pixels at the edges of the speckle region during distortion rectification map extraction. Another problem is computing the resource consumption of the DIC, which increases with the size of the subset area and the number of calibration images. This can be solved with GPU-accelerated computing [40]. These topics are on which we should concentrate our future efforts.

**Author Contributions:** Conceptualization, X.Y., Z.J. and Z.L.; methodology, Z.J. and Z.L.; software, Z.L. and T.G.; validation, Z.J. and Z.L.; formal analysis, Z.J. and Z.L.; investigation, Z.J. and Z.L.; resources, Z.J., P.W. and J.L.; data curation, Z.L. and Z.F.; writing—original draft preparation, Z.J. and Z.L.; writing—review and editing, X.Y. and C.Z.; visualization, Z.L. and Z.H.; supervision, X.Y. and Z.J.; project administration, Z.J.; funding acquisition, H.Z. and X.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key Research and Development Project, gran<sup>t</sup> number 2019YFC0117901; the National Major Scientific Research Instrument Development Project, gran<sup>t</sup> number 81827804; the Robotics Institute of Zhejiang University, gran<sup>t</sup> number K11806; the National Key Research and Development Project, gran<sup>t</sup> number 2017YFC0110802; and the Key Research and Development Plan of the Zhejiang Province, gran<sup>t</sup> number 2017C03036.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The source code, dataset, and result files are available at https://github. com/schcat, and accessed on 22 March 2022.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
