*2.7. Robot's TCP Transformation*

The robotic arm used in the setup was an ABB IRB 4600 40/2.55. The TCP position for every robot target position of the arm was recorded with the *x*, *y*, *z* position and the rotation *q*1, *q*2, *q*3, *q*4 in quaternions. A transformation from the camera's origin to the TCP was applied.

This transformation was calculated using the hand-eye calibration method proposed by Horaud and Dornaike [41]. A set of 30 different positions were defined for the calibration. The result was compared with the holder geometry, as seen in Figure 12.

The transformation found during calibration was the translation of −0.050 mm in the *x* direction, −0.0442 in the *y* direction, and 0.0780 in the *z* direction, and rotations of 90.9179◦around the *x*-axis, 1.1774◦around the *y*-axis, and 0.4859◦around the *z*-axis.

**Figure 12.** Camera holder CAD model with dimensions in TCP's *z*-axis.

#### Pair Matching

With the reference points ready, the transformation for each camera pose was solved based on the chosen methods explained above. The correspondent marker points were identified, and the metrics were calculated according to the explanation given in Section 2.1. Figure 13 shows arrows between the reference point and the board points, showing the correct identification of the pairs in which the error was calculated.

**Figure 13.** Arrows showing the correct correspondence between measured pair pointsbased on the labelling method.

#### **3. Results**

The results of the reconstruction follow the metrics explained in Section 2.1. The first step was to identify the two most accurate methods for the 3D reconstruction of the scene. For this, each system used 1188 points to calculate the reconstruction error for the six camera views.

Table 1 shows the mean absolute error (MAE), the mean square error (MSE), and the root-mean-square error (RMSE) of each system, expressed in millimetres. It is interesting to note that the MAE and the RMSE exhibit a large difference, meaning that there was not a high discrepancy between the errors.


**Table 1.** Summary results with MAE, MSE, and RMSE for one set of data.

Based on these results, a program was written to run 20 sets of data for each method, with each set calculating the error between approximately 1188 points, depending on how many markers were found by the charuco algorithm.

#### *3.1. Charuco Double-Sided Angled Board*

The double-sided board with an angle (to enable four cameras to see it simultaneously) exhibited a poorer performance when compared with the other two methods. The angle applied to the board made it harder to identify the markers, as shown in Figure 14a,b, possibly making it less accurate than the other methods.

Visually, it was able to calculate the pose well in the 2D image, but there was a degradation in the RMSE, which in this case was 10.12 mm.

**Figure 14.** Charuco double-sided board marker detection for pose estimation. (**a**) Camera 1. (**b**) Camera 2.

Figure 15 shows a detail of the reconstructed image with the double-sided charuco board, where small gaps between the top and lateral panel can be observed.

**Figure 15.** Reconstruction view using charuco double-sided angled board.

Due to this method's less accurate 3D reconstruction compared to the other methods, we focused more on the other methods discussed below.

#### *3.2. Cuboid vs. TCP Reconstruction Accuracy*

The program was written in C++ using the Qt Library (https://www.qt.io, accessed on 8 February 2022), and it is available at GitHub (Please see "Data Availability Statement"), together with the dataset captured.

The program iterated over a directory with the files and calculated the RMSE, the mean squared error (MSE), and the mean absolute error (MAE) for both chosen methods. In addition, it assisted with the visualization of the point cloud.

The program was executed with both filtered and unfiltered data, as explained in Section 2.4.
