**3. Geometrical Calibration**

When a thermal camera is involved, the term "calibration" could refer to two different types of calibration—the geometrical calibration and the radiometric calibration. The latter (which was not the object of this work) consists of a procedure that models the relationship between the digital output of the camera and the incident radiation. Hereinafter, if the term "calibration" is used, "geometrical calibration" is intended.

The geometrical calibration is a fundamental process used to compute intrinsic and extrinsic parameters of the system components.

Intrinsic parameters model the optic imaging process and are divided into two groups—four parameters defining the camera matrix, based on the pinhole camera model, and other parameters defining radial and tangential camera distortions caused by the physical properties of lenses [23].

There are six extrinsic parameters which describe the relative pose of the sensors, i.e., the transformation from world coordinates to camera coordinates. In the vast majority of methods, the intrinsic parameters are computed first, and then they are exploited to determine extrinsic ones.

For the determination of both intrinsic and extrinsic parameters, the best-known techniques involve the use of a target with a chessboard pattern, a method first presented by Z. Zang in [24]. Note that, in the case of thermal cameras, the pattern has to be detected in the IR spectrum. Several solutions have been presented for this purpose, involving, for example, a heated chessboard pattern or marker detectable in IR; a more detailed state of art can be found in [10]. In the last few decades, however, some work has been done in order to develop the so-called automatic or targetless extrinsic calibration methods, that exploit natural scenes' features (see [11,25–27]).

Concerning the intrinsic calibration, we adopted the method developed by S. Vidas et al. in [28], which relies on a mask-based approach and does not require specialized equipment.

Regarding the extrinsic calibration, since our aim was to keep the two devices unlinked, the classical approach (consisting of acquiring several frames of the target in a fixed configuration of the two sensors) did not appear very suitable. For cases in which the identification of some homologous point is easily feasible, the extrinsic parameters are computed by means of a manual selection routine of homologous points, and then by exploiting the Matlab function "estimateWordCameraPose", which solves the perspective-*n*-point (PnP) [29]. However, this method might fail or achieve a very low accuracy in scenarios in which homologous points cannot be clearly identified. To address this limitation and to be able to consider more various scenarios, we developed an alternative method exploiting the detection of the object silhouette in the thermogram. In the literature, some silhouette-based calibration methods can be found, for instance in [30] and [31]. In [30], the pose of a known object is estimated through a hierarchical silhouette matching and unsupervised clustering, whereas in [31] the calibration of a stereo camera system was carried out by defining and minimizing a function that computes the distance between viewing cones, which are known from the silhouettes. The developed method has some similarities with the one presented in [11], in which the extrinsic parameters were obtained by minimizing an objective function that measures the alignment between edges extracted from RGB-D images (obtained by a depth camera) and the ones extracted from thermograms. Like that method, ours is suitable for thermograms in which the object contour can be easily identified, and so the object can be extracted from the background, a fairly common situation when dealing with thermograms. Di fferently from [11], however, the object contour detection is only needed for the thermogram, and the quantity evaluated is not the alignment between thermal and depth edges, but rather the "degree of filling" of the projected 3D points inside the thermal edge, which translates to a di fferent definition of the objective function to minimize. Another di fference is that this method takes as its input the full 3D representation of the object, and not the single view representation (often referred to as 2.5D [32]). This method can deal e fficaciously with initial parameter values significantly di fferent from the ones searched for, which makes it particularly suitable for a decoupled type of integration.

The extrinsic calibration method here developed can be divided into the following principal parts:

(1) extraction of the object contour from the thermogram;

=

(2) creation of a matrix *M*1 (with dimension equal to the resolution of the thermogram) obtained in this way: starting from the contour extracted in 1), a series of internal and external subsequent layers are created, and the same numerical value is associated to each pixel of each layer, following a specific function *fl* defined later in this section;

 -


$$f\_{obj} = f\_{obj}\left(\overline{\mathcal{R}}, \overline{\mathfrak{f}}\right) = -sum\{\overline{\mathcal{M}}\_1 \circ \overline{\mathcal{M}}\_2 \left(\overline{\mathcal{R}}, \overline{\mathfrak{f}}\right)\}\tag{1}$$

=

where ◦ is the Hadamard product, *sum* is the operator that sums all the elements in a matrix, *R* is the rotation matrix (computed as a function of the three Euler angles) and *t* is the translation vector.

The Euler angles and the components of the vector *t* that make the function *fobj* minimum are the searched values, namely the extrinsic parameters. It is to be outlined that the matrix = *M*1, once computed, remains fixed in the optimization process described in Equation (1) and so only one contour extraction operation is required.

The point (1) can be done either in an automatic way (exploiting some well-known edge detection or background extraction algorithm) or manually. The background extraction also permits us to avoid the possible gross error of assigning to the 3D points, if the corresponding pixels are very near to the edges, the background temperature.

For the point (2), the function *fl* has to be properly chosen, so to assure that the objective function minimum corresponds to the desired situation, namely that the 3D projected points completely fill up the zone inside the contour without overflowing outside the contour.

 3.

The function *fl* utilized is of Gaussian type:

*fl*(*x*) = *<sup>c</sup>*1*e*<sup>−</sup>*c*2*x*<sup>2</sup> if *x* ≥ 0 (layer *x* inside the contour or the contour itself if *x* = 0) *fl*(*x*) = *<sup>c</sup>*1*e*<sup>−</sup>*c*2*x*<sup>2</sup> –*c*3 if *x* < 0 (layer *x* outside the contour) (*x*Z,positiveconstants).

*c*1,*c*2,*c*3 Anexampleoftheapplication ofthefunction*fl*isshowninFigure

**Figure 3.** Example of the application of the function *fl* on the layers map (with *c1* = *c2* = 1, *c3* = *e*). The object contour is outlined in yellow.

The *c*3 constant is important, because it permits us to shift the curve of the external layers so that the function *fl* has negative values there. This is necessary to avoid situations in which the minimum of the *fobj* corresponds to the filling of zones incorporating the contour, but it is larger than the contour itself (in other words, to avoid that the projected points "overflow" the contour). As can be clearly seen in Figure 4, the function increases when approaching the object contour (*x* = 0) from both negative and positive values of *x*. This is another requirement, that the function *fl* has to meet to gran<sup>t</sup> the proper convergence of the function *fobj*. Apart from these requirements, one has a certain freedom

in the choice of the function *fl*. In fact, the aforementioned Gaussian type function is the result of several experimental evaluations of possible functions. However, different choices of *fl* can be further investigated in future works, in order to improve the speed and robustness of this calibration method.

**Figure 4.** Graph of the function *fl* (considering 500 internal and external layers) with the actual constant values utilized (*c1*= 0.01, *c2*= 0.001, *c3*= 0.011).

To minimize the objective function (1), with six independent variables (three Euler angles and three components of the translation vector *t*), the Matlab optimization toolbox is exploited, in particular the function "GlobalSearch".
