This section presents the results obtained from the research conducted. Our experiments validate the necessity of routine calibration assessment, as environmental factors may compromise the calibration quality over time. We conclude by underscoring the disparities in the calibration requirements between stereo and NIR systems, thereby emphasising the need for specialised approaches tailored to each domain to guarantee an optimal performance in their respective applications.
Application of Evaluation of Stereo Systems
The application of stereo system evaluation, as mentioned in the Introduction, is crucial because understanding the stereo system’s calibration level before use is essential; good calibration enhances the perception of the density of superior objects. Consequently, the application is divided into several modules: checking calibrations, validating the distance in metres of the focus object, estimating the distance in metres using colours, and comparing the results of the application running on Myriad with the application running on the Personal Computer (PC):
- (1)
Checking the calibration: before you check the calibration, you must effectuate the calibration process.
The calibration of the camera represents the calculation of the camera’s intrinsic and extrinsic parameters.
The first step in the camera calibration process is obtaining the set images using a calibrated camera to determine the intrinsic and extrinsic parameters of the camera. The calibration process is made using a tool from Continental Automotive (Hannover, Germany). For this, we need a pattern, which is a 9 × 6 chessboard. For a good calibration, it is essential to take the images from different distances and different angles, as in
Figure 3.
The next step is the determination of the camera’s intrinsic parameters by detecting the corners of the pattern. In
Figure 4, only two of the set images needed to determine the intrinsic parameters can be seen.
The extrinsic parameters are determined by detecting the pair points from the images taken from the two cameras, left and right (
Figure 5).
Step 3 is the most important. Now that we have the intrinsic and extrinsic parameters of the camera, this is the moment to rectify the captured images. This is achieved by bringing the epipolar lines to the same level, which is visible in
Figure 6. As can be seen in
Figure 6, the epipolar lines are at the same level, indicating that we currently have a calibrated camera. Currently, if we possess a calibrated camera, as previously indicated at the beginning of this section, it is imperative to assess the calibration level of said camera. Consequently, the following description will elucidate the application module through which we will conduct the calibration verification process. At this stage, it is necessary to provide a pattern to the camera once more, ensuring that all the corners of the chessboard are visible to the camera. This process is predicated on the precise determination of the coordinates corresponding to each corner of every square on the chessboard, relative to the x and y axes.
First, it is necessary to obtain the first pair of pictures so that they can be coordinated x1 and y1 related to the left picture and then coordinated xr and y1 related to the right picture. The first step within the calibration evaluation module (calibration check) is to detect the corners of every square from the left picture and right picture as points PL (xL, yL) and Pr (xr, yr). The coordinates of each square corner are saved in a vector-type point for every image, left and right.
So, we find the coordinates of each corner of the pattern and save them in the vector mentioned above. The next steps represent the conversion of the coordinated values of each corner into whole values. We check the difference in coordinates x and y for each point of the corners detected in the pictures. The error represents the absolute difference value between two points; the error variable values are as follows: 0, 1, 2, or a value greater than 2, which we note with N. We declare what variable to use to stock each error value so that we have err_0, err_1, err_2, and err_N. In the variable err_0, we stock the number of points with the error value 0; i.e., the pixel in the right picture is at the exact coordinates as those of the pixel from the reference picture, the left picture. This is not offset in any direction. Err_1 stocks the number of points with the error value 1, which means that the point from the right picture to the reference picture is offset with one position to the left, right, up, or down. Err_2 stocks the number of points with the offset difference with two points of the point PR (xR, yR) to the point PL (xL, yL). Any other offset value greater than 2 we consider an N value, which represents that the offset point is at least 3.
Figure 7 shows the schematic representation of a limited number of pixels from a picture, with an offset of zero positions in the corner from the right picture and the reference picture.
Figure 8 represents the offset of one position with a corner position, which may be at one end of the red lines.
Figure 9 represents the offset of two positions; the corner of this one can be positioned at one of the red lines (i.e., at an offset that equals 2).
Figure 10 represents an offset of a minimum of three positions in the four directions: up, down, left, or right.
The calibration evaluation is achieved in real time, and all we need is to steer the pattern toward the camera and start the evaluation module. Because we want a very high accuracy, we calculate the average of the errors for a set of 20 shots of two pictures each, left and right. The final result is the average of the errors calculated for each pair of images.
Then, we sum up the value of the errors for each pair of pictures, and finally, after we have the final results, we can decide the calibration of the cameras. Upon the completion of this module and the acquisition of the collection of images, a report is automatically generated (refer to
Figure 11) detailing the number of errors associated with each pair of images. The report concludes with a summary of the average errors across the 20 pairs of images, accompanied by an interpretation of the findings. Depending on the results, we can have messages like “Very good calibration!”, “Good calibration!”, and “Weak calibration!”.
These messages are automatically generated from the code. We consider a calibration “Very good calibration!” if the average error of 0, i.e., err_1, is greater than 50 and the average error of 1, i.e., err_1, is greater than 4. “Good calibration” is when the average of zero errors and one error is greater than 52. If the sums of err_0 + err_1 is less than 52, it means that we have the calibration type “Weak calibration!”. A very good calibration or good calibration leads to correcting a clear perception of the densities of the objects in the picture, and a weak calibration leads to an ambiguous perception of the objects from the picture. To easily understand the importance of the calibration, it presents the results of the experiments with the correlation between a calibrated camera and a less calibrated one (
Figure 12).
- (2)
Validating the distance from the camera to the object
The first step in developing the mode in validating the distance from the camera to the focused object is the creation of the map of disparity.
The map of disparity represents the correlation between the pixel positions from the left picture and the right picture. The left picture is the reference picture, and finding the correlation between the left picture, reference picture, and right picture goes as follows: we select a window from the left picture, the centre of the window being the same as the centre of the pixel, the correspondence of which we want to find in the right picture. This pixel is compared to more windows from the right picture, starting with the window at the same location (disparity 0), and it moves away to the left, one pixel at a time, until it finds the least matched spot with the left picture, and the moving value represents the disparity value for that pixel. This process is performed for all the pixels in the left picture (see
Figure 12).
A frequent approach for real-time systems, given the limited hardware resources today, is the Semi-Global Matching (SGM) algorithm proposed by Hirschmuller [
15]. The original approach uses intensity differences, mutual information (
Figure 12), a disparity map before applying the Semi-Global (Block) Matching algorithm, and an optimisation method integrating different paths through the image.
The Semi-Global Matching algorithm is applied to refine and perceive a much smoother density. To form the map of disparity and smooth the density of the objects, which can be seen in
Figure 13, it is necessary to transform the map of disparity into the depth map. This is possible using Formula (1):
where baseline is the displacement between the left and right cameras;
disp is the current pixel value in the disparity map;
H FVO is the horizontal field of view;
H Pixels 640 is the width resolution.
Due to the generation of the disparity map and our comprehensive understanding of the procedures involved in the formation of the density map, the steps taken to complete this module can be outlined. A defined area of interest was established, for which the distance from the camera to the object contained within this specified zone was calculated. It is important to note that it did not compute the distance for the entirety of the map, as this would have allowed for the calculation of the depth across the entire image. However, this did not align with our objective, as it would have yielded an average depth for the entire picture rather than the distance from the camera to a particular subject object.
We have defined the interest zone as a square added to the disparity map to determine whether the object is correctly framed and whether the distance is perceived accurately (
Figure 13). An analysis of this module will be discussed in the next section presenting the results. Calculating the distance from the camera to the object is closely connected with the entire paper, as the calibration process was incorrectly executed. This implies that the object density in the image was not smooth enough, indicating an error when calculating the distance to the object. The reported distance from the application is shown near the square representing the zone of interest in the disparity image.
Figure 14 provides an example of this module within the stereo system evaluation application.
- (3)
Representing the distances in the range of colours according to the proximity of the camera objects
Because the disparity map is based on intensities, which means that the objects close to the camera have lower intensities, that is, a lighter colour, and the objects that are far from the camera have higher intensities, that is, they are darker in colour, we apply colour to the disparity map because it is in greyscale format, thereby obtaining a colourless disparity map in which the nearby objects are coloured in red, and depending on the intensity increase, the palette differs in the applied colours, going to dark blue for distant objects. Therefore, depending on the intensity, the colours will also vary (
Figure 15).
Upon completing the disparity map representation, we linked it to the image on the left. This connection allowed us to examine both the actual image captured by the camera and the coloured disparity map, facilitating our analysis to determine the accuracy of the application’s performance reporting. We have developed an information bar to facilitate the analysis of distances based on colour. This bar serves as a tool for evaluating the distances of objects according to their colour attributes. We established an intensity bar correlated with the maximum intensity value found in the image, which is 255. In greyscale imaging, a pixel is represented using 8 bits, allowing for values ranging from 0 to 255. Consequently, our intensity bar encompasses values from 0 to 255, with each column in the information bar representing a singular value per row. As one transitions to another column, the intensity of the colour increases. This method enables a correlation between the disparity map comprising the intensities and the information bar formed from the equivalent intensity values. Thus, objects in close proximity in the image will manifest as intensities in the beginning segment of the information bar; conversely, as the distance of the objects increases and the intensities rise, they will appear toward the opposite end of the information bar.
In conclusion, we have applied a consistent colour palette to the information bar, ensuring a proper alignment between the colourless disparity map and the information bar, which allows for a more accurate approximation of object distances based on colour (
Figure 16). In order to easily distinguish the colours and densities of the objects from the pictures, we applied a stretch to the disparity map, which means the intensification of the values from the disparity map according to a predetermined threshold. If the pixel value is smaller than the threshold value, this will multiply with a smaller scaling factor, and if the pixel value is bigger, this will multiply with a bigger scaling factor. The stretch function was also applied to the information bar. In the two pictures, the left picture and the coloured disparity map, we have added it to the bottom side through concatenation and the information bar. In the lower part of the information bar, the distance in metres is added according to the corresponding colour from far away in blue and for very near in red. The final result, which is displayed after the run of the distance approximation module based on colour, can be seen in
Figure 17.
- (4)
Comparison of Implementations Embedded with PC Implementation
This application module establishes a parallel between disparity maps that are computed at two distinct levels. One of the maps is computed at the chip level on the Myriad 2 platform, with the algorithm being implemented in the assembly language. In contrast, the second implementation utilises the C programming language to compute the disparity map on the computer, rather than on the chip. The primary distinction between these two implementations lies in the representation of the algorithm: the algorithm executed on the Myriad 2 chip utilises floating-point representation, whereas the algorithm executed on the computer does not employ floating-point representation. We created this module, in which we placed the two disparity maps in parallel, and in the middle of them, we produced a new image that displays the difference between the disparity map from the chip and the disparity map from the computer. We compared each pixel from the chip disparity map with each pixel from the disparity map from the computer, representing it in red if the difference between the current pixel on the chip map and the pixel from the computer map exceeded the threshold we set using the track bar, which allows real-time adjustments to the tolerance threshold.
Figure 18 illustrates the comparison with a tolerance of 25, meaning that the difference between the pixels in the right map may have a value greater than or equal to 25 units compared to the current pixels in the left image.