This section presents algorithms that are used further in experiments. First, it describes the marker detection algorithms ARUco, ARUco3, and AprilTag2, then the image-improving algorithms contrast-limited adaptive histogram equalization (CLAHE), deblurring, white balancing, and marker-based underwater white balancing, and finally, our new marker-detecting algorithm Underwater ARUco (UWARUco).
2.1. ARUco, ARUco3, and AprilTag2
Marker detecting algorithms ARUco, ARUco3, and AprilTag2 follow the general structure described in
Figure 1. First, the incoming image is thresholded using global or local adaptive methods to obtain a simplified binary image. This image is in the second step searched for square polygons that become candidates for markers. In the third step, identification code is extracted from inner area of each candidate, and the candidate is then in the fourth step identified and discarded if it does not belong to a set of valid codes. Finally, the position of the marker is computed from its corners or edges.
The ARUco detector [
9] was designed to run fast and to recognize marker codes reliably. It uses an adaptive algorithm to threshold the input image, which computes the threshold value for each pixel as an average of the surrounding pixels. To detect marker-like shapes, it finds all contours and filters out those that do not represent square polygons (small contours, non-polygonal contours, etc.). After the squares are detected, it unprojects them to remove perspective distortion, thresholds them again (this time using the Otsu threshold), and obtains the inner code, which is checked with a dictionary to remove errors. If correct, the corners of these squares are used to compute their relative position to the camera.
The ARUco3 detector [
10] is based on the ARUco detector and was improved to run fast with high-resolution images. There are two main differences between this algorithm and ARUco. The first difference is that this algorithm exchanges adaptive thresholding with simpler global thresholding, which is faster, but less robust to uneven lighting. The second difference is that it scales the image down, so that its size is still sufficient to detect and recognize markers, but the detection runs faster than in the original image. The implementation provided by the authors involves in three versions: Normal, Fast, and VideoFast. The Normal version uses adaptive thresholding like ARUco and therefore should be more robust to uneven illumination. The Fast version uses global thresholding and applies the Otsu method on an image part with a marker to compute the threshold for the next frame (or if no marker is available, it chooses the threshold randomly). The VideoFast version is like the Fast version, but additionally, it assumes that markers in a frame have approximately the same size as the markers in the previous frame and optimizes the scaling accordingly.
The AprilTag2 detector [
12] also uses an adaptive algorithm for thresholding, but instead of using an average of all surrounding pixels, it searches these pixels for the lowest and highest intensities and chooses the threshold as the average between these two intensities. Then, it segments the binary result, fit quads, recovers codes, and uses a hash table to check if the code is correct and the quad is a marker.
2.2. Real-Time Algorithms Improving Underwater Images
In our previous work [
35], we discussed the possibility of using real-time image-improving algorithms to increase the number of detected markers. The paper compared four algorithms, CLAHE, deblurring, white balancing (WB), and marker-based underwater white balancing (MBUWWB), which are also compared in this paper.
Contrast-limited adaptive histogram equalization (CLAHE) [
50] is based on equalizing image histograms. At each pixel, it computes a histogram of its surroundings, rearranges it to avoid unnatural changes in contrast, and equalizes it to obtain new intensity. Deblur [
51] (also known as deblurring or the unsharp mask filter) stresses edges by removing low frequencies from the original image, which can be described by the following equation:
White balancing algorithms change the colors in the image to look more natural. There are many white-balancing algorithms, but for performance reasons, the authors in [
35] chose a simple algorithm from [
52]. This algorithm computes a histogram of input image, removes
percent of the darkest pixels and
percent of the brightest pixels, and changes the colors to stretch the rest of it linearly. Its adaptation for marker-based tracking, marker-based underwater white balancing, described also in [
35], does the same, but computes the initial histogram only of areas that contain markers.
Many of these algorithms have parameters that influence their behavior. In the experiments presented in this paper, we use the same parameters as in [
35]: 2 as a clip limit of CLAHE, 4 as both
and
w for deblurring, and 2 as
and 99 as
for white balancing and marker-based underwater white balancing.
2.3. Detection of Markers under Water
Underwater ARUco (UWARUco) is an adaptation of the ARUco algorithm [
9] for underwater environments. The workflow of the original ARUco algorithm is shown in
Figure 2a. It starts with an input grey-scale image, which is thresholded in the
Threshold step and searches for contours in the
Find Contours step three times in parallel threads, each time with different parameters of adaptive thresholding. The original algorithm used various sizes of the window that computed the threshold (by default, the sizes are 3, 13, and 23 pixels), and additionally, it decreased this threshold with a constant to suppress contours created by small noise (by default, this constant is seven). The contours found in each thresholded image are merged in the
Merge Contours step to represent each marker with only one contour no matter how many thresholded images the contour is found in, and finally, in the
Identify Marker step, the original image is thresholded again using the Otsu method to obtain the marker code that is identified. The algorithm is described in more detail in [
9].
We analyzed the results presented in [
35] to investigate which steps are influenced by image improving algorithms when the number of detected markers increases. In
Figure 3a, we see an image of a marker taken in bad visibility conditions under water. When it is thresholded by ARUco, the marker’s border may get disconnected (see
Figure 3b), and the marker is not recognized as a rectangular object. It was found that all four tested image-improving algorithms increased the contrast of the image, and additionally, deblur also acted as another blur applied to the image before thresholding. The same effect can be achieved by changing the parameters of the
Threshold step to lower the constant decreasing the threshold and to increase the threshold window size, as is shown in
Figure 3c, where the border stays connected when the threshold is not decreased. This change of parameters is not enough, since image-improving methods also affect the image that is thresholded by the Otsu method in the
Identify Marker step. To solve this issue, the
Identify Marker step is moved between the
Find Contours step and the
Merge Contours step (here renamed
Merge Markers), and the identification is based on the thresholded image from the
Threshold step instead of performing an additional threshold. These changes in the ARUco workflow form the base of our UWARUco algorithm. We call this algorithm the
Base version of UWARUco and show its workflow in
Figure 2b. Preliminary experiments showed that optimal window sizes for thresholding were 10, 20, and 40 pixels, and the constant that decreased the threshold was lowered to zero, i.e., the best results were obtained when the threshold was not decreased.
The Base version of UWARUco detects more markers in underwater videos than the original ARUco, as will be shown in
Section 4 and
Section 5.1. However, its processing time is slow, due to an increase in the number of contours that are found in thresholded images and that must be processed by the detector. To improve the processing speed of the detection, the algorithm is extended with a binary mask and a filter that are applied on the thresholded image before the contours are detected, to simplify the image and reduce the number of contours. We call this algorithm the
Masked version of UWARUco; its workflow is shown in
Figure 2c. This algorithm computes the binary mask from original image in the
Compute Mask step. This step requires parameters derived in the previous video frame and creates a mask that is applied on thresholded images in the
Masking steps. Before detecting contours in the
Find Contour step, masked images are filtered in the
Filtering steps to remove very small objects that appear in the image due to noise. After the results are merged in the
Merge Markers step, the parameters for the next frame are derived from the input image and the result of detection in the
Mask Feedback step.
The mask computed in the
Compute Mask step identifies parts of image that do not contain markers and is composed of two submasks: brightness masks (
Figure 4a) and noise masks (
Figure 4b). The brightness mask detects parts of the image with very high intensity, since the brightest pixels are often white areas of markers and the noise mask removes areas that do not contain strong edges to keep only those parts that represent edges of markers. The final mask is created from parts that are in both masks, i.e., they are both very bright and contain a strong edge; see
Figure 4c. The computation of the mask is described in Algorithm 1. First, the algorithm computes the minimum and maximum of intensities for blocks of
pixels, which results in images
and
, which are in each dimension four-times smaller than the original image. These images are blurred to spread the minima and maxima of each pixel into its neighborhood by taking the minimum and maximum of a region of
surrounding pixels. This blur is very important to keep the contours of markers connected, since without this blur, the mask will not contain areas where
blocks are separated by marker contours, because in such cases, the blocks contain no strong edge and only one block is very bright. After blurring, the
image is subtracted from the
image, and the difference is stored in the
image. Finally, the
image is thresholded by comparing it with a value of
and used for the brightness mask, and similarly, the difference image
is compared to a value of
to obtain the noise mask. At the end, both masks are ANDed together to obtain the final mask, which is applied in the
Masking step. The effect of masking is illustrated in
Figure 4d.
Algorithm 1: Pseudocode of the algorithm for computing the brightness mask, the noise mask, and the final mask. |
Input: Grey scale image whose mask is to be computed, threshold for brightness mask , threshold for noise mask |
Output: Brightness and noise masks and |
← image of one fourth of the size of with minimums of pixels of ; |
← image of one fourth of the size of with maximums of pixels of ; |
minimum of surrounding pixels of ; |
maximum of surrounding pixels of ; |
; |
; |
; |
; |
The computation of thresholds and is based on areas with detected markers and is done after the detection in the Mask Feedback step. The procedure is described in Algorithm 2. Images and that are used to compute masks are inspected at pixels of contours of found markers, and the lowest values of these pixels are used as thresholds. Such values would be sufficient as thresholds for the current frame, but to handle small changes in lighting between frames, they are decreased by to obtain actual thresholds for the following frame (this value was chosen after several experiments with different constants). If there are no contours, the thresholds are set to zero to disable masking in the following frame. Similarly, the thresholds are also set to zero before processing the very first frame of the video.
Algorithm 2: Pseudocode of the algorithm for computing thresholds for the brightness mask and the noise mask. |
|
After applying the mask on the thresholded image, the result is filtered in the
Filtering step using a
median filter to remove small objects that appear due to the presence of noise. This filter removes pixel-sized objects and closes small holes, as illustrated in
Figure 4e. Since the input image is binary, the median can be obtained fast by computing a mean of the surrounding pixels and thresholding it to a threshold of
.