*3.1. Census Transform with Haar Wavelet (CTHW)*

In this section, the multi-resolution of the image with a Haar wavelet is used to reduce the computational time of the census transform (CT) and Hamming distance operations. The flow of this method is shown in Figure 2. First, the left and right view images are input. For both images, Haar wavelets are performed to obtain the frequency domain data of LL, HL, LH, and HH-bands. The outputs of LL-bands are converted by CT and the disparity calculated using the Hamming distance. The output of the HH-band from the left image is performed by path searching after binarization. The stop point of the path searching is determined by the high values in the HH-band. The disparity is modified to find the largest disparity that appears in the search paths.

Since the size of the LL-band is smaller than the original image, one main idea of this method is to use the LL-bands of original images to reduce the computational load of CT. Moreover, since the down-sampled image may result in errors, another main idea is to use the HH-band of one original image to modify the disparity. The path searches of the pixel in the HH-band are in four directions: up, down, right, and left. An example of an HH-band image is shown in Figure 3 and a point is colored in red. Since the mechanical stereo vision is applied by two horizontal cameras, the horizontal direction is the main searching path. In order to reduce the effect of large areas without borders, the vertical direction is added to the searching paths. The four searching paths are from the red point to four directions, which are denoted as A, B, C, and D with green arrows. Each searching path is stopped when the path arrives at the edge (white point). The corresponding disparity values (the output of CT with Hamming distance) on the paths are recorded and counted. The modified disparity is set as the disparity value with the largest number.

**Figure 2.** The flowchart of the census transform with Haar Wavelet.

**Figure 3.** Example of the path searching of a (red) point.

#### *3.2. Adaptive Window Census Transform (AWCT)*

The size of the conversion window will affect the computational load and accuracy when we use CT. The larger the conversion window, the more accurate the results, but this also consumes the computational resources. In this section, the proposed method, AWCT, changes the selection of the conversion window size. The conversion window size changes from a fixed size of all pixels to an adaptive size by the boundary around the pixel. The edge information is used to select the window size.

The flow chart of AWCT is shown in Figure 4. First, the edge information can be obtained from the left view image by edge detection. The conversion window size of each point can be determined by the edge information. The selected window sizes are applied for CT, and the disparity can be computed through a Hamming distance computation.

**Figure 4.** The flowchart of the adaptive window census transform (AWCT).

The conversion window sizes can be 3 × 3, 5 × 5, 7 × 7, ... , 21 × 21. After the edge detection, each point will be the center of the windows in turn, and we count the number of edge points in each window. We also count the proportion of the number of edge points in the window. The window size ranges from small to large, and the size is selected when the window contains edge points. We divide the window size selection into two types. The first type is no edge, in which there are no edge points in the window. The image texture representation in this area is unclear, even without texture. The largest window size is used in this type. The other type is one in which the edge points are in the window. In this type, we record the proportion of edge points in the window when the window changes from small to large. We record it as one-time negative growth when the window size increases by one level and the ratio decreases. The window size will be selected when we have negative growth *N* consecutive times. In this study, the value of *N* is set to 5 through experience. This value can be set by the user for different cases.

Since the conversion window size is adaptive, the used window size of each pixel may be different. The smaller windows size is selected to compute the hamming distance when the sizes are different. In order to make the comparison of hamming distance reasonable, the calculation order of pixels is counter-clockwise from the center outward. An example of the pixels' order is shown in Figure 5. Even if the two windows are different in size, this order will make the relative positions of the pixels the same. The comparison of Hamming distance is up to the length of the small window. This allows the hamming distance to be used in the same window size.


**Figure 5.** The order of pixel calculation (with a 7 × 7 window size).

#### *3.3. Adaptive Window Sparse Census Transform (AWSCT)*

According to Equation (3), we can see that the number of points to be computed will increase as the window size increases. Since some points may be ignored to reduce the operation times of the computer, the sub-set of points of the conversion window is applied to determine which points are calculated for CT. This modified method is called a sparse census transform, and it is defined as [28]

$$C(p\_{xy}) = \underset{p\_{ij} \in w\_{\text{g}}}{\text{otimes}} \xi(I(p\_{xy}), I(p\_{ij})) \tag{10}$$

where *ws* is the sub-set of points of the conversion window. We can see that the sparse census transform takes a part of the points to convert instead of all the points in the window. According to the results of the sparse census transform [28], the neighbor points are selected to be symmetric. The 16-points are selected with a 7 × 7 window as shown in Figure 6c, which maximizes the performance. In this paper, the same pattern of Figure 6c is used and expanded for the selected points to the different windows. The selected points with different windows are shown in Figure 6. In this section, the sparse census transform is combined with AWCT. The flowchart of AWSCT is the same as AWCT (Figure 4), but the CT is changed to SCT (sparse census transform).

**Figure 6.** *Cont*.

**Figure 6.** The selected points in the windows with a sparse census transform. (**a**) AWSCT 3 × 3, (**b**) AWSCT 5 × 5, (**c**) AWSCT 7 × 7, (**d**) AWSCT 9 × 9, (**e**) AWSCT 11 × 11, (**f**) AWSCT 13 × 13, (**g**) AWSCT 15 × 15, (**h**) AWSCT 17 × 17, (**i**) AWSCT 19 × 19, and (**j**) AWSCT 21 × 21.

.**3**2

#### **4. Experiments and Results**

The results were compared with the ground truth data by PoBMP (percentage of bad matching pixels) and RMS (root mean squared) [35]. The PoBMP is defined by

$$\text{PoBMP} = \frac{1}{N} \sum\_{(\mathbf{x}, y)} \left( |d\_{\mathcal{L}}(\mathbf{x}, y) - d\_T(\mathbf{x}, y)| \right) \\ > \delta\_d \tag{11}$$

where *dC* is the disparity with the proposed method, *dT* is the disparity with the ground truth and δ*<sup>d</sup>* is the allowable error which is set as 3 in this paper. The RMS can be obtained by

$$\text{RMS} = \left(\frac{1}{N} \sum\_{(\mathbf{x}, \mathbf{y})} |d\_{\mathcal{C}}(\mathbf{x}, \mathbf{y}) - d\_{\mathcal{T}}(\mathbf{x}, \mathbf{y})|^2\right)^{\frac{1}{2}} \tag{12}$$

Six images (Moebius, Flowerpots, Reindeer, Cloth2, Midd1, and Baby1), which were provided by Middlebury Stereo Datasets [36], were used to show the performances of the proposed methods. These six images and their ground truth are shown in Figure 7.

.2

.2

.**!**2

.2

**Figure 7.** *Cont*.

.2

**Figure 7.** The experimental images: sequentially, the right image, left image and ground truth. (**a**) Moebius, (**b**) Flowerpots, (**c**) Reindeer, (**d**) Cloth2, (**e**) Midd1, and (**f**) Baby1.
