1. Introduction
Three-dimensional reconstruction is one of the most important techniques in the research domain of computer vision, which has been widely applied for robotics, AR/VR, industrial inspection, human-computer interaction, and so on [
1,
2,
3]. Different from temporal structured light (SL), spatial SL can achieve three-dimensional measurements with a single shot, which makes it an important branch in the field of dynamic reconstruction [
4,
5,
6]. As a common technique for reconstruction, its accuracy, robustness, and density are therefore of vital importance.
However, in spatial SL, there is a wide performance gap between dense reconstruction (but less accurate, e.g., speckle-coded SL) and accurate reconstruction (but often sparser, e.g., shape-coded SL). In speckle-coded SL, a speckle pattern is projected to attach textures onto object surfaces, and the reflected light is captured by one or two sensors [
7,
8,
9]. Through the block-by-block matching of sensors, such as the well-known semi-global matching (SGM) algorithm, a high-density depth map can be realized. Yet, the accuracy of speckle-coded SL is limited due to block matching. Some representative speckle patterns are shown in
Figure 1a. Generally, for these SL systems, only millimeter level accuracy can be achieved at a distance of 1.0 m. Contrary to this, in shape-coded SL [
10,
11,
12,
13,
14,
15], the speckles are substituted by different coding primitives, such as geometric shapes. The codeword of each feature point is determined by its values of itself, along with all adjacent elements surrounding it. In this case, by designing easy-to-detect feature points, such as corners, the reconstruction accuracy can be greatly improved. Yet, only sparse feature points can be reconstructed, which limits the spatial resolution greatly. Some representative shape-coded patterns are shown in
Figure 1b.
To ensure high accuracy and spatial resolution at the same time, the coding capacity of shape-coded SL needs to be greatly improved. Currently, 2D matrix-like coding strategies are widely adopted in encoding the projected pattern, e.g., de Bruijn sequence [
16], pseudorandom array [
17], M-array [
18], perfect map (PM) [
19], and so on. Albitar et al. [
12] generated a pattern based on the M-array with a coding capacity of 27 × 29, which used three different geometrical shapes, i.e., disc, circle, and dash, and the size of the coding window was 3 × 3. Yang et al. [
18] improved the coding capacity to 27 × 36 with the same coding window by using six geometrical shapes, i.e., short lines, solid circles, hollow circles, triangles, stars, and diamonds. Tang et al. [
13] generated a pattern based on the M-array with a coding capacity of 65 × 63, which used eight different geometrical shapes and a small coding window of 2 × 2. In our previous work [
14], a pattern with a maximum coding capacity of 65 × 63 was also generated based on the M-array, which used eight different geometrical shapes, i.e., points with different arrangements, and a small coding window of 2 × 2. In another previous work of ours [
15], a 36 × 48 pattern was designed based on the M-array to suit the VGA-level image capture device. Eight different geometrical shapes constituted by mahjong dots were designed and the size of the coding window was 2 × 2.
From the above research, it can be seen that in these 2D matrix-like coding strategies, the maximum coding capacity is theoretically limited [
10]. To be more specific, the maximum coding capacity
is determined by the number of coding elements
and the coding window’s size
:
, as shown in
Figure 2. In this 2D matrix
, every sub-matrix
appears only once. In this case, a higher coding capacity can be achieved only by using more coding elements, or by enlarging the coding window’s size. However, both methods have their side effects. On one hand, the incorporation of more coding elements will inevitably make the decoding task more challenging. On the other hand, enlarging the size of the coding window will lead to a lower decoding accuracy, and additionally, occlusion and discontinuity of the scene will potentially lead to broken codewords. It is worth noting that this is also the reason why spatial SL has a much lower coding capacity compared with temporal SL. How to obtain a higher coding capacity with fewer coding elements and smaller coding windows is an interesting and meaningful research topic in the field of spatial SL.
To this end, this work developed a new pseudo-2D coding method and completed corresponding pattern decoding algorithms. The contributions of this paper are as follows. Firstly, a new pseudo-2D pattern generation strategy was developed, which can improve the coding capacity of shape-coded SL greatly. Then, to extract the dense feature points robustly and accurately, an end-to-end corner detection method based on deep learning was developed. Finally, the pseudo-2D pattern was robustly decoded with the aid of the epipolar constraint. The paper is organized as follows.
Section 2 describes the main algorithms, such as the pattern design, corner detection, and feature decoding methods. Experimental results are given in
Section 3, and some discussions and conclusions were drawn in
Section 4 and
Section 5, respectively.
3. Experiments and Results
The prototype of the proposed SL system is displayed in
Figure 10a. A camera of PointGrey BFS-U3-120S4C-CS was used, which is of the resolution of 4000 × 3000, and can work at a frame rate of up to 31 fps. The pixel size of the camera was 1.85 μm. The projector used was a DLP projector BENQ w1700s, which is of the resolution of 3840 × 2160, and has a projection ratio (i.e., the ratio of the projector’s projection distance against the projected pictures’ width) of 1.47~1.76:1. The calibration algorithm in [
24] was applied to calibrate the SL system accurately and the results are displayed in
Table 2.
Two SL patterns with different geometric shapes were designed based on the proposed pseudo-2D coding strategy. Both patterns were generated by using the primitive polynomial
h(x) = x
4 + x
2 + Ax + A
2, with four coding elements and a coding window of the size of
. In the first pattern, the coding elements were designed as square blocks with embedded geometrical shapes, and the square blocks were specially colored with a black or white background, which thereby comprises a typical checkerboard pattern. The embedded geometric shapes adopted a simple ‘L’ shape with four different rotation angles (
and
), as shown in
Figure 10b. In the second pattern, speckle dots in different distributions were tucked into the blocks that were formed by a series of horizontal and vertical lines, as shown in
Figure 10c. In both patterns, the grid corners formed by the horizontal and vertical lines were taken as the main feature points of SL. One of the irreplaceable advantages of these feature corners is that they can be extracted in sub-pixel precision. Based on the coding strategy in
Section 2.1, given the number of coding elements (
) and the size of the coding window (
), the dimension of the pseudo-2D sequence
S’ was
. Afterward, repeating
S’ by
times, resulted in the pattern with a theoretical maximum coding capacity of
being generated. In this work, the value of
W was empirically set to be 65. Therefore, the whole coding capacities of both patterns are 16,510, which have been greatly improved compared with all previously reported work, as far as we know.
To test the proposed algorithms comprehensively, several experiments were conducted. First, the performance of the proposed end-to-end corner detection method was tested and compared with the traditional methods. Then, the measurement accuracy of the system was evaluated at different distances and compared with other market-available SL cameras. Afterward, surfaces with rich colors and complex textures were selected to test the robustness of our system.
3.1. Performance Evaluation of the Developed End-to-End Corner Detection Algorithm
One of the main contributions of this work is the development of an end-to-end corner detection method based on deep learning. Therefore, the performance of it was first evaluated. Here, the traditional method that was based on the local template matching algorithm as in [
15] was used as a comparison. The details of this algorithm can be referred to [
15].
3.1.1. Performance w.r.t. (with Respect to) the Noise Level
In this experiment, three different objects were used. One was a vase with textured surfaces, another was a face model with large curvature-changing surfaces, and the last was a ball with smooth surfaces, as shown in
Figure 11. To test the robustness of the developed end-to-end corner detection method, Gaussian noise with the mean value of 0 and the standard deviation of σ was added to the raw images. Considering the normal noise in a practical measurement situation, we varied σ from 0 to 1.0. The corner detection results under different noise levels of the proposed method were shown in
Figure 12. The results of the traditional local template matching algorithm [
15] was provided for comparison.
As can be seen in
Figure 12, for these three objects, with the increase in noise level, the performance of the traditional image processing method [
15] was reduced. However, the proposed method demonstrates a stable performance and robustness to noise. For the traditional local template matching method, when σ = 1, the number of successfully detected corners of the three objects decreased by 17.8%, 14.1%, and 12.5%, respectively, compared with the noise-clean situation. However, for the proposed method, the results were only 1~3 corner points apart.
3.1.2. Performance w.r.t. the Density of the Feature Points
To further validate the performance of the proposed method, we compared it with the traditional image processing method [
15] in the case of different coding densities. We changed the coding density of the pseudo-2D pattern by changing the size of the pattern blocks. Here, the pattern blocks with the size of 51 × 51 pixels, 41 × 41 pixels, 31 × 31 pixels, and 21 × 21 pixels were chosen, respectively. The smaller the block size, the greater the coding density. Three target objects, a vase, a face model, and a ball were used. The coding SL pattern in
Figure 10b was used, and the captured images with different coding densities are displayed in
Figure 13.
The quantitative corner detection results of the three objects based on the traditional image processing method and the proposed method were comparatively displayed in
Table 3. The number of successfully detected corners was used as an index to evaluate the performance of these two different methods. In
Table 3, the growth rate of the number of corner extraction by our method and the traditional image processing method was expressed as “Ratio”.
Several conclusions were made from these results. First, in all the target surfaces, the number of detected corners improved greatly using the proposed method, rather than the traditional method. For example, for the vase with a pattern density of 21 × 21 pixels, the corner detected by our method was 26.3% higher than that with the traditional method. Second, our method demonstrates a superior robustness against the changing of the coding densities. Taking the spherical data as an example, although the projected corners have been very dense with the pattern density of 21 × 21 pixels, our method can detect most of the features correctly. Last, the advantages of our method are more obvious when dealing with the problem areas, such as the edges, surfaces with rich textures, or large curvature changes. For the ball with the pattern density of 21 × 21 pixels, the corner detected by our method was 17.1% higher than that by the traditional method. By contrast, this data were as high as 27.8% for the face model.
3.2. Accuracy Evaluation of the Developed System
Reconstruction accuracy of the developed spatial SL with a high-capacity pattern was examined. A standard plane of size 50 × 50 cm was utilized as the target object and the machining accuracy of it was up to ±5 µm. The plane was put at different working distances from 90 cm to 130 cm against the projector with a step of 10 cm. The plane was reconstructed, and its standard deviation was taken as the precision evaluation index. The results are displayed in
Figure 14. At working distances of 1.0 m, the reconstruction error was as low as 0.133 mm, which achieves submillimeter accuracy. When the working distance increased to 130 cm, the accuracy markedly decreased to 0.633 mm. Two main reasons may have led to this phenomenon. The first could have been the triangular structure of the SL system, in which the accuracy decreases with increasing measuring distances. The second reason could have been the depth of field of the DLP, which was rather small once the focal length of the lens was fixed. When the distance reached 130 cm, the projected patterns were already blurred.
3.3. Reconstruction of Surfaces with Rich Textures
As mentioned above, objects’ textures, including both geometrical texture and color texture, bring great challenges in pattern decoding. In this part, two different target objects were selected. One was a face model with complex geometrical textures, and another was a plane with complex color textures, as shown in
Figure 15a,c. Corresponding results of corner detection are shown in
Figure 15b,d, respectively. Reconstructed point clouds and 3D models are displayed in
Figure 16. It can be observed that the proposed method works well on complex surfaces.
3.4. Reconstruction of Surfaces with Large Mutations
To further validate the robustness of the proposed method, a piece of paper was shaken to generate various and large morphological mutations. The proposed system shot the dynamic scenes and recovered the 3D morphology of the deformed surfaces. As the projected pattern was fixed, only the frame rate of the camera needed to be adjusted. Here, we made it work at the frame rate of 31 fps. We shook the paper to create different deformations and captured 35 frames in total. The captured images and reconstructed results can refer to the videos we provided in the attached file. To save some space, only one of the frames and its reconstruction result were displayed in
Figure 17.
From
Figure 17a,b, it can be seen that in the regions with an abrupt surface gradient, the blocks of the SL pattern were twisted, or even broken. For small and moderate mutations, our system was found to work rather well. However, for serious and large mutations, the pattern blocks were broken, indicating that they cannot be successfully extracted and decoded, meaning that the related 3D data was lost. It should be noted that this is a recognized defect of spatial SL. In fact, by adopting denser coding corners with our high-capacity pattern, this problem can be alleviated to some extent, but the problem still exists. The processing of broken code words is an important research direction of spatial SL, which has been planned to be studied in further depth in our future work.
4. Discussions
Spatial SL technology can achieve high-precision reconstruction from a single image and is one of the important 3D vision technologies. However, it has not been practically applied compared to the temporal SL and is mainly limited by the traditional 2D matrix-like coding strategies, resulting in fewer reconstructable feature points. This paper proposed a new pseudo-2D coding method that greatly increases the number of spatial coding points by reducing the coding dimensions. Taking the common settings of the q = 4 coding elements and the coding window of r × r = 2 × 2 as an example, the traditional 2D matrix-like coding strategies can only generate a pattern with a theoretical maximum coding capacity of 255, while the proposed pseudo-2D coding method can generate a pattern with the maximum coding capacity of n × r × W = n × 254 (where n is the number of repetitions). It is worth noting that the increase in coding density has also led to an increased decoding difficulty, especially in the extraction of dense feature points. To address this issue, this paper proposes an end-to-end deep learning method that enhances the robustness compared to the traditional template matching methods, especially in the problem areas, such as the edges, surfaces with rich textures, or large curvature changes.
However, several disadvantages and limitations of the proposed technique remain. For example, as the coded feature points are still discrete, there may be code interruption issues for objects with surface discontinuities or large morphological mutations. Additionally, this system is relatively bulky due to the use of a high-resolution DLP projector. To make it more portable and compact, we are considering using diffractive optical elements for projection in the future.