1. Introduction
In the case of human–robot cooperation, deep reinforcement learning (DRL) is used to train the robot to undertake the task. For an instance of the bolt screwing task, the human partner’s arms might be obstacles for the robot during the working process (
Figure 1). In order to train the capability of avoiding obstacles for the robot through DRL method, one has to prepare a huge number of samples, which are usually hard to generate. In this case, it can be done by reconstructing the 3D imaging [
1,
2,
3] of a human worker executing the task. The reconstruction sequence of the human’s arms can be used as moving obstacles to train the obstacle avoidance capability of the robot in a virtual environment (
Figure 2).
In the scenarios above, a common prerequisite is the accurate pose information of a human or a robot. However, when an object is projected onto the camera plane, its depth information along the optical axis is lost, which possibly makes two actually far separated objects look close to each other [
4]. This results in incorrect estimation of the pose without correct depth information. Although lasers are used to provide depth information, they are usually expensive. Low-cost equipment such as a Kinect are also prevailing alternatives, but their accuracy is not high enough (Kinect has an error of at least 4 mm and has a dead zone within 0.5 m. Additionally, the farther the distance is, the more inaccurate the measuring result is).
Many previous studies propose different approaches to produce depth estimation from images. Among them, a few researchers used optimization methods to handle the problem. Ranftl et al. propose a novel motion segmentation method to produce dense depth map from two consecutive frames through a single monocular camera. They segment the optical flow field into a set of motion models, with which the scene is reconstructed by minimizing a convex program [
5]. Smith et al. present a method which estimates depth from a single polarisation image by solving a large, sparse system of linear equations [
6]. Karsch proposes a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling [
7]. Optimization methods need the manually defined constraints to guarantee the accuracy of the resulted depth estimation. Thus, these methods require human experiences to give effective constraints.
Additionally, approaches using learning-based methods to model manually extracted features are other promising alternatives as well. Saxena et al. propose a Markov Random Field (MRF) learning algorithm to handle monocular cues like texture gradients and variations, defocus, etc. They incorporate these cues into a stereo system to obtain depth estimation results [
8]. Ma et al. improve the ResNet-50 network by transfer learning to tackle the depth estimation problem through a single image [
9]. Haim et al. propose a phase-coded aperture camera for depth estimation. They equip the camera with an optical phase mask to produce unambiguous depth-related color characteristics for the captured image [
10]. Gan et al. present a convolutional neural network architecture that pays more attention to the relationships of different image locations and incorporates the absolute and relative features [
11]. These methods depend on the extracted features. In some scenarios, these features might not incorporate sufficient cues of the depth information.
Recently, approaches using deep learning to generate depth maps from images become prevailing. Fu et al. propose a space-increasing discretization (SID) strategy to discretize depth and recast depth network learning as an ordinal regression problem [
12]. Jiao et al. propose an approach to handle the depth estimation and semantic labeling task simultaneously. They present a concept called the attention-driven loss for the network supervision, and a synergy network to learn the relevance between the two tasks [
13]. Godard et al. trained an unsupervised deep neural network with binocular stereo data to address the ground truth depth data deficient problem in traditional depth estimation methods. A novel training loss was proposed for the deep convolutional neural network to perform single image depth estimation with a high quality [
14]. Inspired from the concept of autoencoders, Garg et al. trained the first convolutional neural network end-to-end from scratch for single view depth estimation in an unsupervised manner [
15]. Deep learning methods are powerful tools for regressions. However, they usually need expensive platforms to run the algorithm and they usually need many training samples related to the target object for accurate depth estimation.
Different from the aforementioned previous approaches, the proposed method in this paper estimates depth information by means of 3D reconstruction. The first thing we need to do is to circle a camera around the target object and get its reconstruction at the very beginning. Afterwards, only a single monocular image is required to accurately estimate the depth information of the object no matter how it moves or becomes deformable in front of the camera. The proposed method is therefore more applicable in the scenarios where the target object is not rigid and where accurate depth information is necessary. With the point cloud of a target object (in a certain static pose) reconstructed beforehand, the proposed method can estimate the pose and reconstruct the point cloud of a target object (in its other poses) by a single input RGB image. This work can be used on not only humans and humanoid objects, but on other deformable objects as well.
The remainder of this paper is structured as follows.
Section 2,
Section 3 and
Section 4 individually introduce the three modules of the proposed approach specifically.
Section 5 introduces the whole flow chart of the proposed method. Experimental evaluations are provided on a NAO robot and a human in
Section 6.
Section 7 concludes the contributions of this paper.
2. 3D Labeled Reconstruction
This section introduces the priori model for the target object. The priori model is the 3D reconstruction (stored in the form of the point cloud [
16,
17,
18]) of the target object in its stationary status, with a SIFT feature vector attached to each cloud point. Therefore, the priori model is built through two steps. First, we use a traditional 3D reconstruction approach (for static object) to reconstruct the target object by multiple images. Second, we use the SIFT algorithm [
19] to extract feature vectors from the collected images, attaching them to the corresponding 3D points on the reconstructed point cloud.
2.1. 3D Reconstruction with Multiple Images
Given a static object, we can use a single camera (whose intrinsic parameter is
f) to circle around the object, reconstructing a point cloud. In order to reconstruct the object, the total number of the images captured by the camera is
N. Additionally, the total number of points on the object surface is
M. The orientation and position of the camera with regard to the world frame at the
ith instant can be respectively represented by a matrix
and a vector
. Denote
as the
jth point on the object surface with regard to the world frame,
as the same
jth point with regard to the camera frame at the
ith instant, and
as the image coordinate of
jth point at the
ith instant (for denotation simplification, we define
if
is occluded under the observation by the camera at
ith instant). The following can be given [
20]:
Let
Subsequently, the desired result in this step is the 3D reconstruction
in the following form:
2.2. SIFT Features to Label the 3D Reconstruction
Given a two-dimensional image
, the SIFT algorithm [
19] is able to extract effective key points through the LoG operator. By computing the gradients in the neighborhood of each key point, a corresponded descriptor vector can be obtained to distinguish the key point. Then, we can use the SIFT algorithm to find a set of feature points (denoted as
) and their corresponding descriptor vectors (denoted as
) for the image captured at the
instant, which jointly yield a two-tuple set denoted as
. Executing the same operation to all the images, we can finally get
(where
indicates the total number of feature points derived from the SIFT algorithm for the image captured at the
ith instant).
Subsequently, we need to attach the descriptor vectors to the corresponding 3D points on the surface of the reconstructed point cloud. It can be deduced from Equations (
1) and (
2) that
By Equation (
5), we can determine the 3D point
on the reconstructed point cloud corresponding to the feature point
. Thus, we can acquire a two-tuple set
. For denotation simplification, we define that any
that is occluded from the camera view at the
ith instant, or whose corresponding
is not a key point, still has a descriptor vector
. Therefore, we can get
after the
Nth instant. The required 3D labeled reconstruction is the following:
where
represents the average vector over all the non-zero descriptor vectors (i.e.,
) related to the 3D point
within the
Nth sampling instant.
3. Skeleton-Based Topological Segmentation
This section introduces how to provide the reconstructed point cloud with a robust topological segmentation, so as to deal with the case where the target object is not rigid. In this way, each sub-point-cloud derived from the topological segmentation is expected to be rigid. Topological segmentation based on the surface information of the target object is inclined to getting influenced by the surface noise, causing weak robustness. Therefore, the proposed topological segmentation is executed through two steps. First, we extract the skeleton of the reconstructed point cloud [
21] and segment the skeleton based on its curvature. Second, we dilate the sub-skeletons [
22] to yield the sub-point-clouds, which are the results of the topological segmentation.
Skeleton extraction and segmentation. Given an object denoted as , we denote as the set of all the maximally inscribed spheres in , none of which has common points of tangency with noisy surface. Then, the skeleton of is denoted as .
After extracting the skeleton of the reconstructed point cloud, we segment the skeleton according to its curvature. Supposing , an equivalence relation implicated by is defined as , if and only if , are on a curve segment whose ends are two points in and no other points in are on the same curve segment.
Thus, the curve segments determined by
are the equivalence classes [
23]
defined as
It is clear that the curve segments in
are separated from each other. In this paper, we propose two categories of points, respectively, denoted as
and
in order that
determines
in Equation (
7).
3.1. The First Category
Supposing all the points on the skeleton constitute a set (K is the total number of elements in ). We use a set to represent the connectivity of each two points from . Specifically, if and are adjacent to each other. Otherwise, . Subsequently, the first category of points is defined as
3.2. The Second Category
Similar to
(Equation (
7)),
segmented by
is
We define the function of the
cth curve segment (suppose
C curve segments totally) in
as
, where
is the arc length parameter of the function
and
(
is the length of the
cth curve segment). Then, the Frenet formulas [
24] of the
cth curve is
where
,
and
are, respectively, the unit vector tangent, normal unit vector and binormal unit vector of
;
and
are the curvature and torsion.
We construct a quantity satisfying that and as well as a threshold . Subsequently, the second category of points is defined as
3.3. Skeleton Dilation with Constraints
Supposing
B is a structuring element [
20] in the form of a subset in
, the dilation of
by
B after
T iterations is defined as
The dilation operation is executed to segment the reconstructed point cloud by the sub-skeletons. Therefore, sub-point-clouds should be separated from each other. Additionally, the dilation should be stopped when reaching the surface of the reconstructed point cloud. Thus, in each iteration
, we remove the points that violate the following two constraints:
where
and
represent two distinct sub-skeletons in
,
is the external space of
.
Moreover, for the dilation of each sub-skeleton
under constraints Equations (
13) and (
14), the corresponding total iteration times
T satisfy that
Then, the dilation result of sub-skeletons in
under constraints in the form of Equations (
13)–(
15) forms a equivalence relation for topological segmentations
.
4. 3D Reconstruction at th Time
This section introduces how to quickly reconstruct the dynamic object with a single RGB camera. For each frame by the camera, we first extract all the feature points. Through matching these feature points to those on the reconstructed point cloud, we are able to know to which sub-point-clouds these feature points individually correspond. Finally, the poses of the sub-point-clouds can be estimated by the correspondences between the feature points on the point cloud and the image feature points. This pose estimation problem can be easily handled by solving a nonlinear optimization. The reconstruction is therefore reduced to the reorganization of the sub-point-clouds with updated poses.
Denote the image captured at
ith instant (
) as
. Through the SIFT algorithm, we can extract the feature points and their corresponding descriptor vectors. Through
Section 2.2, a labeled 3D point cloud is reconstructed, to which the descriptor vectors are attached. Thus, we can use the descriptors from the captured image and from the cloud to find their correspondence, which can be denoted as a two-tuple set
. In this set,
and
respectively represent the feature point from the image and the cloud point, whose descriptor vectors are similar. In addition,
yields a bijective map
Suppose as the basis for the topology of space . We can get the basis for topology (denoted as ) of the subset by
We can further acquire the basis for the topology (denoted as ) of the set by
Based on our design, each element in
is a rigid component of
. Thus, when the object represented by
moves stochastically, the elements in
have the same rigid transformation, i.e., there exists a single pair
for
such that
where
is a three-dimensional rotation matrix,
is a three-dimensional translation vector and
is defined as
where
p is denoted in the form of a column vector.
Defining
, since
, based on Equation (
18), we can get the actual coordinates of
at
instant as
Then, we can get another expression of
as
where
(based on Equation (
2)) is an operator to transform a three-dimensional point to a two-dimensional coordinate related to the camera at
ith instant, which is defined as
Thus, we can compute a
for
by solving the following optimization problem:
Finally, the raw result of the 3d reconstruction
based on
at
ith time is
5. Approach Overview
Specifically, we utilize a single camera to resolve the dynamic 3D object reconstruction. The problem can be formulated as: given the images captured within time (the pose and position alters at each time in order that static 3D reconstruction of the object can be satisfactorily achieved) and images captured at time , the expected result is the dense reconstruction of the object at the same time .
Accordingly, we consider to fully utilize the 3D information acquired from the foregoing frames and then reduce the dynamic object reconstruction problem to a re-organizing problem. Thus, the proposed approach mainly incorporates three steps.
In the first step, we obtain the static 3D reconstruction of the target object in its stationary status. Specifically, the point cloud of the target object is acquired in the first phase through existing static 3D reconstruction methods. Meanwhile, we obtain the SIFT features on each image (among images utilized for 3D reconstruction) and attach the feature descriptors to the corresponding points on the reconstructed point cloud.
In the second step, we find an appropriate topological segmentation for the reconstructed point cloud such that each topological part moves rigidly during the object motion. Actually, the point cloud topological segmentation is also an open problem due to the fact that it is difficult to determine the standard for a satisfactory segmentation. In this paper, we transfer the point cloud segmentation problem into the skeleton segmentation problem, i.e., the segmentation of the point cloud results from the segmentation of its skeleton. This is based on the thought that the object skeleton is much more stable toward perturbation than the object surface. Thus, we can simply segment the skeleton into several sub-skeletons based on its curvature and torsion property before dilating each sub-skeleton to determine a corresponding topological part of the point cloud.
In the third step, when capturing a new image, we extract the SIFT features and match these features to those attached to the point cloud in the first phase. These matched features implicate the correspondence between the image and the point cloud. Based on the topological segmentation executed on the point cloud, the correspondence between the image and each topological part can also be computed. Then, the pose and position of each topological part can be deduced, and the reconstruction result can thus be obtained through a simple re-organization of these topological parts. The whole flow chart of the proposed approach is illustrated in
Figure 3.
Construction of rotation matrix. Since the analytic expression of rotation matrix
in Equations (
18), (
20), (
23) and (
24) is required for optimization program (following Equation (
23)), we utilize the quaternion to construct the rotation matrix. Specifically, a quaternion is in the form of
Then, the corresponding rotation
in the optimization problem of Equation (
23) is
with a constraint as
which constitutes a simple constrained nonlinear optimization problem.
6. Experiments and Discussion
This section provides the experiment results from the proposed approach. We evaluate the proposed approach by a set of experiments on a NAO robot
Figure 4 and a human being.
6.1. Experiments on a NAO Robot
NAO robot is an autonomous, programmable humanoid robot developed by Aldebaran Robotics (France), with a height of 58 centimetres and 25 degrees of freedom. We use a NAO robot as the target object for 3D reconstruction. During the experiment, the NAO robot continuously changes its poses such that it is deformable. We reconstruct the NAO robot to test the depth accuracy by the proposed algorithm.
This experiment is undertaken on a single monocular camera to reconstruct the NAO robot in its dynamic status. In order to guarantee the effectiveness of the proposed method, several specific processing procedures are listed in
Appendix A,
Appendix B and
Appendix C.
We compare the proposed method with approaches in [
14,
25]. The depth estimation result by the proposed method is shown in
Figure 5. Note that the proposed method only estimates the depth information of the target object while the previous approaches estimate the depth of the whole image. Therefore, we only compare the depth estimation accuracies corresponding to the image regions where the target object appears. Four accuracy metrics are used [
26] as shown in
Table 1.
The depth accuracies of the proposed method and other three approaches are shown in
Table 2.
6.2. Experiments on a Human
Experiments are also undertaken on a human to verify the proposed approach. We produce the 3D reconstruction of the human in his still pose at the very beginning. Subsequently, the human changes his pose and the camera captures the corresponding monocular image for each pose. We use these images to test the proposed approach and the previous algorithms [
14,
25]. The program to implement this experiment is the same as that for the NAO robot experiment.
The depth estimation results by the proposed method and other approaches are shown in
Figure 6. Additionally, the depth accuracies of the proposed method and other three approaches are shown in
Table 3.
Similar to the experiments taken on NAO robot, the experiments results can validate the effectiveness of the proposed method. The proposed method uses the 3D point cloud reconstructed ahead of time as the priori model. Subsequently, it depends on the image features to estimate the pose changes of the priori model, therefore being capable of estimating the depth information accurately.
7. Conclusions
In this paper, we propose a feature-based approach to estimate the depth of a deformable object accurately via a monocular camera. The proposed approach needs to reconstruct the target object in its initial pose as a priori model. Afterwards, only one monocular image is required to to accurately estimate the depth of the target object no matter how it changes its pose. Experiments are undertaken on a NAO robot and a human to evaluate the accuracy of the proposed approach. In future work, we aim to accurately estimate depth of the same kind of deformable object, by only reconstructing a single instance of that kind as the priori model.
Author Contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. conceptualization, methodology and writing—original draft preparation: G.J.; software, validation and writing—review and editing: S.J.; investigation, resources, software and supervision: Y.O.; the investigation, resources and validation: S.Z.
Funding
This work was supported by the National Natural Science Foundation of China Grant No. U1613210, U1813208, the Shenzhen Fundamental Research Programs (JCYJ2016428154842603, JCYJ20170413165528221), and the Shenzhen Engineering Laboratory for Integration of Interventional Diagnosis and Treatment.
Conflicts of Interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Appendix A. SIFT Matching Accuracy
The SIFT algorithm, in spite of its effectiveness validated in [
19], cannot actually match features from two images enough accurately, especially when a corresponding object in two images has deformation. Thus, in order to ameliorate the result of feature matching (crucial as a prerequisite of our approach), we propose the following engineering improvements:
For a certain feature point, we collect the 10 other feature points closest to it on the image. The 11 feature points forms a group. In this way, each feature point on the image can constitute a group, with itself as the center of the group. It is easy to consider that, if two feature points from two images match each other, their corresponding groups match each other as well.
When judging whether two feature points from different images are matched, we compare their groups. Specifically, we first look for two points respectively from two groups, which matched each other best (the matching degree is decided by the SIFT descriptor vectors of the feature points.). Recording the matching degree, we delete the two points from their groups. We continue looking for the next two points that are matched best in the same way. This is iteratively done until there are no more points in both groups. We accumulate the matching degrees of the feature point pairs as the similarity of the original two feature points.
Appendix B. Discrete Computation of Curvature and Torsion
In Equation (
11), since
should satisfy that
,
, we compute
in discrete situations when programming as: for any voxel (namely a 3D point restored in computers), we choose the 5th point individually from it along both sides. Construct two vectors respectively pointing from the voxel to the two chosen voxels. Then, the cosine value of the two vectors is selected as
for simplicity.
Appendix C. Result Optimization for Smoothness
The result from Equation (
23) can basically represent the pose of the target object, but the joint of two connected topological parts may be too coarse to be the final result. Thus, we propose a simple algorithm to smooth the joint of two connected topological parts. The procedures can be described as:
When the local point cloud is segmented into two topological parts, we know the correspondence between all the points on both borders. After solving the optimization problem as Equation (
23), the relative pose of the two topological parts can also be acquired. Then, the new correspondence between the points on both borders can be deduced. We interpolate points equably along the line segment with them as ends. The interpolated points as well as the pairs of points constitutes several included angles by connecting neighbor points. Regard these included angles as variables, by solving an optimization problem to make the variance of these variables smallest, a smoothed reconstruction with smoothly connected topological parts can be obtained.
References
- Xu, G.; Chen, J.Y.; Li, X.T. 3-D Reconstruction of Binocular Vision Using Distance Objective Generated From Two Pairs of Skew Projection Lines. IEEE Access 2017, 5, 27272–27280. [Google Scholar] [CrossRef]
- Chu, P.M.; Cho, S.; Fong, S.; Park, Y.W.; Cho, K. 3D Reconstruction Framework for Multiple Remote Robots on Cloud System. Symmetry 2017, 9, 55. [Google Scholar] [CrossRef]
- Xu, G.; Yuan, J.; Li, X.T.; Su, J. 3D reconstruction of laser projective point with projection invariant generated from five points on 2D target. Sci. Rep. 2017, 7, 7049. [Google Scholar] [CrossRef] [PubMed]
- Xu, G.; Zhang, X.Y.; Li, X.T.; Su, J.; Hao, Z.B. Global Calibration Method of a Camera Using the Constraint of Line Features and 3D World Points. Meas. Sci. Rev. 2016, 16, 190–196. [Google Scholar] [CrossRef] [Green Version]
- Ranftl, R.; Vineet, V.; Chen, Q.; Koltun, V. Dense Monocular Depth Estimation in Complex Dynamic Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Smith, W.A.P.; Ramamoorthi, R.; Tozza, S. Linear Depth Estimation from an Uncalibrated, Monocular Polarisation Image. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- Karsch, K.; Liu, C.; Kang, S.B. Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2144–2158. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Saxena, A. Depth estimation using monocular and stereo cues. In Proceedings of the International Joint Conference on Artifical Intelligence, Hyderabad, India, 6–12 January 2007. [Google Scholar]
- Depth Estimation from Single Image Using CNN-Residual Network. Available online: http://cs231n.stanford.edu/reports/2017/pdfs/203.pdf (accessed on 30 August 2017).
- Haim, H.; Elmalem, S.; Giryes, R.; Bronstein, A.M.; Marom, E. Depth Estimation From a Single Image Using Deep Learned Phase Coded Mask. IEEE Trans. Comput. Imaging 2018, 4, 298–310. [Google Scholar] [CrossRef]
- Gan, Y.; Xu, X.; Sun, W.; Lin, L. Monocular Depth Estimation with Affinity, Vertical Pooling, and Label Enhancement. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep Ordinal Regression Network for Monocular Depth Estimation. arXiv, 2018; arXiv:1806.02446. [Google Scholar]
- Jiao, J.; Cao, Y.; Song, Y.; Lau, R.W.H. Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss. In Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Brostow, G.J. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Garg, R.; Bg, V.K.; Carneiro, G.; Reid, L. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. arXiv, 2016; arXiv:1603.04992. [Google Scholar]
- Wang, G.H.; Chu, Y.B. A New Oren-Nayar Shape-from-Shading Approach for 3D Reconstruction Using High-Order Godunov-Based Scheme. Algorithms 2018, 11, 75. [Google Scholar] [CrossRef]
- Zhu, W.; Chang, X.; Wang, Y.B.; Zhai, H.Y.; Yao, Z.X. Reconstruction of Hydraulic Fractures Using Passive Ultrasonic Travel-Time Tomography. Energies 2018, 11, 1321. [Google Scholar] [CrossRef]
- Xu, G.; Yuan, J.; Li, X.T.; Su, J. Optimization reconstruction method of object profile using flexible laser plane and bi-planar references. Sci. Rep. 2018, 8, 1526. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef] [Green Version]
- Stockman George, C. Computer Vision; Prentice Hall: Upper Saddle River, NJ, USA, 2001. [Google Scholar]
- Jalba, A.; Sobiecki, A.; Telea, A. An Unified Multiscale Framework for Planar, Surface, and Curve Skeletonization. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 30–45. [Google Scholar] [CrossRef] [PubMed]
- Rodriguez, J.; Ayala, D. Erosion and Dilation on 2D and 3D Digital Images: A new size-independent approach. In Proceedings of the Vision Modeling & Visualization Conference, Stuttgart, Germany, 21–23 November 2001. [Google Scholar]
- Munkres, J. Introduction to Topology; Saunders College Pub: Philadelphia, PA, USA, 1983. [Google Scholar]
- Kreyszig, E. Differential Geometry; University of Toronto Press: Toronto, ON, Canada, 1959. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper Depth Prediction with Fully Convolutional Residual Networks. arXiv, 2016; arXiv:1606.00373. [Google Scholar]
- Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. Available online: https://papers.nips.cc/paper/5539-depth-map-prediction-from-a-single-image-using-a-multi-scale-deep-network.pdf (accessed on 10 September 2014).
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).