Three-Dimensional Shape Reconstruction from Digital Freehand Design Sketching Based on Deep Learning Techniques

Zhou, Ding; Wei, Guohua; Yuan, Xiaojun

doi:10.3390/app142411717

Open AccessArticle

Three-Dimensional Shape Reconstruction from Digital Freehand Design Sketching Based on Deep Learning Techniques

by

Ding Zhou

^1,*

,

Guohua Wei

¹ and

Xiaojun Yuan

²

¹

School of System Design and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen 518051, China

²

National Key Laboratory of Wireless Communications, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(24), 11717; https://doi.org/10.3390/app142411717

Submission received: 14 October 2024 / Revised: 5 December 2024 / Accepted: 10 December 2024 / Published: 16 December 2024

Download

Browse Figures

Versions Notes

Abstract

:

This paper proposes a method for 3D reconstruction from Freehand Design Sketching (FDS) in architecture and industrial design. The implementation begins by extracting features from the FDS using the self-supervised learning model DINO, followed by the continuous Signed Distance Function (SDF) regression as an implicit representation through a Multi-Layer Perceptron network. Taking eyeglass frames as an example, the 2D contour and freehand sketch optimize the alignment by their geometrical similarity while exploiting symmetry to improve reconstruction accuracy. Experiments demonstrate that this method can effectively reconstruct high-quality 3D models of eyeglass frames from 2D freehand sketches, outperforming existing deep learning-based 3D reconstruction methods. This research offers practical information for understanding 3D modeling methodology for FDS, triggering multiple modes of design creativity and efficient scheme adjustments in industrial or architectural conceptual design. In conclusion, this novel approach integrates self-supervised learning and geometric optimization to achieve unprecedented fidelity in 3D reconstruction from FDS, setting a new benchmark for AI-driven design processes in industrial and architectural applications.

Keywords:

freehand design sketching; 3D shape reconstruction; deep learning; geometric reconstruction algorithms; image processing techniques

1. Introduction

Freehand Design Sketching (FDS) in architecture and industry refers to a creative activity where designers visually express and convey their ideas through two-dimensional media (e.g., paper and digital drawing boards) during the early stages of the design process. It provides an effective platform for exploration and communication within the design team or with the client [1]. FDS not only aids in the design process but also captures the unique concepts envisioned by the designer, translating them into a tangible visual form. In architectural design, FDS is commonly used to explore architectural forms, spatial layouts, and material selections, providing essential references for the initial design proposals [2]. In industrial design, FDS allows designers to experiment with different product appearances and functional structures, helping them find the most suitable design solution [3]. Therefore, FDS plays an irreplaceable and essential role in the design process, connecting the initial creative idea with the subsequent concrete implementation. The FDS infuses the entire design process with inspiration and vitality. Figure 1 illustrates that converting FDS in architectural and industrial design into Digital 3D Models (D3DM) is crucial for verifying, refining, and further developing design proposals [4]. This method enhances the depth and breadth of the design process and also helps facilitate effective communication and understanding among the various parties involved in the project [5].

Traditional CAD systems are limited by issues such as manual input errors, design rigidity, scalability challenges, lack of predictive insights, and difficulties in collaboration [6]. In contrast, deep learning enhances design workflows by enabling automation, flexibility, scalability, predictive capabilities, and improved collaboration [7]. Compared to Computer Aided Design (CAD) software, Artificial Intelligence (AI) based on deep learning techniques offers an innovative yet challenging technological approach for 3D shape reconstruction from FDS. Manually transforming FDS into D3DM encounters at least three main challenges. First, FDS is often spontaneous and unstructured, lacking precise dimensions and detailed information, making the transformation process a high-communication-cost modeling task [8]. The related sketches may be incomplete, ambiguous, or even distorted, making it difficult for modelers to capture accurately the intended shapes and structures [9]. Second, the limited views and scale references that FDS offers constrain the exact replication of item proportions throughout the 3D reconstruction process [10]. Thirdly, fidelity and precision may be negatively impacted by the designer’s sketching’s inaccuracies, stains, and flaws during the FDS process [11].

To address the challenges in interpreting and reconstructing 3D models, sophisticated algorithms and techniques can be leveraged to capture and express effectively the design intent. Relevant methods using general filtering [12] or consolidation [13] to transform rough sketches into clear line drawings often require manual intervention and lack precision, leading to discrepancies between the original sketches and the reconstructed models. These issues hinder the seamless transition from FDS to D3DM and call for more efficient and accurate technologies in this field. In recent years, advancements in machine learning, computer vision, and computational geometry have enabled greater automation and accuracy in 3D shape reconstruction. Technologies such as deep learning-based image recognition and reconstruction [14], CAD-based automated modeling tools [2], and virtual reality [15] have introduced new technical possibilities for transforming FDS into D3DM.

In architectural and industrial design, the transformation of FDS into D3DM has high product standards for (i) restoring the modeling characteristics, (ii) ensuring symmetry, and (iii) maintaining smooth surfaces. However, 3D shape reconstruction based on deep learning techniques must still address these challenges. These difficulties highlight the existing research gaps, namely the need to develop robust algorithms to effectively improve the quality of 3D shape reconstruction [16]. Early methods had predefined rules to infer local geometric features and generate 3D shapes. However, these methods often limited the diversity of 3D reconstructions.

The advent of deep learning has revolutionized this field. With large datasets of synthetic sketches to train deep neural networks, algorithms can predict 3D geometries from free-hand sketches. For example, conditional Generative Adversarial Networks (cGANs) can process free-form sketches without strict structural requirements. Multi-view integration methods reduce ambiguity in single-view sketches, enhancing depth perception and structural coherence. Direct shape optimization methods reconstruct 3D shapes from multiple sketches, providing greater detail and alignment with artistic intent. In our work, we use the marching cubes algorithm to convert implicit functions into surface meshes and render any view of the reconstructed surface through rasterization. This method supports iterative model optimization based on user input, making it ideal for design and prototyping applications.

This study selected eyeglass frame design as the subject for 3D shape reconstruction from FDS due to the distinct modeling characteristics of its rims, bridges, and temples. These components define the product’s design and feature absolute symmetry and smooth surfaces, which are important criteria for the D3DM outcomes. The transformation from an eyeglass frame’s FDS to its D3DM requires 3D shape reconstruction to solve challenges such as surface fitting, edge detection, and geometric shape matching. Therefore, this research explores the essential technological application of 3D reconstruction algorithms to capture the modeling characteristics of the target product and ensure symmetry and smoothness standards. Innovative solutions in this field can help promote the broad application prospects of 3D shape reconstruction from FDS based on deep learning techniques. The research results of the essential technologies can directly drive the progress of computer-aided design and digital manufacturing, opening up new possibilities for intuitive design conceptualization and efficient scheme adjustment in architectural and industrial design. Overall, this research introduces a transformative approach to 3D shape reconstruction from FDS, leveraging deep learning techniques to overcome traditional challenges in fidelity, symmetry, and geometric precision, setting a new benchmark in the integration of AI and creative design workflows.

2. Related Work

2.1. FDS Empowered by Digital Applications

Architectural professionals and industrial designers, with their spatial imagination and understanding abilities, can complete architecture drawings and product sketching on traditional paper mediums such as FDS. However, the complex lines’ information may hinder the transformation into D3DM. In architecture, designers often use sketches to ponder different schemes and express the results of the design process. The creative medium commonly used by architects for visual expression is drawing on white paper [17]. This method is more direct and economical in its approach, allowing them to grasp the essential characteristics of the design scheme. In architecture drawing, they can remove redundant information and use concise lines to capture the core of the design concept, determining the focus of their creative thinking before further developing the design [18]. Product sketching is an indispensable tool in industrial design for developing, expanding, and communicating ideas. Designers often use various line types and drawing techniques to express three-dimensional shapes in their conceptualization. For example, descriptive lines include contours, folds, and section lines, used to convey the three-dimensional shape, surface boundaries, and detailed changes of the product [19]; constructive lines include center lines, supports, and scale lines, used to accurately grasp the three-dimensional perspective, scale, and spatial relationships of the product [20]. However, these complex lines also introduce redundant information, which is not conducive to preserving digital visual expression, conveying creative data, and transforming them into D3DM.

Digital applications such as Autodesk Sketchbook and Procreate can refine line work in architectural drawings and product sketches, enhancing the process of transforming FDS into D3DM in both architectural and industrial design. These digital sketching applications, often combined with compatible tablet hardware, provide creative and intuitive user-friendly software that seamlessly integrates the vitality and flexibility of freehand sketching with the advantages of computer technology [21]. This approach requires designers to understand the mechanisms of traditional hand-drawn sketches, such as line quality, perspective, and composition. Combining these skills with powerful drawing software greatly expands their creative choices [22]. In architectural design, software like Sketchbook and Procreate provide a natural and smooth painting experience, supporting various drawing tools and layer management. They are helpful for quick conceptual sketching, rendering, and design scheme expression. These can also help architects communicate design ideas with clients more intuitively and interactively. By utilizing these digital design tools, architects can improve design efficiency and enhance the quality and interactivity of design expression. In industrial design, digital drawing applications can enable creativity and produce high-quality illustrations that deliver specific aesthetic qualities and traditional visual styles. In the conceptual design process, the workflows and creative outputs generated by these tools are significant and, to some extent, can substitute traditional sketching techniques, helping to produce higher-quality design outcomes [16]. Most importantly, the layer functionality in Sketchbook and Procreate allows designers to output clean and complete outlines while hiding folds, section lines, center lines, supports, and scale lines (as illustrated in Figure 2). Additionally, features such as mirroring and splitting [23] enhance the digital visual expression, facilitating the clear conveyance of creative data for the transformation from FDS to D3DM.

2.2. 3D Reconstruction from FDS and Converting Implicit Functions to Surface Meshes

Whether the design sketches are hand-drawn or digital, we aim to convey the designer’s creative ideas better. Unfortunately, even the most advanced methods today still struggle with complex drawings. Nonetheless, converting FDS into D3DM is crucial in various fields. Early works heavily relied on predefined rules to analyze and infer local geometric features within sketches, such as lines, curves, and angles. Leclerc and Lipson [24,25] discuss the foundational principles behind these rules, describing an optimization-based algorithm for reconstructing a 3D model from a single, imperfect 2D edge-vertex graph. Additionally, Jung et al. [26] highlight that creating folded surfaces from sketches involves mapping silhouette strokes to discontinuous sets of non-planar curves on the 3D model. Alternatively, some approaches assumed that specific line drawings correspond to particular structural characteristics, thereby limiting the diversity of possible 3D reconstructions [27,28]. Compared to traditional 3D reconstruction methods, deep learning-based approaches, such as cGANs and Signed Distance Function (SDF) regression, offer distinct advantages over traditional 3D reconstruction methods. cGANs can map 2D sketches to 3D shapes by training on large datasets, handling ambiguity and imperfections in freehand sketches [29]. SDF regression generates continuous, high-fidelity 3D surfaces, even from incomplete or noisy data [30]. These methods excel at capturing fine details and resolving ambiguities, challenges traditional methods struggle with. Despite requiring large annotated datasets and higher computational costs [29,30], they remain the best option for 3D shape reconstruction from freehand sketches due to their adaptability, accuracy, and ability to manage imperfect inputs.

The advent of deep learning has revolutionized the field of 3D reconstruction from sketches. A notable trend involves using large datasets of synthetic sketches to train deep neural networks. For instance, Li et al. [31] developed a method for modeling generic freeform 3D surfaces from sparse, expressive 2D sketches by incorporating convolutional neural networks (CNNs) into the sketch processing workflow. Wang et al. [32] introduced a sketch standardization module to handle various sketch distortions and styles. Zhang et al. [33] proposed a novel view-aware generation approach that conditions the generation process on a given viewpoint, allowing for the automatic creation of plausible shapes with predicted viewpoints or specified viewpoints to better convey user intentions. Zhong et al. [34] tackled the imprecision of human sketches and the unique figure/ground ambiguity problem in sketch-based reconstruction with two bespoke designs within a deep adversarial network. Wang et al. [32] employed cGANs to predict 3D geometries from free-hand sketches, notable for processing free-form sketches without strict structural requirements. Delanoy et al. [35] explored a volumetric approach that interprets sketches from multiple perspectives, integrating multiple views to reduce ambiguity and enhance depth perception and structural coherence. Han et al. [36] described a direct shape optimization method for reconstructing 3D shapes from multiple sketches, offering enhanced detail and alignment with the artistic intent. Guillard et al. [37] presented Sketch2Mesh, which supports the reconstruction and subsequent editing of 3D shapes from sketches. This method integrates feedback loops into the reconstruction process, facilitating iterative model refinement based on user input—ideal for design and prototyping applications. Our approach to generating 3D models of eyeglass frames from FDS draws inspiration from the works of Guillard et al. [37] and Remelli et al. [38], which utilize implicit object representations that can be converted into meshes using the marching cubes algorithm. Our primary focus is industrial eyeglass frames, leveraging symmetrical properties to optimize the reconstructed models.

The marching cubes algorithm is a fundamental technique for converting implicit functions into surface meshes. Introduced by Lorensen and Cline in 1987 [39], it has become a cornerstone in the field of 3D surface reconstruction [40]. The algorithm efficiently transforms volumetric data into polygonal surfaces, enabling the visualization and analysis of complex 3D structures. Over the years, researchers have proposed enhancements to the marching cubes algorithm to address issues such as computational efficiency and mesh quality [41,42]. Newman and Yi [43] performed a comprehensive analysis of the algorithm’s variants, applications, and limitations. In our work, we utilize the marching cubes algorithm to generate meshes from implicit representations. This method allows for rendering of the reconstructed surface from any viewpoint through rasterization.

3. Methods

3.1. Problem Statement

As shown in Figure 3, our method takes an eyeglass frame sketch as input, and our goal is to reconstruct the corresponding D3DM of the eyeglass frame. We assume that the sketch represents a shape drawn in perspective. We define a binary sketch as

I \subseteq {0, 1}^{H \times W}

, where zero denotes pixels covered by pen strokes, and one denotes uncovered pixels.

An encoder

E

and a decoder

D

are trained, whose composite function,

D \circ E (I)

, constructs a mesh

M (θ) = (V, F)

, where

V

denotes vertex positions in

R^{3}

and

F

denotes faces. In this formulation,

Z = E (I)

,

Z \in R^{n}

is the latent vector that defines the 3D geometry of mesh. We describe the encoder and the decoder in Section 3.2 and Section 3.3, respectively.

The main objective of obtaining the 3D eyeglass frame mesh model is to enhance the parameter

θ

to align the projection of 3D mesh

f (M)

with the 2D sketch

I

. This enhancement occurs by minimizing the 2D Chamfer distance between the projected mesh and the input sketch (Section 3.4). Section 3.5 detail optimization based on the symmetry of the eyeglass frame, emphasizing that the symmetrical property improves the accuracy of 3D reconstructions derived from FDS. Finally, we present implementation details in Section 3.6.

Figure 4 illustrates the architecture of the DINOv2 model, providing a comprehensive view of its layered structure and the flow of data through various network components. The model consists of multiple layers, including convolutional layers (Conv), normalization layers (Norm), and activation functions such as ReLU and Tanh, each playing a pivotal role in feature extraction and data processing. These components work collaboratively to progressively refine extracted features, enhancing the model’s ability to handle complex structures in image data. The network is divided into three primary layers, each with distinct functionality that contributes to the model’s performance. Layer 1 serves as the initial stage of feature extraction, where convolutional and normalization layers process raw input data to capture fundamental image features, with ReLU further enhancing the network’s capacity to learn nonlinear relationships. Layer 2 builds upon this foundation, incorporating additional convolutional and normalization layers to refine the features learned in Layer 1, enabling the model to discern more intricate patterns. Layer 3 continues this refinement process, using further convolutional and normalization layers to extract highly abstract and complex features. The data flows through the network in a clear progression, with each layer extracting progressively more sophisticated features essential for accurate predictions or specific tasks.

Regarding computational efficiency and resource usage, we recognize the importance of these factors in evaluating the practicality of the DINOv2 framework. While powerful, the model demands significant computational resources due to the use of multiple convolutional layers, normalization, and activation functions, which increase both memory and processing requirements. However, the modular design allows for a flexible trade-off between performance and resource consumption, depending on the application and available hardware. For example, increasing the number of convolution layers in Layers 2 and 3 enhances feature extraction but also raises the computational load. This trade-off is critical in real-time applications, where balancing accuracy and efficiency is essential. Additionally, techniques like batch normalization and activation functions improve convergence speed and model stability, enhancing training efficiency.

3.2. Feature Extraction Using DINO

DINOv2 [44] is an advanced self-supervised learning model that improves the training of robust visual features without supervision. In this study, we selected the DINOv2 self-supervised learning framework for feature extraction due to its distinct advantages. First, DINOv2 eliminates the need for complex preprocessing, such as extensive data augmentation or label processing, enabling the direct use of raw images for training, thus simplifying the workflow and reducing dataset preparation time. Second, DINOv2 excels in feature representation by maximizing consistency between homologous images, allowing the model to learn robust representations essential for tasks like sketch feature extraction. Third, its self-supervised approach enables feature extraction from large-scale image datasets without manual annotations, making it ideal for handling vast amounts of unlabeled data. Finally, DINOv2 demonstrates strong robustness in recognizing key elements in images, particularly for precise sketch feature extraction. Overall, its simplicity, robustness, and self-supervised capabilities make DINOv2 an optimal choice for our task, offering efficiency and strong performance without relying on extensive labeled datasets. It trained on diverse datasets providing a broad perspective, including ImageNet-22k, the train split of ImageNet-1k, Google Landmarks, and an assortment of fine-grained datasets comprising 1.2 billion unique images. DINOv2 showcases significant advancements in handling various computer vision tasks without fine-tuning.

The DINO framework employs a dual-network architecture consisting of a student and a teacher network. The student network learns by attempting to replicate the output of the teacher network, which in turn is an exponential moving average of the student’s parameters. The core process involves generating multiple augmented views (

I_{1}, I_{2}, \dots, I_{k}

) of a given input image (I), which then these networks process. The resultant feature vectors from the student (

F_{s}

) and teacher (

F_{t}

) networks are utilized to compute the distillation loss as follows:

L_{D} = \sum_{i = 1}^{k} CrossEntropy (F_{s} (I_{i}), Softmax (\frac{F_{t} (I_{i})}{τ})),

(1)

where

τ

represents the temperature scaling parameter. The output functions for DINOv2 can be denoted as

f_{DINOv 2} (I)

, converting an input image

I

of size

224 \times 224 \times 3

into a

1 \times 384

feature vector.

Since the eyeglass frames often showcase intricate curves, sharp edges, and sophisticated geometric shapes, we utilize the DINOv2 model based on the Vision Transformer (ViT) architecture as our encoder

E

to extract features

Z

from the FDS of the eyeglass frame, which allows us to identify a diverse range of geometric features effectively.

Given an input sketch

I

, we employ the DINOv2 model to extract features and project them to a fixed dimension using a linear layer,

Z = l i n e a r (f_{DINOv 2} (I)) .

(2)

3.3. Implicit Representation

We want to learn a generalized representation of 3D eyeglass frames of different shapes. In this section, we learn a Signed Distance Function (SDF), which is a continuous function that maps

P \in R^{3}

to a signed distance s,

S D F : R^{3} \to R, P \in R^{3}, s \in R .

(3)

The sign represents whether the point is inside (negative) or outside (positive) the watertight surface and the amplitude represents the distance from the watertight surface. Therefore, we further express SDF as follows:

S D F : R^{3} \times X \to R .

(4)

We implicitly defined the underlying surface of the object as the 0-level set of the neural function, denoted as

S D F (\cdot) = 0

. This implicit surface can be rendered through raycasting or rasterization of a mesh obtained with, for example, marching cubes.

Inspired by Park et al. [45], we represent the object geometry as an SDF field and directly regress the continuous SDF from point samples using deep neural networks. The resulting trained network can predict the SDF value of a given query position, from which we can extract the zero level-set surface by evaluating spatial samples. In practice, we use a Multi-Layer Perceptron (MLP) neural network

f_{g}

as the decoder. It takes a 3D point

P \in R^{3}

and latent vector

Z

as the input and generates its distance to the closest surface. We train the parameters

θ_{g}

of

f_{g}

to obtain a good approximator of the given SDF:

\begin{matrix} \begin{matrix} S D F (P) \approx f_{g} (P, Z, θ_{g}), \end{matrix} \end{matrix}

(5)

The training is performed by minimizing the sum over losses between the predicted

f_{g} (P)

and real SDF values of points s under the following

L_{1}

loss function:

L (f_{g} (P), s) = | clamp (f_{g} (P), δ) - clamp (s, δ) |,

(6)

where

clamp (P, δ) : = min (δ, max (- δ, P))

introduces the parameter

δ

to control the distance from the surface over which we expect to maintain a metric SDF. Larger values of

δ

allow for fast ray tracing, since each sample gives information about safe step sizes. Smaller values of

δ

can concentrate network capacity on details near the surface.

Once trained, the network implicitly represents the surface as the zero iso-surface of

f_{g} (P)

, which we can visualize using raycasting or marching cubes.

3.4. Minimizing 2D Chamfer Distance

Our primary aim in this section is to enhance the precision of our 3D model of eyeglass frames through an optimization process that aligns it closely with the FDS. We use the 2D Chamfer distance as a metric to measure the alignment between the projection of the reconstruction 3D model and the corresponding FDS.

The optimization begins by identifying the essential points on our 3D mesh,

M (θ)

, that should align with the contour of the FDS. These points are critical in minimizing the distance between the projected mesh contour and the sketch outline.

To execute this, we employ a mapping function

f : R^{3} \to R^{2}

that projects points from 3D space onto the 2D plane of the sketch. We first project our entire 3D mesh

M (θ)

onto a binary image

I_{p} = f (M (θ))

. In this projection, pixels representing external contours have a zero value, while all others have one. We then identify 3D points on the mesh

M

that project onto pixels with a value of zero on the contour image

I_{p}

, signifying the contour points P of the mesh.

Similar to Guilard et al.’s [37] approach, we use PyTorch3D [46] to access both facet IDs and barycentric coordinates of

P

that contribute to the contour regions indicated by the binary contour image

I_{p}

. The position of P is then interpolated using the vertices

V_{1}

,

V_{2}

,

V_{3}

of the associated facet. Since the vertices of the facet are differentiable functions of a parameter set

θ

, the coordinates of P are similarly differentiable.

Applying this calculation to all external contour points, we compile a set of 3D points

S_{3 D}

characterized by

\forall P \in S_{3 D}, I_{p} [f (P)] = 0,

(7)

The corresponding 2D projections of

P \in S_{3 D}

can be denoted as

S_{2 D} = f (P) .

(8)

To enhance the alignment, we refine the outer contours of the target sketch

I

. This refinement employs a ray-shooting algorithm from all four borders of the image to precisely preserve the initial encounter with black pixels, resulting in a refined sketch

I_{f}

. In this refined sketch,

I_{f} [p] = 0

denotes pixels p lying on the contour, while

I_{f} [p] = 1

denotes background pixels. This process effectively isolates and highlights the outermost features of the sketch, disregarding interior details.

Our objective is to align the filtered sketch

I_{f}

and the external contours of the projected mesh

I_{p}

as closely as possible. We define our objective function using a bidirectional 2D Chamfer loss,

L_{C D} = \sum_{u \in S_{2 D}} min_{v | I_{f} [v] = 0} {∥u - v∥}^{2} + \sum_{v | I_{f} [v] = 0} min_{u \in S_{2 D}} {∥u - v∥}^{2} .

(9)

Here,

u \in S_{2 D}

is the projection of the 3D vertices in

S_{3 D}

, and the coordinates of the 3D vertices in

S_{3 D}

are differentiable concerning

θ

. Since f is differentiable, so are their 2D projections in

S_{2 D}

and

L_{C D}

as whole.

3.5. Optimizing Based on Symmetry

In this section, our goal is to refine the 3D eyeglass frame mesh model by leveraging the symmetrical properties of the object. To compensate for the missing information from unseen views, we assume the eyeglass frame exhibits perfect bilateral symmetry, where the left part mirrors the right. This assumption holds under the condition that deformations or asymmetrical designs are minimal. We exploit this symmetry by generating a mirrored version of the input sketch

I

, which we denote as

\tilde{I}

, achieved by a horizontal flip transformation. We obtain the corresponding camera poses

π_{m}

by applying a fixed transformation matrix, which reflects the original camera extrinsic parameters

π_{s}

across the axis of symmetry. The intrinsic parameters are unchanged. The mirror image acts as a pseudo-projected image for a hypothetical viewpoint, enabling us to infer the appearance and geometry from angles not captured in the original view.

Using

\tilde{I}

, we generate a filtered sketch

{\tilde{I}}_{f}

, akin to the process described in Section 3.4 for the original sketch

I

. To optimize the model, we extend our bidirectional 2D Chamfer loss function to incorporate the symmetry,

\begin{matrix} L_{C D} = \sum_{u \in S_{2 D}} min_{v | I_{f} [v] = 0} {∥ u - v ∥}^{2} + \sum_{v | I_{f} [v] = 0} min_{u \in S_{2 D}} {∥ u - v ∥}^{2} \\ + \sum_{u \in S_{2 D}} min_{\tilde{v} | {\tilde{I}}_{f} [\tilde{v}] = 0} ∥ u - \tilde{v} ∥^{2} + \sum_{\tilde{v} | {\tilde{I}}_{f} [\tilde{v}] = 0} min_{u \in S_{2 D}} {∥ u - \tilde{v} ∥}^{2} \end{matrix}

This comprehensive approach helps minimize discrepancies between the 3D models and observed and inferred 2D representations.

3.6. Implementation Details

Train. We implement our models using PyTorch Lightning [47]. For the encoder, we freeze the DINOv2 model and only train the final linear layer, which projects the feature dimensions from 384 to 256. For the decoder, we set

δ = 0.1

and use a feed-forward network consisting of eight fully connected layers, each with dropout applied. All internal layers are 512-dimensional and utilize ReLU activations, while the output layer employs a tanh activation to regress the SDF values. We found batch normalization [48] unstable during training, so we instead applied weight normalization [49]. We use the Adam optimizer [50] with a learning rate of

10^{- 4}

and train the network for 400,000 iterations across four NVIDIA RTX 3090 GPUs, with a batch size of 16 per GPU. Figure 5 illustrates the training process of a model that takes an input sketch and employs an encoder–decoder network to predict SDF values. The training focuses on minimizing the discrepancy between the predicted and actual SDF values.

Inference. At inference time, we first retrieve the latent vector

Z

from the encoder, similarly to the training process. Next, we voxelize the 3D space using a grid

\tilde{V}

consisting of

d^{3}

voxels and compute the 3D coordinates for each voxel

v \in \tilde{V}

based on the voxel size and origin. Finally, we process the entire voxel grid in batches, along with the latent vector

Z

, using the decoder to obtain the predicted SDF value of each voxel v. Figure 6 outlines the inference process, which begins with the encoder’s latent vector. The 3D space is then voxelized, and the decoder is used to predict SDF values for each voxel, enabling the reconstruction of the 3D model. The process is further refined by incorporating symmetry, where a mirrored sketch and a symmetrical loss function are applied to enhance the accuracy of the 3D eyeglass frame model.

4. Results

4.1. Dataset

4.1.1. Data Processing

We developed a benchmark eyeglass frames dataset specifically for this task. Professional designers created most models in the repository using commercial modeling software, and these models are used in real-world production. To train our continuous SDF model, we generated SDF samples X for each mesh, comprising 3D points and their corresponding SDF values

S D F (X)

.

For data preparation, we begin by normalizing each mesh to fit within a unit sphere, ensuring uniformity and scale consistency across different meshes. Subsequently, we sample 500,000 spatial points X, with a higher density of points near the object’s surface, to capture finer details of the SDF in that region.

We begin processing the mesh files by extracting the vertex indices of the mesh faces. Using these indices, we calculate the area of each triangle formed by the vertices. This step is essential for creating a cumulative distribution function (CDF) based on the triangle areas, enabling weighted random sampling. To construct the CDF, we iterate through the mesh faces, calculating the area of each triangle and updating the CDF accordingly. This iterative process ensures that the CDF accurately reflects the distribution of triangle areas, with larger triangles contributing more significantly due to their greater surface area.

After constructing the CDF, we use a random number generator to sample points from the mesh surface. The CDF guides this random sampling, ensuring that the likelihood of selecting a point from a particular triangle is proportional to the triangle’s area. This weighted sampling method increases the probability of sampling from larger triangles, covering a larger area, and ensuring a more accurate representation. Consequently, the distribution of sampled points accurately reflects the mesh surface, capturing both prominent and finer details.

4.1.2. Render

We employ a synthetic eyeglass frame model for training and testing, randomly dividing the dataset into 4604 training samples and 512 testing samples. For each object and its corresponding 3D mesh, we select 12 random azimuth and elevation angles to encompass a diverse range of perspectives. The cameras are carefully aligned to focus on the object’s centroid, with consistent distances and focal lengths maintained across all viewpoints to ensure uniformity.

To generate

256 \times 256

binary sketches for each viewpoint, we use Sobel filters [51] to compute gradients and detect edges. This technique is essential for capturing the object’s fine details and contours, as demonstrated in Figure 7. The Sobel filters are particularly effective at identifying edges in normal and depth maps of the rendered object. In the depth map, these edges correspond to depth discontinuities, while in the normal map, they highlight sharp ridges and valleys.

4.2. Metric

Once we obtain the reconstructed eyeglass frame model, we compute the Chamfer Distance (CD) between the reconstructed model and the ground truth. Specifically, we sample a series of points on the boundary of the objects. Given predicted and target point sets, denoted as P and

P_{g t}

(

P, P_{g t} \subset R^{3}

), the Chamfer Distance (CD) between the two point clouds can be calculated as follows:

d_{C D} (P ∥ P_{g t}) = \sum_{p \in P} min_{q \in P_{g t}} {∥ p - q ∥}_{2}^{2} + \sum_{q \in P_{g t}} min_{p \in P} {∥ p - q ∥}_{2}^{2}

(10)

4.3. Comparison

We demonstrated the effectiveness of our method through a comparative analysis with two existing techniques, as discussed in the studies by Remelli et al. [38] and Guillard et al. [37]. We computed the Chamfer Distance between the reconstructed models and the ground truth using Equation (10). Although our method yields Chamfer distances slightly higher than those reported in the studies by Remelli et al. [38] and Guillard et al. [37], the distances consistently clustered between 0 and 1 for most models. While our method may not often achieve the lowest Chamfer distances, it reliably produces high-quality reconstructions that closely match the ground truth (as illustrated in Figure 8).

Figure 9 provides an example of reconstruction. Each image in this figure shows the overlap between the eyeglass frames model reconstructed using a specific method and the real model. The visual comparison demonstrates that our method yields a model that resembles the real one. This close alignment suggests that our approach is more accurate and reliable in capturing the intricacies of the original model.

Figure 10 showcases our reconstruction outcomes across diverse eyeglass frames. The first row presents the successful reconstruction of an eyeglass frames model using different methods. Our method produces smoother edges, underscoring its capability to generate more refined and polished reconstructions. This smoothness is pivotal for achieving a high level of realism and visual appeal in the final model. In the second row, a subtle concavity at the corner of the ground truth model is highlighted, a feature that poses a significant challenge for accurate reconstruction. The methods by Remelli et al. [38] and Guillard et al. [37] result in coarse reconstructions that do not accurately capture this detail. In contrast, our method delivers a more detailed reconstruction, likely due to the incorporation of symmetry optimization during the process, which enhances the preservation and replication of the model’s subtle features.

Further examples in the third to fifth rows illustrate the model’s ability to handle thin sections highlighted within the red box. While the reconstruction by Remelli et al. [38] fails to connect these sections, and Guillard et al. [37] only partially succeeds, our results are more congruent with the ground truth model. This fact suggests the effectiveness of our approach in maintaining delicate features.

5. Discussion

5.1. Reconstructing Non-Symmetrical Designs

This study focuses on the 3D reconstruction of symmetrical designs, exemplified by objects such as eyeglass frames, which are prevalent in industrial and architectural domains. Symmetry is a defining feature in designs ranging from automobiles and monitors to iconic architectural structures like the Louvre Pyramid. However, non-symmetrical designs are equally pervasive in these fields, characterized by their unique geometries and deliberate asymmetry. Such designs often embody a designer’s intention to create dynamic, organic forms that deviate from traditional symmetrical patterns [52]. In industrial design, non-symmetrical examples include gaming mice designed for one-handed use, while architectural masterpieces like the Guggenheim Museum illustrate the aesthetic and structural appeal of asymmetry.

Building on our approach for the 3D reconstruction of symmetrical designs, we propose extending deep learning techniques to address the challenges associated with reconstructing non-symmetrical 3D shapes. Specifically, our method for symmetrical designs can be adapted to handle non-symmetrical ones by incorporating advanced computational solutions. Deep learning algorithms, which have demonstrated robust performance in reconstructing symmetrical 3D shapes, can be further refined to manage the irregularities and complexities of non-symmetrical geometries. For example, adding specialized layers to the neural network architecture could enhance its capacity to process the variations intrinsic to non-symmetrical designs. Additionally, leveraging Generative Adversarial Networks (GANs) can improve the model’s ability to infer and reconstruct missing or ambiguous components, facilitating the creation of accurate 3D models even from incomplete or imprecise sketches. By integrating these strategies, our approach becomes more versatile, enabling the effective reconstruction of both symmetrical and non-symmetrical 3D shapes while accommodating diverse design features.

5.2. Limitations and Future Work

While our method has demonstrated significant promise in interpreting FDS for 3D reconstruction, it is important to acknowledge its current limitations, particularly in handling incomplete, distorted, or ambiguous sketches. Our primary objective is to enhance the interpretation of sketches created by designers during the early stages of conceptualization. However, a key challenge remains: existing approaches, including ours, face difficulties when dealing with incomplete sketches that exhibit extreme distortions, ambiguous contours, overlapping strokes, or missing elements. This is especially problematic when sketches contain support lines or partial details, as current methods often merge these lines with adjacent curves, thereby losing critical contextual information required for accurate 3D reconstruction.

To address these challenges, we propose several strategies aimed at enhancing the robustness of our method in handling incomplete, distorted, or ambiguous sketches. Previous works have already tackled similar issues. For instance, Bessmeltsev and Solomon [53] introduced an image-processing technique to extract clean vector curves from incomplete sketches. By analyzing edge features and line continuity, their method can infer missing elements, which is essential for reconstructing 3D shapes from incomplete sketches. Similarly, Favreau et al. [54] employed deep learning algorithms to iteratively process sketch images, refining blurred or incomplete parts to generate more accurate 3D models. Additionally, Liu et al. [13] proposed a machine learning-based approach for reasoning over sketches, which predicts missing or ambiguous parts by leveraging design patterns learned from a large dataset. This enables their system to infer plausible continuations of partial sketches. We suppose that integrating such machine learning methods into our pipeline could significantly enhance our model’s ability to handle missing components, ambiguous details, and extreme distortions, leading to more reliable 3D reconstructions even when the input sketch is incomplete, unclear, or contains overlapping strokes.

In addition to these technical improvements, we propose incorporating a user feedback loop into the system. This would allow the system to interact with the designer when encountering ambiguous or incomplete sketches, prompting for clarifications or additional input. This iterative feedback process would help ensure that the final 3D model aligns more closely with the designer’s intent. By adopting this collaborative approach, our system would not only become more robust in handling incomplete, distorted, or ambiguous sketches but also improve the overall user experience, making the process of converting freehand sketches into 3D models more intuitive and efficient.

6. Conclusions

This study investigated the potential of deep learning techniques for the 3D reconstruction of eyeglass frames from FDS. By employing advanced neural network architectures and integrating symmetry optimization, the proposed method effectively extracts detailed 3D information from FDS, achieving enhanced accuracy and reliability in the reconstruction process. This approach demonstrates significant advancements in capturing fine details and accommodating shape variations, offering a practical solution for industrial applications. Notably, this research contributes a novel framework that combines self-supervised learning and implicit representations, setting a new standard for AI-driven 3D modeling workflows in design contexts.

Despite these achievements, challenges remain in addressing the diversity and size of datasets, as well as the computational demands of the reconstruction process. These factors currently limit the scalability of the method for real-time and high-complexity applications. Future research should prioritize the development of more efficient models to reduce computational complexity, alongside expanding dataset diversity to enhance the generalizability of the approach. Additionally, exploring novel neural network architectures and integrating user-driven feedback mechanisms could further improve reconstruction accuracy and usability. These enhancements would bridge existing gaps, enabling broader adoption in design workflows and extending the applicability of AI-driven 3D modeling technologies.

Author Contributions

Conceptualization, D.Z.; methodology, X.Y.; software, X.Y.; validation, All the authors; formal analysis, G.W.; investigation, G.W.; writing—original draft preparation, G.W.; writing—review and editing, All the authors; supervision, D.Z. and X.Y.; project administration, D.Z.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the 2022 Stable Support Plan Program for Shenzhen-based Universities (20220815150554000).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, Y.; Wang, Q.; Xu, F. Eyeglasses 3D Shape Reconstruction from a Single Face Image. In Proceedings of the 16th European Conference on Computer Vision (ECCV 2020), Glasgow, UK, 23–28 August 2020; Part XXV 16. pp. 372–387. [Google Scholar] [CrossRef]
Zhang, C.; Pinquié, R.; Polette, A.; Carasi, G.; De Charnace, H.; Pernot, J.P. Automatic 3D cad models reconstruction from 2D orthographic drawings. Comput. Graph. 2023, 114, 179–189. [Google Scholar] [CrossRef]
Gryaditskaya, Y.; Hähnlein, F.; Liu, C.; Sheffer, A.; Bousseau, A. Lifting freehand concept sketches into 3D. ACM Trans. Graph. 2020, 39, 167. [Google Scholar] [CrossRef]
Gryaditskaya, Y.; Hähnlein, F.; Liu, C.; Sheffer, A.; Bousseau, A. Opensketch: A richly-annotated dataset of product design sketches. ACM Trans. Graph. 2019, 38, 232. [Google Scholar] [CrossRef]
Bae, S.H.; Balakrishnan, R.; Singh, K. Everybody loves sketch: 3D sketching for a broader audience. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology, Victoria, BC, Canada, 4–7 October 2009; pp. 59–68. [Google Scholar] [CrossRef]
Advantages And Disadvantages of Using Computer Aided Design (CAD). Available online: https://www.arcvertex.com/article/advantages-and-disadvantages-of-using-computer-aided-design-cad/ (accessed on 29 November 2024).
Fan, J.; Feng, Y.; Mo, J.; Wang, S.; Liang, Q. 3D reconstruction of non-textured surface by combining shape from shading and stereovision. Measurement 2021, 185, 110029. [Google Scholar] [CrossRef]
Cordier, F.; Seo, H.; Melkemi, M.; Sapidis, N.S. Inferring mirror symmetric 3D shapes from sketches. Comput. Aided Des. 2013, 45, 301–311. [Google Scholar] [CrossRef]
Guillou, E.; Meneveaux, D.; Maisel, E.; Bouatouch, K. Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Vis. Comput. 2000, 16, 396–410. [Google Scholar] [CrossRef]
Lun, Z.; Gadelha, M.; Kalogerakis, E.; Maji, S.; Wang, R. 3D shape reconstruction from sketches via multi-view convolutional networks. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Pasadena, CA, USA, 25–28 October 2017; pp. 67–77. [Google Scholar]
Simo-Serra, E.; Iizuka, S.; Sasaki, K.; Ishikawa, H. Learning to simplify: Fully convolutional networks for rough sketch cleanup. ACM Trans. Graph. 2016, 35, 121. [Google Scholar] [CrossRef]
Simo-Serra, E.; Iizuka, S.; Ishikawa, H. Mastering sketching: Adversarial augmentation for structured prediction. ACM Trans. Graph. 2018, 37, 11. [Google Scholar] [CrossRef]
Liu, C.; Rosales, E.; Sheffer, A. Strokeaggregator: Consolidating raw sketches into artist-intended curve drawings. ACM Trans. Graph. 2018, 37, 97. [Google Scholar] [CrossRef]
Samavati, T.; Soryani, M. Deep learning-based 3D reconstruction: A survey. Artif. Intell. Rev. 2023, 56, 9175–9219. [Google Scholar] [CrossRef]
Bruno, F.; Bruno, S.; De Sensi, G.; Luchi, M.-L.; Mancuso, S.; Muzzupappa, M. From 3D reconstruction to virtual reality: A complete methodology for digital archaeological exhibition. J. Cult. Herit. 2010, 11, 42–49. [Google Scholar] [CrossRef]
Aldoy, N.; Evans, M. A review of digital industrial and product design methods in UK higher education. Des. J. 2011, 14, 343–368. [Google Scholar] [CrossRef]
Perez-Gomez, A. Architecture as drawing. J. Archit. Educ. 1982, 36, 2–7. [Google Scholar] [CrossRef]
Unwin, S. Analysing architecture through drawing. Build. Res. Inform. 2007, 35, 101–110. [Google Scholar] [CrossRef]
Pei, E.; Campbell, I.; Evans, M. A taxonomic classification of visual design representations used by industrial designers and engineering designers. Des. J. 2011, 14, 64–91. [Google Scholar] [CrossRef]
Chen, X.; Kang, S.B.; Xu, Y.-Q.; Dorsey, J.; Shum, H.-Y. Sketching reality: Realistic interpretation of architectural designs. ACM Trans. Graph. 2008, 27, 11. [Google Scholar] [CrossRef]
Richards, J. Freehand renaissance: Concept sketching for a digital age. Res. Rec. 2014, 11–17. [Google Scholar]
Richards, J. Freehand Drawing and Discovery, Enhanced Edition: Urban Sketching and Concept Drawing for Designers, 1st ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Eissen, K.; Steur, R. Sketching: The Basics; Bis Publishers: Amsterdam, The Netherlands, 2011. [Google Scholar]
Leclerc, Y.G.; Fischler, M. An Optimization-Based Approach to the Interpretation of Single Line Drawings as 3D Wire Frames. IEEE Int. J. Comput. Vis. 1992, 9, 113–136. [Google Scholar] [CrossRef]
Lipson, H.; Shpitalni, M. Optimization-based reconstruction of a 3D object from a single freehand line drawing. Comput. Aided Des. 1996, 28, 651–663. [Google Scholar] [CrossRef]
Jung, A.; Hahmann, S.; Rohmer, D.; Begault, A.; Boissieux, L.; Cani, M.-P. Sketching folds: Developable surfaces from non-planar silhouettes. ACM Trans. Graph. 2015, 34, 155. [Google Scholar] [CrossRef]
Ilić, S.; Fua, P. Implicit meshes for surface reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 8, 328–333. [Google Scholar] [CrossRef] [PubMed]
Malik, J.; Maydan, D. Recovering three-dimensional shape from a single image of curved objects. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 555–566. [Google Scholar] [CrossRef]
Chen, C.; Liu, D.; Xu, C.; Truong, T.-K. GeneCGAN: A conditional generative adversarial network based on genetic tree for point cloud reconstruction. Neurocomputing 2021, 462, 46–58. [Google Scholar] [CrossRef]
Zang, Y.; Ding, C.; Chen, T.; Mao, P.; Hu, W. Deep3DSketch++: High-Fidelity 3D Modeling from Single Free-hand Sketches. arXiv 2023, arXiv:2310.18178. [Google Scholar] [CrossRef]
Li, C.; Pan, H.; Liu, Y.; Tong, X.; Sheffer, A.; Wang, W. Robust flow-guided neural prediction for sketch-based freeform surface modeling. In Proceedings of the ACM SIGGRAPH Asia, Brisbane, Australia, 12–15 December 2018. [Google Scholar] [CrossRef]
Wang, J.; Lin, J.; Yu, Q.; Liu, R.; Chen, Y.; Yu, S.X. 3D shape reconstruction from free-hand sketches. arXiv 2020, arXiv:2006.09694. [Google Scholar] [CrossRef]
Zhang, S.H.; Guo, Y.C.; Gu, Q.W. Sketch2Model: View-aware 3D Modeling from Single Free-hand Sketch. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021. [Google Scholar] [CrossRef]
Zhong, Y.; Qi, Y.; Gryaditskaya, Y.; Zhang, H.; Song, Y.-Z. Towards Pract. Sketch-Based 3D Shape Gener. Role Prof. Sketches. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 3518–3528. [Google Scholar] [CrossRef]
Delanoy, J.; Aubry, M.; Isola, P.; Efros, A.A.; Bousseau, A. 3D sketching using multi-view deep volumetric prediction. In Proceedings of the ACM on Computer Graphics and Interactive Techniques; ACM: New York, NY, USA, 2018; pp. 1–22. [Google Scholar] [CrossRef]
Han, Z.; Ma, B.; Liu, Y.S.; Zwicker, M. Reconstr. 3D Shapes Mult. Sketches Using Direct Shape Optimization. IEEE Trans. Image Process. 2020, 29, 8721–8734. [Google Scholar] [CrossRef]
Guillard, B.; Remelli, E.; Yvernay, P.; Fua, P. Sketch2Mesh: Reconstructing and Editing 3D Shapes from Sketches. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 13003–13012. [Google Scholar] [CrossRef]
Remelli, E.; Lukoianov, A.; Richter, S.; Guillard, B.; Bagautdinov, T.; Baque, P.; Fua, P. MeshSDF: Differentiable Iso-Surface Extraction. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020; pp. 22468–22478. [Google Scholar]
Lorensen, W.; Cline, H. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. In Proceedings of the ACM SIGGRAPH, Anaheim, CA, USA, 27–31 July 1987; pp. 163–169. [Google Scholar] [CrossRef]
Livnat, Y.; Shen, H.W.; Johnson, C.R. A Near optimal Isosurface Extraction Algorithm Using the Span Space. IEEE Trans. Vis. Comput. Graph. 1996, 2, 73–84. [Google Scholar] [CrossRef]
Itoh, T.; Yamaguchi, Y.; Koyamada, K. Fast isosurface generation using the volume thinning algorithm. IEEE Trans. Vis. Comput. Graph. 2001, 7, 32–46. [Google Scholar] [CrossRef]
Van Gelder, A.; Wilhelms, J. Topological considerations in isosurface generation. ACM Trans. Graph. 1994, 13, 337–375. [Google Scholar] [CrossRef]
Newman, T.S.; Yi, H. A survey of the marching cubes algorithm. Comput. Graph. 2006, 30, 854–879. [Google Scholar] [CrossRef]
Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. Dinov2: Learning robust visual features without supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar] [CrossRef]
Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.A.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 165–174. [Google Scholar]
Ravi, N.; Reizenstein, J.; Novotny, D.; Gordon, T.; Lo, W.Y.; Johnson, J.; Gkioxari, G. Accelerating 3D Deep Learning with PyTorch3D. arXiv 2020, arXiv:2007.08501. [Google Scholar] [CrossRef]
Pytorch Lightning. Available online: https://github.com/Lightning-AI/pytorch-lightning (accessed on 25 July 2024).
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
Salimans, T.; Kingma, D.P. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 901–909. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
Saito, T.; Takahashi, T. Comprehensible Rendering of 3-D Shapes. In Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, Dallas, TX, USA, 4–8 August 1990; pp. 51–58. [Google Scholar] [CrossRef]
Incorporating Asymmetry into Your Architectural Design—Architecture Plus LLC. Available online: https://care-can.squarespace.com/blog/2021/10/2/incorporating-asymmetry-into-your-architectural-design (accessed on 4 December 2024).
Bessmeltsev, M.; Solomon, J. Vectorization of line drawings via poly vector fields. ACM Trans. Graph. 2018, 38, 1–12. [Google Scholar] [CrossRef]
Favreau-Lessard, A.J.; Ryzhov, S.; Sawyer, D.B. Novel biological therapies targeting heart failure: Myocardial rejuvenation. Heart Fail Clin. 2016, 12, 461–471. [Google Scholar] [CrossRef] [PubMed]

Figure 1. FDS using digital sketching applications converted to D3DM (developed by the authors).

Figure 2. Product sketching of the eyeglass frame outline using Autodesk Sketchbook.

Figure 3. Pipeline. Our reconstruction approach starts with feature extraction from a fixed self-supervised backbone acting as an encoder (Section 3.2). These features are then processed through a perception neural network Multi-Layer Perceptron (MLP) to directly regress the continuous Signed Distance Function (SDF) from point samples (Section 3.3). To improve the accuracy of our eyeglass model, we initially minimize the 2D Chamfer Distance between the projected mesh contour and the sketch outline (Section 3.4). The reconstruction model achieves its ultimate form by incorporating symmetry considerations (Section 3.5).

Figure 4. DINOv2 and Multi-Layer Perceptrons. (a) DINOv2; (b) Layer 1; (c) Layer 2; (d) Layer 3.

Figure 5. Train pipeline.

Figure 6. Inference pipeline.

Figure 7. Input model and its corresponding synthetic sketch. (a) Input model; (b) synthetic sketch.

Figure 8. Comparison of Chamfer distances. (Top): histogram of Chamfer distances between the reconstructed models and the ground truth. (Bottom): average Chamfer distances between the reconstructed models and the ground truth [37,38].

Figure 9. The overlap between the reconstructed model and the ground truth, with higher overlap indicating better results (where gray represents the reconstructed model and purple represents the ground truth). (a) Meshsdf [38] (b) Sketch2Mesh [37] (c) Ours.

Figure 10. Qualitative results on test set. (Column 1) shows ground truth models. (Column 2) shows reconstructed models obtained by Meshsdf [38], (Column 3) shows reconstructed models obtained by Sketch2Mesh [37], (Column 4) shows reconstructed models obtained by ours.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, D.; Wei, G.; Yuan, X. Three-Dimensional Shape Reconstruction from Digital Freehand Design Sketching Based on Deep Learning Techniques. Appl. Sci. 2024, 14, 11717. https://doi.org/10.3390/app142411717

AMA Style

Zhou D, Wei G, Yuan X. Three-Dimensional Shape Reconstruction from Digital Freehand Design Sketching Based on Deep Learning Techniques. Applied Sciences. 2024; 14(24):11717. https://doi.org/10.3390/app142411717

Chicago/Turabian Style

Zhou, Ding, Guohua Wei, and Xiaojun Yuan. 2024. "Three-Dimensional Shape Reconstruction from Digital Freehand Design Sketching Based on Deep Learning Techniques" Applied Sciences 14, no. 24: 11717. https://doi.org/10.3390/app142411717

APA Style

Zhou, D., Wei, G., & Yuan, X. (2024). Three-Dimensional Shape Reconstruction from Digital Freehand Design Sketching Based on Deep Learning Techniques. Applied Sciences, 14(24), 11717. https://doi.org/10.3390/app142411717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Three-Dimensional Shape Reconstruction from Digital Freehand Design Sketching Based on Deep Learning Techniques

Abstract

1. Introduction

2. Related Work

2.1. FDS Empowered by Digital Applications

2.2. 3D Reconstruction from FDS and Converting Implicit Functions to Surface Meshes

3. Methods

3.1. Problem Statement

3.2. Feature Extraction Using DINO

3.3. Implicit Representation

3.4. Minimizing 2D Chamfer Distance

3.5. Optimizing Based on Symmetry

3.6. Implementation Details

4. Results

4.1. Dataset

4.1.1. Data Processing

4.1.2. Render

4.2. Metric

4.3. Comparison

5. Discussion

5.1. Reconstructing Non-Symmetrical Designs

5.2. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI