IndustrialNeRF: Accurate 3D Industrial Digital Twin Based on Integrating Neural Radiance Fields Using Unsupervised Learning

Zhou, Hui; Xu, Juangui; Lin, Hongbin; Nie, Zhenguo; Zheng, Li

doi:10.3390/app14125336

Open AccessArticle

IndustrialNeRF: Accurate 3D Industrial Digital Twin Based on Integrating Neural Radiance Fields Using Unsupervised Learning

by

Hui Zhou

^1,2,

Juangui Xu

^3,4,

Hongbin Lin

^3,5,

Zhenguo Nie

^3,*

and

Li Zheng

^1,*

¹

Department of Industrial Engineering, Tsinghua University, Beijing 100084, China

²

Beijing Science and Technology Achievements Transformation Service Center, Beijing 100084, China

³

Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China

⁴

College of Cyberspace Security, Guangzhou University, Guangzhou 510006, China

⁵

Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5336; https://doi.org/10.3390/app14125336

Submission received: 5 May 2024 / Revised: 9 June 2024 / Accepted: 13 June 2024 / Published: 20 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

In the era of Industry 4.0, digital twin technology is revolutionizing traditional manufacturing paradigms. However, the adoption of this technology in modern manufacturing systems is fraught with challenges due to the scarcity of labeled data. Specifically, existing supervised machine learning algorithms, with their reliance on voluminous training data, find their applicability constrained in real-world production settings. This paper introduces an unsupervised 3D reconstruction approach tailored for industrial applications, aimed at bridging the data void in creating digital twin models. Our proposed model, by ingesting high-resolution 2D images, autonomously reconstructs precise 3D digital twin models without the need for manual annotations or prior knowledge. Through comparisons with multiple baseline models, we demonstrate the superiority of our method in terms of accuracy, speed, and generalization capabilities. This research not only offers an efficient approach to industrial 3D reconstruction but also paves the way for the widespread adoption of digital twin technology in manufacturing.

Keywords:

industrial digital twin; neural radiance fields; unsupervised learning; 3D reconstruction

1. Introduction

Digital twins in the industrial realm serve as a vital bridge linking the physical and digital worlds within manufacturing systems. This innovative approach paves the way for real-time data exchange, monitoring, and advanced analytics, which are crucial for Industry 4.0 practices [1]. Among these, the three-dimensional (3D) digital twin models stand out for their significance in creating a holistic representation of products and processes.

These 3D models play an instrumental role in various industrial applications, especially in the areas of product design, simulation, testing, and production [2,3]. By providing a comprehensive digital representation of a physical asset, these models enable businesses to make better-informed decisions, reduce system downtime, and enhance product quality [4]. These models not only offer engineers and designers an intuitive tool that facilitates observing and refining designs in a virtual setting but also present the entire production team with a unified reference. Such references are invaluable in ensuring precision and consistency throughout the actual production process, leading to optimized workflows and improved product lifecycles [5,6]. Digital twin technology has also been applied to underground utility tunnels for disaster management and infrastructure maintenance, demonstrating its versatility and effectiveness in diverse applications [7,8].

Moreover, integrating digital twin models with other technologies like artificial intelligence and machine learning can further enhance predictive maintenance, resource optimization, and product customization [9]. In essence, 3D digital twin models herald immense opportunities for next-generation manufacturing systems, steering the industry towards a future characterized by smart, sustainable, and efficient practices [10,11].

However, the adoption of digital twin technology in current manufacturing systems encounters numerous challenges. One of the primary impediments stems from the conservative nature of manufacturing enterprises, which results in a scarcity of labeled data. This limitation hinders the application of state-of-the-art supervised machine learning algorithms known for their remarkable performance [12]. Additionally, the strong dependency of deep learning models on training datasets and their limited generalization pose significant problems. These models require the operational scenario to align closely with the data distribution from which they were trained [13]. In manufacturing systems, substantial data heterogeneity exists due to variations between different factories or even equipment, leading to exorbitant costs for data recollection. Such costs significantly obstruct the widespread adoption of deep learning models in real-world manufacturing applications [14].

Furthermore, with the advent of sophisticated technologies such as autonomous driving, robotics, and advanced automation systems in the manufacturing domain [15], there is an escalating demand for 3D digital twin models to achieve higher precision and real-time generation. Thus, there is an urgent need to develop models capable of automatic data labeling or those that can operate through unsupervised learning [16]. Enhancing the utility and generalization of models using unsupervised learning, and improving their accuracy and computational speed, remain focal points for both the academic and industrial communities.

In this paper, we introduce IndustrialNeRF, a novel approach that leverages neural radiance fields to achieve unsupervised 3D reconstruction from 2D images, significantly reducing the cost of generating digital twins in industrial settings. Our method addresses the critical challenge of data scarcity in manufacturing by eliminating the dependency on labeled datasets, thereby streamlining the creation of digital twin models. By autonomously reconstructing precise 3D models from high-resolution images, IndustrialNeRF enhances the efficiency and accuracy of digital twin generation. This capability is crucial for real-time monitoring, predictive maintenance, and optimizing production processes, which are fundamental components of Industry 4.0.

Our key contributions are as follows:

Unsupervised 3D Reconstruction: We develop an unsupervised learning framework that obviates the need for manual annotations, facilitating cost-effective 3D model generation.
Modified Neural Radiance Fields: By enhancing the NeRF architecture, we achieve accurate and robust 3D reconstructions suitable for diverse industrial applications.
High-Resolution Model Generation: Our approach efficiently reconstructs high-resolution 3D digital twin models from 2D images, enabling real-time monitoring and advanced analytics in manufacturing environments.

2. Related Work

2.1. Digital Twins

Digital twins, at their core, represent a fusion of the physical and digital realms, enabling real-time data analysis and system monitoring. The term “digital twin” was initially introduced by Grieves, emphasizing its role as a virtual representation of a physical product or process, facilitating iterative development, testing, and optimization in a digital environment before real-world implementation [17].

In various industrial sectors, the adoption of digital twins has proved revolutionary. These virtual models have found extensive applications in sectors ranging from aerospace to manufacturing, assisting in tasks such as predictive maintenance, real-time monitoring, and fault detection. The integration of digital twins with IoT technologies has amplified operational efficiency, reducing system downtimes and refining product quality [3]. The real power of digital twins is realized when they are used to simulate real-world scenarios, thereby enabling pre-emptive problem identification and solution implementation.

Despite their immense potential, the deployment of digital twins in large-scale industrial systems is not without challenges. One major hurdle is the significant initial investment required in terms of infrastructure and expertise [5,18]. Furthermore, achieving synchronization between the physical and digital entities in real-time environments remains a technical challenge. Addressing data security concerns and ensuring seamless integration with existing systems are other challenges faced by industries. However, ongoing research and technological advancements promise solutions to these limitations, ensuring the wider adoption of digital twins in the future [19].

2.2. Unsupervised Learning

Unsupervised learning stands as one of the fundamental paradigms in machine learning, primarily focusing on deriving patterns and structures from data without labeled responses [20]. Contrasting with supervised learning, which relies on labeled datasets to make predictions or classifications, unsupervised learning delves into data’s intrinsic structures, offering the advantage of working with vast amounts of unlabeled data, making it particularly suitable for exploratory data analysis and feature discovery [21]. Figure 1 shows the process of unsupervised learning.

Unsupervised learning has found myriad applications across industries, especially in scenarios with vast unlabeled datasets. In the manufacturing realm, it is utilized for tasks like anomaly detection in machinery, where normal operations create a pattern and deviations from this pattern signal potential issues [22]. Additionally, it aids in segmenting market data for targeted product releases, allowing companies to better understand customer behaviors and preferences. Clustering methods, a subset of unsupervised learning, can be instrumental in grouping similar products or processes in manufacturing, optimizing resource allocation and production strategies [23].

While unsupervised learning presents novel opportunities, its deployment is not without challenges. A predominant concern is the interpretability of results, especially when complex algorithms like deep neural networks are employed [24]. The absence of labeled data can sometimes lead to spurious patterns or associations that may not have practical relevance. Ensuring data quality, handling high-dimensional data, and choosing the appropriate algorithm for specific tasks are further hurdles in the practical application of unsupervised learning. Researchers and practitioners are exploring ways to integrate domain knowledge to enhance the reliability and relevance of unsupervised models.

Recent advances in unsupervised learning have been driven by deep learning, with techniques like autoencoders and generative adversarial networks (GANs) leading the charge [25]. Variational autoencoders (VAEs) and transformers [26,27], especially in the context of natural language processing, have showcased the potential of unsupervised techniques in handling complex datasets and tasks. These state-of-the-art methods are continually being refined, promising even more potent applications in the future.

2.3. Neural Radiance Fields (NeRFs)

Neural Radiance Fields, commonly referred to as NeRFs, represent a novel approach in the realm of 3D reconstruction, employing deep neural networks to model volumetric scenes using sparse sets of 2D images [28]. This method capitalizes on the capability of neural networks to encode complex data relationships, thereby allowing for the generation of intricate 3D scenes. One of the main attractions of NeRFs is their ability to produce high-fidelity and continuously viewable scenes without the necessity for mesh-based representations, setting them apart from conventional 3D modeling techniques.

NeRFs have heralded significant advancements in the domain of 3D modeling and reconstruction. By leveraging a scene’s sparse radiance samples and optimizing over viewing angles and light directions, NeRFs synthesize novel views with impressive accuracy. Their applications span a diverse range, from virtual reality and augmented reality to film production and architectural visualization. Within an industrial context, NeRFs can be especially advantageous for product modeling and prototyping, offering a more detailed and adjustable representation compared to traditional methods [29].

While NeRFs present remarkable capabilities, their implementation, particularly in large-scale industrial scenarios, presents challenges. The computational intensity of NeRF algorithms, given their reliance on deep neural networks, can lead to longer rendering times, which might be infeasible in real-time applications [30]. Additionally, the quality of the reconstruction can sometimes be contingent on the diversity and number of 2D input images. Researchers are actively exploring methods to enhance the efficiency of NeRF implementations and ensure consistent quality across diverse input conditions. Recent research has aimed at improving the computational efficiency of NeRFs, leading to variants like FastNeRF and MicroNeRF that target real-time and embedded applications [31]. There is also ongoing exploration into combining NeRFs with other 3D reconstruction techniques, aiming to harness the strengths of multiple methods. Given the rapid advancements in the field, NeRFs and their derivatives are poised to redefine the landscape of 3D modeling and visualization in the forthcoming year.

3. Methodology

3.1. Initialization of Point Cloud and Color Discrimination Using NeRF

As shown in Figure 2, given the input view–camera pose pairs

(I, P)

, and resolution hyper-parameter r, an initial square point set

X = S (r)

of point count

r^{3}

is generated, which facilitates axis-based indexing. Subsequently, based on the camera pose P, the initial point set X is transformed from the world coordinate system to the corresponding camera coordinate system of the view.

The points in the camera coordinate system undergo position encoding as per the formula referenced in [32] given by the following:

PosE (p_{i}) = Concat (p_{i}^{x}, p_{i}^{y}, p_{i}^{z})

(1)

where

p_{i}^{x} [2 j] = sin (α x_{i} + β^{\frac{6 j}{C}})

(2)

and

p_{i}^{x} [2 j + 1] = cos (α x_{i} + β^{\frac{6 j}{C}})

(3)

Here, C represents the output dimension of position encoding, which is a multiple of 6, and

j \in [0, \frac{C}{6}]

.

Using a CNN, features from the view are extracted into a feature map denoted as

F = E (I)

. Each point from the point set in the camera coordinate system is then projected onto this feature map, and the feature channels of the corresponding pixel,

π (F, P, X)

, serve as the view-associated feature of the point.

Finally, these image-associated features combined with position encoding features are input to the NeRF network, obtaining voxel density and color values:

(σ, rgb) = N (π (F, P, X), PosE (P))

(4)

3.2. Three-Dimensional Coarse-to-Fine Sampling

The initial point set, once voxel densities are discerned, can be perceived as a 3D probability distribution,

σ = P D F (x, y, z)

. Inverse Transform Sampling (ITS) can be applied over this distribution. Given the ease of axis-based indexing of the initial point cloud, the distribution can be formulated as follows:

P D F (x, y, z) = \frac{σ_{i}}{\sum_{j} σ_{j}}

(5)

where

p_{i} = (x, y, z)

and

P D F (x, y) = \sum_{z} P D F (x, y, z)

(6)

P D F (x) = \sum_{y} P D F (x, y)

(7)

P D F (y | x) = \frac{P D F (x, y)}{P D F (x)}

(8)

P D F (z | x, y) = \frac{P D F (x, y, z)}{P D F (x, y)}

(9)

Using the ITS process, refined point cloud densities and colors are obtained.

3.3. Two-Dimensional Neural Radiance Rendering

Points from Section 3.1 and Section 3.2 with

σ > 0

are termed as cloud points. Using the target camera pose P, neural radiance points

R = R (P)

are sampled and the cloud points are transformed to the target camera coordinate system. For each neural radiance point, the k-Nearest Neighbor algorithm captures a maximum of k points within its radius r. These points are then aggregated through inverse distance weighting to obtain the voxel density–color pairs. Using the neural rendering formula cited in [28], all neural points on the neural radiance line are aggregated to retrieve the target pixel color.

3.4. Point Cloud Confidence Based on Rerendering

For direct generation of a 3D point cloud instead of 2D images, the algorithm mentioned in Section 3.3 is typically not used. Our methodology facilitates the inclusion of multiple source views to generate a more precise point cloud, necessitating the introduction and aggregation of confidences.

By leveraging the principle of the algorithm from Section 3.3 for introducing and aggregating confidence across multiple views, our work possesses two unique characteristics:

The precision of the point cloud augments with an increase in the number of input views.
Different input views contribute varying confidences to the same point in the point cloud, and these contributions are interpretable.

This implies that our point cloud can achieve high precision given sufficient input and computational resources.

Given the exponential decrease in light intensity when a ray passes through a 3D object’s surface, this intensity, represented as follows:

T_{i} = exp (- \sum_{j = 1}^{i - 1} σ_{j} δ_{j})

(10)

should be lower for parts of the 3D point cloud reconstruction that are obscured from view and higher for visible portions.

The voxel confidence for the point cloud point at the neural radiance point

T_{i}

is given by the following:

T_{i}^{j} = T_{i} \frac{W_{i}^{j}}{max (W_{i})}

(11)

where

W_{i}^{j}

denotes the inverse distance weight.

By performing 2D neural radiance rendering on the point cloud using the camera poses from the source views, voxel confidences for these viewpoints are derived, alongside the neural rendered images from the source views. The color confidence is calculated as follows:

S_{i} = max (1 - \frac{| | I_{i} - O_{i} | |}{L}, 0)

(12)

The aggregated point cloud confidence, combining voxel and color confidences, is given by the following:

t_{i}^{j} = S_{i} T_{i} \frac{W_{i}^{j}}{max (W_{i})}

(13)

In the case of multi-view aggregation, the confidence for each point is aggregated using the maximum value.

4. Experiment

4.1. Dataset Description

The primary dataset used in our experiments is a custom-assembled collection of industrial 3D models, which we shall refer to as the “Industrial 3D Twin Dataset”. This dataset comprises approximately 100 distinct 3D models. Unlike conventional datasets available in the public domain, our collection has been meticulously curated and sourced through various online platforms (https://3dwarehouse.sketchup.com/). A significant preprocessing step involved in the preparation of this dataset was denoising, ensuring that our 3D models maintained a high level of fidelity and were devoid of any artifacts or anomalies often seen in publicly sourced data.

To facilitate the training process without the direct need for the 3D models, high-resolution images were captured from various angles of each 3D model. These images are of

512 \times 512

resolution, serving as the primary input modality for our network, thereby simulating a real-world scenario where 3D models might not be readily available, but images can be easily sourced.

4.2. Experimental Environment and Model Implementation

All experiments were conducted on a workstation equipped with an NVIDIA RTX 3090 GPU. This high-performance GPU ensured swift training times and efficient resource utilization.

Our network architecture was designed and implemented using PyTorch v2.3.0. The choice of PyTorch was motivated by its extensive library of pre-built modules and its capability of handling complex neural network designs. A list of the hyperparameters and other pertinent training details will be discussed in subsequent sections.

4.3. Evaluation Metric: Earth Mover’s Distance (EMD)

The Earth Mover’s Distance (EMD), also known as the Wasserstein distance, is an effective measure for comparing two probability distributions. In the context of our study, where we aim to evaluate the similarity between the reconstructed 3D point cloud and the ground truth, EMD provides a nuanced understanding of the differences in terms of both geometry and density.

The primary reason behind selecting EMD as our evaluation metric is its ability to provide a more holistic view of the discrepancies between distributions, as opposed to simpler metrics that might only measure point-wise differences. EMD measures the minimum cost to transform one distribution into the other, which resonates well with the geometric nature of our task.

Mathematically, for two discrete distributions P and Q, the EMD is defined as follows:

E M D (P, Q) = \frac{\sum_{i, j} w_{i, j} d (p_{i}, q_{j})}{\sum_{i, j} w_{i, j}}

(14)

where

w_{i, j}

represents the flow between points

p_{i}

from P and

q_{j}

from Q, and

d (p_{i}, q_{j})

is the Euclidean distance between these points.

4.4. Baseline Models

To ensure a comprehensive evaluation of the proposed methodology, the results are compared with three widely recognized supervised 3D reconstruction algorithms that serve as baseline models:

3D-R2N2: A deep residual network that uses convolutional layers to predict the 3D structure of an object from one or more 2D images. Developed by researchers at Stanford and Adobe, this model has become a benchmark in 3D reconstructions from 2D images.
AtlasNet: Proposed by Facebook AI, this model utilizes a collection of 2D patches, or atlases, to reconstruct the 3D geometry of objects. It leverages a PointNet encoder, demonstrating high proficiency in generating detailed 3D shapes.
Occupancy Networks: Occupancy Networks represent a novel approach to 3D.

The rationale behind selecting these specific models as baselines is their prevalence in the domain of 3D reconstruction and their known efficacy in various scenarios. By juxtaposing our results with these established models, we aim to provide a clear benchmark for the capabilities of our proposed approach.

4.5. Experimental Results and Analysis

4.5.1. Model Performance Comparison

One of the primary metrics chosen for evaluating the performance of the proposed method against baseline models is Earth Mover’s Distance (EMD). This section provides a comparative analysis based on the EMD metric for all the models.

The results, as presented in Table 1, indicate that the proposed model outperforms the baseline models in terms of the EMD score. Notably, the EMD score for the proposed model is lower than that of the 3D-R2N2, AtlasNet, and Occupancy Networks, suggesting a more accurate 3D reconstruction from the 2D images.

It is essential to understand that, while the EMD score offers valuable insights into the performance of the models, the specific application context and other qualitative factors also play a crucial role in determining the effectiveness of the 3D reconstruction.

4.5.2. Qualitative Analysis

Further to the quantitative results, visual inspections of the reconstructed 3D models also revealed the superiority of our proposed model. The baseline models, especially in the context of complex industrial objects, occasionally showed artifacts or lacked some minor details. In contrast, our proposed model maintained a consistent quality across diverse object categories. A part of the visual results of our method is presented in Figure 3. And the loss scores of the visual results are shown in Table 2.

4.5.3. Discussion

The proposed model’s unique architecture and robustness, developed through training on high-resolution industrial images, sets it apart from baseline models. Unlike these models, which may produce artifacts or overlook minor details, our approach consistently delivers high-quality outputs across a wide range of object categories, piquing the interest of researchers and professionals in the field.

A comparative analysis with baseline models highlights the advantages of our method. As shown in Table 1, our model achieved a significantly lower EMD score (0.0121665) compared to 3D-R2N2 (0.0482132), AtlasNet (0.0759724), and Occupancy Networks (0.0618715). This indicates a more precise 3D reconstruction from 2D images. Our model’s superior performance in RGB 2D (MSE) and Depth 2D (MSE) further underscores its effectiveness.

Visual inspections further validate these findings. The reconstructed 3D models from our approach exhibited fewer artifacts and more accurately preserved intricate details than those from baseline models. This qualitative assessment confirms the quantitative results, showcasing the robustness and consistency of our method across different industrial objects.

Importantly, our methodology goes beyond generating Digital Shadows by creating true Digital Twin models. These comprehensive virtual representations of physical assets encompass real-time data integration and dynamic updating capabilities. Our model reconstructs accurate 3D geometries and embeds detailed color and texture information, providing a holistic and interactive digital counterpart of physical objects. This comprehensive approach instills confidence in the accuracy and utility of the Digital Twin models.

Our model’s ability to significantly reduce the cost and effort of generating Digital Twins makes advanced manufacturing and maintenance solutions more accessible and practical. The rapid generation of accurate and high-resolution Digital Twin models will enhance operational efficiencies and support decision-making processes in real-world industrial applications. By comparing our results with other studies, the contributions and improvements offered by our model become even more evident, instilling optimism about its potential impact in the field of 3D reconstruction and digital twin technology.

5. Conclusions

This study introduced a novel approach to 3D reconstruction using high-resolution industrial images, leveraging the principles of neural radiance fields in an unsupervised learning framework. Through extensive experimentation, our proposed model, IndustrialNeRF, demonstrated superior performance against notable baseline models such as 3D-R2N2, AtlasNet, and Occupancy Networks, as measured by the Earth Mover’s Distance (EMD) metric. The results highlight the potential of our method in transforming 2D images into detailed 3D reconstructions, particularly for complex industrial models.

Limitations and Future Work

While our model has shown promising results, it is essential to recognize its limitations. In terms of computational efficiency, although our model exhibits high accuracy, the computational demand, particularly in terms of memory usage, might be challenging for real-time applications or systems with limited resources. Additionally, the model was primarily trained and tested on industrial datasets, and its performance on diverse and more generic datasets remains to be explored. Handling larger or more intricate 3D models might require further optimizations, as the current architecture might not scale linearly with increasing complexity. Moreover, the model, though trained with denoised data, might be sensitive to noisy or imperfect input images, and robustness against such imperfections is crucial for real-world deployments.

Given these limitations, future work can focus on enhancing computational efficiency through model pruning or adopting more lightweight architectures to facilitate real-time applications. Expanding the dataset scope to ensure the model’s adaptability across various scenarios and object types is essential for improving its generalizability. Introducing noise augmentation during training can enhance robustness against imperfect input images, making the model more resilient to real-world conditions. Investigating multimodal input fusion techniques to leverage diverse data types (e.g., combining visual, depth, and thermal data) could result in more detailed and accurate 3D reconstructions.

In conclusion, while our proposed IndustrialNeRF model has successfully demonstrated the feasibility of unsupervised 3D reconstruction from high-resolution industrial images, it opens avenues for further research and development. Addressing the identified limitations will enhance the practical applicability of our approach, making it a more robust and scalable solution for various industrial and real-world applications. As technology progresses, we anticipate that 3D reconstruction will become increasingly integral to numerous fields, driving continuous innovation and improvement in this domain.

Author Contributions

Conceptualization, Z.N. and L.Z.; methodology, Z.N.; software, J.X. and H.L.; validation, H.Z., J.X. and H.L.; formal analysis, H.Z.; investigation, H.Z.; resources, L.Z.; data curation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, Z.N. and L.Z.; visualization, J.X.; supervision, Z.N.; project administration, L.Z.; funding acquisition, H.Z. and Z.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under Grant No. 2020YFB1713200, and the National Natural Science Foundation of China under Grant number 72188101.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors have no relevant financial or non-financial interests to disclose.

References

Pang, T.Y.; Pelaez Restrepo, J.D.; Cheng, C.T.; Yasin, A.; Lim, H.; Miletic, M. Developing a digital twin and digital thread framework for an ‘Industry 4.0’ Shipyard. Appl. Sci. 2021, 11, 1097. [Google Scholar] [CrossRef]
Liu, M.; Fang, S.; Dong, H.; Xu, C. Review of digital twin about concepts, technologies, and industrial applications. J. Manuf. Syst. 2021, 58, 346–361. [Google Scholar] [CrossRef]
Qi, Q.; Tao, F. Digital twin and big data towards smart manufacturing and industry 4.0: 360 degree comparison. IEEE Access 2018, 6, 3585–3593. [Google Scholar] [CrossRef]
Lattanzi, L.; Raffaeli, R.; Peruzzini, M.; Pellicciari, M. Digital twin for smart manufacturing: A review of concepts towards a practical industrial implementation. Int. J. Comput. Integr. Manuf. 2021, 34, 567–597. [Google Scholar] [CrossRef]
Haw, J.; Sing, S.L.; Liu, Z.H. Digital twins in design for additive manufacturing. Mater. Today Proc. 2022, 70, 352–357. [Google Scholar] [CrossRef]
Schleich, B.; Anwer, N.; Mathieu, L.; Wartzack, S. Shaping the digital twin for design and production engineering. CIRP Ann. 2017, 66, 141–144. [Google Scholar] [CrossRef]
Lee, J.; Lee, Y.; Park, S.; Hong, C. Implementing a Digital Twin of an Underground Utility Tunnel for Geospatial Feature Extraction Using a Multimodal Image Sensor. Appl. Sci. 2023, 13, 9137. [Google Scholar] [CrossRef]
Lee, J.; Lee, Y.; Hong, C. Development of Geospatial Data Acquisition, Modeling, and Service Technology for Digital Twin Implementation of Underground Utility Tunnel. Appl. Sci. 2023, 13, 4343. [Google Scholar] [CrossRef]
Feng, K.; Ji, J.; Zhang, Y.; Ni, Q.; Liu, Z.; Beer, M. Digital twin-driven intelligent assessment of gear surface degradation. Mech. Syst. Signal Process. 2023, 186, 109896. [Google Scholar] [CrossRef]
Kenett, R.S.; Bortman, J. The digital twin in Industry 4.0: A wide-angle perspective. Qual. Reliab. Eng. Int. 2022, 38, 1357–1366. [Google Scholar] [CrossRef]
Yao, X.; Zhou, J.; Zhang, J.; Boër, C.R. From intelligent manufacturing to smart manufacturing for industry 4.0 driven by next generation artificial intelligence and further on. In Proceedings of the 2017 5th International Conference on Enterprise Systems (ES), Beijing, China, 22–24 September 2017; pp. 311–318. [Google Scholar]
Saufi, S.R.; Ahmad, Z.A.B.; Leong, M.S.; Lim, M.H. Challenges and opportunities of deep learning models for machinery fault detection and diagnosis: A review. IEEE Access 2019, 7, 122644–122662. [Google Scholar] [CrossRef]
Arjovsky, M.; Bottou, L.; Gulrajani, I.; Lopez-Paz, D. Invariant risk minimization. arXiv 2019, arXiv:1907.02893. [Google Scholar]
Zhao, J.; Papapetrou, P.; Asker, L.; Boström, H. Learning from heterogeneous temporal data in electronic health records. J. Biomed. Inform. 2017, 65, 105–119. [Google Scholar] [CrossRef] [PubMed]
Leng, J.; Sha, W.; Lin, Z.; Jing, J.; Liu, Q.; Chen, X. Blockchained smart contract pyramid-driven multi-agent autonomous process control for resilient individualised manufacturing towards Industry 5.0. Int. J. Prod. Res. 2023, 61, 4302–4321. [Google Scholar] [CrossRef]
Bojanowski, P.; Joulin, A. Unsupervised learning by predicting noise. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 517–526. [Google Scholar]
Grieves, M. Intelligent digital twins and the development and management of complex systems. Digit. Twin 2022, 2, 8. [Google Scholar] [CrossRef]
Roy, R.B.; Mishra, D.; Pal, S.K.; Chakravarty, T.; Panda, S.; Chandra, M.G.; Pal, A.; Misra, P.; Chakravarty, D.; Misra, S. Digital twin: Current scenario and a case study on a manufacturing process. Int. J. Adv. Manuf. Technol. 2020, 107, 3691–3714. [Google Scholar] [CrossRef]
Aheleroff, S.; Xu, X.; Zhong, R.Y.; Lu, Y. Digital twin as a service (DTaaS) in industry 4.0: An architecture reference model. Adv. Eng. Inform. 2021, 47, 101225. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
Dike, H.U.; Zhou, Y.; Deveerasetty, K.K.; Wu, Q. Unsupervised learning based on artificial neural network: A review. In Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS), Shenzhen, China, 25–27 October 2018; pp. 322–327. [Google Scholar]
Borghesi, A.; Bartolini, A.; Lombardi, M.; Milano, M.; Benini, L. Anomaly detection using autoencoders in high performance computing systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 9428–9433. [Google Scholar]
García-Escudero, L.A.; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A. A review of robust clustering methods. Adv. Data Anal. Classif. 2010, 4, 89–109. [Google Scholar] [CrossRef]
Chakraborty, S.; Tomsett, R.; Raghavendra, R.; Harborne, D.; Alzantot, M.; Cerutti, F.; Srivastava, M.; Preece, A.; Julier, S.; Rao, R.M.; et al. Interpretability of deep learning models: A survey of results. In Proceedings of the 2017 IEEE Smartworld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (Smartworld/SCALCOM/UIC/ATC/CBDcom/IOP/SCI), San Francisco, CA, USA, 4–8 August 2017; pp. 1–6. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Šlapak, E.; Pardo, E.; Dopiriak, M.; Maksymyuk, T.; Gazda, J. Neural radiance fields in the industrial and robotics domain: Applications, research opportunities and use cases. arXiv 2023, arXiv:2308.07118. [Google Scholar]
Sengupta, S.; Gu, J.; Kim, K.; Liu, G.; Jacobs, D.W.; Kautz, J. Neural inverse rendering of an indoor scene from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8598–8607. [Google Scholar]
Garbin, S.J.; Kowalski, M.; Johnson, M.; Shotton, J.; Valentin, J. FastNeRF: High-fidelity neural rendering at 200FPS. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14346–14355. [Google Scholar]
Zhang, R.; Wang, L.; Wang, Y.; Gao, P.; Li, H.; Shi, J. Starting from Non-Parametric Networks for 3D Point Cloud Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5344–5353. [Google Scholar]

Figure 1. Unsupervised learning process.

Figure 2. The pipeline of our IndustrialNeRF model from image to digital twin generation. The process begins with capturing images and extracting point cloud data. These inputs are encoded through the View Encoder and processed by the MLP coarse and MLP fine networks. Multiple loss functions are used to optimize the 3D reconstruction. The resulting high-fidelity digital twin is suitable for stress analysis and object detection applications.

Figure 3. Experiment result samples.

Table 1. Comparison of EMD scores among various models. Lower EMD values indicate better performance.

Model	EMD Score
Ours	0.0121665
3D-R2N2	0.0482132
AtlasNet	0.0759724
Occupancy Networks	0.0618715

Table 2. Loss scores of the visual results.

Loss	Score
Point Cloud (EMD)	0.0121665
RGB 2D (MSE)	0.0986291
Depth 2D (MSE)	0.0117503

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, H.; Xu, J.; Lin, H.; Nie, Z.; Zheng, L. IndustrialNeRF: Accurate 3D Industrial Digital Twin Based on Integrating Neural Radiance Fields Using Unsupervised Learning. Appl. Sci. 2024, 14, 5336. https://doi.org/10.3390/app14125336

AMA Style

Zhou H, Xu J, Lin H, Nie Z, Zheng L. IndustrialNeRF: Accurate 3D Industrial Digital Twin Based on Integrating Neural Radiance Fields Using Unsupervised Learning. Applied Sciences. 2024; 14(12):5336. https://doi.org/10.3390/app14125336

Chicago/Turabian Style

Zhou, Hui, Juangui Xu, Hongbin Lin, Zhenguo Nie, and Li Zheng. 2024. "IndustrialNeRF: Accurate 3D Industrial Digital Twin Based on Integrating Neural Radiance Fields Using Unsupervised Learning" Applied Sciences 14, no. 12: 5336. https://doi.org/10.3390/app14125336

APA Style

Zhou, H., Xu, J., Lin, H., Nie, Z., & Zheng, L. (2024). IndustrialNeRF: Accurate 3D Industrial Digital Twin Based on Integrating Neural Radiance Fields Using Unsupervised Learning. Applied Sciences, 14(12), 5336. https://doi.org/10.3390/app14125336

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IndustrialNeRF: Accurate 3D Industrial Digital Twin Based on Integrating Neural Radiance Fields Using Unsupervised Learning

Abstract

1. Introduction

2. Related Work

2.1. Digital Twins

2.2. Unsupervised Learning

2.3. Neural Radiance Fields (NeRFs)

3. Methodology

3.1. Initialization of Point Cloud and Color Discrimination Using NeRF

3.2. Three-Dimensional Coarse-to-Fine Sampling

3.3. Two-Dimensional Neural Radiance Rendering

3.4. Point Cloud Confidence Based on Rerendering

4. Experiment

4.1. Dataset Description

4.2. Experimental Environment and Model Implementation

4.3. Evaluation Metric: Earth Mover’s Distance (EMD)

4.4. Baseline Models

4.5. Experimental Results and Analysis

4.5.1. Model Performance Comparison

4.5.2. Qualitative Analysis

4.5.3. Discussion

5. Conclusions

Limitations and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI