Dimensionality Reduction for the Real-Time Light-Field View Synthesis of Kernel-Based Models
Abstract
:1. Introduction
- Plenoptic abilities: A unique advantage of light field recordings is that they can capture direction-dependent and complex light phenomena, such as refractions, smoke, fire, and reflections of curved surfaces.
- Camera-captured material: The system should be able to create representations from real-world camera-captured images or videos. This comes with a few challenges, such as the lack of accurate depth information and dealing with calibration information (e.g., regular sampling grids cannot be assumed).
- Six degrees of freedom: The representation should be fit for allowing a user to move in three translational and three rotational degrees of freedom, synthesizing views accordingly.
- Large immersive volumes: The representation should allow the user to explore a large volume containing significant occluders (e.g., entering different rooms).
- Panoramic: As the user has complete freedom to look around, content needs to be available in an immersive way in all viewing directions.
- Low latency and low decoding complexity: Due to the interactive 6DoF nature, it is not a priori known how the user will consume the content. E.g., the user can behave unpredictably, wander off, walk far away, make abrupt movements, etc. The representation and system have to be able to keep up with these non-predetermined ways of consuming content.
- Streamable representation: Ideally, one would like to deliver these interactive experiences remotely. This requires the representation at hand to be streamable while respecting the interactive nature of content consumption.
- Real-time playback: The system and representation should be fast enough to deliver all of the above aspects in real time and on commodity hardware. For VR applications, the system should ideally achieve rendering at 90 frames per second with dual output.
2. Related Works
2.1. Light-Field-Related Works
2.1.1. Three-Dimensional Graphics
2.1.2. Depth + Image-Based Techniques
2.1.3. Fourier Disparity Layers
2.1.4. Neural Radiance Fields (NeRFs)
2.1.5. Kernel-Based Methods
2.2. Summary of SMoE for Light Fields
2.2.1. Preliminaries
2.2.2. Related Works in SMoE for Light Fields
3. Proposed Rendering Method
3.1. Constructing a View-Specific 2D Representation
3.2. GPU Implementation
4. Evaluation
4.1. Experiment Setup
4.1.1. Dataset
- barbershop (SILVR, cuboid/right face): The “barbershop” scene from the Blender Institute short movie Agent 327: Operation Barbershop [32] (licensed CC-BY). Camera grid: , spaced 11 cm apart.
- lone monk (SILVR, cuboid/front face): This scene was created by Carlo Bergonzini from Monorender (licensed CC-BY). Camera grid: , spaced 20 cm apart.
- zen garden (SILVR, cuboid/back face): This scene was created by Julie Artois and Martijn Courteaux from IDLab MEDIA (licensed CC-BY 4.0). Camera grid: , spaced 10 cm apart.
- kitchen (MPEG, frame 20): This dataset by Orange Labs was software-generated and features a kitchen with a table ready for breakfast and an owl and a spider [31]. This dataset has a rather small area. Camera grid: . Resolution .
- painter (MPEG, frame 246): This dataset was captured with real cameras and features a man in front of a canvas [30]. This dataset has a rather small area and few specularities. Camera grid: . Resolution .
4.1.2. Reconstruction Tasks
- ‘spin’ A trajectory where the camera moves from left to right while panning from right to left.
- ‘push–pull’ A trajectory where the camera moves several meters backward, while the field of view narrows, similar to the cinematic push–pull or dolly zoom effect.
- ‘zoom-to-xyz’ A trajectory where the camera moves both forward and narrows the field of view slightly towards a certain object of interest in the scene.
4.1.3. SMoE CPU Implementation Details
4.1.4. SMoE GPU Implementation Details
4.1.5. NeRF Experiment Details
4.2. Results and Discussion
4.2.1. Speed Versus Quality Trade-off
4.2.2. Correctness
4.2.3. Comparison with NeRFs
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proposed Reduction Method
- Appendix A.1 describes the used coordinate spaces and notational conventions.
- Appendix A.2 gives a high-level overview and the motivation behind the reduction method.
- Appendix A.3 formalizes the concepts introduced in the overview.
- Appendix A.4 derives a new two-dimensional SMoE model which approximates the function derived in Appendix A.3.
Appendix A.1. Definitions, Conventions and Notations
- Cameras look in the direction , such that corresponds to the world-coordinate-space direction in which the camera looks. Their up direction is the direction .
- The camera sensor coordinate space is two-dimensional (after dividing away the third homogeneous component). The sensor x and y coordinates both range from to .
Appendix A.2. Overview and Motivation of the Reduction
Appendix A.3. Intersecting the Camera Plane
Appendix A.4. Reducing to a 2D Model
Appendix A.4.1. Solving the Minimization to Obtain
Appendix A.4.2. Calculating the Derivative G
Appendix A.4.3. Transforming a Component
Appendix A.5. Numerical Stability
Appendix A.6. Approximation Compensation
Appendix A.7. Obtaining z-Depth
Appendix A.8. Dropping Components
References
- Gao, R.; Qi, Y. A Brief Review on Differentiable Rendering: Recent Advances and Challenges. Electronics 2024, 13, 3546. [Google Scholar] [CrossRef]
- Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph. 2022, 41, 102:1–102:15. [Google Scholar] [CrossRef]
- Wen, C.; Zhang, Y.; Li, Z.; Fu, Y. Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Lin, C.H.; Wang, O.; Russell, B.C.; Shechtman, E.; Kim, V.G.; Fisher, M.; Lucey, S. Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Rogge, S.; Schiopu, I.; Munteanu, A. Depth Estimation for Light-Field Images Using Stereo Matching and Convolutional Neural Networks. Sensors 2020, 20, 6188. [Google Scholar] [CrossRef] [PubMed]
- Zerman, E.; Ozcinar, C.; Gao, P.; Smolic, A. Textured Mesh vs Coloured Point Cloud: A Subjective Study for Volumetric Video Compression. In Proceedings of the 2020 12th International Conference on Quality of Multimedia Experience, QoMEX 2020, Athlone, Ireland, 26–28 May 2020. [Google Scholar] [CrossRef]
- Microsoft. Microsoft Mixed Reality Capture Studio. 2022. Available online: https://news.microsoft.com/source/features/work-life/microsoft-mixed-reality-capture-studios-create-holograms-to-educate-and-entertain/ (accessed on 14 October 2024).
- 8i. 8i Studio. 2022. Available online: https://8i.com (accessed on 14 October 2024).
- Buehler, C.; Bosse, M.; McMillan, L.; Gortler, S.; Cohen, M. Unstructured Lumigraph Rendering. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’01, New York, NY, USA, 12–17 August 2001; pp. 425–432. [Google Scholar] [CrossRef]
- Kellnhofer, P.; Jebe, L.; Jones, A.; Spicer, R.; Pulli, K.; Wetzstein, G. Neural Lumigraph Rendering. In Proceedings of the CVPR, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Overbeck, R.S.; Erickson, D.; Evangelakos, D.; Pharr, M.; Debevec, P. A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. In Proceedings of the SIGGRAPH Asia 2018 Technical Papers, SIGGRAPH Asia 2018, Tokyo, Japan, 4–7 December 2018; Volume 37, p. 15. [Google Scholar] [CrossRef]
- Broxton, M.; Flynn, J.; Overbeck, R.; Erickson, D.; Hedman, P.; Duvall, M.; Dourgarian, J.; Busch, J.; Whalen, M.; Debevec, P. Immersive light field video with a layered mesh representation. ACM Trans. Graph. 2020, 39, 15. [Google Scholar] [CrossRef]
- Boyce, J.M.; Dore, R.; Dziembowski, A.; Fleureau, J.; Jung, J.; Kroon, B.; Salahieh, B.; Vadakital, V.K.M.; Yu, L. MPEG Immersive Video Coding Standard. Proc. IEEE 2021, 109, 1521–1536. [Google Scholar] [CrossRef]
- Le Pendu, M.; Guillemot, C.; Smolic, A. A Fourier Disparity Layer Representation for Light Fields. IEEE Trans. Image Process. 2019, 28, 5740–5753. [Google Scholar] [CrossRef] [PubMed]
- Dib, E.; Pendu, M.L.; Guillemot, C. Light Field Compression Using Fourier Disparity Layers. In Proceedings of the Proceedings—International Conference on Image Processing, ICIP, Taipei, Taiwan, 22–25 September 2019. [Google Scholar] [CrossRef]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Glasgow, UK, 23–28 August 2020; Volume 12346 LNCS, pp. 405–421. [Google Scholar] [CrossRef]
- Qin, S.; Xiao, J.; Ge, J. Dip-NeRF: Depth-Based Anti-Aliased Neural Radiance Fields. Electronics 2024, 13, 1527. [Google Scholar] [CrossRef]
- Dong, B.; Chen, K.; Wang, Z.; Yan, M.; Gu, J.; Sun, X. MM-NeRF: Large-Scale Scene Representation with Multi-Resolution Hash Grid and Multi-View Priors Features. Electronics 2024, 13, 844. [Google Scholar] [CrossRef]
- Barron, J.T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. arXiv 2021, arXiv:2103.13415. [Google Scholar]
- Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. arXiv 2022, arXiv:2111.12077. [Google Scholar]
- Hu, W.; Wang, Y.; Ma, L.; Yang, B.; Gao, L.; Liu, X.; Ma, Y. Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields. In Proceedings of the ICCV, Paris, France, 1–6 October 2023. [Google Scholar]
- Verhack, R.; Sikora, T.; Lange, L.; Jongebloed, R.; Van Wallendael, G.; Lambert, P. Steered mixture-of-experts for light field coding, depth estimation, and processing. In Proceedings of the Proceedings—IEEE International Conference on Multimedia and Expo, Hong Kong, China, 10–14 July 2017; pp. 1183–1188. [Google Scholar] [CrossRef]
- Verhack, R.; Sikora, T.; Van Wallendael, G.; Lambert, P. Steered Mixture-of-Experts for Light Field Images and Video: Representation and Coding. IEEE Trans. Multimed. 2020, 22, 579–593. [Google Scholar] [CrossRef]
- Bochinski, E.; Jongebloed, R.; Tok, M.; Sikora, T. Regularized gradient descent training of steered mixture of experts for sparse image representation. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 3873–3877. [Google Scholar] [CrossRef]
- Liu, B.; Zhao, Y.; Jiang, X.; Wang, S. Three-dimensional Epanechnikov mixture regression in image coding. Signal Process. 2021, 185, 108090. [Google Scholar] [CrossRef]
- Verhack, R.; Sikora, T.; Lange, L.; Van Wallendael, G.; Lambert, P. A universal image coding approach using sparse steered Mixture-of-Experts regression. In Proceedings of the Proceedings—International Conference on Image Processing, ICIP, Anchorage, AK, USA, 25–28 September 2016; pp. 2142–2146. [Google Scholar] [CrossRef]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. 2023, 42, 139. [Google Scholar] [CrossRef]
- Huang, B.; Yu, Z.; Chen, A.; Geiger, A.; Gao, S. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields. In Proceedings of the SIGGRAPH 2024 Conference Papers, Denver, CO, USA, 28 July–1 August 2024. [Google Scholar] [CrossRef]
- Courteaux, M.; Artois, J.; De Pauw, S.; Lambert, P.; Van Wallendael, G. SILVR: A Synthetic Immersive Large-Volume Plenoptic Dataset. In Proceedings of the 13th ACM Multimedia Systems Conference (MMSys ’22), New York, NY, USA, 14–17 June 2022. [Google Scholar] [CrossRef]
- Doyden, D.; Boisson, G.; Gendrot, R. [MPEG-I Visual] New Version of the Pseudo-Rectified TechnicolorPainter Content; Document ISO/IEC JTC1/SC29/WG11 MPEG/M43366; Technicolor-Armand Langlois: Ljublana, Slovenia, 2018. [Google Scholar]
- Jung, J.; Boissonade, P. [MPEG-I Visual] Proposition of New Sequences for Windowed-6DoF Experiments on Compression, Synthesis, and Depth Estimation; Standard ISO/IEC JTC1/SC29/WG11 MPEG/M43318; Orange Labs: Anaheim, CA, USA, 2018. [Google Scholar]
- Blender Institute. Agent 327: Operation Barbershop. 2017. Available online: https://studio.blender.org/films/agent-327/ (accessed on 14 October 2024).
- Davis, A.; Levoy, M.; Durand, F. Unstructured Light Fields. Comput. Graph. Forum 2012, 31, 305–314. [Google Scholar] [CrossRef]
Scene | Task | PSNR | Time GPU |
---|---|---|---|
barbershop | push–pull | dB | ms |
spin | dB | ms | |
lone monk | push–pull | dB | ms |
spin | dB | ms | |
zen garden | push–pull | dB | ms |
spin | dB | ms | |
kitchen | push–pull | dB | ms |
spin | dB | ms | |
zoom-to-sink | dB | ms | |
zoom-to-table | dB | ms | |
painter | push–pull | dB | ms |
spin | dB | ms | |
zoom-to-painting-1 | dB | ms | |
zoom-to-painting-2 | dB | ms |
Model | Resolution | SMoE () | NeRF Base | NeRF Small | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR | SSIM | Time | Speed | PSNR | SSIM | Time | Speed | PSNR | SSIM | Time | Speed | ||
barbershop | 26.6 dB | 0.85 | 1.2 ms | 222 MP/s | 31.1 dB | 0.95 | 65 ms | 4.2 MP/s | 26.3 dB | 0.81 | 21 ms | 12.6 MP/s | |
lone monk | 24.9 dB | 0.78 | 1.1 ms | 234 MP/s | 27.9 dB | 0.88 | 52 ms | 5.5 MP/s | 23.6 dB | 0.70 | 16 ms | 16.4 MP/s | |
zen garden | 29.2 dB | 0.78 | 1.0 ms | 259 MP/s | 32.0 dB | 0.88 | 112 ms | 2.4 MP/s | 28.4 dB | 0.73 | 43 ms | 06.2 MP/s | |
kitchen | 28.6 dB | 0.82 | 2.4 ms | 865 MP/s | 36.5 dB | 0.95 | 799 ms | 2.6 MP/s | 27.0 dB | 0.75 | 229 ms | 09.1 MP/s | |
painter | 29.6 dB | 0.82 | 2.4 ms | 935 MP/s | 36.2 dB | 0.93 | 1425 ms | 1.6 MP/s | 29.2 dB | 0.78 | 650 ms | 03.4 MP/s |
Model | SMoE vs. NeRF Base | SMoE vs. NeRF Small | ||
---|---|---|---|---|
PSNR | Speedup | PSNR | Speedup | |
barbershop | −4.5 dB | +0.3 dB | ||
lone monk | −3.0 dB | +1.3 dB | ||
zen garden | −2.9 dB | +0.8 dB | ||
kitchen | −7.9 dB | +1.6 dB | ||
painter | −6.6 dB | +0.4 dB |
Scene | SMoE | NeRF Base | NeRF Small |
---|---|---|---|
barbershop | 3.6 MB | 24 MB | 2.8 MB |
lone monk | 3.0 MB | 24 MB | 1.8 MB |
zen garden | 3.1 MB | 24 MB | 2.5 MB |
kitchen | 3.6 MB | 24 MB | 1.6 MB |
painter | 3.6 MB | 23 MB | 1.1 MB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Courteaux, M.; Mareen, H.; Ramlot, B.; Lambert, P.; Van Wallendael, G. Dimensionality Reduction for the Real-Time Light-Field View Synthesis of Kernel-Based Models. Electronics 2024, 13, 4062. https://doi.org/10.3390/electronics13204062
Courteaux M, Mareen H, Ramlot B, Lambert P, Van Wallendael G. Dimensionality Reduction for the Real-Time Light-Field View Synthesis of Kernel-Based Models. Electronics. 2024; 13(20):4062. https://doi.org/10.3390/electronics13204062
Chicago/Turabian StyleCourteaux, Martijn, Hannes Mareen, Bert Ramlot, Peter Lambert, and Glenn Van Wallendael. 2024. "Dimensionality Reduction for the Real-Time Light-Field View Synthesis of Kernel-Based Models" Electronics 13, no. 20: 4062. https://doi.org/10.3390/electronics13204062
APA StyleCourteaux, M., Mareen, H., Ramlot, B., Lambert, P., & Van Wallendael, G. (2024). Dimensionality Reduction for the Real-Time Light-Field View Synthesis of Kernel-Based Models. Electronics, 13(20), 4062. https://doi.org/10.3390/electronics13204062