TGA-GS: Thermal Geometrically Accurate Gaussian Splatting

Zou, Chen; Ma, Qingsen; Wang, Jia; Lu, Rongfeng; Lu, Ming; Qu, Zhaowei

doi:10.3390/app15094666

Open AccessArticle

TGA-GS: Thermal Geometrically Accurate Gaussian Splatting

by

Chen Zou

¹

,

Qingsen Ma

²,

Jia Wang

²,

Rongfeng Lu

³

,

Ming Lu

⁴ and

Zhaowei Qu

^5,*

¹

State Key Laboratory of Information Photonics and Optical Communications, School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

³

School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China

⁴

State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing 100871, China

⁵

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4666; https://doi.org/10.3390/app15094666

Submission received: 14 March 2025 / Revised: 21 April 2025 / Accepted: 22 April 2025 / Published: 23 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Novel view synthesis and 3D reconstruction have been extensively studied. Three-dimensional Gaussian Splatting (3DGS) has gained popularity due to its rapid training and real-time rendering capabilities. However, RGB imaging is highly dependent on ideal illumination conditions. In low-light situations such as at night or in the presence of occlusions, RGB images often suffer from blurred contours or even complete failure in imaging, which severely restricts the application of 3DGS in such scenarios. Thermal imaging technology, on the other hand, serves as an effective complement. Thermal images are solely influenced by heat sources and are immune to illumination conditions. This unique property enables them to clearly identify the contour information of objects in low-light environments. Nevertheless, thermal images exhibit significant limitations in presenting texture details due to their sensitivity to temperature variations rather than surface texture features. To capitalize on the strengths of both, we propose thermal geometrically accurate Gaussian Splatting (TGA-GS), a novel Gaussian Splatting model. TGA-GS is designed to leverage RGB and thermal information to generate high-quality meshes in low-light conditions. Meanwhile, given low-resolution thermal images and low-light RGB images as inputs, our method can generate high-resolution thermal and RGB images from novel viewpoints. Moreover, we also provide a real thermal imaging dataset captured with a handheld thermal infrared camera. This not only enriches the information content of the images but also provides a more reliable data basis for subsequent computer vision tasks in low-light scenarios.

Keywords:

Gaussian splatting; thermal image; 3D reconstruction; novel view synthesis

1. Introduction

The field of scene reconstruction and novel view synthesis based on visible light has achieved remarkable progress, with applications spanning various domains. These applications include creating immersive experiences for users in virtual reality (VR), aiding vehicles in environmental perception and path planning for autonomous driving, assisting robots in better understanding and interacting with their environments, supporting architectural design and monitoring in the construction industry, digital twinning, and even playing crucial roles in specialized fields such as materials research and electronic component inspection.

The advent of Neural Radiance Fields (NeRFs) [1] brings new perspectives and methods to traditional 3D reconstruction and novel view synthesis. NeRFs utilize multi-layer perceptrons (MLPs) to simulate scene radiance fields through differentiable volume rendering, enabling the learning of 3D scene representations from a set of 2D images and achieving realistic view synthesis. However, NeRFs suffer from slow training and rendering speeds, which to some extent limit their applications in scenarios demanding real-time performance.

This limitation spurs the development of 3D Gaussian Splatting (3DGS) [2], which uniquely transforms the existing technical framework. By representing scenes using a multitude of 3D Gaussians, 3DGS can achieve real-time rendering while maintaining visual fidelity comparable to NeRFs, leading to a significant leap in the efficiency of 3D reconstruction and novel view synthesis. Nevertheless, most current 3DGS methods primarily process RGB image data under ideal lighting conditions. In real-world scenarios such as nighttime, low-light indoor environments, or situations with significant occlusions, RGB images face numerous challenges. Due to the photon-starved effect [3], performance significantly deteriorates under low-light conditions, leading to increased image noise, reduced contrast, blurred object contours, and sometimes the inability to acquire sufficient effective image information, greatly limiting the effectiveness and reconstruction accuracy of 3DGS technology in these special scenarios.

At the same time, thermal imaging technology, as a crucial imaging modality, offers irreplaceable advantages. Thermal images primarily reflect the temperature distribution on the surface of objects, capturing mid-wave infrared (MWIR, 3–5

μ

m) and long-wave infrared (LWIR, 8–14

μ

m) radiation emitted by objects based on their temperature and emissivity [4]. Unlike passive RGB sensors, thermal sensors provide illumination-invariant detection capabilities, enabling clear identification of object contours and approximate shapes even in complete darkness, smoke/fog [5], and extreme weather conditions [6], facilitating robust scene understanding. Leveraging this capability, thermal imaging can distinctly identify object contours and approximate shapes in low-light environments, providing crucial complementary information for scene understanding. With these advantages, thermal imaging technology finds wide applications in various fields. In military and defense [7,8,9,10], it aids in target detection and recognition; in post-disaster rescue scenarios [11,12,13], it helps rescue teams quickly locate survivors; in environmental protection [14,15,16,17,18], it detects the pollution status of water and space; in the construction industry [19,20,21,22,23,24,25], it detects thermal defects in building structures; in agriculture [18,26,27,28,29,30,31,32,33], it helps monitor crop growth conditions; in industry [6,34], it is used for equipment fault diagnosis; in the medical field [35,36,37], it assists in disease detection and diagnosis; and in autonomous driving [38,39], it provides additional perceptual information for vehicles under adverse lighting conditions. Additionally, thermal imaging technology also plays a significant role in computer vision tasks such as object detection and tracking [40,41,42], recognition and segmentation [43,44,45], motion estimation [46], and SLAM [47].

Despite these benefits, thermal imaging poses inherent challenges for 3D reconstruction: (1) lower spatial resolution (usually 640 × 480) compared to RGB sensors [48], (2) lack of photometric texture details due to thermal diffusion effects [49,50,51], and (3) spectral mismatch between thermal radiation peaks (8–14

μ

m) and visible light (380–750 nm) complicating the fusion of cross-modal data [52].

In recent years, although some thermal image methods based on NeRFs have been proposed [53,54,55,56,57,58,59,60], attempting to leverage strong modeling capabilities of NeRFs to handle thermal image data, the inherent slow rendering speed and implicit scene representation of NeRFs still constrain the practical effectiveness of these methods. With the widespread application of 3DGS methods, there have also been emerging thermal image methods based on Gaussian Splatting (GS) models [61,62,63,64]. However, most of these methods mainly focus on novel view synthesis under normal lighting conditions and fail to fully exploit the potential of thermal imaging in low-light environments.

Given the current research status, our study, based on an in-depth examination and analysis of existing technologies, introduces an innovative method called thermal geometrically accurate Gaussian Splatting (TGA-GS) based on the GS model. TGA-GS can simultaneously process thermal and RGB data from thermal sensors, leveraging the complementary information from both types of data to achieve the goal of reconstructing high-quality scenes from low-light environments effectively. Notably, our method possesses unique advantages: even when provided with low-resolution thermal and RGB images under low-light conditions, it can generate high-resolution thermal and RGB images, offering a high-quality data foundation for subsequent image analysis and processing. Furthermore, to better support and validate our research, we construct an innovative dataset. This dataset includes RGB and thermal images captured in low-light environments, providing rich and valuable data resources for 3D reconstruction and novel view synthesis research.

In summary, the main contributions of our study are as follows:

We propose the TGA-GS method, focusing on 3D thermal model reconstruction in low-light conditions. Extensive experimental results demonstrate that TGA-GS outperforms existing relevant methods in multiple evaluation metrics, showcasing its outstanding performance in 3D reconstruction under low-light environments.
Our method exhibits robust novel view synthesis capabilities, generating high-resolution images from low-resolution inputs, including high-resolution thermal and RGB images, effectively enhancing image quality and information richness.
We create a novel dataset that provides essential data support for 3D reconstruction and novel view synthesis research. This dataset comprises RGB and thermal images captured in low-light environments, offering a wealth of samples for studying image features and reconstruction algorithms under low lighting conditions and facilitating further advancements in related fields.

The rest of this paper is organized as follows: Section 2 reviews related work on thermal imaging and 3DGS. Section 3 provides a detailed introduction to the theoretical derivation and specific implementation of the proposed method, mainly including multimodal calibration, training phase, and loss function construction. Section 4 presents experimental results comparing TGA-GS with state-of-the-art methods on thermal images, RGB images, and 3D reconstruction. Finally, in Section 5, we summarize the work in this paper and look ahead to future work.

2. Related Work

2.1. Thermal Imaging

Thermal imaging has long been an important complement to visible light sensors, based on Planck’s blackbody radiation law [65] and the Stefan–Boltzmann equation [66], fundamentally overcoming the limitations of visible light sensors that rely on environmental lighting. Due to the technological breakthrough of uncooled microbolometer arrays [67], this technology has gradually expanded from the early military target recognition field to civilian scenarios [6]. Its unique physical perception mechanism enables the thermal imaging system to operate stably in complete darkness, through smoke/fog [5] and other obstructions and under extreme weather conditions.

This feature gives thermal imaging a unique advantage in areas such as security and surveillance, preventive maintenance, building inspections [68], monitoring rock formations [69], and more. Due to its non-contact nature, thermal imaging technology has further expanded its applications in medical diagnostics, agricultural monitoring, and archaeological research.

The hardware design of thermal imaging systems poses inherent challenges; the high cost of germanium lenses [70] and limitations in the detector unit size result in significantly lower spatial resolution of thermal sensors compared to visible light sensors. Additionally, thermal images are susceptible to the ghosting effect, where the environmental thermal radiation reflected or scattered from the object’s surface can blur the actual temperature distribution, leading to texture loss and reduced contrast [71]. To overcome these limitations, some studies propose thermal imaging super-resolution [72], while others propose multimodal fusion solutions. For instance, Maset et al. [25] and De Luis-Ruiz et al. [20] combine RGB images with thermal images using photogrammetric techniques to build high-precision 3D models, albeit relying on dense RGB-thermal image pairs for acquisition. Chen et al. [73] attempt direct 3D reconstruction using thermal images but find it only suitable for high-contrast scenes and challenging to generalize to complex environments.

In the field of sensor fusion, early methods often rely on lidar or depth cameras to provide geometric information, projecting thermal data onto 3D models, but these methods require cumbersome manual calibration and are susceptible to errors in depth measurements. In recent years, advancements in deep learning have propelled multimodal fusion. Ma et al. [74] enhance thermal image features through RGB image augmentation to improve depth perception, while Lang et al. [75] fuse inertial measurement unit (IMU) data to optimize attitude estimation. X-NeRF [53] is the first to extend Neural Radiance Fields (NeRFs) to the infrared spectrum, achieving scene modeling through joint learning of RGB–thermal image cross-spectral representations. Subsequent works expand into the thermal NeRF [54,55,56,57,58,59,60] and thermal 3DGS [61,62,63,64] domains. However, current thermal GS methods focus on normal lighting conditions and fail to utilize the advantages of thermal imaging under low-lighting conditions and cannot achieve super-resolution output of thermal images.

2.2. Three-Dimensional Reconstruction and Novel View Synthesis

Three-dimensional reconstruction technology, which retrieves the geometric information of a 3D scene from 2D images, holds significant value in the fields of visualization, measurement, and analysis. Traditional methods are mainly divided into two categories: point cloud reconstruction methods based on Structure from Motion (SfM) [76,77] and Multi-View Stereo (MVS) approaches [78] and volume modeling methods based on implicit representations such as radiance fields or signed distance fields. SfM algorithms like COLMAP recover camera poses and sparse point clouds by matching feature points across multiple views, followed by dense reconstruction using MVS. However, these methods are prone to a decrease in reconstruction quality due to feature matching failures in low-texture scenes or with low-resolution inputs [79], and they rely on rigid scene assumptions [80,81].

The advent of Neural Radiance Fields (NeRFs) [1] marks a breakthrough in implicit 3D representation, using Multi-Layer Perceptrons (MLPs) to model scene radiance fields and density fields, enabling the generation of high-quality novel views from sparse RGB images. Subsequent research focus on various directions of improvement. RawNeRF [82] and Aleth-NeRF [83] enhance reconstruction capabilities in low-light environments, W-NeRF [84] handles dynamic lighting, and NeuS [85] achieves high-fidelity surface reconstruction. Zeng et al. [86] propose a method for reconstructing power transmission lines (PTLs) based on images, and some works combine LiDAR point clouds [87] or near-infrared images [53] to enhance representation capabilities, while others use depth information to enhance effects [88].

The emergence of 3D Gaussian Splatting (3DGS) [2] revolutionizes the explicit 3D reconstruction paradigm. Due to its real-time rendering capabilities and impressive performance, 3DGS garners increasing attention. Subsequent works further explore rendering effects [89,90,91], compact representations [92], and reconstruction capabilities [93,94,95]. Additionally, Rangelov et al. [96] discuss the influence of different camera settings on the quality of 3D reconstruction, and Ye et al. [97] design a camera response module to compensate for inconsistencies in multiple views. However, the current GS reconstruction method is not ideal for 3D reconstruction of RGB or thermal images obtained in low-light environments and cannot fully utilize the complementary information of the two modalities.

3. Methods

3.1. Preliminaries

We first review 3DGS [2] and 2DGS [94]. 3DGS uses explicit 3D Gaussian points as its main rendering entity. A 3D Gaussian point, with mean vector

μ \in R^{3}

and covariance matrix

Σ \in R^{3 \times 3}

, is mathematically defined as:

G (x) = e^{- \frac{1}{2} {(x - μ)}^{T} Σ^{- 1} (x - μ)}

(1)

where x is an arbitrary position within the 3D scene and the covariance matrix of the 3D Gaussian

Σ = R S S^{T} R^{T}

is factorized into a scaling matrix S and a rotation matrix R.

2DGS adopts flat 2D Gaussians embedded in 3D space for scene representation, which is different from 3DGS. Each 2D Gaussian primitive has opacity

α

and view-dependent appearance c with spherical harmonics. For volume rendering, Gaussians sort the projected 2D Gaussians by their depth and compute the color at each pixel by front-to-back alpha blending:

C (x^{'}) = \sum_{i = 1} c_{i} α_{i} G_{i} (u (x^{'})) Π_{j = 1}^{i - 1} (1 - α_{j} G_{j} (u (x^{'}))

(2)

where

x^{'}

represents a homogeneous ray emitted from the camera and passing through uv space, while

G (u) = e^{- \frac{u^{2} + v^{2}}{2}}

is 2D Gaussian value of the point

u

=

(u, v)

in

u v

space.

3.2. Multimodal Calibration

To address the spatial pose deviations between thermal imaging sensors and RGB sensors, as well as inherent camera distortions, we propose a multimodal joint calibration strategy based on thermal radiation modulation. Specifically, this process can be divided into three steps: calibration plate preparation, thermal gradient induction, and multimodal calibration.

Calibration Plate Preparation: We utilize a standard checkerboard calibration plate typically employed for RGB camera calibration, as shown in Figure 1. The black regions of the plate are uniformly coated with a pre-configured AgNWs/PVB ethanol solution, forming a composite coating characterized by high visible-light transmittance (~83.0%), high mid-infrared reflectivity (~69.8%), and low emissivity [98]. The thermal absorption capability of the composite structure primarily depends on the emissivity of the outermost layer [99]. Consequently, the AgNWs coating effectively suppresses thermal radiation emission in the black regions.

Thermal Gradient Induction: After allowing the prepared calibration plate to reach thermal equilibrium in a room-temperature environment (25 °C), it is rapidly transferred to a low-temperature environment of −20 °C. The uncoated white regions, due to their high emissivity, rapidly radiate heat to the environment and cool down, while the coated regions exhibit slower temperature decline owing to the low emissivity of the AgNWs/PVB composite coating, which suppresses thermal radiation, thereby creating significant temperature differences between the two areas.

Multimodal Calibration: Once substantial temperature differences are established between the black and white regions of the calibration plate, we simultaneously capture color and thermal images using thermal imaging equipment. Conventional camera calibration methods are then applied to achieve precise multimodal sensor alignment.

This methodology ensures spatiotemporal synchronization between thermal and RGB modalities while addressing intrinsic sensor distortions, providing a robust foundation for subsequent cross-modal data fusion.

3.3. TGA-GS

We designed a multimodal Gaussian training methodology, as shown in Figure 2. By deeply integrating thermal radiation physics constraints with geometric priors, our approach achieves high-precision 3D reconstruction and cross-modal super-resolution generation in low-light scenarios. Specifically, we first train a foundational Gaussian Splatting (GS) model using low-light RGB images, then fine-tune it with thermal maps. The entire training process consists of two phases: pre-training and end-to-end training.

Pre-training Phase: During this phase, we train a baseline low-light GS model using low-light RGB images from the same scenes intended for end-to-end training. Although pre-training with low-light RGB images may lead to texture detail loss and inaccurate depth estimation during training, it enables initialization based on partially observable information, thereby significantly reducing convergence time in subsequent end-to-end training.

End-to-End Training Phase: As illustrated in Figure 2, this phase fine-tunes the pre-trained low-light GS network using raw thermal maps and low-light RGB images. The outputs of the GS network are then fed into thermal SR and RGB SR modules to generate high-resolution thermal and RGB images. This stage comprises two key steps. (1) The first is the multimodal Gaussian network. As illustrated in Figure 2a, we extend the pre-trained 3DGS by jointly optimizing it with low-resolution low-light RGB and thermal images. The thermal modality provides robust geometric priors through thermal gradients (e.g., object boundaries in darkness), while the RGB modality guides texture recovery. This mutual constraint enhances the model’s ability to resolve ambiguities in low-light 3D perception. (2) The second is the thermal and RGB SR networks. As depicted in Figure 2b, the thermal maps and low-light RGB outputs from the multimodal Gaussian network are processed through dedicated thermal SR and RGB SR networks. These modules not only enhance their respective modalities but also exchange cross-modal guidance signals—thermal gradients inform texture recovery in RGB SR, while RGB edges refine thermal detail reconstruction.

3.4. Loss

For the entire training process, we designed multiple loss functions to impose constraints. Based on the characteristics of input images, we adapted the original L1 and Structural Similarity Index (SSIM) losses from the Gaussian Splatting (GS) network. When processing low-light RGB inputs, inspired by the Lit-MSE loss in Aleth-NeRF [83], we modified the standard L1 loss into an

L_{1 - low}

loss to enhance adaptability to dark environments. The formulation is given by:

L_{1 - low} = {∥Φ (\hat{C} + ϵ) - Φ (C + ϵ)∥}_{1}

where

ϵ = 10^{- 3}

follows the configuration in Aleth-NeRF,

Φ

denotes the inverse tone mapping curve, C represents ground truth pixel values, and

\hat{C}

denotes predicted pixel values.

The SSIM loss evaluates image distortion through three components: luminance (estimated via means

μ_{x}, μ_{y}

), contrast (estimated via standard deviations

σ_{x}, σ_{y}

), and structure (measured via covariance

σ_{x y}

). The SSIM formula remains unchanged for thermal images, as temperature variations can still be assessed through luminance patterns at the image level.

For the final SR outputs, we designed an enhancement loss

L_{en}

, combining perceptual and L1 losses:

L_{en} = λ_{per} L_{per} + λ_{L 1} L_{L 1}

where

L_{per}

is computed using a pretrained VGG19 network

ϕ

to measure feature-level similarity between outputs O and ground truth

G T

, while

λ_{per}

and

λ_{L 1}

are weighting hyperparameters.

To address geometric ambiguity in low-light 3D reconstruction, we propose a Thermal Gradient Alignment Loss. This loss leverages thermal imaging gradients to guide surface geometry optimization. Specifically, thermal edges (corresponding to temperature transitions) provide robust geometric priors under low-light conditions. The loss enforces consistency between the thermal gradient

\nabla T

and the geometric surface normal n derived from the GS model:

L_{thermal - align} = \sum_{i} {∥\nabla T_{i} \cdot n_{i}∥}_{1}

where

\nabla T_{i}

is the thermal gradient at position i, and

n_{i}

is the normalized surface normal. By aligning thermal and geometric boundaries, this loss mitigates depth estimation errors caused by low-contrast RGB inputs, significantly improving reconstruction fidelity for complex scenes.

4. Experiments

4.1. Datasets and Baselines

To evaluate the performance of 3D reconstruction and novel view synthesis under low-light conditions using multimodal RGB-thermal inputs, we constructed a specialized dataset captured by the commercial-grade FLIR E6 PRO handheld thermal imaging system. This advanced device simultaneously acquires aligned RGB and thermal imagery, featuring a thermal sensor with an operational range of −20 °C to 550 °C and measurement accuracy of ±2% of reading. The dataset includes five distinct indoor environments, specifically designed to simulate challenging low-light search-and-rescue scenarios. For each scene, we provide raw thermal images captured by the thermal camera, RGB images, thermal imaging data, MSX (Multi-Spectral Dynamic Imaging) images (edge-enhanced thermal–visual fusion), and camera pose data.

For comprehensive performance evaluation, we select two types of baselines for our experiments: (1) High-rendering-quality methods, used exclusively in novel view synthesis experiments, including 3DGS [2] and ThermalGaussian [62]. For ThermalGaussian, we opt for the OMMG method from the original paper due to its superior rendering quality within the ThermalGaussian framework. (2) High-reconstruction-quality methods, serving as baselines for both novel view synthesis and reconstruction experiments, including 2DGS [94] and RaDe-GS [95].

4.2. Evaluation Metrics

To evaluate the performance of our methods, we employ a set of metrics that assess quality, and memory usage.

Quality: To evaluate the quality of novel view synthesis, we compare the generated images against ground truth images captured from the corresponding target viewpoints. Specifically, the Peak Signal-to-Noise Ratio (PSNR) quantifies pixel-level differences between the synthesized novel views and the real images of those views. Higher PSNR values indicate smaller deviations from the ground truth. Additionally, the Structural Similarity Index (SSIM) [100] measures luminance, contrast, and structural consistency between the generated and real images, where higher SSIM reflects closer perceptual alignment. Finally, the Learned Perceptual Image Patch Similarity (LPIPS) [101] evaluates human-perceived visual discrepancies by leveraging deep features, emphasizing differences more relevant to human observers. Importantly, the ground truth images used for comparison are real-world captures from novel viewpoints not included in the training set, ensuring unbiased validation of our method’s ability to generalize to unseen perspectives.

Memory Usage: To assess the memory requirements of our method, we measure the memory usage in megabytes (MB). This metric provides insights into the resource demands of our approach, allowing us to optimize memory consumption and ensure efficient utilization of computational resources.

Efficiency: We measure efficiency by calculating the number of frames processed per second (FPS). This metric provides insight into the real-time performance of our approach and its ability to handle high frame rates. Higher values indicate faster processing times, which are desirable for applications requiring real-time or near-real-time performance.

4.3. Implementation Details

Our method builds upon the 3DGS framework [2], with all experimental settings strictly aligned with the reference implementation. All baseline methods and our proposed TGA-GS are trained under identical conditions on the same dataset to ensure fair comparison. Each experiment, including those for baselines, underwent 30K iterations of training on NVIDIA V100 GPU. For TGA-GS (NVIDIA, Santa Clara, CA, USA), while a pre-training phase using low-light RGB images is employed to accelerate convergence in subsequent end-to-end training, the total number of iterations (30 K) remains consistent with other methods, eliminating potential bias from extended training. The rendered RGB and thermal images share a resolution of

640 \times 480

, matching the input specifications of the evaluation pipeline.

4.4. Thermal View Synthesis

To compare the quality of novel view synthesis for thermal images across various Gaussian Splatting (GS) methods, we conducted quantitative and qualitative evaluations. As shown in Table 1, our method outperforms existing approaches in visual metrics such as PSNR and LPIPS while consuming significantly less memory. Its running speed is comparable to ThermalGaussian and outperforms other methods. Specifically, our method achieves the best performance with minimal memory consumption of 540 MB, attaining a PSNR of 32.78 dB and LPIPS of 0.092. In comparison, the second-best method (3DGS [2]) exhibits similar visual metrics—with a marginal difference of 0.27 dB in PSNR and 0.01 in LPIPS—but requires nearly triple the memory footprint at 1658 MB, and more inference time.

For qualitative comparison, Figure 3 visually contrasts our method with baseline approaches across three representative scenes: Fruits2 (top row), Hairdryer (middle row), and Laptop (bottom row). The results highlight our method’s improved visual fidelity. For instance, in the Hairdryer scene, baseline methods exhibit noticeable thermal distortions such as warped nozzle geometry and blurred temperature gradients, whereas our approach preserves structural integrity and radiometric consistency. Similarly, in the Laptop scene, our method accurately reconstructs fine thermal edges including keyboard gaps that are either missing or misaligned in baseline outputs. Notably, this capability relies on the presence of temperature differences between objects. For example, the keyboard gaps in the Laptop scene are discernible only because the active electronics generate localized heat, creating measurable thermal gradients. If all objects in the scene were at equilibrium with ambient temperature (e.g., a static laptop in a thermally uniform environment), thermal imaging would fail to capture structural details such as key boundaries, as no thermal contrast would exist. These visual improvements align with the quantitative metrics, validating our method’s robustness in low-light multimodal reconstruction tasks.

4.5. RGB View Synthesis

To assess the performance of GS methods for novel RGB view synthesis under low-light conditions, we performed rigorous quantitative assessments and qualitative visual comparisons. As summarized in Table 2, our approach surpasses existing methods in critical visual metrics, including PSNR and LPIPS, while maintaining substantially lower computational resource requirements and high computational efficiency. The quantitative analysis reveals that our method achieves state-of-the-art performance with a memory footprint of 540 MB, delivering a PSNR of 40.77 dB and LPIPS of 0.082. In contrast, the second-best performer (3DGS [2]) demonstrates comparable but inferior visual metrics (2.64 dB lower PSNR and 0.02 higher LPIPS) while consuming 1496 MB of memory (approximately three times the model’s requirements) and 349 FPS inference speed (approximately 30% slower than ours).

Figure 4 provides qualitative comparisons across three representative scenarios: Fruits2 (top row), Hairdryer (middle row), and Laptop (bottom row). These visualizations underscore our method’s enhanced fidelity in preserving fine details under illumination-starved conditions. Notably, in the Laptop scene, our approach maintains crisp structural details such as labels, whereas competing methods exhibit blurred textures and loss of high-frequency features. This performance gap aligns with the quantitative metrics, confirming our framework’s superiority in balancing reconstruction accuracy with computational efficiency.

4.6. Three-Dimensional Object Reconstruction

To evaluate the effectiveness of TGA-GS for 3D reconstruction under low-light conditions, we conducted experiments on two scenes, Fruits1 and Cup, using 2DGS [94] and RaDe-GS [95] methods to reconstruct 3D models from low-light RGB and thermal inputs. As illustrated in Figure 5 and Figure 6, the leftmost column displays the input low-light RGB and thermal images for each scene. The middle columns show reconstruction results from 2DGS and RaDe-GS, with the upper row representing RGB-only reconstructions and the lower row thermal-only reconstructions. The rightmost column presents TGA-GS’s multi-modal (RGB+thermal) reconstruction results.

The results demonstrate that TGA-GS achieves more accurate reconstructions compared to 2DGS and RaDe-GS. In the Fruits1 scene, TGA-GS successfully distinguishes individual fruits such as bananas, while other methods struggle to resolve fine-grained structures. For the Cup scene, despite using a cup filled with hot water, thermal-only reconstructions exhibit inferior performance compared to RGB-only results. This degradation may stem from the low contrast of thermal images, which introduces ambiguity in depth estimation. In contrast, TGA-GS leverages complementary information from both modalities to generate higher-quality 3D models. The fused approach effectively mitigates the limitations of individual modalities, highlighting the advantages of multimodal integration in low-light 3D reconstruction.

4.7. Ablation Study

In our ablation study, we evaluate two core components of TGA-GS: the Thermal Gradient Alignment Loss and the SR module, as demonstrated in Table 3 and Table 4 and Figure 7. Table 3 and Table 4 reveal that the SR module significantly impacts rendering quality due to its direct influence on the final output. While the Thermal Gradient Alignment Loss, as an intermediate training constraint, also affects rendering results, its performance gaps can be mitigated by the SR module during post-processing. Conversely, Figure 7 illustrates that the SR module has minimal effect on reconstruction quality, whereas the Thermal Gradient Alignment Loss plays a critical role in determining geometric fidelity. This discrepancy arises because the SR module operates as a post-processing component, leaving the core reconstruction process of TGA-GS largely unaffected. In contrast, the Thermal Gradient Alignment Loss imposes structural constraints during training, directly shaping the 3D geometry optimization.

4.8. Practical Implications

Search and Rescue Teams: In post-disaster search and rescue operations (such as locating survivors in smoke-filled environments), when the visibility of the environment is low, TGA-GS can use RGB and thermal images collected from limited perspectives to construct a search and rescue environment. The thermal gradient can reliably outline the danger of people and structures, while the RGB image and SR module can further recover key details.

Industrial Practitioners: The perception ability of autonomous vehicles, unmanned aerial vehicles, etc., in low-light or severe conditions is enhanced. The use of TGA-GS cross-mode alignment can ensure the synchronous perception of thermal barriers (such as pedestrians) and RGB traffic signs. Meanwhile, super-resolution of low-resolution thermal input can detect distant obstacles without affecting latency.

Industrial and Building Inspectors: The multimodal calibration method proposed in this paper can better align the relevant information between thermal and RGB images and associate thermal anomaly areas (such as thermal leaks) with RGB texture details (such as rust or cracks). Meanwhile, by combining the guidance super-resolution module of TGA-GS, the quality of detection can be further improved.

5. Conclusions

In this paper, we present TGA-GS, a method for novel view synthesis and 3D reconstruction utilizing thermal and low-light RGB images. Our approach not only produces high-quality renderings from low-resolution RGB and thermal inputs but also enhances reconstruction accuracy in low-light environments while significantly reducing memory footprint during runtime. However, it is important to acknowledge certain limitations of our method. Although TGA-GS generates improved RGB outputs and 3D reconstructions, the rendered images remain low light in appearance, and reconstruction quality degrades under low-thermal-contrast conditions. Future work will focus on developing an extended framework capable of generating high-quality natural-light RGB images and further improved reconstruction results from thermal and low-light RGB inputs.

Author Contributions

Conceptualization, C.Z.; methodology, C.Z.; software, C.Z. and Q.M.; validation, C.Z. and R.L.; formal analysis, M.L. and Z.Q.; data curation, C.Z.; writing—original draft preparation, C.Z.; writing—review and editing, M.L. and J.W.; visualization, C.Z. and J.W.; supervision, M.L.; funding acquisition, Z.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61976025.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139:1–139:14. [Google Scholar] [CrossRef]
Nayar, S.; Narasimhan, S. Vision in bad weather. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 820–827. [Google Scholar]
Flir. The Ultimate Infrared Handbook for R&D Professionals. 2013. Available online: www.flir.com/thg (accessed on 21 April 2025).
Tsai, P.F.; Liao, C.H.; Yuan, S.M. Using Deep Learning with Thermal Imaging for Human Detection in Heavy Smoke Scenarios. Sensors 2022, 22, 5351. [Google Scholar] [CrossRef] [PubMed]
Gade, R.; Moeslund, T. Thermal cameras and applications: A survey. Mach. Vis. Appl. 2014, 25, 245–262. [Google Scholar] [CrossRef]
He, Y.; Deng, B.; Wang, H.; Cheng, L.; Zhou, K.; Cai, S.; Ciampa, F. Infrared machine vision and infrared thermography with deep learning: A review. Infrared Phys. Technol. 2021, 116, 103754. [Google Scholar] [CrossRef]
Torresan, H.; Turgeon, B.; Ibarra-Castanedo, C.; Hebert, P.; Maldague, X.P. Advanced surveillance systems: Combining video and thermal imagery for pedestrian detection. In Proceedings of the SPIE Thermosense XXVI, Orlando, FL, USA, 12 April 2004; Volume 5405, pp. 506–515. [Google Scholar]
Akula, A.; Ghosh, R.; Sardana, H. Thermal imaging and its application in defence systems. Proc. AIP Conf. Proc. 2011, 1391, 333–335. [Google Scholar]
Wong, W.K.; Tan, P.N.; Loo, C.K.; Lim, W.S. An effective surveillance system using thermal camera. In Proceedings of the International Conference on Signal Acquisition and Processing, Kuala Lumpur, Malaysia, 3–5 April 2009; pp. 13–17. [Google Scholar]
Yeom, S. Thermal image tracking for search and rescue missions with a drone. Drones 2024, 8, 53. [Google Scholar] [CrossRef]
Rudol, P.; Doherty, P. Human body detection and geolocalization for UAV search and rescue missions using color and thermal imagery. In Proceedings of the IEEE Aerospace Conference, Big Sky, Montana, 1–8 March 2008; pp. 1–8. [Google Scholar]
Rodin, C.; de Lima, L.; de Alcantara Andrade, F.; Haddad, D.; Johansen, T.; Storvold, R. Object classification in thermal images using convolutional neural networks for search and rescue missions with unmanned aerial systems. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
Iwasaki, K.; Fukushima, K.; Nagasaka, Y.; Ishiyama, N.; Sakai, M.; Nagasaka, A. Real-time monitoring and postprocessing of thermal infrared video images for sampling and mapping groundwater discharge. Water Resour. Res. 2023, 59, e2022WR033630. [Google Scholar] [CrossRef]
Fuentes, S.; Tongson, E.; Gonzalez Viejo, C. Urban green infrastructure monitoring using remote sensing from integrated visible and thermal infrared cameras mounted on a moving vehicle. Sensors 2021, 21, 295. [Google Scholar] [CrossRef]
Lega, M.; Napoli, R.M. Aerial infrared thermography in the surface waters contamination monitoring. Desalination Water Treat. 2010, 23, 141–151. [Google Scholar] [CrossRef]
Pyykonen, P.; Peussa, P.; Kutila, M.; Fong, K.W. Multi-camera-based smoke detection and traffic pollution analysis system. In Proceedings of the IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 8–10 September 2016; pp. 233–238. [Google Scholar]
Fuentes, S.; Tongson, E.J.; De Bei, R.; Gonzalez Viejo, C.; Ristic, R.; Tyerman, S.; Wilkinson, K. Non-invasive tools to detect smoke contamination in grapevine canopies, berries and wine: A remote sensing and machine learning modeling approach. Sensors 2019, 19, 3335. [Google Scholar] [CrossRef] [PubMed]
Kim, H.; Lamichhane, N.; Kim, C.; Shrestha, R. Innovations in Building Diagnostics and Condition Monitoring: A Comprehensive Review of Infrared Thermography Applications. Buildings 2023, 13, 2829. [Google Scholar] [CrossRef]
De Luis-Ruiz, J.; Sedano-Cibrian, J.; Perez-Alvarez, R.; Pereda-Garcia, R.; Salas-Menocal, R. Generation of 3D Thermal Models for the Analysis of Energy Efficiency in Buildings. In Advances in Design Engineering III; Springer: Berlin/Heidelberg, Germany, 2023; pp. 741–754. [Google Scholar]
Martin, M.; Chong, A.; Biljecki, F.; Miller, C. Infrared thermography in the built environment: A multi-scale review. Renew. Sustain. Energy Rev. 2022, 165, 112540. [Google Scholar] [CrossRef]
Masri, Y.; Rakha, T. A scoping review of non-destructive testing (NDT) techniques in building performance diagnostic inspections. Constr. Build. Mater. 2020, 265, 120542. [Google Scholar] [CrossRef]
Malhotra, V.; Carino, N. CRC Handbook on Nondestructive Testing of Concrete; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
Jackson, C.; Sherlock, C.; Moore, P. Leak Testing. In Nondestructive Testing Handbook; American Society for Nondestructive Testing: Columbus, OH, USA, 1998. [Google Scholar]
Maset, E.; Fusiello, A.; Crosilla, F.; Toldo, R.; Zorzetto, D. Photogrammetric 3D building reconstruction from thermal images. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, IV-2-W3, 25–32. [Google Scholar] [CrossRef]
Zhou, Z.; Majeed, Y.; Naranjo, G.D.; Gambacorta, E.M.T. Assessment for crop water stress with infrared thermal imagery in precision agriculture: A review and future prospects for deep learning applications. Comput. Electron. Agric. 2021, 182, 106019. [Google Scholar] [CrossRef]
Jurado, J.M.; Lopez, A.; Padua, L.; Sousa, J.J. Remote sensing image fusion on 3D scenarios: A review of applications for agriculture and forestry. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102856. [Google Scholar] [CrossRef]
Nasi, R.; Honkavaara, E.; Blomqvist, M.; Lyytikainen-Saarenmaa, P.; Hakala, T.; Viljanen, N.; Kantola, T.; Holopainen, M. Remote sensing of bark beetle damage in urban forests at individual tree level using a novel hyperspectral camera from UAV and aircraft. Urban For. Urban Green. 2018, 30, 72–83. [Google Scholar] [CrossRef]
Miyoshi, G.T.; Arruda, M.S.; Osco, L.P.; Marcato Junior, J.; Goncalves, D.N.; Imai, N.N.; Tommaselli, A.M.G.; Honkavaara, E.; Goncalves, W.N. A novel deep learning method to identify single tree species in UAV-based hyperspectral images. Remote Sens. 2020, 12, 1294. [Google Scholar] [CrossRef]
Lee, S.; Moon, H.; Choi, Y.; Yoon, D.K. Analyzing thermal characteristics of urban streets using a thermal imaging camera: A case study on commercial streets in Seoul, Korea. Sustainability 2018, 10, 519. [Google Scholar] [CrossRef]
Carrasco-Benavides, M.; Antunez-Quilobran, J.; Baffico-Hernandez, A.; Avila-Sanchez, C.; Ortega-Farias, S.; Espinoza, S.; Gajardo, J.; Mora, M.; Fuentes, S. Performance assessment of thermal infrared cameras of different resolutions to estimate tree water status from two cherry cultivars: An alternative to midday stem water potential and stomatal conductance. Sensors 2020, 20, 3596. [Google Scholar] [CrossRef] [PubMed]
Hernandez-Clemente, R.; Hornero, A.; Mottus, M.; Penuelas, J.; Gonzalez-Dugo, V.; Jimenez, J.; Suarez, L.; Alonso, L.; Zarco-Tejada, P. Early diagnosis of vegetation health from high-resolution hyperspectral and thermal imagery: Lessons learned from empirical relationships and radiative transfer modelling. Curr. For. Rep. 2019, 5, 169–183. [Google Scholar] [CrossRef]
Parihar, G.; Saha, S.; Giri, L.I. Application of infrared thermography for irrigation scheduling of horticulture plants. Smart Agric. Technol. 2021, 1, 100021. [Google Scholar] [CrossRef]
Glowacz, A. Fault diagnosis of electric impact drills using thermal imaging. Measurement 2021, 171, 108815. [Google Scholar] [CrossRef]
Lahiri, B.; Bagavathiappan, S.; Jayakumar, T.; Philip, J. Medical applications of infrared thermography: A review. Infrared Phys. Technol. 2012, 55, 221–235. [Google Scholar] [CrossRef]
Goel, J.; Nizamoglu, M.; Tan, A.; Gerrish, H.; Cranmer, K.; ElMuttardi, N.; Barnes, D.; Dziewulski, P. A prospective study comparing the FLIR ONE with laser Doppler imaging in the assessment of burn depth by a tertiary burns unit in the United Kingdom. Scars Burn. Heal. 2020, 6, 2059513120974261. [Google Scholar] [CrossRef]
Jaspers, M.E.; Carriere, M.; Meij-de Vries, A.; Klaessens, J.; Van Zuijlen, P. The FLIR ONE thermal imager for the assessment of burn wounds: Reliability and validity study. Burns 2017, 43, 1516–1523. [Google Scholar] [CrossRef]
Choi, Y.; Kim, N.; Hwang, S.; Park, K.; Yoon, J.; An, K.; Kweon, I. KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving. IEEE Trans. Intell. Transp. Syst. 2018, 19, 934–948. [Google Scholar] [CrossRef]
Lee, A.; Cho, Y.; Shin, Y.S.; Kim, A.; Myung, H. ViViD++: Vision for visibility dataset. IEEE Robot. Autom. Lett. 2022, 7, 6282–6289. [Google Scholar] [CrossRef]
Farooq, M.A.; Shariff, W.; Khan, F.; Corcoran, P.; Rotariu, C. C3I Thermal Automotive Dataset. 2022. Available online: https://ieee-dataport.org/documents (accessed on 21 April 2025).
Hwang, S.; Park, J.; Kim, N.; Choi, Y.; Kweon, I. Multispectral Pedestrian Detection: Benchmark Dataset and Baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1237–1244. [Google Scholar]
Berg, A.; Ahlberg, J.; Felsberg, M. A thermal infrared dataset for evaluation of short-term tracking methods. In Proceedings of the Swedish Symposium on Image Analysis, Copenhagen, Denmark, 12–14 June 2015. [Google Scholar]
Kopaczka, M.; Kolk, R.; Merhof, D. A fully annotated thermal face database and its application for thermal facial expression recognition. In Proceedings of the IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Houston, TX, USA, 14–17 May 2018; pp. 1–6. [Google Scholar]
Cho, Y.; Bianchi-Berthouze, N.; Marquardt, N.; Julier, S. Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature Patterns. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–12. [Google Scholar]
Vertens, J.; Zürn, J.; Burgard, W. HeatNet: Bridging the Day-Night Domain Gap in Semantic Segmentation with Thermal Images. arXiv 2020, arXiv:2003.04645. [Google Scholar]
Dai, W.; Zhang, Y.; Chen, S.; Sun, D.; Kong, D. A Multi-spectral Dataset for Evaluating Motion Estimation Systems. arXiv 2021, arXiv:2007.00622. [Google Scholar]
Yin, J.; Li, A.; Li, T.; Yu, W.; Zou, D. M2DGR: A Multi-sensor and Multi-scenario SLAM Dataset for Ground Robots. IEEE Robot. Autom. Lett. 2021, 7, 2266–2273. [Google Scholar] [CrossRef]
Vidas, S.; Lakemond, R.; Denman, S.; Fookes, C.; Sridharan, S.; Wark, T. A mask-based approach for the geometric calibration of thermal-infrared cameras. IEEE Trans. Instrum. Meas. 2012, 61, 1625–1635. [Google Scholar] [CrossRef]
Ko, K.; Shim, K.; Lee, K.; Kim, C. Large-scale benchmark for uncooled infrared image deblurring. IEEE Sens. J. 2023, 23, 30119–30128. [Google Scholar] [CrossRef]
Kuang, X.; Sui, X.; Liu, Y.; Chen, Q.; Gu, G. Single infrared image enhancement using a deep convolutional neural network. Neurocomputing 2019, 332, 119–128. [Google Scholar] [CrossRef]
Liu, Y.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
Huang, N.; Liu, K.; Liu, Y.; Zhang, Q.; Han, J. Cross-modality person re-identification via multi-task learning. Pattern Recognit. 2022, 128, 108653. [Google Scholar] [CrossRef]
Poggi, M.; Ramirez, P.Z.; Tosi, F.; Salti, S.; Mattoccia, S.; Stefano, L.D. Cross-spectral Neural Radiance Fields. In Proceedings of the International Conference on 3D Vision (3DV), Prague, Czech Republic, 12–15 September 2022; pp. 606–616. [Google Scholar]
Ye, T.; Wu, Q.; Deng, J.; Liu, G.; Liu, L.; Xia, S.; Pei, L. Thermal-nerf: Neural radiance fields from an infrared camera. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 1046–1053. [Google Scholar]
Hassan, M.; Forest, F.; Fink, O.; Mielle, M. Thermonerf: Multimodal neural radiance fields for thermal novel view synthesis. arXiv 2024, arXiv:2403.12154. [Google Scholar]
Lin, Y.; Pan, X.; Fridovich-Keil, S.; Wetzstein, G. ThermalNeRF: Thermal Radiance Fields. In Proceedings of the 2024 IEEE International Conference on Computational Photography (ICCP), Lausanne, Switzerland, 22–24 July 2024; pp. 1–12. [Google Scholar]
Zhong, C.; Xu, C. TeX-NeRF: Neural Radiance Fields from Pseudo-TeX Vision. arXiv 2024, arXiv:2410.04873. [Google Scholar]
Xu, J.; Liao, M.; Kathirvel, R.; Patel, V. Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 321–339. [Google Scholar]
Chopra, S.; Cladera, F.; Murali, V.; Kumar, V. AgriNeRF: Neural Radiance Fields for Agriculture in Challenging Lighting Conditions. arXiv 2024, arXiv:2409.15487. [Google Scholar]
Özer, M.; Weiherer, M.; Hundhausen, M.; Egger, B. Exploring Multi-modal Neural Scene Representations With Applications on Thermal Imaging. arXiv 2024, arXiv:2403.11865. [Google Scholar]
Chen, Q.; Shu, S.; Bai, X. Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 253–269. [Google Scholar]
Lu, R.; Chen, H.; Zhu, Z.; Qin, Y.; Lu, M.; Zhang, L.; Yan, C.; Xue, A. ThermalGaussian: Thermal 3D Gaussian Splatting. arXiv 2024, arXiv:2409.07200. [Google Scholar]
Liu, Y.; Chen, X.; Yan, S.; Cui, Z.; Xiao, H.; Liu, Y.; Zhang, M. ThermalGS: Dynamic 3D Thermal Reconstruction with Gaussian Splatting. Remote Sens. 2025, 17, 335. [Google Scholar] [CrossRef]
Yang, K.; Liu, Y.; Cui, Z.; Liu, Y.; Zhang, M.; Yan, S.; Wang, Q. NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics. arXiv 2025, arXiv:2503.03115. [Google Scholar]
Planck, M. On the Law of Distribution of Energy in the Normal Spectrum. Ann. Der Phys. 1901, 4, 1. [Google Scholar]
Stefan, J.; Ebel, J. Kommentierter Neusatz von Uber die Beziehung Zwischen der Wärmestrahlung und der Temperatur. 2014. Available online: https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&q=Kommentierter+Neusatz+von+Uber+die+Beziehung+Zwischen+der+Wärmestrahlung+und+der+Temperatur.&btnG= (accessed on 21 April 2025).
Rogalski, A. Infrared Detectors; CRC Press: Boca Raton, FL, USA, 2000. [Google Scholar]
Adan, A.; Quintana, B.; Aguilar, J.G.; Pérez, V.; Castilla, F.J. Towards the Use of 3D Thermal Models in Constructions. Sustainability 2020, 12, 8521. [Google Scholar] [CrossRef]
Grechi, G.; Fiorucci, M.; Marmoni, G.M.; Martino, S. 3D Thermal Monitoring of Jointed Rock Masses Through Infrared Thermography and Photogrammetry. Remote Sens. 2021, 13, 957. [Google Scholar] [CrossRef]
Schmidt, R. How Patent-Pending Technology Blends Thermal and Visible Light. Available online: https://www.fluke.com/en/learn/blog/thermal-imaging/how-patent-pending-technology-blends-thermal-and-visible-light (accessed on 13 March 2025).
Bao, F.; Jape, S.; Schramka, A.; Wang, J.; McGraw, T.E.; Jacob, Z. Why Are Thermal Images Blurry. arXiv 2023, arXiv:2307.15800. [Google Scholar] [CrossRef]
Zhang, H.; Hu, Y.; Yan, M. Thermal Image Super-Resolution Based on Lightweight Dynamic Attention Network for Infrared Sensors. Sensors 2023, 23, 8717. [Google Scholar] [CrossRef]
Chen, C.; Yeh, C.; Chang, B.; Pan, J. 3D Reconstruction from IR Thermal Images and Reprojective Evaluations. Math. Probl. Eng. 2015, 2015, e520534. [Google Scholar] [CrossRef]
Ma, Y.; Wang, Y.; Mei, X.; Liu, C.; Dai, X.; Fan, F.; Huang, J. Visible/Infrared Combined 3D Reconstruction Scheme Based on Nonrigid Registration of Multi-Modality Images with Mixed Features. IEEE Access 2019, 7, 19199–19211. [Google Scholar] [CrossRef]
Lang, S.; Jager, K. 3D Scene Reconstruction from IR Image Sequences for Image-Based Navigation Update and Target Detection of an Autonomous Airborne System. Infrared Technol. Appl. XXXIV 2008, 6940, 535–543. [Google Scholar]
Schonberger, J.; Frahm, J. Structure-from-Motion Revisited. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Robertson, D.; Cipolla, R. Practical Image Processing and Computer Vision. In Chapter Structure from Motion; John Wiley & Sons: New York, NY, USA, 2009. [Google Scholar]
Schonberger, J.; Zheng, E.; Pollefeys, M.; Frahm, J. Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Cheng, Z.; Esteves, C.; Jampani, V.; Kar, A.; Maji, S.; Makadia, A. Lu-Nerf: Scene and Pose Estimation by Synchronizing Local Unposed Nerfs. arXiv 2023, arXiv:2306.05410. [Google Scholar]
Furukawa, Y.; Hernandez, C. Multi-View Stereo: A Tutorial. Found. Trends Comput. Graph. Vis. 2015, 9, 1–148. [Google Scholar] [CrossRef]
Wang, S.; Jiang, H.; Xiang, L. CT-MVSnet: Efficient Multi-View Stereo with Cross-Scale Transformer. In Proceedings of the International Conference on Multimedia Modeling, Amsterdam, The Netherlands, 29 January–2 February 2024; pp. 394–408. [Google Scholar]
Mildenhall, B.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.; Barron, J. NeRF in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16190–16199. [Google Scholar]
Cui, Z.T.; Gu, L.; Sun, X.; Ma, X.Z.; Qiao, Y.; Harada, T. Aleth-Nerf: Low-Light Condition View Synthesis with Concealing Fields. arXiv 2023, arXiv:2303.05807. [Google Scholar]
Martin-Brualla, R.; Radwan, N.; Sajjadi, M.; Barron, J.; Dosovitskiy, A.; Duckworth, D. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7210–7219. [Google Scholar]
Wang, P.; Liu, L.; Liu, Y.; Theobalt, C.; Komura, T.; Wang, W. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv 2021, arXiv:2106.10689. [Google Scholar]
Zeng, Y.J.; Lei, J.; Feng, T.M.; Qin, X.Y.; Li, B.; Wang, Y.Q.; Wang, D.X.; Song, J. Neural Radiance Fields-Based 3D Reconstruction of Power Transmission Lines Using Progressive Motion Sequence Images. Sensors 2023, 23, 9537. [Google Scholar] [CrossRef]
Zhu, H.; Sun, Y.; Liu, C.; Xia, L.; Luo, J.; Qiao, N.; Nevatia, R.; Kuo, C. Multimodal Neural Radiance Field. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 9393–9399. [Google Scholar]
Wang, B.J.; Zhang, D.H.; Su, Y.X.; Zhang, H.J. Enhancing View Synthesis with Depth-Guided Neural Radiance Fields and Improved Depth Completion. Sensors 2024, 24, 1919. [Google Scholar] [CrossRef]
Yu, Z.; Chen, A.; Huang, B.; Sattler, T.; Geiger, A. Mip-splatting: Alias-free 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19447–19456. [Google Scholar]
Zou, C.; Ma, Q.; Wang, J.; Lu, M.; Zhang, S.; He, Z. Gaussianenhancer: A General Rendering Enhancer for Gaussian Splatting. In Proceedings of the ICASSP 2025–2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar]
Zou, C.; Ma, Q.; Wang, J.; Lu, M.; Zhang, S.; Qu, Z.; He, Z. Gaussianenhancer++: A General GS-Agnostic Rendering Enhancer. Symmetry 2025, 17, 442. [Google Scholar] [CrossRef]
Lu, T.; Yu, M.; Xu, L.; Xiangli, Y.; Wang, L.; Lin, D.; Dai, B. Scaffold-gs: Structured 3d gaussians for view-daptive rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 20654–20664. [Google Scholar]
Guédon, A.; Lepetit, V. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5354–5363. [Google Scholar]
Huang, B.; Yu, Z.; Chen, A.; Geiger, A.; Gao, S. 2d gaussian splatting for geometrically accurate radiance fields. In SIGGRAPH 2024 Conference Papers; Association for Computing Machinery: Denver, CO, USA, 2024. [Google Scholar]
Zhang, B.; Fang, C.; Shrestha, R.; Liang, Y.; Long, X.; Tan, P. Rade-gs: Rasterizing depth in gaussian splatting. arXiv 2024, arXiv:2406.01467. [Google Scholar]
Rangelov, D.; Waanders, S.; Waanders, K.; van Keulen, M.; Miltchev, R. Impact of camera settings on 3D Reconstruction quality: Insights from NeRF and Gaussian Splatting. Sensors 2024, 24, 7594. [Google Scholar] [CrossRef] [PubMed]
Ye, S.; Dong, Z.; Hu, Y.; Wen, Y.; Liu, Y. Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting. Comput. Graph. Forum 2024, 43, e15213. [Google Scholar] [CrossRef]
Lin, S.; Wang, H.; Zhang, X.; Wang, D.; Zu, D.; Song, J.; Liu, Z.; Huang, Y.; Huang, K.; Tao, N.; et al. Direct spray-coating of highly robust and transparent Ag nanowires for energy saving windows. Nano Energy 2019, 62, 111–116. [Google Scholar] [CrossRef]
Hsu, P.; Liu, C.; Song, A.Y.; Zhang, Z.; Peng, Y.; Xie, J.; Liu, K.; Wu, C.; Catrysse, P.; Cai, L.; et al. A dual-mode textile for human body radiative heating and cooling. Sci. Adv. 2017, 3, e1700895. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, Q. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]

Figure 1. AgNWs/PVB ethanol solution and standard checkerboard calibration.

Figure 2. Illustration of our proposed TGA-GS: (a) Multimodal Gaussian Networkfor joint RGB-thermal training and (b) Thermal and RGB SR Networks with cross-modal guidance.

Figure 3. Qualitative evaluation of thermal view synthesis.

Figure 4. Qualitative evaluation of RGB view synthesis.

Figure 5. Qualitative evaluation of mesh reconstruction on Fruits1 scene.

Figure 6. Qualitative evaluation of mesh reconstruction on Cup scene.

Figure 7. Ablation study of mesh reconstruction on Cup scene.

Table 1. Quantitative results of thermal view synthesis (best, second best).

Metrics	Methods	Fruits1	Fruits2	Cup	Laptop	Hairdryer	Avg.
PSNR (dB)↑	3DGS [2]	37.34	29.29	29.47	35.55	30.90	32.51
	ThermalGaussian [62]	18.82	18.65	17.01	10.31	17.64	16.49
	2DGS [94]	28.52	22.79	27.77	36.12	28.48	28.74
	RaDe-GS [95]	32.56	28.46	30.24	37.25	27.75	31.25
	Ours	35.51	30.25	29.72	38.94	29.50	32.78
SSIM↑	3DGS	0.967	0.935	0.948	0.975	0.951	0.955
	ThermalGaussian	0.743	0.781	0.782	0.554	0.744	0.721
	2DGS	0.922	0.872	0.952	0.972	0.948	0.933
	RaDe-GS	0.952	0.932	0.954	0.974	0.943	0.951
	Ours	0.952	0.933	0.949	0.972	0.899	0.941
LPIPS ↓	3DGS	0.108	0.156	0.099	0.058	0.089	0.102
	ThermalGaussian	0.500	0.361	0.284	0.628	0.299	0.414
	2DGS	0.233	0.229	0.104	0.076	0.100	0.148
	RaDe-GS	0.162	0.168	0.095	0.065	0.108	0.120
	Ours	0.145	0.095	0.087	0.021	0.111	0.092
Mem. (MB) ↓	3DGS	1467	1733	1743	1783	1563	1658
	ThermalGaussian	945	1363	1179	1147	1075	1142
	2DGS	659	1069	711	711	729	776
	RaDe-GS	939	1309	1027	1859	1049	1237
	Ours	551	571	502	596	480	540
FPS ↑	3DGS	292	351	230	234	218	265
	ThermalGaussian	390	432	312	347	314	359
	2DGS	260	251	212	173	223	224
	RaDe-GS	201	298	209	140	184	206
	Ours	387	441	301	315	318	352

Table 2. Quantitative results of RGB view synthesis (best, second best).

Metrics	Methods	Fruits1	Fruits2	Cup	Laptop	Hairdryer	Avg.
PSNR (dB)↑	3DGS [2]	37.26	36.37	38.43	41.79	36.77	38.13
	ThermalGaussian [62]	23.47	25.98	26.74	23.53	26.42	25.23
	2DGS [94]	37.55	36.21	38.11	39.85	37.18	37.78
	RaDe-GS [95]	38.58	35.75	38.37	39.70	36.49	37.78
	Ours	39.98	39.66	42.12	44.05	38.09	40.77
SSIM↑	3DGS	0.941	0.925	0.953	0.970	0.941	0.946
	ThermalGaussian	0.727	0.695	0.743	0.635	0.715	0.703
	2DGS	0.938	0.918	0.945	0.952	0.943	0.940
	RaDe-GS	0.929	0.924	0.948	0.955	0.936	0.939
	Ours	0.937	0.939	0.960	0.963	0.852	0.930
LPIPS ↓	3DGS	0.078	0.115	0.131	0.055	0.130	0.102
	ThermalGaussian	0.249	0.264	0.319	0.240	0.338	0.282
	2DGS	0.080	0.122	0.147	0.076	0.134	0.112
	RaDe-GS	0.087	0.126	0.153	0.077	0.149	0.118
	Ours	0.075	0.075	0.110	0.028	0.124	0.082
Mem. (MB) ↓	3DGS	1433	1685	1447	1511	1403	1496
	ThermalGaussian	945	1363	1179	1147	1075	1142
	2DGS	827	1089	951	1177	717	952
	RaDe-GS	865	1135	887	941	817	929
	Ours	551	571	502	596	480	540
FPS ↑	3DGS	307	428	342	333	334	349
	ThermalGaussian	442	487	358	365	380	406
	2DGS	215	239	220	152	225	210
	RaDe-GS	205	283	225	228	228	232
	Ours	439	544	505	447	445	476

Table 3. Ablation study of thermal view synthesis (best, second best).

Variant	PSNR↑	SSIM↑	LPIPS↓
w/o Thermal Gradient Alignment Loss	32.22	0.937	0.144
w/o SR module	31.54	0.930	0.340
Ours	32.78	0.941	0.092

Table 4. Ablation study of RGB view synthesis (best, second best).

Variant	PSNR↑	SSIM↑	LPIPS↓
w/o Thermal Gradient Alignment Loss	40.71	0.924	0.100
w/o SR module	39.26	0.904	0.218
Ours	40.77	0.930	0.082

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, C.; Ma, Q.; Wang, J.; Lu, R.; Lu, M.; Qu, Z. TGA-GS: Thermal Geometrically Accurate Gaussian Splatting. Appl. Sci. 2025, 15, 4666. https://doi.org/10.3390/app15094666

AMA Style

Zou C, Ma Q, Wang J, Lu R, Lu M, Qu Z. TGA-GS: Thermal Geometrically Accurate Gaussian Splatting. Applied Sciences. 2025; 15(9):4666. https://doi.org/10.3390/app15094666

Chicago/Turabian Style

Zou, Chen, Qingsen Ma, Jia Wang, Rongfeng Lu, Ming Lu, and Zhaowei Qu. 2025. "TGA-GS: Thermal Geometrically Accurate Gaussian Splatting" Applied Sciences 15, no. 9: 4666. https://doi.org/10.3390/app15094666

APA Style

Zou, C., Ma, Q., Wang, J., Lu, R., Lu, M., & Qu, Z. (2025). TGA-GS: Thermal Geometrically Accurate Gaussian Splatting. Applied Sciences, 15(9), 4666. https://doi.org/10.3390/app15094666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TGA-GS: Thermal Geometrically Accurate Gaussian Splatting

Abstract

1. Introduction

2. Related Work

2.1. Thermal Imaging

2.2. Three-Dimensional Reconstruction and Novel View Synthesis

3. Methods

3.1. Preliminaries

3.2. Multimodal Calibration

3.3. TGA-GS

3.4. Loss

4. Experiments

4.1. Datasets and Baselines

4.2. Evaluation Metrics

4.3. Implementation Details

4.4. Thermal View Synthesis

4.5. RGB View Synthesis

4.6. Three-Dimensional Object Reconstruction

4.7. Ablation Study

4.8. Practical Implications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI