1. Introduction
Space-based lidar systems provide critical information about the vertical distributions of aerosols and clouds that greatly improve our understanding of the Earth’s air quality, weather, and climate system [
1]. Since 2006, three elastic backscatter lidar sensors provided vertical profiles of the atmosphere from space. The Cloud-Aerosol Transport System (CATS) was an elastic backscatter lidar that operated on the International Space Station (ISS) for 33 months (February 2015 to October 2017), primarily at the 1064 nm wavelength [
2]. The Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP), aboard the CALIPSO satellite, provided vertical profiles of clouds and aerosols at 532 and 1064 nm from 2006 to 2023 [
3]. The Advanced Topographic Laser Altimeter System (ATLAS) that currently flies on the ICESat-2 mission and launched in September of 2018 also includes an atmospheric channel [
4]. These lidar systems fundamentally measure attenuated backscatter at different polarization states (parallel and perpendicular), except for ATLAS, which only measures the total signal without polarization discrimination. These measurements are then used to derive top/base heights and the geometrical thickness of aerosol and cloud features in the atmosphere, often referred to as a “vertical feature mask” data product, that have been utilized in numerous applications, such as cloud detection frequencies [
5,
6,
7], aerosol plume tracking [
8,
9], vertical proximity of clouds and aerosols [
10,
11], planetary boundary layer height [
12,
13], and aerosol spatiotemporal variability [
14,
15]. However, noise in daytime observations presents challenges for analyzing the data from spaceborne lidar systems like CATS and CALIOP [
16,
17].
Daytime signals from space-based lidars are degraded by noise from solar background light that can dwarf the atmospheric signal. The magnitude of the solar background is dependent on properties of the scatterer (brighter targets like ice or clouds reflect more sunlight to the lidar telescope) and instrument specifications (telescope field of view, spectral width of bandpass filters, etc.). The signal-to-noise ratio (SNR), a metric used to determine the accuracy that a lidar can measure backscatter, is significantly lower during daytime than at night for CALIOP [
16] and CATS [
17] due to solar contributions to the total signal increasing noise in the detector response. This lower SNR inhibits the ability to detect atmospheric features during the daytime. Yorks et al. (2016) [
2] shows that the minimum detectable backscatter (MDB), which is defined as the lowest attenuated backscatter value within cloud and aerosol layers identified by layer detection algorithms, is higher at daytime compared to nighttime for both CALIOP and CATS. Traditionally, vertical and horizontal averaging has been utilized to improve the daytime SNR and accuracy of feature detection but at the expense of resolution. Despite averaging the signal as much as 60 to 80 km horizontally, both CALIOP and CATS algorithms struggle to detect the optically thinnest cloud and aerosol features during the daytime [
18,
19]. This limits the ability of researchers to confidently use the daytime spaceborne lidar data for applications such as radiative forcing [
19,
20].
New advanced and machine learning techniques, such as principal component analysis (PCA) and wavelet decomposition, have been developed for image processing and applied to Earth Science data. PCA is a widely used statistical technique for the problem of dimensionality reduction. For Earth Science applications, PCA was used to determine burned areas in imagery [
21], detect aerosols using Raman lidar data [
22], and reduce noise in lidar observations [
23]. Wavelet decomposition is a well-known filter-based approach to noise reduction. Inspired by the Fourier series, which represents signals as sums infinitely oscillating functions (sines and cosines), the wavelets model represents signals as sums of brief oscillations. The Aerosol and Carbon Detection Lidar (ACDL), currently flying on the Chinese atmospheric environment monitoring satellite DQ-1, uses a form of wavelet denoising in their data processing [
24]. Yorks et al. (2021) [
25] applied PCA and wavelet decomposition denoising techniques to simulated lidar data and found that the wavelet decomposition had better improvement in SNR and less distortion of the signal compared to PCA. However, the SNR improvement was modest: between a factor of 1.5 and 2.0. Marais et al. (2016) [
26] developed an approach to simultaneously denoise and invert backscatter and extinction from lidar data, but this technique does not have the computational efficiency for near-real time (NRT) operational data product [
27] production (i.e., the process of producing the Level 1 and Level 2 standard NASA data products from spaceborne remote sensing instruments for use by the research and applications communities) from a spaceborne lidar. New, state-of-the-art deep learning image denoising algorithms can achieve high performance and the computational efficiency required for NRT operational data product production. In this paper, one such deep learning algorithm is applied and evaluated using both CATS daytime and simulated lidar data. Signal metrics, such as SNR and distortion, are estimated for this deep learning algorithm and compared to traditional averaging and wavelet decomposition techniques. Errors in layer detection are quantified using the truth in simulated datasets, and a comparison is performed to the standard CATS vertical feature mask data products that employ horizontal averaging to detect faint atmospheric features.
2. Methods
Recent advances in deep learning have revolutionized image-based denoising. The first deep learning architecture to outperform classical denoising algorithms was “DnCNN” introduced by Zhang et al. in 2017 [
28]. Since then, various deep learning image denoising methods have continued to make incremental improvements due to concomitant advances in hardware, model architectures, and learning strategies [
29,
30,
31].
Lidar curtains are images. Furthermore, most CATS raw background-subtracted photon counts fall (roughly) between 0 and 255 like standard image pixels. Therefore, it is fairly straightforward to adapt these algorithms to denoise raw lidar data.
For this study, two image denoising deep learning model architectures were tested: “DnCNN” [
28] and “DDUNet” [
30]. The DnCNN is a series of two-dimensional convolutional blocks (Conv2D) with the now standard batch normalization (BatchNorm) and rectified linear unit (ReLU) activations (
Figure 1 in Zhang et al., 2017 [
28]). The authors define this network’s depth as the number of Conv2D blocks with depths between 17 and 20 cited as tested in the paper. DDUNet has a more complex architecture of cascading U-Net variants (DUNet, called “Dense U-Net Blocks”) with each subsequent DUNet connected to the output of all levels of all previous DUNets. To limit the number of parameters in the model, the number of channels remains the same after each downsampling in each DUNet, which is in contrast to the classic U-Net [
32]. Jia et al. (2021) [
30] tested their model with different numbers of DUNet blocks. They tested between 2 and 5 with model performance increasing with more DUNets; however, they were only testing on a single GPU and ran out of memory, so their results were capped at 5.
A training dataset of pairs of noisy and clean lidar curtain images was created. Artificial Poisson noise was added to nighttime data to emulate the CATS detector response during daytime solar conditions to create the pairs. CATS 2015 Mode 7.2 [
25] 1064 nm nighttime raw photon count data were used as the training data source. The input for training was the noised night data, but the output was the residual noise instead of the clean image (
Figure 1). This follows Zhang et al. (2017) [
28], where they argue that for denoising, deep learning models with residual learning (also called “skip connections”) are easier to train when learning a residual mapping. Once a model is trained, the denoised image is obtained by subtracting the output (residual noise) from the input (noisy image). Following Zhang et al. (2017) [
28] and Jia et al. (2021) [
30], 3 different noise levels were added, which are within and up to the median of CATS solar background levels: 40, 80, and 160 solar photon counts (
Table 1). Noisy input samples were background-subtracted and normalized to be approximately within the range 0 to 1.
All of the CATS 2015 Mode 7.2 1064 nm data (March–December) with a below-surface standard deviation of 0.2–0.6 background-subtracted counts in the middle third of the granule were used for training. The standard deviation limits are set to select the night data with the least amount of background. At night, the primary source of noise is detector dark current noise, not solar background. The lower limit of 0.2 is to ensure no data captured during boresighting or other engineering procedures were included. To ensure only the darkest night data were used, only the middle third of each granule was used. The result was that 1.77 months’ worth of data were selected, all of which were from 2015. Most of these data are from the months April through October, as seen in
Figure 2.
The Python implementation of Tensorflow (version 2.2.0) [
33] was used to train models using the L2 loss function [
30]. Model architectures were coded for both DnCNN and DDUNet. After initial comparisons, it was clear that DDUNet provided more realistic-looking features. This, combined with Jia et al.’s (2021) [
30] evidence of better performance, led to simply applying as much computing power as possible toward training a model with the DDUNet architecture. This meant selecting the deepest (therefore largest in size) DDUNet that could fit into GPU memory. The hardware available were a maximum of 4 GPUs with enough supporting CPUs and RAM. Further constraining the size of the network was the input patch size. Training image processing neural networks requires a fixed input size, typically necessitating breaking large images up into “patches” (
Figure 3). Generally, a larger patch size is better, but it comes at the expense of making training slower, more difficult, or impossible based on hardware limitations [
34]. Conversely, choosing too small a patch size loses context and results in poorer model performance. For CATS, the smallest patch size that incorporated most of the troposphere was 256 × 256 (bins × profiles). The altitude range of 0–20 km is approximately covered by 256, 78 m raw CATS bins. In summary, given a patch size of 256 × 256, Jia et al.’s (2021) [
30] 64 filters, a network depth of 7 was experimentally determined to be the deepest possible on the available hardware.
Once the data were split into fixed-size patches and the largest trainable version of a DDUNet was determined, new models were repeatedly trained, randomly shuffling the sample order each time. Samples were augmented using random flips and translations (both horizontal and vertical) but not rotations. The reason for not using rotations was to allow the model to exploit the natural anisotropy of the atmosphere. The atmosphere is generally vertically stratified with high persistence of features horizontally. While typical image processing algorithms need to be able to anticipate a wide variety of shapes and orientations, for the specific case of an atmospheric lidar signal, using only a horizontal orientation simplifies the learning task. After approximately 20 different versions of the same model were trained, the best one was chosen by visual assessment against a few reference cases. Again, the only difference in training each time was the sample order, which was determined by random shuffling. A summary of the best model is shown in
Table 1.
When using the trained model to denoise data (called “inference”), the input photon count curtain data are first background-subtracted and normalized to the same range as when training was performed. The two-dimensional array of data is then broken up into multiple overlapping input patches of size 256 bins by 256 profiles. Therefore, every data point in the original data array has multiple predictions. For each 256 × 256 input patch, the denoised signal is the difference between the input and model output (residual noise; see
Figure 1). To reduce edge effects, the mean denoised value for each of the data points is taken to be the final denoised result. It was observed that this improved denoising quality, as demonstrated in
Figure 4. There may be better ways to aggregate overlapping patch predictions, but the simple approach of aggregating by mean was effective enough for this study. The degree of patch overlap is determined by the stride and has a profound effect on runtime, since it changes the number of samples used to denoise a particular array of data. Vanishing improvements were observed lowering the stride below 16; therefore, 16 was used.
3. Performance Assessment Using Simulated Data
Simulated data were generated to assess the performance of the DDUNet denoising. Particulate backscatter and extinction coefficients at 1064 nm retrieved from six different high-SNR Cloud Physics Lidar (CPL) [
37] cases were input into the GSFC lidar simulator [
15] to generate a CATS-like signal at noisy, yet typical, daytime conditions. The six CPL cases, outlined in
Table 2, were chosen because they provide multiple types of clouds (ice, liquid water, stratus, etc.) and aerosol layers (dust, smoke, etc.), representing a good cross-section of realistic scenes and approximately 18,385 profiles of simulated spaceborne lidar data at a 1 km horizontal resolution. The DDUNet model was then used to denoise the raw signal (photon counts) of the simulated data.
3.1. Signal Quality Metrics
For the six simulated cases in
Table 2, the CPL data, which are used as the input to the lidar simulation model, is considered the “true” signal. Using the known “true” or expected signal, the following metrics were computed: SNR (signal-to-noise ratio), D (distortion), PSNR, and SSIM. SNR and D were computed as defined in Yorks et al. (2021) [
25], which are shown for reference here in Equations (1) and (2), respectively. E is the expected signal, N is the noise, and A in Equation (
2) is the observed or “actual” value. The signal used to compute all these metrics was the background-subtracted photon counts.
Table 3 shows the metrics computed within layers across all the six simulated cases for DDUNet, Wavelet, averaging, and baselines (“Day” and “Night”) sorted by decreasing PSNR. The daytime PSNR (32.87) is about 57% lower than the night PSNR (57.03), demonstrating the large disparity in the daytime and nighttime performance. Various averaging levels (6, 30, and 120 km) were mapped back to the raw resolution signal to compute the SNR. This ensured a comparison of all methods at the same (2 km) resolution. The mapping of a large, averaged block pixel back to many smaller pixels (i.e., “smearing”) causes D to become increasingly negative with decreasing resolution. This effect likely also caps the SNRs as an increasingly coarse representation of signal asymptotes to a mean value. The wavelet decomposition method chosen was the best of the ensemble tested in Yorks et al. (2021) [
25] and outperformed all the averaging resolutions. Furthermore, the DDUNet outperforms the wavelet for SNR, PSNR, and SSIM. Most notably, DDUNet increases the SNR by a factor of 2.5 compared to the baseline daytime data (“Day” in
Table 3). The distortion is a factor of 2–4 lower than averaging the data to 30 or 120 km, but it is slightly higher than the wavelet decomposition method, which is likely due to areas where the signal is too low for DDUNet to recover (see layer at 15 km in
Supplemental Figure S13). Finally, the DDUNet improves the PSNR to 49.46, which is roughly 87% of the nighttime PSNR.
The 18 August 2015 case contains a coincident segment where CPL flew under the expected ISS/CATS ground track. A zoomed-in image of the cirrus cloud layer observed in this segment is shown in
Figure 5. Due to differences in platform speeds, CATS took approximately 1 min to travel the 300 km segment shown in
Figure 5, while CPL took 30 min. Given the cloud evolution over the 30 min, the coincidence in time and space is not precise enough to enable a comparison of the simulated true/expected signal to the CATS observation. The vertical structure of the clouds is different in the real CATS data (
Figure 5E,G) compared to the simulated data from CPL (
Figure 5A).
Despite the cloud evolution over the scene, horizontally integrating data provides an opportunity to assess how well the simulated noisy daytime spaceborne lidar data represent true CATS daytime signals. For this comparison, the SNR was computed differently from Equation (
1). The SNR was computed according to Equation (
3) (
with “e” meaning empirical) by taking the mean and standard deviation across the horizontal extent of the segment within layers (either true or detected). Therefore, each vertical level, i.e., bin, had its own mean and standard deviation. But only bins where there was a minimum of 25 within-layer data points were considered in the analysis to ensure statistical weight.
Figure 6 shows a similar change in SNR after denoising between real versus simulated data. It should be noted that the actual CATS data are at 1.8 km horizontal resolution, whereas the simulation is at a 2 km resolution. Even so, it is apparent that the simulation is a bit noisier than the real data (
Figure 5 panels F vs. G), which is likely a consequence of the solar zenith angle and surface reflectance assumed for the simulation.
3.2. Layer Detection Enhancement
The standard CATS layer detection algorithm, which utilizes a threshold profile applied to the scattering ratio for each individual profile [
25], was run on both the noisy and denoised simulated data. The denoised detection was run at 2 km; however, like the operational CATS processing, the noisy data needed additional averaging to 60 km to detect a comparable number of layers. The layer detection algorithm requires attenuated total backscatter (ATB) as the primary input. The transformation from denoised photon counts into ATB was straightforward, as it used the same instrument specifications (laser energy, telescope area, detector efficiencies, etc.) used to create the simulation. As with real CATS data, the Rayleigh signal is too weak to be recovered by the DDUNet model; however, this is not a barrier to the CATS standard layer detection algorithm since a theoretical Rayleigh signal is used to create a detection threshold [
25,
38].
Figure 7 shows the 1064 nm ATB (left) and the results of the layer detection algorithm (right) for the 18 August 2015 case. Layers detected from the denoised data (
Figure 7F) are visually much better aligned with the truth (
Figure 7B) versus the layers detected via averaging, which are herein called “standard processing” or “standard layers” (
Figure 7D). This is evident in
Figure 8, which visually compares detected layers to true layers. For the denoised data, the layer detection algorithm does very well in identifying aerosol and cloud bins with stronger backscatter signals, which are referred to as true positives (
Figure 8, top panel, purple). However, there are false negatives—bins within true layers that are not detected in the denoised ATB (
Figure 8, top panel, teal). The majority of these bins have lower true ATB. False negatives are even more prevalent in the standard processing (
Figure 8, middle panel, teal). Additionally, there are more false positives (
Figure 8, middle panel, red) due to the layer smearing effects from averaging the data to 60 km, especially near the edges of the cloud and aerosol layers.
Figure 8 directly compares the denoised and standard detected layers (bottom panel), where there is a large number of bins that were detected as within layers in the true and denoised data (neon green) but not the standard L2 processing.
The confusion matrices for the 18 August 2015 case and for all cases in
Table 2 are shown in
Figure 9 and
Figure 10, respectively, with the standard averaging versus the true data (A), denoised versus the true data (B), and standard averaging versus denoised (C). The confusion matrix for the 18 August 2015 case provides quantitative confirmation of the visual assessment from
Figure 7 and
Figure 8. These results are borne out across all the simulated data cases, as shown in
Figure 10, with a few key results:
The minimum detectable backscatter (MDB) for the denoised data was inferred from these detections to be around
km
−1sr
−1 above 10 km (cirrus zone). It was computed by taking the 10th percentile of all true positive values above 10 km in the 18 August 2015 case. This daytime MDB after denoising is lower than the nighttime CALIOP 1064 nm value of
km
−1sr
−1 [
40]. Confidence in this assessment comes from three facts:
The high level of simulated solar noise;
The estimate is made at 2 km horizontal resolution vs. 5 km for CALIOP;
There were no clouds above 15 km, whereas CALIOP’s value was estimated at 15 km.
The DDUNet’s ability to recover features is a function of both SNR and physical size. This is nicely illustrated in
Supplemental Figure S13 where a thin (both optically and geometrically) stratospheric aerosol layer at 15 km altitude has similar ATB values to the geometrically thick aerosol layer near the surface. The near-surface aerosol feature is recovered, while the stratospheric aerosol layer is not. The sensitivity of DDUNet to SNR and physical size will be explored in a future paper.
3.3. Cloud Top Height Accuracy
Cloud top height (CTH) is a critical variable provided by backscatter lidars, especially for faint clouds and cloud tops below the detection limits of passive sensors and cloud radars. CTH detection for ice clouds was performed using both the standard averaging and deep learning denoising data and compared to the truth dataset using error statistics. In general, clouds with a base height above 6 km are classified as ice clouds (validated using the CPL cloud phase data product) for the 18 August 2015, 23 February 2020, 13 February 2022, and 5 January 2023 cases that had such clouds (transparent and opaque in nature) present (
Table 1). In the case of 13 February 2022, a cloud top height above 7.1 km is chosen empirically as a threshold separating ice clouds from some mixed-phase clouds (identified based on the CPL 1064 nm depolarization ratios) at lower altitudes.
Comparisons of CTH are made between vertical profiles in which both the “analysis” and truth datasets detected a layer. Analysis datasets refer to the detected layers in the 2 km averaged, 60 km averaged, and 2 km denoised data. CTH is defined here as the altitude of the highest bin within a vertically contiguous cloud layer for each profile. CTH difference is the subtraction of the truth height from the analysis height. A complication to these comparisons is the scenario in which the analysis and truth datasets do not have the same number of cloud layers detected. To reduce the occurrence of comparing two distinctly different cloud layers between datasets, CTH differences over 1000 m are analyzed further to determine if they fall into this scenario. If only one layer is detected in the analysis profile but two are detected in the truth profile (or vice versa), the truth layer resulting in the smallest absolute CTH difference is chosen for comparison. When both profiles include two or more detected layers, all combinations are compared to select the pair resulting in the smallest absolute CTH difference.
Figure 11 shows the CTH comparison between the analysis data on the y-axis and the truth data on the x-axis (top row). The number of data points and correlation coefficient are reported. There are over twice as many data points for the 60 km averaged (top middle) and 2 km denoised (top right) datasets as compared to the 2 km averaged data (top left). However, many CTHs from the 60 km averaged dataset are duplicated and reported at 2 km resolution, whereas the denoised dataset provides unique CTHs at a true 2 km resolution. Correlation with the truth dataset improves from the 2 km averaged to 60 km averaged to denoised dataset with correlation coefficients of 0.94, 0.98, and 0.99, respectively.
Figure 11 also shows histograms of the frequency of CTH differences between the analysis and truth datasets in meters (bottom row). The mean CTH difference, mean absolute error (MAE), and root mean squared error (RMSE) are reported for each analysis dataset. The 2 km averaged data have a low bias of nearly 275 m with an MAE of 278 m and an RMSE of 531 m (bottom left). This low bias and high MAE/RMSE are likely the result of fainter cloud tops that were not detected in the 2 km averaged dataset, as shown in
Figure 8 (middle panel, teal colors near cloud top). At the 60 km averaged resolution (
Figure 11, bottom middle), the bias is improved to +43 m with a lower MAE (234 m) and RMSE (368 m). These CTH differences at 2 and 60 km averaging are similar to those reported in Thorsen et al. (2011) [
41] for CALIOP 5–80 km ice cloud detections, which were within 510 m of the mean ice cloud top heights at the Manus and Nauru Atmospheric Radiation Measurement (ARM) sites. The best performance of CTH detection is using the denoised dataset (bottom right), which resulted in a mean difference of 22 m, MAE of 170 m, and RMSE of 285 m. These CTH errors are closer to CALIOP 5–80 km CTH errors for a nighttime opaque ice cloud case reported by Yorks et al. (2011) [
42], further emphasizing the robust performance of the denoised dataset for CTH retrievals.
4. Layer Detection Using Real CATS Daytime Data
All CATS August 2015 daytime data were denoised, and layer detection was performed to quantify the potential improvement in layer detection when applying this denoising technique to current spaceborne lidar datasets. The raw CATS photon counts were first averaged every five profiles, which corresponds to a horizontal resolution of 1.8 km. This was experimentally determined to be the minimum amount of averaging required to allow the DDUNet model to work in the noisiest CATS daytime scenes. It was observed that when the SNR was too low, besides not removing the noise, the model would return the output virtually unmodified. This is likely due to the solar background counts being limited to 160 in the training data (
Table 1). Once denoised, the 1.8 km resolution photon counts were then converted to ATB using the CATS operational Level 1 processing code [
2,
17,
38]. With the Rayleigh signal effectively removed by the DDUNet model, pre-computed calibration coefficients from the original data were used.
The denoised detected layers were compared against the layers identified in the operational CATS Level 2 (L2) data product, which are hereafter referred to as “L2 Layers”. Unlike the simulated detection comparison (
Section 3.2 and
Section 3.3), the denoised layer detection was not run on the same code as the CATS L2 to avoid complications running the real CATS L2 processing at the higher 1.8 km resolution. Instead, a simplified Rayleigh threshold profile method was run on the denoised ATB without the refined gap-filling, feature size constraints, and false positive rejection analysis of the CATS detection algorithm [
25,
38]. The L2 layers were detected using both 5 km and 60 km averaged ATB but are reported at 5 km resolution. In
Figure 12, curtain plots of the CATS 1064 nm ATB at various averaging levels compared with the 1.8 km denoised data for a PBL-focused daytime scene on 04 August 2015 highlight the detail preserved by denoising and the detail lost by averaging. The L2 layers were upsampled to the denoised 1.8 km resolution to compare. L2 bins were copied to all 1.8 km profiles whose timestamps were within the bounds of the L2 profile timespan (given by the “profile_UTC_Time” variable in the CATS L2 product). The L2 and denoised layers were vertically aligned because their ATB inputs were both processed by the same L1 code.
Figure 13 shows a case over the western United States with ample, lofted aerosol. The L2 detection of this aerosol requires 60 km averaging, and even then, the detection is sparse (panel C). In addition to being 33 times the resolution, the detection using the denoised data is more comprehensive and accurate. The fact that the detected aerosol region in the denoised data (purple in panel C) generally encompasses the L2 detections suggests the DDUNet model is recovering real signal shape and magnitude. Looking at panel D, the L2 detections (red) appear unnatural and blocky within the clouds between 23:03 and 23:04 UTC. The small, detected features along the surface in the denoised layers between ~23:04–23:05 UTC are likely due to ground contamination. The denoised detection relied on adding a fixed margin to the DEM to avoid surface detections, whereas the CATS L2 surface filter was more complex, using a DEM-constrained analysis of the lidar signal similar to Vaughan et al. (2005) [
43]. It is noteworthy that the edges of strong features, like the surface or small liquid water clouds, are well preserved by DDUNet. This can be observed in all example cases shown (
Figure 13,
Figure 14 and
Figure 15) and is well illustrated in the zoomed-in boundary layer plot in
Figure 12.
Figure 14 features a prominent Saharan dust plume through the middle of the image. Between 15:02 and 15:03 UTC, both detections capture much of the plume, and the top boundary matches well. However, outside of this segment, the denoised detection significantly outperforms L2, assuming the denoised signal is representative of the truth. Blocky, 60 km layers once again stand out like artifacts on the high-resolution grid. Again, this is more pronounced in the clouds. Interestingly, vertical streaks of apparent aerosol signal stand out, which are located from the left edge of the denoised ATB (panel B) to about 15:01 UTC. These are indeed detected as layers (panel D) but are missed by L2 (panel C). Vertical signal streaks like this are common when there is a cloud layer of varying optical depth, causing the laser beam to go in and out of attenuation. This phenomenon is easy to find in CATS night data but is not visually apparent in daytime data with this level of solar noise.
A final case highlighting smoke aerosols at various altitudes and strengths throughout the troposphere is shown in
Figure 15. All the aerosol in this scene is either marine or smoke. All of the smoke originated in western North America; however, the tenuous lofted layer on the left side of the image between 12.5 and 15 km is likely lofted smoke from wildfires in Siberia [
44]. This scene does depict some weaknesses of the denoising and simplistic detection. Firstly, there are vertically oriented streaks of false positives in places where there are especially reflective low-level water clouds. The SNR for these narrow horizontal segments briefly eclipses what the current DDUNet model can handle. Secondly, the tenuous, high-altitude Siberian smoke layer (12.5–15 km) is better captured by the L2 layers. Even so, it is still apparent in both the denoised ATB and layers. With additional effort, the denoised layer detection could be improved. However, overall, the denoised detection is superior (again, under the assumption that the denoised signal is a faithful representation of the truth). The lofted smoke plume straddling 7.5 km in the center of the image is much fuller in the denoised detection. One last thing to note is that the highest altitude feature at 00:15 UTC does not appear in L2.
The layer detection statistics for the entire month of August 2015 compared to the CATS L2 layer product are shown in
Table 4. There are several important conclusions to draw from
Table 4:
The denoised data enabled the detection of 85% of CATS L2 within-layer bins (i.e., bins where a layer was detected) at the finer horizontal resolution of 2 km, which is an improvement in resolution by a factor of 2.7–33.
There are many bins that were identified as being within a layer in the CATS L2 data products but not in the denoised layer detections. This is especially true for layers uniquely detected at 60 km in the CATS data products, where ~25% of bins were not in layers for the denoised data. Most of these are likely due to 60 km detections being laid upon a 1.8 km horizontal grid, causing the smearing effect. Examples of this are evident in
Figure 13,
Figure 14 and
Figure 15 (red boxes sticking off the finer resolution edges of features).
There are times when the denoising model cannot resolve the signal associated with faint, small-scale features. This may explain some of the 25% of CATS L2 layers detected uniquely at 60 km that are not detected using the denoised data. Examples of such scenarios are the faint atmospheric layer above 15 km in
Figure 8 (top left) and above 12.5 km in
Figure 15.
Using the denoised data for layer detection significantly increased the number of bins detected as within-layer. A total of
bins were detected as within-layers using the denoised data, while only
bins were detected as within-layers in the CATS L2 data products. This equates to a factor of 2.33 more bins detected. This is evident in
Figure 13C and
Figure 14C, where bins with weaker backscatter intensities within faint aerosol layers are uniquely detected using the denoised data (purple).
Based on
Table 4 and
Figure 13,
Figure 14 and
Figure 15, there is strong evidence that performing layer detection on the denoised ATB is yielding more, higher-quality layers at a much finer horizontal resolution compared to the CATS L2 data products. Additionally, the model could be further optimized to improve the denoising or performed at multiple horizontal averaging scales, such as 1.8 and 5 km, to help address the issue outlined in list item #3 in the above paragraph and improve the layer detection of faint atmospheric layers that were missed in
Figure 8,
Figure 9 and
Figure 10.
5. Conclusions
Deep learning-based denoising algorithms, such as the DDUNet technique presented in this paper, offer a significant advancement for improving spaceborne lidar datasets compared to traditional approaches to date. When applied to simulated CATS 1064 nm daytime data, the DDUNet denoising algorithm increased the daytime SNR by a factor of 2.5 compared to the baseline daytime data. This led to a factor of 2.33 more bins detected at a 1.8 km horizontal resolution for all of the August 2015 daytime CATS data when compared with the layers in the L2 product. Additionally, the denoised data enabled the detection of CATS within-layer bins at the finer horizontal resolution of 1.8 km, which is an improvement in horizontal resolution by a factor of 2.7 to 33 compared to traditional CATS daytime detection. This finer resolution reduced the number of false positives by 60% by limiting the layer smearing that occurs when averaging the data to 60 or 80 km based on simulated “truth”. However, this DDUNet denoising technique is not without limitations. Compared to the true bins within atmospheric layers in the simulated data, the layer detection algorithm applied to the DDUNet denoised data missed ~39% of bins within layers, especially small-scale or faint cloud and aerosol layers with weaker backscatter intensities. There are several ways to overcome these limitations when applying this technique to current or previous spaceborne lidar datasets, such as using a multi-tiered layer detection algorithm that includes DDUNet denoising, convolutional neural network approaches [
25], or traditional averaging techniques [
45]. The DDUNet model used in this paper could also be further optimized by experimenting with different loss functions [
46], hyperparameters, noise levels, and using more data. CATS night data, which were used as the “clean” images for training, have a non-negligible level of noise (
Figure 12A). Denoising the night data before training might facilitate better learning. This could be accomplished by a traditional filtering technique (like those in Yorks et al., 2021 [
25]), the Poisson signal likelihood optimization technique outlined in Marais et al. (2016) [
26], or perhaps even the DDUNet model used in this study.
The more accurate detection of clouds and aerosols in the atmosphere at finer resolutions has significant benefits to climate science as well as weather and air quality applications. The daytime SNR issues with CATS, CALIOP, and ICESat-2 data present challenges for quantifying aerosol and cloud impacts on climate [
19,
20]. The DDUNet denoising decreased the MDB of the simulated lidar data to ~
km
−1sr
−1, which is lower than the CALIOP 1064 nm night MDB value of
km
−1sr
−1 [
2,
40], providing nighttime quality data during daytime viewing conditions. This has positive ramifications for daytime atmospheric feature detection, which traditionally has presented challenges for quantifying boundaries of cirrus clouds [
47], aerosol optical depth [
48], and consequently direct aerosol radiative effects [
10], often limiting analysis using spaceborne lidar to nighttime-only. Our analysis shows that the application of the DDUNet denoising algorithm vastly improves daytime feature detection and creates the potential to use daytime observations on equal footing as nighttime observations for scientific analysis. It should be noted that DDUNet does not eliminate the need for traditional normalized relative backscatter (NRB), since DDUNet removes the molecular signal, which is needed for calibration. A future processing stream would include both denoised and standard NRB. Applying the DDUNet algorithm described in this paper to CALIOP data may not yield the same performance, given the model assumes Poisson distributed random noise associated with photon counting lidar systems, and CALIOP is an analog detection system that has more complex noise characteristics. The DDUNet denoising also likely increases the CATS and ICESat-2 retrieval accuracy of variables such as particulate backscatter, linear depolarization ratio, and particulate extinction, through a decrease in random error and artifacts of averaging. While retrieval accuracy is outside the scope of this paper, the authors plan to address this topic in a future paper.
Looking into the future, this DDUNet denoising algorithm could enable smaller and more affordable spaceborne lidar systems. To overcome the lower daytime SNR due to the solar background signal, previous spaceborne lidar missions increased the size, weight, and power (SWaP) via laser power, telescope size, and receiver complexity. Lidar laser specifications such as repetition rate and pulse energy drive the power consumption and thermal requirements and, consequently, the mass of the instrument. The telescope area drives both the mass and volume of the instrument. Thus, recent lidar instruments such as CATS (494 kg) and ATLAS (300 kg) are too large to fit affordable SmallSat buses and rideshare launch opportunities. ATLAS required a large spacecraft bus, so the ICESat-2 observatory [
49] weighs 1390 kg, and the total mission cost about
$1B. SmallSat lidar designs already exist that reduce the SWaP and cost compared to CATS and ATLAS [
1], and the DDUNet denoising technique can further reduce the SWaP to roughly 20–25% of the ATLAS and CATS specifications while providing similar data products with better sensitivity to atmospheric features.