**1. Introduction**

Integral imaging (II) is one of the passive three-dimensional (3D) imaging techniques invented by Gabriel Lippmann in 1908 [1] and has received wide attention, as the applications of II span several research problems in optical engineering research areas [2–4]. For instance, these include biomedicine, security, autonomous vehicles, and remote sensing, to name a few [5].

Advanced machine learning (ML) and deep learning (DL) algorithms have been shown to produce superior results in computer-vision-based applications. Thereafter, such approaches have also been extended to solve several problems in various other scientific research areas. In particular, the DL framework has been proven as an important tool to make automatic decisions, as it solves numerous image-based problems without much human intervention. Convolution Neural Networks (CNN) are a widely used DL algorithm for several problems such as image classification [6], autonomous driving [7], etc. Furthermore, a CNN framework for 3D face recognition and classification in a photonstarved environment has also been demonstrated [2,8].

## **2. Integral Imaging**

Integral imaging (II) captures a 3D scene in the form of two-dimensional (2D) elemental images (EIs) in addition to the directional information (i.e., angle of propagation). Notably, 3D scene reconstruction can be achieved in two ways: (i) optical methods and (ii) computational methods [9]. In computational integral imaging (CII), a geometric ray back-propagation method is employed which magnifies and superimposes the EIs onto each other to reconstruct 3D sectional images [10]. Consequently, the objects or 3D points which are located at the corresponding depth position in an imaging plane are properly

**Citation:** Dodda, V.C.; Muniraj, I. Roles of Deep Learning in Optical Imaging. *Eng. Proc.* **2023**, *34*, 6. https://doi.org/10.3390/ HMAM2-14123

Academic Editor: Vijayakumar Anand

Published: 6 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

overlapped and in focus, while the other points at different depth locations do not overlap properly and hence appear off-focus or defocused. The defocused points in the 3D sectional image do not convey any valuable information and are therefore redundant. Recently, we have demonstrated a way to manually identify and remove the off-focus points from a 3D sectional image [11]. Furthermore, under some special imaging scenarios (e.g., biomedical imaging and night vision), low light levels or photon-starved illumination conditions may be encountered. In such cases, since image capturing happens in much darker conditions, the recorded image looks degraded due to the presence of noise [8,10]. Nevertheless, this system has been shown to provide a better 3D reconstruction in terms of the PSNR even with fewer photons, e.g., 100 photons [10].

#### *2.1. Denoising*

For image denoising, various methods have been proposed in the literature such as prediction filtering, transformation-based methods, rank reduction methods, and dictionary learning methods, to name a few. In addition to these, DL algorithms have also been applied to the image denoising problem [12]. In this regard, there are two methods that are commonly followed to train the DL network: (i) supervised and (ii) unsupervised. First, we discuss supervised learning, where an under-complete autoencoder is used to denoise the noisy 3D integral (sectional) images with a patch-based approach. In this process, the noisy input 3D sectional image is divided into multiple patches, which are then used to train the neural network in a supervised manner (we use clean data as labels). We note that by using the patch-based approach, the time required to prepare the labeled training data is greatly reduced. Then, after denoising, the acquired denoised patches can be combined via an unpatching process. Figure 1 depicts the supervised denoising technique used on our dataset [13]. To train the network, 20 epochs were employed with a learning rate of 0.001.

**Figure 1.** Denoised results for supervised learning.

Figure 1c shows the denoised 3D sectional image. We analyzed the performance of the proposed method quantitatively in terms of the peak signal-to-noise ratio (PSNR). For instance, the PSNR value given in Figure 1c is an estimation from Figure 1a,c. It is evident from Figure 1c that the proposed denoising method has a better performance in terms of the PSNR. Second, we proposed an unsupervised learning method for 3D image denoising. In this study, we opted for a U-Net architecture [8]. This is an end-to-end, fully unsupervised denoising approach where the noisy photons in the 3D sectional image are fed as an input to the network. The major components in the U-Net are encoder and decoder blocks with skip connection layers [14–16]. In addition to this, skip blocks (SB) were added to the skip connection strategy in the U-Net architecture to avoid the vanishing gradients problem. In the training process, the 3D input image is given in the form of patches to the network. The patched input image is converted to a 1D vector and fed as an input to the network. After removing the noise, we unpatch the 1D vector and convert it back to the size of the input data. In our experiments, to test the performance of the proposed method, we used two 3D objects: a tri-colored ball known as Object 1 in Figure 2a and a toy bird referred to as Object 2 in Figure 2(a1,a2). Figure 2(b1,b2) are obtained after the TV denoising method. The proposed method results are given in Figure 2(c1,c2). Notably, we used 20% of the PCSI patches for validation and 60% of the patches for training purposes. In this work, 15 epochs were used with a learning rate of 0.001 to train the network. The PSNR values are shown in Figure 2.

**Figure 2.** Denoising results: (**a1**,**b1**,**c1**) represent the noisy photon-counted 3D sectional image, the TV denoised image, and the result of our proposed denoising method when object 1 is in focus, respectively, and (**a2**,**b2**,**c2**) represent the noisy photon-counted 3D sectional image, the TV denoised image, and the result of our proposed denoising method when object 2 is in focus, respectively.

### *2.2. Off-Focus Removal*

Several studies have been conducted to demonstrate the feasibility of combining photon detection imaging or photon counting imaging (PCI) techniques with conventional 3D integral imaging systems, known as photon counted integral imaging (PCII) [2,9–11,17]. In such systems, it is known that the reconstructed depth images contain both the focused and off-focus (or out-of-focus) voxels simultaneously (see for instance Figure 3). Off-focus pixels often look blurred and therefore do not convey clear information about the scene. Several approaches have been proposed to efficiently remove the off-focus points from the reconstructed 3D images [4,11]. We note that the existing approaches are subjective as they involve manual calculation of algorithm parameters such as variance, threshold, etc., which is time consuming.

**Figure 3.** Reconstructed 3D CII sectional images at various depth locations.

Here, we propose a new ensemble Dense Neural Network (DNN) model that is composed of six different DNN models, each trained with its own set of training datasets for removing off-focus points from 3D sectional images. It is known that data pre-processing enhances the accuracy of the network; therefore, we used the Otsu thresholding algorithm [18] to remove the unwanted (and obvious) background from the 3D sectional images. In this work, we employed an ADAM optimizer to update the weights and bias [13], and the

standard mean squared error (MSE) was used as the cost function in our training process. Notably, the proposed ensemble deep neural network was trained (supervised way) using the conventional 3D sectional images from various depth locations and the corresponding focused images (labels). We tested the method on a 3D scene that contains two toy cars and one toy helicopter (see Figure 4) [13]. We used an Intel® Xeon® Silver 4216 CPU @2.10 GHz (two processors) with 256 GB RAM and a 64-bit operating system to simulate all the scenarios.

**Figure 4.** Reconstructed focus only CII sectional images by using the proposed DL network.

#### **3. Conclusions**

In summary, we demonstrated that it is possible to use deep learning networks to solve some of the inherent problems of 3D optical imaging systems. For instance, we have tackled two important problems that exist in 3D integral imaging systems, i.e., denoising and off-focus removal, using two different datasets. For our study, it is evident that DL can be used to solve problems that are too complex to carry out manually. It is therefore expected that we will further expand our analysis to various other imaging modalities such as holography, microscopy, etc.

**Author Contributions:** V.C.D. and I.M. contributed equally to manuscript preparation. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Department of Science and Technology (DST) under the Science and Engineering Research Board (SERB) grant number SRG/2021/001464.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data for this paper is not publicly available but shall be provided upon reasonable request to the corresponding author.

**Acknowledgments:** Authors thank Suchit Patel of Poornima College of Engineering, India, for lending his support in the simulations and we sincerely thank Bahram Javidi of the University of Connecticut and Moon Inkyu of DGIST, Korea, for providing the dataset.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
