1. Introduction
With the development of holography [
1,
2,
3], there remains a vital requirement: separating the original object image from the twin image and autocorrelation items in holography reconstruction. A laser with high coherence benefits the recording of off-axis holography and then boosts its application in practice [
4,
5,
6]. Off-axis holography has the intelligence for original, high-quality image recovery but only by sacrificing the system bandwidth product. Because of the low resolution of the current CCD recording device, digital holography imposes strict restrictions on the reference tilt [
6]. Phase-shifting digital holography (PSDH) solves the problem of recovering the original object wave by calculating the complex amplitude of the object wave on the recording plane, with two or more holograms with different reference phases, and the phase-shift values between two adjacent frames generated during the recording process [
7]. In particular, the precision of the phase-shift value is crucial for high-quality reconstruction. However, due to the influence of environmental interference and the precision of the phase shifter, there is an error between the phase-shift value set before recording and the actual value shifted in the reference beam. In addition, the reference tilt, in practice, also incurs the error phase in the reconstructed complex wave front. To obtain the actual phase-shift values, many generalized phase-shifting digital holography (GPSDH) algorithms have been proposed [
8,
9,
10]. One way to accomplish this task is to obtain unknown phase shifts either iteratively or non-iteratively based on statistical averages [
8,
9], reconstruct the object wave on the recording plane, and finally obtain the original object wave front on the original objective plane by inverse Fresnel diffraction. This GPSDH method can ensure the high quality of the reconstructed image and the low requirement on the CCD resolution at the same time, but the storing of multiple holograms and the large data set imposes a heavy burden on the computer storage and processor, which is adverse to the dynamic display of stereoscopic video. Another kind of method is to employ a week off-axis recording setup [
9]. By performing Fourier transform operations on the holograms, the unknown phase shift is acquired, the wave front is restored, and the object image is rebuilt. Although this weak reference tilt method can reduce the hologram number to only the extreme situation of two and no other measurement is needed in the phase-shift extraction, the recovered wave requires further steps to remove the phase error caused by reference tilt, which encumbers the whole recovery process, especially in the application of dynamic holographic display.
Additionally, these algorithms have obvious drawbacks, such as dependence on the high acquisition quality and accurate physical constraints of multiple holograms and requiring longer computation time caused by iterations.
Here, one robust holographic reconstruction by deep learning (RHRDL) method is proposed, which can quickly complete the holography reconstruction with a single coaxial hologram or one frame with a weak reference tilt. This method can avoid not only the complex calculation of phase-shift extraction and object wave reconstruction in traditional PSDH [
8] but also the wave correction of the phase in an algorithm with a weak reference tilt [
9]. The availability and efficiency of this method were tested by applying it to optical experiments.
2. Establishment of Data Set and Model for Network Training
The DL process is essentially a process of training the neural network structure using datasets to obtain a general fitting function [
11,
12,
13,
14,
15]. The training image is input to the neural network, the output of the network is obtained by forward transmission, the loss (difference) between the output image and the real image is compared, and the loss function is transmitted back to the neural network [
16]. During training, the loss function gradient is used to guide the optimization direction of the neural network and update the parameters of the neural network, as shown in
Figure 1. The general logic of DL is to repeat this process until an optimal or local optimal solution is obtained [
17]. Because of the excellent performance of DL, it is more frequently employed for optical information processing, including rapid generation of computer-generated holograms (CGH) and digital holography reconstruction [
18,
19].
The quality of the dataset determines the upper limit of the DL ability of the network. High-quality, low-repetition, and complex datasets often obtain higher-quality training parameters [
20,
21,
22,
23,
24]. Therefore, a high-quality two-step PSDH algorithm is used to reconstruct the object wave [
9]. The interferometric configuration shown in
Figure 2 is used to collect the hologram. The target object is a resolution plate, and one laser beam with wavelength 532 nm (emitted from a semi-conductor laser (MSL-FN-532, produced by CNI from Changchun, China) is divided into two beams through a beam splitter (
BS1). A uniform plane wave is obtained after passing through a micro-objective, a pin hole, and a collimating lens (a convex lens with focal length of 15 cm). The reference wave is reflected by
BS1 and a mirror mounted on a Piezoelectric Transducer (
PZT), respectively, and the phase shifts are generated by
PZT. The object wave that carries the object information after diffraction overlaps with the reference wave that carries the phase information of the reference wave, forming a hologram, which is collected by a CCD ( DH-SV1410FM, produced by IMAVISION from Beijing, China).
For the collected holograms, the spectral analysis method is used to extract the phase-shift values between the two holograms and reconstruct the object wave. The two holograms recorded with the reference tilt phase
φxy are represented by Equations (1) and (2):
In the two equations,
Ao,
Ar are the amplitudes of object wave and reference wave, respectively, the first two terms on the right sides are intensities of the objective wave and the reference wave, respectively,
φo,
φxy are the phases of object wave and reference wave, and
δ is the phase difference of reference between two holograms. In fact, the additional reference phase
φxy is contributed by the angles between the reference and objective waves represented by
α and
β along
x and
y axes, respectively. We have the formula of the additional reference phase
φxy with the parameters of the reference tilt angles
α and
β
where λ is the wavelength of the reference wave with a small tilt employed during the procedure of hologram recording. To separate the phase-shift parameter
δ more easily, We can further rewrite the above Equations (1) and (2) into
Equations (4) and (5) are operated by Fourier transforms, and their distributions in the frequency domain can be expressed:
Here F1(u, v) and F2(u, v) come from the Fourier transforms of I1 and I2 in Equations (4) and (5), respectively, FA(u, v) is the Fourier transform results of objective intensities, Fo(u + up, v + vp) is the spatial spectrum of the objective wave Aoexp(−iφo) in Equations (4) and (5), and δ(u, v) is a delta function of the spectrum distribution from the unit value of one. In Equation (7), there are exp(iδ) and exp(−iδ) in the last two terms of the right sides and we know that parameter δ is the phase-shift value between the reference phases corresponding to the two holograms. Because δ is a constant for all the pixels on holograms, it can be calculated by the subtraction operation on the argument angle of the complex term on any one of the last two in Equations (6) and (7).
This phase shift value extraction method is simple and efficient, and it can work without using any other measurements. Furthermore, this phase shift extraction algorithm is suitable for PSDH techniques with only two frames, which is the extreme situation for PSDH. This phase shift calculation method is also fit for cases with three [
9] or more than three frames if the tilt reference wave is introduced. In most conventional phase shift extraction methods, the phase-shift values are calculated with either three or more than three frames and, in some algorithms that have been designed, even iterations are rescued. These methods need more consumption time for complex calculations, especially for the latter one. The conventional two step method for phase-shift extraction reported recently [
8] is time saving, but it needs the measurements of the object and reference intensities. Compared with these conventional methods, this novel phase shift calculation method is convenient because it needs only two Fourier transforms on the holograms and one subtraction operation. In the following, the whole specific procedure of the phase-shift extraction and object wave reconstruction are described briefly.
In Equations (6) and (7), the first two terms on the right side of the equations are located at the center of the spectrum, and the third and fourth terms are the origin-symmetric spectra. By a similar algorithm [
9], reference tilt angles
α and
β can be obtained by determining
up,
vp from the coordinates of the complex amplitude of the spectrum at the (
u +
up,
v +
vp) position,
where
M,
N,
dx,
dy are the pixel numbers and pitches on the chip of the CCD device, respectively. The phase-shift value
δ can be obtained after subtraction
Among them, arg[·] represents the operation of taking the argument angle. Finally, the object wave on the recording surface is reconstructed by the formula
where
Io and
Ir are the intensities of the objective wave and reference wave. Obviously, the object wave in Equation (10) contains the tilt phase error
φxy. This tilt phase error causes the phase error of the reconstructed object wave on the recording plane and then the corresponding phase error of the object wave on the image plane after the inverse Fresnel transform. This phase error caused by reference tilt can be corrected by equation
The original image on the original plane can be obtained by the inverse Fresnel diffraction of O.
One case of the computer simulation results is shown in
Figure 3. The amplitude and phase of the object wave are assumed to be Gaussian and spheroid distributions, respectively, and a plane wave is employed as the reference wave. The interference fringes between the object wave and the reference wave are recorded as hologram
I1 and, after the reference phase is shifted, another hologram
I2 is generated. In
Figure 3e, to make the spectrum easier to observe, the energy of the zero-frequency part has been suppressed. Finally, the object wave on the recording plane is obtained and corrected by Equations (10) and (11), and the original image is obtained after the inverse Fresnel diffraction of
O.
Figure 4 is one experimental example of the two-step PSDH method.
Figure 4a,b are two holograms with reference tilt and phase shift, and
Figure 4c–e are the intensities of the background wave, object wave, and reference wave, respectively. Using the two-step PSDH algorithm, the reference phase difference
δ between the two holograms is calculated and the result is 0.4356 rad.
Figure 4f is the reconstructed image with 1392 × 1040 pixels.
To improve the efficiency of neural network training, hologram images are divided into several parts. Each group of holograms can produce eight sets of datasets, and there are 688 × 512 pixels on every part in the dataset. The holograms are the image input, and the reconstructed image is the label image of the network trained. By repeating all the recording and reconstruction processes, a total of 1500 sets of datasets were produced, which were divided into two groups: 1200 training sets and 300 test sets according to a 4:1 ratio.
In the process of network training, the structure of the neural network becomes more and more complex and the number of hidden layers increases, which hinders the improvement of the network update speed and leads to difficulties in training. In this regard, the idea of batch normalization from Sergry Loffe et al. [
25] is introduced by taking normalization as a part of the model architecture and implementing normalization for each training small batch to speed up the training speed. Batch normalization processing can use a higher learning rate, and there is no need to be concerned about the initialization. At the same time, it can also act as a regularizer to reduce the requirements for training data. On the other hand, network depth is directly related to network performance and potential. But with the increase in network depth, there are inevitable problems: gradient disappearance and gradient explosion. Even the training effect of a too deep network structure is not as good as that of a slightly shallow network, which hinders the convergence of results. The proposal of a residual network solves this problem and stimulates the potential of the network structure by establishing residual mapping for the network structure [
26].
The loss function in the model is the mean square error of the reconstructed images (MSE Loss). Our experiments show that, as the loss function, the mean square error has a faster optimal convergence speed for hologram reconstruction. If the input image is represented as
x1 and the real (ground truth) image is
y1, both of which are images with
M ×
N pixels in size, and each pixel of the image is expressed as
amn,
bmn(0 <
m <
M, 0 <
m <
N), then the loss between the two images can be expressed as:
In addition, the ReLu function is selected as the activation function and the Adam optimizer is chosen as the model optimizer [
27]. Although some works have reported that the optimization algorithm of the adaptive learning rate may be not as good as the random gradient descent algorithm in the final result, the optimization algorithm of the adaptive learning rate obviously has a faster optimization speed, and the convergence effect can be improved by controlling the Adam learning rate [
28].
The U-net model structure used here is shown in
Figure 5, which mainly includes down-sampling and up-sampling parts. The residual mapping structure was established symmetrically before up/down-sampling. The purple arrow represents the operation, including a combination of two two-dimensional convolutions, batch normalization(BN), and ReLu activation functions. The green arrow denotes the operations of adding a max pooling layer on the basis of the purple one to achieve the purpose of down-sampling. The red arrow is the operation to add a two-dimensional deconvolution (or transposed convolution) layer on the basis of the purple arrow to achieve the purpose of up-sampling [
29,
30,
31]. Both the input and output of the model are 1 × 512 × 688 pixels in size and they contain 31,042,369 parameters, which require 2170.16 M storage space.
3. Deep Learning Network Training Results
The established data set was used for training the parameters of the network structure. At the beginning of the training, in order to pursue faster convergence speed, we set the learning rate to Lr = 0.01 and trained it for 50 cycles, and then a small parameter of Lr = 0.001 was used for the next 50 cycles.
Figure 6 shows the curve of the loss distribution, in which the black curve corresponds to the loss between the input image and the output image of the training set, and the red curve corresponds to the loss between the input image and the output image of the test set. It can be observed that the training result after 50 cycles gradually reaches the optimal solution (or local optimal solution), so we stopped the training process after 100 training cycles.
In order to test the reconstruction ability of the network after training, the reconstruction results by two-step PSDH are compared with those based on the RHRDL model in
Figure 7, where
Figure 7a,d,g are the collected holograms,
Figure 7b,e,h are the reconstructed images by the corresponding PSDH algorithm, and
Figure 7c,f,i are the ones by the corresponding RHRDL model reconstruction. Through the comparison among
Figure 7a–c, it can be seen that although the result of RHRDL is not as good as the result of PSDH reconstruction in some aspects (such as that in the yellow circle in
Figure 7b, the image shows higher resolution), the image reconstructed by RHRDL has higher reconstruction accuracy, excellent arc restoration (the arc part of the character ‘5’ in the blue circle in
Figure 7e is well recovered), and good noise resistance (observing the imaging area of the red circle in
Figure 7h, we know that it is a pure white image with little noise). It can be seen that the trained network structure has learned how to reconstruct the hologram with high quality, and the reconstruction ability is worth affirmation.
The pseudo color image can highlight the possible information omission caused by insufficient contrast. Here, the jet pseudo color is used to process the original grayscale image, as shown in
Figure 8.
Figure 8a shows the reconstruction result of PSDH algorithm, and
Figure 8b shows the result of RHRDL. Through comparison, it can be seen that the latter method can suppress the appearance of ghost images. For example, the fuzzy number “2, 3, 4, 5” appears on the left of the number “4, 5” in
Figure 8a. In theory, the weak edge diffraction information outside the recording chip of the CCD should be reconstructed. Due to the limitation of the recording chip, the part on the right of the hologram “turns back” to the left for display.
Another concern about the performance of RHRDL is the running time required for the reconstruction by the DL model. The time-consuming test was carried out on a personal computer with Core i5-5490 CPU and GTX 1050Ti GPU. From the comparison, the same test works are performed on the same computer but with two- and three-step PSDH [
9] methods. All results are shown in
Table 1. The results in the second column are the time needed with only CPU and the third one in the table is that needed with the help of GPU during the process of reconstruction. The digits in the last column are the numbers of the holograms needed for information processing.
It can be seen that, with the help of GPU, the time for the three all decreased significantly, but the RHRDL model shows the best performance in the time reduction. With the GPU assistant, it has the fastest reconstruction speed of 0.335 s, and the number of holograms required for reconstruction is only one frame, which greatly compresses the image acquisition process, making the method satisfactory. In fact, the CUDA framework with high-speed parallel computing ability integrated in a GPU has become a necessary means of the DL method at present. To test the performance of the method, the reconstruction work is also performed on the NVIDIA GeForce RTX 3090, and the consuming time can be decreased to only 0.013 s, which is applicable in the dynamic holography display. Because only one frame is used, this RHRDL method decreases not only the storage burden but also the reconstructing time, showing its robustness.
4. Image Similarity Evaluation and Model Stability Analysis
Besides the efficiency of the proposed method RHRDL, the stability of the model is also an important indicator to ensure its practical value. To further describe the performance of our U-net network under different conditions and its robustness in object image reconstruction, it is investigated in two aspects: the structural similarity distribution of the test set and the noise resistance of hologram reconstruction [
32,
33].
Mean square error (MSE) and peak signal to noise ratio (PSNR) have been used in function optimization and evaluation because they are easy to use and have clear physical significance [
34,
35,
36]. These two functions do not match the human visual perception. Furthermore, complex loss functions cannot provide effective gradient guidance. To objectively evaluate the difference between the image reconstructed by the RHRDL method and the image reconstructed by the classical algorithm, we use the structural similarity function (SSIM) to evaluate the image quality [
35]. SSIM provides a more effective objective evaluation through the parameters of image brightness, contrast, and structure.
The SSIM is generally expressed as
where
x,
y is the normalized image of the two images to be obtained, and
μx,
μy is the mean value of the two images, then:
N is the number of pixels of the image.
σx,
σy is the standard deviation
σxy is the covariance of
x and
y images
c1 = (
k1L)
2,
c2 = (
k2L)
2 are two constants, where
k1 = 0.01,
k2 = 0.03, and
L = 2
B − 1 (
B is a binary digits) and
for 8-bit binary images. The value of SSIM should distribute in the scope from 0 to 1. The closer the value is to 1, the higher the similarity is. The structural similarity of two identical images is 1.
Figure 9 shows the deep learning reconstruction image and label image structure similarity function curve described by the structural similarity function. It can be seen that the structural similarity of the test set is higher than 0.998.
In the process of the hologram acquisition, the interferogram has high sensitivity. When the experimental platform has slight vibration or strong air disturbance, it will cause a change in the hologram style, resulting in the reduction in hologram contrast and causing the image to blur.
Here, Gaussian blur is used to simulate the insufficient resolution of holograms caused by various reasons. Holograms with certain blur are obtained by adjusting the size and variance of the Gaussian filter, as shown in
Figure 10.
Figure 10a is the original hologram,
Figure 10b is the result of filtering using a Gaussian filter with a size of 5 × 5 pixels and a variance of 2,
Figure 10c is the result of using a Gaussian filter with a size of 10 × 10 pixels and a variance of 2, and
Figure 10d is the result of using a Gaussian filter with a size of 10 × 10 pixels and a variance of 4.
Figure 10d shows obvious information loss and sharpness reduction compared with
Figure 10a.
The four holograms in
Figure 10 are reconstructed using the RHRDL method and the results are shown in
Figure 11. It can be seen that the blur of the hologram does not cause a decline in reconstruction quality, but only a small decrease in brightness. The mean square errors between the four reconstructed images and the standard label images are 0.016%, 0.030%, 0.030%, and 0.046%, respectively. Therefore, the network structure has strong noise resistance.