**Hoon-Seok Jang 1,**†**, Mannan Saeed Muhammad 2,**†**, Guhnoo Yun <sup>3</sup> and Dong Hwan Kim 3,\***


Received: 10 July 2019; Accepted: 7 August 2019; Published: 9 August 2019

**Abstract:** Recovering three-dimensional (3D) shape of an object from two-dimensional (2D) information is one of the major domains of computer vision applications. Shape from Focus (SFF) is a passive optical technique that reconstructs 3D shape of an object using 2D images with different focus settings. When a 2D image sequence is obtained with constant step size in SFF, mechanical vibrations, referred as jitter noise, occur in each step. Since the jitter noise changes the focus values of 2D images, it causes erroneous recovery of 3D shape. In this paper, a new filtering method for estimating optimal image positions is proposed. First, jitter noise is modeled as Gaussian or speckle function, secondly, the focus curves acquired by one of the focus measure operators are modeled as a quadratic function for application of the filter. Finally, Kalman filter as the proposed method is designed and applied for removing jitter noise. The proposed method is experimented by using image sequences of synthetic and real objects. The performance is evaluated through various metrics to show the effectiveness of the proposed method in terms of reconstruction accuracy and computational complexity. Root Mean Square Error (RMSE), correlation, Peak Signal-to-Noise Ratio (PSNR), and computational time of the proposed method are improved on average by about 48%, 11%, 15%, and 5691%, respectively, compared with conventional filtering methods.

**Keywords:** shape from focus (SFF); jitter noise; focus curve; Kalman filter

#### **1. Introduction**

Inferring three-dimensional (3D) shape of an object from two-dimensional (2D) images is a fundamental problem in computer vision applications. Many 3D shape recovery techniques have been proposed in literature [1–5]. The methods can be categorized into two categories based on the optical reflective model. The first one includes active techniques which use projected light rays. The second category consists of passive techniques which utilize reflected light rays without projection. The passive methods can further be classified into Shape from *X*, where *X* denotes the cue used to reconstruct the 3D shape as *Stereo* [6], *Texture* [7], *Motion* [8], *Defocus* [9], and *Focus* [10]. Shape from Focus (SFF) is a passive optical method that utilizes a series of 2D images with different focus levels for estimating 3D information of an object [11]. For SFF, a focus measure is applied to each pixel of the image sequence, to evaluate the focus quantity at every point. The best focused position is acquired by maximizing the focus measure values along the optical axis.

Many focus measures have been reported in literature [12–16]. Initial depth map, obtained through any of the focus measure operators, has the problem of information loss between consecutive frames due to the discreteness of predetermined sampling step size. To solve this problem, refined depth map is acquired using approximation techniques, as reported in literature [17–22]. As an important issue of SFF, when images are obtained by translating the object plane with constant step size, mechanical vibration, referred to as jitter noise, occurs in each step, as shown in Figure 1 [21].

**Figure 1.** Image acquisition for Shape from Focus.

Since this noise changes the focus values of images by oscillating along the optical axis, accuracy of 3D shape recovery is considerably degraded. Unlike any image noise [23,24], this noise is not detectable by simply observing images.

Many filtering methods for removing the jitter noise have been reported [25–27]. In [25,26], Kalman and Bayes filtering methods for removing Gaussian jitter noise have been proposed, respectively. In [27], a modified Kalman filtering method for removing Lévy noise has been presented.

In this paper, a new filtering method for removing the jitter noise is proposed as an extended version of [25]. First, jitter noise is modeled as Gaussian or speckle function to reflect more types of noise that can occur in SFF. At the second stage, the focus curves acquired by one of the focus measure operators are modeled as Gaussian function for application of the filter and a clearer performance comparison of various filters. Finally, Kalman filter as the proposed method is designed and applied. Kalman filter is a recursive filter that tracks the state of a linear dynamic system containing noise, and is used in many fields such as computer vision, robotics, radar, etc. [28–33]. In many cases, this algorithm is based on measurements made over time. More precise results can be expected than from using only measurements at that moment. As the filter recursively processes input data, including noise, optimal statistical prediction for the current state can be performed. The proposed method is experimented by using image sequences of synthetic and real objects. The performance of the proposed method is analyzed through various metrics to show its effectiveness in terms of reconstruction accuracy and computational complexity. Root Mean Square Error (RMSE), correlation, Peak Signal-to-Noise Ratio (PSNR), and computational time of the proposed method are improved by an average of about 48%, 11%, 15%, and 5691%, respectively, compared with conventional filtering methods. In the remainder of this paper, Section 2 presents the concept of SFF and a summary of previously proposed focus measures as background. Sections 3 and 4 provide the modeling of jitter noise and focus curves, respectively. Section 5 explains the Kalman filter as the proposed method in detail. Experimental results and discussion are presented in Section 6. Finally, Section 7 concludes this paper.

#### **2. Related Work**

#### *2.1. Shape from Focus*

In SFF methods, images with different focus levels (such that some parts are well focused and the rest of the parts are defocused with some blur) are obtained by translating the object plane at a predetermined step size along the optical axis [11]. By applying a focus measure, the best focused frame for each object point is acquired to find the depth of the object with an unknown surface. The distance of the corresponding object point is computed by using the camera parameters for the frame, and utilizing the lens formula as follows:

$$\frac{1}{f} = \frac{1}{u} + \frac{1}{v} \tag{1}$$

where, *f* is the focal length, *u* and *v* are the distances of object and image from the lens, respectively. Figure 2 shows the image formation in the optical lens. The object point at the distance *u* is focused to the image point at the distance *v*.

**Figure 2.** Image formation in optical lens.

### *2.2. Focus Measures*

A focus measure operator calculates the focus quality of each pixel in the image sequence, and is evaluated locally. As the image sharpness increases, the value of the focus measure increases. When the image sharpness is maximum, the best focused image is attained. Some of the popular gradient-based, statistical-based, and Laplacian-based operators are briefly given in [12].

First, there are Modified Laplacian (ML) and Sum of Modified Laplacian (SML) as Laplacian-based operators. When Laplacian is used in textured images, *x* and *y* components of the Laplacian operator may cancel out and provide no response. ML is calculated by adding the squared second derivatives for each pixel of the image *I* as:

$$F\_{ML}(\mathbf{x}, \mathbf{y}) = \left(\frac{\partial^2 I(\mathbf{x}, \mathbf{y})}{\partial \mathbf{x}^2}\right)^2 + \left(\frac{\partial^2 I(\mathbf{x}, \mathbf{y})}{\partial \mathbf{y}^2}\right)^2 \tag{2}$$

If the image has rich textures with high variability at each pixel, focus measure can be evaluated for each pixel. In order to improve robustness for weak-textured images, SML is computed by adding the ML values in a *W* × *W* window as:

$$F\_{\rm SML}(i,j) = \sum\_{\mathbf{x} \in \mathcal{W}} \sum\_{\mathbf{y} \in \mathcal{W}} \left\{ \left( \frac{\partial^2 I(\mathbf{x}, \mathbf{y})}{\partial \mathbf{x}^2} \right)^2 + \left( \frac{\partial^2 I(\mathbf{x}, \mathbf{y})}{\partial \mathbf{y}^2} \right)^2 \right\} \tag{3}$$

where, *i* and *j* are the *x* and *y* coordinates of center pixel in a *W* × *W* window, respectively.

Next, there is Tenenbaum (TEN) as a gradient-based operator. TEN is calculated by adding the squared responses of horizontal and vertical Sobel operators. For robustness, it is also computed by adding the TEN values in a *W* × *W* window as:

$$F\_{TEN}(i,j) = \sum\_{\mathbf{x} \in \mathcal{W}} \sum\_{\mathbf{y} \in \mathcal{W}} \left\{ (\mathbf{G}\_{\mathbf{x}}(\mathbf{x}, \mathbf{y}))^2 + \left( \mathbf{G}\_{\mathbf{y}}(\mathbf{x}, \mathbf{y}) \right)^2 \right\} \tag{4}$$

where, *Gx*(*x*, *y*) and *Gy*(*x*, *y*) are images acquired through convolution with the horizontal and vertical Sobel operators, respectively.

Finally, there is Gray-Level Variance (GLV) as a statistics-based operator. It has been proposed on the basis of the idea that the variance of gray level in a sharp image is higher than in a blurred image. GLV for a central pixel in a *W* × *W* window is calculated as:

$$F\_{\rm GLV}(i,j) = \frac{1}{N^2} \sum\_{\mathbf{x} \in \mathbb{W}} \sum\_{y \in \mathbb{W}} \left\{ (I(\mathbf{x}, y) - \mu)^2 \right\} \tag{5}$$

where, μ is the mean of the gray values in a *W* × *W* window.

#### **3. Noise Modeling**

When a sequence of 2D images is obtained by translating the object at a constant step size along the optical axis, mechanical vibrations, referred as jitter noise, occur in each step. In this manuscript, two probability density functions are used for modeling the jitter noise. At first, the jitter noise is modeled as Gaussian function with mean μ*<sup>n</sup>* and standard deviation σ*n*, as shown in Figure 3.

**Figure 3.** Noise modeling through Gaussian function.

μ*<sup>n</sup>* represents the position of each image frame without the jitter noise, and σ*<sup>n</sup>* represents the amount of jitter noise occurred in each image frame. σ*<sup>n</sup>* is determined by checking depth of field and corresponding image position. The depth of field is affected by magnification and different factors. σ*<sup>n</sup>* is selected as σ*<sup>n</sup>* ≤ 10 μm through repeated experiments with real objects used in this manuscript. Second, the jitter noise is modeled as speckle function as follows [34,35]:

$$f(\zeta) = \frac{1}{2\sigma\_n^2} \times e^{\frac{-\zeta}{2\sigma\_n^2}} \tag{6}$$

where, ζ is the amount of jitter noise before or after filtering.

#### **4. Focus Curve Modeling**

In order to filter out jitter noise, the focus curve obtained by one of the focus measure operators is modeled by Gaussian approximation with mean *z <sup>f</sup>* and standard deviation σ*<sup>f</sup>* [11]. This focus curve modeling is shown in Figure 4.

**Figure 4.** Gaussian fitting of the focus curve.

Related equation about this method is given as:

$$F(z) = F\_p \times e^{\left(-\frac{1}{2}\left(\frac{z-z\_f}{\sigma\_f}\right)^2\right)}\tag{7}$$

where, *z* is the position of each image frame, *F*(*z*) is the focus value at *z*, *z <sup>f</sup>* is the best-focused position of the object point, σ*<sup>f</sup>* is standard deviation of the approximated focus curve (by Gaussian function), and *Fp* is amplitude of the focus curve. Using the natural logarithm to (7), (8) is obtained:

$$\ln(F(Z)) = \ln(F\_{\mathcal{P}}) - \frac{1}{2} (\frac{z - z\_f}{\sigma\_f})^2 \tag{8}$$

Using (8) and initial best-focused position obtained through one of the focus measure operators *zi* and the positions below and above initial best-focused position *zi*−<sup>1</sup> and *zi*+<sup>1</sup> and their corresponding focus values *Fi*, *Fi*−<sup>1</sup> and *Fi*+1, (9) and (10) are obtained:

$$\ln(F\_i) - \ln(F\_{i-1}) = -\frac{1}{2} \frac{(\left(z\_i - z\_f\right)^2 - \left(z\_{i-1} - z\_f\right)^2)}{\sigma\_f^2} \tag{9}$$

$$\ln(F\_i) - \ln(F\_{i+1}) = -\frac{1}{2} \frac{(\left(z\_i - z\_f\right)^2 - \left(z\_{i+1} - z\_f\right)^2)}{\sigma\_f^2} \tag{10}$$

Using (10), (11) is acquired as follows:

$$\frac{1}{\sigma\_f^2} = \frac{\ln(F\_i) - \ln(F\_{i+1})}{-\frac{1}{2}(\left(z\_i - z\_f\right)^2 - \left(z\_{i+1} - z\_f\right)^2)}\tag{11}$$

Applying (11) to (9), (12) is obtained:

$$\ln(F\_i) - \ln(F\_{i-1}) = \frac{\left(\left(z\_i - z\_f\right)^2 - \left(z\_{i-1} - z\_f\right)^2 \times \left(\ln(F\_i) - \ln(F\_{i+1})\right)\right)}{\left(z\_i - z\_f\right)^2 - \left(z\_{i+1} - z\_f\right)^2} \tag{12}$$

Assuming Δ*z* = *zi*+<sup>1</sup> − *zi* = *zi* − *zi*−<sup>1</sup> = 1 and utilizing (12), (13) is acquired as:

$$z\_f = \frac{(\ln(F\_i) - \ln(F\_{i+1}))(z\_t^2 - z\_{i-1}^{-2}) - (\ln(F\_i) - \ln(F\_{i-1}))(z\_t^2 - z\_{i+1}z^2)}{2((\ln(F\_i) - \ln(F\_{i-1})) + (\ln(F\_i) - \ln(F\_{i+1})))} \tag{13}$$

Using (11) and (13), (14) is obtained as:

$$\sigma\_f^2 = -\frac{\left(z\_i^2 - z\_{i-1}\,^2\right) + \left(z\_i^2 - z\_{i+1}\,^2\right)}{2\left(\left(\ln(F\_i) - \ln(F\_{i-1})\right) + \left(\ln(F\_i) - \ln(F\_{i+1})\right)\right)}\tag{14}$$

Utilizing (7), (13), and (14), *Fp* is acquired by following:

$$F\_p = \frac{F\_i}{e^{\left(-\frac{1}{2}\left(\frac{z\_i - z\_f}{\sigma\_f}\right)^2\right)}}\tag{15}$$

Substituting (13), (14), and (15) into (7), final focus curve obtained by Gaussian approximation is acquired. Since jitter noise is considered in this paper, Equation (7) is modified as follows:

$$F\_n(z) = F\_p \times e^{\left(-\frac{1}{2} \left(\frac{(z+\zeta)-z\_f}{\sigma\_f}\right)^2\right)}\tag{16}$$

where, ζ is previously modeled (jitter) noise, approximated by Gaussian or speckle function. Using the proposed filter (in the next section), this noise is filtered to obtain a noise-free focus curve.

#### **5. Proposed Method**

Various filters can be used for removing the jitter noise. In this manuscript, Kalman filter is used as an optimal estimator and is designed accordingly. It is a recursive filter, which tracks the state of a linear dynamic system that contains noise, and is based on measurements made over time. More accurate estimation results can be obtained than by using only measurements at that moment. The Kalman filter, which recursively processes input data including noise, can predict optimal current state statistically [36–39]. The application of the Kalman filter to the SFF system is shown in Figure 5.

**Figure 5.** Application of Kalman filter to SFF system.

The system is defined by the position of each image frame in a 2D image sequence. The system state is changed by the jitter noise, which is the measurement noise, in the microscope. The optimal estimate of the system state is obtained by removing the jitter noise through the Kalman filter.

The entire Kalman filter algorithm can be divided into two parts: prediction and update. The prediction refers to the prediction of the current state, and update means that a more accurate

prediction can be made through the values from the present state to the observed measurement. The prediction of the state and its variance is represented as follows:

$$S = T \times S + C \times \mathcal{U} \tag{17}$$

$$V = T \times V \times T' + N\_p \tag{18}$$

where, *S* is "estimate of the system state", *T* is "transition coefficient of the state", *U* is "input"; *C* is "control coefficient of the input", *V* is "variance of the state estimate", and *Np* is "variance of the process noise". In the SFF system, *S* is represented as the position of each image frame in the 2D image sequence estimated by the Kalman filter, and *C*, *U*, and *Np* are all set as 0, since there is no control input in the SFF system, and only jitter noise, as the measurement noise, is considered in this manuscript. Next, the computation of the Kalman gain is given for updating the predicted state as follows:

$$G = V \times A' \times inv(A \times V \times A' + N\_{\text{m}}) \tag{19}$$

where, *G* is "Kalman gain", *A* is "observation coefficient", and *Nm* is "variance of the measurement noise". In an SFF system, *Nm* is defined as the variance of the previously modeled jitter noise. Finally, the update of the predicted state on the basis of the observed measurement is provided by:

$$S = S + G \times (O - A \times S) \tag{20}$$

$$V = V - G \times A \times V \tag{21}$$

where, *O* is "observed measurement". In an SFF system, *O* is represented as the position of each image frame in the image sequence before filtering. Parameters that are not set to values, *T* and *A*, are all set to 1 for simplicity. For the start of the algorithm, *S* and *V* are initialized as:

$$S = \text{inv}(A) \times O \tag{22}$$

$$V = \dot{m}v(A) \times N\_m \times \dot{m}v(A') \tag{23}$$

Through the Kalman filter algorithm, the optimal position *S* of each image frame in the image sequence is estimated. The pseudo code for the Kalman filter algorithm is shown in Algorithm 1.


The difference between the true position μ*n*, which is the position of each image frame without the jitter noise, and optimal position *S*, is put to ζ. This algorithm is repeated for all image frames in the image sequence. After acquiring a filtered image sequence, a depth map is obtained by maximizing the focus measure obtained by using the previously modeled focus curve, for each pixel in the image sequence. A list of frequently used symbols and notations is shown in Table 1.



### **6. Results and Discussion**

#### *6.1. Image Acquisition and Parameter Setting*

For experiments, four objects were used, as shown in Figure 6, consisting of one simulated and three real objects.

**Figure 6.** 10th frame of experimented objects: (**a**) Simulated cone, (**b**) Coin, (**c**) Liquid Crystal Display-Thin Film Transistor (LCD-TFT) filter, (**d**) Letter-I.

First, a simulated cone image sequence consisting of 97 images, with dimensions of 360 × 360 pixels, was acquired. These images were generated using camera simulation software [40].

The real objects used for experiments were: coin, Liquid Crystal Display-Thin Film Transistor (LCD-TFT) filter, and letter-I. The coin images were magnified images of Lincoln's head from the back of the US penny. The coin sequence consisted of 80 images, with dimensions of 300 × 300 pixels. The LCD-TFT filter images consisted of microscopic images of an LCD color filter. The image sequence of LCD-TFT filter had 60 images, with the dimensions of 300 × 300 pixels each. The third image sequence consisted of letter-I, engraved on the metallic surface. It consisted of 60 images, with dimensions of 300 × 300 pixels each. The real objects were acquired through a microscopic control system (MCS) [18]. The system consists of a personal computer integrated with a frame grabber board (Matrox Meteor-II) and a CCD camera (SAMSUNG CAMERA SCC-341) mounted on a microscope (NIKON OPTIPHOT-100S). Computer software obtains images by translating the object plane through a stepper motor driver (MAC 5000), possessing a 2.5 nm minimum step size. The coin and letter-I images were obtained under 10× magnification, while the LCD-TFT filter images were acquired under 50× magnification.

In parameter setting, the standard deviation of the jitter noise for each object was assumed to be ten times the sampling step size of each image sequence, i.e., 254 mm, 6.191 μm, 1.059 m, and 1.529 μm for simulated cone, coin, LCD-TFT filter, and letter-I, respectively. For comparison of 3D shape recovery results, a local window 7 × 7 for focus measure operators was used. The total number of iterations *N* of the Kalman filter was set as 100.

For performance comparison, Bayes filter and particle filter were employed [41–46]. The depth estimation through the Bayes filter is presented in Figure 7.

In Figure 7, *z*<sup>0</sup> is defined as a total number of 2D images obtained for SFF, and *pj*(*i*) is presented as follows:

$$p\_j(i) = \frac{1}{\sqrt{2\pi\sigma\_n^2}} e^{\frac{-\left(z(j) - r(i)\right)^2}{2\sigma\_n^2}}, \ 1 \le i \le M, \ 1 \le j \le N \tag{24}$$

where, *pj*(*i*) is Gaussian probability density function, *z*(*j*) is the position of each image frame changed by the jitter noise, *r*(*i*) is the possible positions of each image frame in the presence of the jitter noise, σ*<sup>n</sup>* is standard deviation of previously modeled jitter noise, *M* is the total length of *r*(*i*) with intervals of 0.01, and *N* is the total number of iterations of Bayes filter. The reason why 3σ*<sup>n</sup>* is set in the range of *r*, is because 3σ*<sup>n</sup>* makes *z*(*j*) be in the range of *r* with the probability of 99.7% due to the Gaussian probability density function. The recursive Bayesian estimation was applied to all 2D image frames obtained for SFF. After the filtered image sequence was acquired, an optimal depth map was obtained using the previously modeled focus curve.

**Figure 7.** Depth estimation through Bayes filter.

Particle filter algorithm is mainly divided into two steps: Generating the weights for each of particles and resampling for acquiring new estimated particles. In the first step, the weights are based on the probability of the given observation for each of the particles as:

$$p\_w(i) = \frac{1}{\sqrt{2\pi\sigma\_n^2}} e^{\frac{-\left(z - z\_p(i)\right)^2}{2\sigma\_n^2}}, \ 1 \le i \le P \tag{25}$$

where, *pw*(*i*) is Gaussian probability density function, *z* is the observed position of each image frame changed by the jitter noise, *zp*(*i*) is vector of particles, and *P* is the number of particles the SFF system generates. In this manuscript, *zp*(*i*) is initialized by randomly selecting the values on the x-axis from the previously modeled jitter noise, and *P* is set as 1000. After the weights are normalized, resampling, as the second step, is needed for acquiring new estimated particles. The new estimated particles are obtained by sampling the cumulative distribution of the normalized *pw*(*i*), randomly and uniformly. Through this sampling, the particles with the higher weights are selected. This particle filter algorithm is repeated *N* times, as the total number of iterations of the particle filter. The optimal position of each image frame is the mean of the final estimated particles *zp*(*i*) obtained through resampling in iteration *N*. After the filtered image sequence is acquired through application of the particle filter to all 2D image frames, optimal depth map is obtained in the same way as the depth estimation in Figure 7.

#### *6.2. Experimental Results*

Figure 8 presents the performance comparison of the filters in the 97th frame of the simulated cone using various iterations in the presence of Gaussian jitter noise.

**Figure 8.** Performance of the filters: (**a**) Iteration—50, (**b**) Iteration—100, (**c**) Iteration—150, (**d**) Iteration—200.

Figure 9 provides performance comparison of the filters in the 100th iteration using various frames of the simulated cone in the presence of Gaussian jitter noise.

These Figures are intensively enlarged versions of the last iteration. "Kalman output" is the estimated position through Kalman filter, "Bayesian output" is the estimated position through Bayes filter, "Particle output" is the estimated position through particle filter, and "True position" is the position without the jitter noise. It is clear from these Figures that Kalman output converged better to True position than Bayesian output and Particle output. It means that Kalman filter outperformed the other filters compared for experiments.

**Figure 9.** Performance of the filters: (**a**) Frame number—10, (**b**) Frame number—30, (**c**) Frame number—50, (**d**) Frame number—70.

Figure 10 shows the Gaussian approximation of the focus curves using experimented objects in the presence of Gaussian jitter noise.

"Without Noise" is the Gaussian approximation of the focus curve without the jitter noise, "After Kalman Filtering" is the Gaussian approximation of the focus curve after Kalman filtering, "After Bayesian Filtering" is the Gaussian approximation of the focus curve after Bayes filtering, and "After Particle Filtering" is the Gaussian approximation of the focus curve after particle filtering. It is clear from Figure 10 that the optimal position with the highest focus value in After Kalman Filtering is closer to the optimal position in Without Noise than the optimal positions in the focus curves obtained after using other filtering techniques.

For performance evaluation of 3D shape recovery, three metrics were used in case of simulated cone, since the synthetic object had an actual depth map, as in Figure 11 [47].

**Figure 10.** Gaussian approximation of focus curves: (**a**) Simulated cone (60, 60), (**b**) Coin (120, 120), (**c**) LCD-TFT filter (180, 180), (**d**) Letter-I (240, 240).

**Figure 11.** Actual depth map of simulated cone.

The first one is Root Mean Square Error (RMSE), which is a commonly used measure when dealing with the difference between estimated and actual value, as follows:

$$RMSE = \sqrt{\frac{1}{XY} \sum\_{x=0}^{X-1} \sum\_{y}^{Y-1} \left( d(x, y) - \hat{d}(x, y) \right)^2} \tag{26}$$

where, *d*(*x*, *y*) and ˆ *d*(*x*, *y*) are the actual and estimated depth map, respectively, and *X* and *Y* are width and height of 2D images, which are used for SFF, respectively.

The second one is correlation, which shows the linear relationship and strength between two variables as:

$$\text{Correlation} = \frac{\sum\_{x=0}^{X-1} \sum\_{y=0}^{Y-1} (d(x,y) - \overline{d}(x,y))(d(x,y) - \overline{d(x,y)})}{\sqrt{(\sum\_{x=0}^{X-1} \sum\_{y=0}^{Y-1} (d(x,y) - \overline{d}(x,y))^2) \left(\sum\_{x=0}^{X-1} \sum\_{y=0}^{Y-1} (\hat{d}(x,y) - \overline{\hat{d}(x,y)})^2\right)}} \tag{27}$$

where, *d*(*x*, *y*) and ˆ *d*(*x*, *y*) are the means of the actual and estimated depth map, respectively.

The third one is Peak Signal-to-Noise Ratio (PSNR), which is the power of noise over the maximum power a signal can have. It is usually represented in terms of the logarithmic decibel scale as:

$$PSNR = 10\log\_{10}(\frac{d\_{\text{max}}^2}{MSE}) \tag{28}$$

where, *dmax* is maximum depth value in the depth map and *MSE* is the Mean Square Error, which is the square of the RMSE. The lower the RMSE, the higher the correlation, and the higher the PSNR, the higher the accuracy of 3D shape reconstruction.

Tables 2–4 provide the quantitative performance of 3D shape recovery of the simulated cone using three focus measures, SML, GLV, and TEN, before and after filtering in the presence of Gaussian jitter noise.

**Table 2.** Comparison of focus measure operators with proposed method for simulated cone in the presence of Gaussian noise by using RMSE (Root Mean Square Error). SML: Sum of Modified Laplacian; GLV: Gray-Level Variance; TEN: Tenenbaum.


**Table 3.** Comparison of focus measure operators with proposed method for simulated cone in the presence of Gaussian noise by using correlation.


**Table 4.** Comparison of focus measure operators with proposed method for simulated cone in the presence of Gaussian noise by using PSNR.


The order of the general performance of the focus measures is that SML is the best, then the GLV, and finally the TEN. In Before Filtering, it is difficult to distinguish the performance of the focus measures due to the jitter noise. However, in After Bayesian Filtering and After Kalman Filtering, it is shown in Tables 2–4 that the performance order of the focus measures is almost correct, as described above. The particle filter suitable for nonlinear systems does not remove jitter noise well in a linear SFF system. It is seen in Tables 2–4 that the performance order of the focus measures in After Particle Filtering is slightly different from the one presented above. Tables 5–7 provide the quantitative performance of 3D shape recovery of the simulated cone using three focus measures, SML, GLV, and TEN, before and after filtering in the presence of speckle noise. The performance order of the focus measures for each filtering technique is almost the same as that of focus measures when Gaussian jitter noise is present. However, in the presence of speckle noise, After Kalman Filtering and After Bayesian Filtering have poor performance in terms of RMSE and PSNR. This is because these two filters estimate the position of each 2D image after assuming the jitter noise to be Gaussian function. It is evident from Tables 2–7 that the best overall performance is that of the Kalman filter as the proposed method, which provides the optimal estimation results in a linear system. The Bayes filter comes second, and finally the particle filter, which estimates the optimal value in a nonlinear system.

**Table 5.** Comparison of focus measure operators with proposed method for simulated cone in the presence of speckle noise by using RMSE.


**Table 6.** Comparison of focus measure operators with proposed method for simulated cone in the presence of speckle noise by using correlation.


**Table 7.** Comparison of focus measure operators with proposed method for simulated cone in the presence of speckle noise by using PSNR (Peak Signal-to-Noise Ratio).


Tables 8 and 9 present the time taken to estimate the position of one image frame by using the filters for the experimented objects in the presence of Gaussian and speckle noise, respectively.

**Table 8.** Computation time of filters for the experimented objects in the presence of Gaussian noise.



**Table 9.** Computation time of filters for the experimented objects in the presence of speckle noise.

The computation time in Tables 8 and 9 is expressed in seconds. It is evident that the computation time of the Kalman filter was about 14 times better than the Bayes filter and about 80 times better than the particle filter. Figures 12–14 show the qualitative performance of 3D shape reconstruction of the experimented objects using three focus measures, SML, GLV, and TEN, before and after filtering in the presence of Gaussian noise. Figures 15 and 16 provide the qualitative performance of 3D shape reconstruction of the experimented objects using three focus measures, SML, GLV, and TEN, before and after filtering in the presence of speckle noise. In Before Filtering and After Particle Filtering, it can be seen that the performance of 3D shape reconstruction was very poor due to unremoved or poorly removed jitter noise. However, in After Bayesian Filtering and After Kalman Filtering, it is evident that the performance of the 3D shape recovery was greatly improved due to the elimination of most of the jitter noise. It is proved from these experimental results that filtering the jitter noise using the Kalman filter improves the 3D shape reconstruction faster and more accurately.

**Figure 12.** 3D shape recovery of simulated cone, before and after filtering, using SML, GLV, and TEN in the presence of Gaussian noise. (**a**) Before filtering for SML; (**b**) Before filtering for GLV; (**c**) Before filtering for TEN; (**d**) After particle filtering for SML; (**e**) After particle filtering for GLV; (**f**) After particle filtering for TEN; (**g**) After Bayesian filtering for SML; (**h**) After Bayesian filtering for GLV; (**i**) After Bayesian filtering for TEN; (**j**) After Kalman filtering for SML; (**k**) After Kalman filtering for GLV; (**l**) After Kalman filtering for TEN.

**Figure 13.** 3D shape recovery of coin, before and after filtering, using SML, GLV, and TEN in the presence of Gaussian noise. (**a**) Before filtering for SML; (**b**) Before filtering for GLV; (**c**) Before filtering for TEN; (**d**) After particle filtering for SML; (**e**) After particle filtering for GLV; (**f**) After particle filtering for TEN; (**g**) After Bayesian filtering for SML; (**h**) After Bayesian filtering for GLV; (**i**) After Bayesian filtering for TEN; (**j**) After Kalman filtering for SML; (**k**) After Kalman filtering for GLV; (**l**) After Kalman filtering for TEN.

**Figure 14.** 3D shape recovery of LCD-TFT filter, before and after filtering, using SML, GLV, and TEN in the presence of Gaussian noise. (**a**) Before filtering for SML; (**b**) Before filtering for GLV; (**c**) Before filtering for TEN; (**d**) After particle filtering for SML; (**e**) After particle filtering for GLV; (**f**) After particle filtering for TEN; (**g**) After Bayesian filtering for SML; (**h**) After Bayesian filtering for GLV; (**i**) After Bayesian filtering for TEN; (**j**) After Kalman filtering for SML; (**k**) After Kalman filtering for GLV; (**l**) After Kalman filtering for TEN.

**Figure 15.** 3D shape recovery of simulated cone, before and after filtering, using SML, GLV, and TEN in the presence of speckle noise. (**a**) Before filtering for SML; (**b**) Before filtering for GLV; (**c**) Before filtering for TEN; (**d**) After particle filtering for SML; (**e**) After particle filtering for GLV; (**f**) After particle filtering for TEN; (**g**) After Bayesian filtering for SML; (**h**) After Bayesian filtering for GLV; (**i**) After Bayesian filtering for TEN; (**j**) After Kalman filtering for SML; (**k**) After Kalman filtering for GLV; (**l**) After Kalman filtering for TEN.

**Figure 16.** 3D shape recovery of letter-I, before and after filtering, using SML, GLV, and TEN in the presence of speckle noise. (**a**) Before filtering for SML; (**b**) Before filtering for GLV; (**c**) Before filtering for TEN; (**d**) After particle filtering for SML; (**e**) After particle filtering for GLV; (**f**) After particle filtering for TEN; (**g**) After Bayesian filtering for SML; (**h**) After Bayesian filtering for GLV; (**i**) After Bayesian filtering for TEN; (**j**) After Kalman filtering for SML; (**k**) After Kalman filtering for GLV; (**l**) After Kalman filtering for TEN.

#### **7. Conclusions**

For SFF, an object is translated at a constant step size along the optical axis. When an image of the object is captured in each step, mechanical vibrations occur, which are referred as jitter noise. In this manuscript, jitter noise is modeled as Gaussian function with mean *zn* and standard deviation σ*<sup>n</sup>* for simplicity. Then, the focus curves obtained by one of the focus measure operators are also modeled as Gaussian function, with mean *z <sup>f</sup>* and standard deviation σ*<sup>f</sup>* , for the application of the proposed method. Finally, a new filter is proposed to provide optimal estimation results in a linear SFF system with the jitter noise, utilizing Kalman filter to eliminate jitter noise in the modeled focus curves. Through experimental results, it was found that the Kalman filter provided significantly improved 3D reconstruction of the experimented objects compared with before filtering, and that the 3D shapes of the experimented objects were recovered with more accurate and faster performance than with other existing filters, such as the Bayes filter and the particle filter.

**Author Contributions:** Conceptualization, H.-S.J. and M.S.M.; Methodology, H.-S.J. and M.S.M.; Software, H.-S.J. and G.Y.; Validation, H.-S.J.; Writing—original draft preparation, H.-S.J.; Writing—review and editing, M.S.M.; Supervision, D.H.K.; Funding acquisition, D.H.K.

**Funding:** This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No.2018-0-00677, Development of Robot Hand Manipulation Intelligence to Learn Methods and Procedures for Handling Various Objects with Tactile Robot Hands).

**Acknowledgments:** We thank Tae-Sun Choi for his assistance with useful discussion.

**Conflicts of Interest:** The authors declare no conflict of interest.
