Learning-Based Proof of the State-of-the-Art Geometric Hypothesis on Depth-of-Field Scaling and Shifting Influence on Image Sharpness

Khatibi, Siamak; Wen, Wei; Emam, Sayyed Mohammad

doi:10.3390/app14072748

Open AccessArticle

Learning-Based Proof of the State-of-the-Art Geometric Hypothesis on Depth-of-Field Scaling and Shifting Influence on Image Sharpness

by

Siamak Khatibi

¹

,

Wei Wen

¹

and

Sayyed Mohammad Emam

^2,*

¹

Department of Technology and Aesthetics, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

²

Department of Mechanical Engineering, Faculty of Engineering, Ardakan University, Ardakan P.O. Box 184, Iran

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(7), 2748; https://doi.org/10.3390/app14072748

Submission received: 24 January 2024 / Revised: 11 March 2024 / Accepted: 22 March 2024 / Published: 25 March 2024

Download

Browse Figures

Versions Notes

Abstract

:

Today, we capture and store images in a way that has never been possible. However, huge numbers of degraded and blurred images are captured unintentionally or by mistake. In this paper, we propose a geometrical hypothesis stating that blurring occurs by shifting or scaling the depth of field (DOF). The validity of the hypothesis is proved by an independent method based on depth estimation from a single image. The image depth is modeled regarding its edges to extract amplitude comparison ratios between the generated blurred images and the sharp/blurred images. Blurred images are generated by a stepwise variation in the standard deviation of the Gaussian filter estimate in the improved model. This process acts as virtual image recording used to mimic the recording of several image instances. A historical documentation database is used to validate the hypothesis and classify sharp images from blurred ones and different blur types. The experimental results show that distinguishing unintentionally blurred images from non-blurred ones by a comparison of their depth of field is applicable.

Keywords:

unintentional blur; shifting; scaling; depth of field; blurred image

1. Introduction

The availability and ease of use of cameras has caused the mass production of photos, videos, and multimedia. It is more interesting to capture the moment than to think about the quality of the captured content. Generally, postsorting captured photos based on their quality seems affordable and accessible.

However, gathering hundreds and hundreds of images makes the task almost impossible, and a qualitative automatic selection method becomes an immediate necessity. The blurring effect is one of the conventional image quality degradations, and recent increasing interest [1,2,3] in classifying the captured images is evidence of such necessity.

The image quality degradation due to unintentional blurring is caused by (a) a sudden displacement of the camera or the object at the instance of image capturing, known as motion blur effect or (b) a sudden change in the adjusted shooting distance, known as defocus blur effect. Unfortunately, these conditions are rare and too often, e.g., handshaking or object movement can happen at any time during the capturing. Generally, the motion or defocus blurring is caused by external or internal incidents related to the camera, respectively. The external incidents are physical happenings outside the camera and the internal incidents are related to unexpected scene changes in conflict with the camera parameter setting.

In this paper, we study the unintentional blur problem. A hypothesis based on geometrical optics has arisen, explaining that blurring occurs either by shifting or scaling the depth of field. We prove the validity of the hypothesis using historical documentation images in which there are two different depth surfaces. An independent method from geometrical optics is used to detect the depth surfaces from a single image. The DOF of each image is estimated from the position of the depth surfaces. We also show the feasibility of detecting unintentionally blurred images from non-blurred ones by comparing their depth of field. The paper is organized into eight sections, including the present section. We review previous works in Section 2. The relation between blurring and DOF based on geometrical optics is considered in Section 3. The hypothesis is presented in Section 4. Furthermore, the object depth modeling is presented in Section 5. The blur classification is investigated in Section 6. Experimental results are shown in Section 7. Finally, we discuss our approach and conclude the paper in Section 8.

2. Related Work

Measuring edge elongation caused by blurred effects has played a significant role in the analysis of blurred images. In previous approaches, the blurring effect was detected and estimated by measuring the blur extent of edges in [4] or by fitting the gradient magnitude to a normal distribution along the edge direction where the standard deviation of the distribution and gradient magnitude was used as the blur measure as in [5]. Zhao et al. [6] presented a defocus blur estimation using a transformer encoder and edge detection method. They proposed a hybrid architecture of convolutional neural networks with an edge-guided aggregation module and a feature fusion module for defocus blur detection. Li et al. [7] investigated the defocus blur detection method to detect blurred areas in images. To address the problem of uneven pixel distribution at the edges of defocused regions, they deliberately separated the main labels to prior tokens, including a structure body region and a edge transfer detail region. Almustofa et al. [8] investigated blur detection algorithms, including support vector machine filters, focus measure thresholding, and convolutional neural networks on blurred images.

The estimation of blur filter and latent unblurred images relies on blind image deconvolution methods [9,10,11]. This type of estimation tries to solve a severely ill-posed problem. Most of recent proposed methods in image deblurring implement a spatially invariant blur in which it is assumed that all pixels in the input image are blurred by the same PSF. The partial blur problem was considered in some methods by assuming a blur kernel or with the help of user interaction [12,13,14]. For all these methods, the deblurring is successful if the PSF is correctly reconstructed. However, in practice, blind deconvolution usually performs unsatisfactorily, even by making restrictive assumptions on image and kernel structures. This problem becomes even more significant when the partial blurring effect in images is considered. Therefore, blind deconvolution methods are not appropriate for general blur detection in terms of efficiency and accuracy, especially for handling images in a large database.

In photography, the low-depth-of-field technique focuses the camera only on an object of interest. In the presented methods in [15,16], the object of interest was extracted automatically. However, the computed low-depth-of-field images contained out-of-focus backgrounds, making the methods inappropriate for general blur detection. Datta et al. [17] extended the previous idea for image autosegmentation as an application of blur analysis. They generated the low-depth-of-field images by calculating an indicator, defined by the ratio of high-frequency wavelet coefficients of the central regions of the whole image. Accordingly, the method simply assumed that low-depth-of-field images contained focused objects near the image center and out-of-focus objects in surrounding pixels of the image. Thus, the method also did not suit general-purpose blur analysis.

The most relevant research to this work is probably related to depth recovery from motion and defocus blurring effects. Li et al. [18] proposed a learning framework for motion and defocus deblurring networks. Their networks were trained for removing object blur as a by-product. Keshri et al. [19] presented depth recovery with a single camera scanner by applying focus blur and changing the aperture number. Their proposed model performed well with both sharp and blurred images in computational depth estimation up to a range of 3.3 m, regardless of whether the image was in focus or out of focus. Nazir et al. [20] suggested a deep convolutional neural network to estimate the depth and image deblurring. Kumar et al. [21] presented a novel technique to generate a more accurate depth map for dynamic scenes using a combination of defocus and motion cues. The combination was performed by keeping the parameter of the defocus edge points aligned in the motion direction and estimating the camera parameters with the help of motion and defocus relations. The proposed technique rectified and corrected errors in the depth map caused by moving objects and inaccurate defocus blur and motion estimation. Using a patch-pooled set of feature maps, Anwar, Hayder, and Porikli [22] presented a depth estimation method based on a novel deep convolutional neural framework from a single image. Moreover, they computationally reconstructed an all-focus image, removing the blur and achieving synthetic refocusing from the same image. The significant difference of their method from existing ones was the algorithm of the convolutional depth estimation from defocused image and incorporating the resulting depth map in deblurring. In contrast, in this work, we apply the knowledge of geometric optics on edges of images, blurred or normal, to find an estimated depth net of edges as potential seeds to recover the depth in the whole image. Then, by analyzing the depth data, blurred and normal images are classified.

3. Relation between Blurring and DOF Based on Geometrical Optics

To understand the origin of the blurring problem, we need to focus on the most crucial part of the capturing process: the projection of a scene by a lens on a camera sensor. Figure 1 shows the principle of such projection using a thin lens model [23] where the camera has a focal length of

f

. When an object stands at the focus distance of

d_{f}

, i.e., the shooting distance, the image sensor capture the object as it is at the distance of

d_{0}

, see the top sketch in Figure 1. However, at any other distances, e.g.,

d_{1}

or

d_{2}

, the image sensor captures a deformed and blurred shape of the object.

In the thin lens model, a circular imprint of the blurring effect, known as the circle of confusion (CoC), is used to measure the blurriness. The distance C in Figure 1b,c represents the diameter of such a CoC, and it is calculated by

D_{C o C} = \frac{|d_{i} - d_{f}|}{d_{i}} \frac{f^{2}}{N (d_{f} - f)} .

(1)

where

d_{i}

is the object distance, and

N

is the relative aperture or f-stop number. Figure 2 shows the

D_{C o C}

as a function of the object distance

d_{i}

, where distance

d_{f}

is 800 mm, f-stop number

N

is 8, 16, or 22, and the focus

f

is

f_{0}

and

f_{1}

at 50 and 52 mm, respectively. This figure shows that the diameter of

D_{C o C}

is a non-linear function of the object distance. The function monotonically decreases or increases for object distances less or bigger than the focus distance, respectively. The expansion tolerance of the CoC while capturing images is a subjective issue, and the conjugate of the CoC in object space is typically bigger than the CoC due to the lens magnification, as shown in Figure 3.

According to Gauss’s ray construction, there are two distance regions around the CoC in image space and around the conjugate of the CoC in object space where the image plane and object can change their positions and still make it possible to capture a non-blurred image. These distance regions in image space and object space are asymmetrical and known as the depth of focus and depth of field (DOF), respectively; see the blue regions in Figure 3. The

D_{C o C}

is shown by C and the diameter of the conjugate of the CoC is presented by

\frac{C}{M}

in Figure 3, where

M

is the magnification from image space to object space, e.g.,

M ≜ - \frac{I m a g e p l a n e d i s t a n c e}{O b j e c t d i s t a n c e}

. When the conjugate of CoC is small in relation to the lens aperture size, it yields:

D O F \approx \frac{2 N D_{C o C} d_{i}^{2}}{f^{2}} .

(2)

Using Equation (1) in Equation (2) yields:

D O F \approx \frac{2 d_{i} |d_{i} - d_{f}|}{(d_{f} - f)} .

(3)

Thus, a successful non-blurred image capturing is constrained by Equation (3); a more detailed calculation of the DOF can be found in Appendix A. In unintentional image degradation, i.e., caused by motion or defocus blur, the focal length

f

does not change, i.e., the appropriate focal length is chosen intentionally and as part of the camera setting. However, the focus distance

d_{f}

, as a result of the internal camera setting, can conflict with scene changes. Thus, the DOF constraint for such degradation cases, expressed in Equation (3), is a function of two parameters: focus distance

d_{f}

and the distance to the in-focus plane

d_{i}

.

4. Hypothesis on the Cause of Blurring Effects

We discussed earlier how a successful non-blurred image is captured; when the object is in the range of the DOF, the depth of field is calculated by Equation (3). We argued that the DOF for unintentional image degradation cases is a function of two parameters: the focus distance

d_{f}

and the distance to the in-focus plane

d_{i}

. Figure 4 demonstrates typical calculated DOF values using Equation (3) for a focal length of 200 mm where

d_{f}

and

d_{i}

are varied between 700 to 1200 mm and 300 to 3000 mm, respectively. In the figure, the relation of DOF to

d_{i}

for certain

d_{f}

values of 850, 1000, and 1150 mm are shown in green, red, and black, respectively. These relations for a focal length of 200 mm are shown in Figure 5. In that figure, for certain

d_{i}

, the DOF is changed with respect to the

d_{f}

. The scaling concept can be found in Appendix A (Figure A2). The relation of the DOF to

d_{i}

for a

d_{f}

value of 1000 mm for different focal lengths is shown in Figure 6. In the figure, as the

d_{i}

distance is changed, and for any focal length, the DOF value is changed monotonically and non-linearly. The translation shifting concept can be found in Appendix A (Figure A3).

5. Object Depth Modelling

In the discrimination of objects in a scene in relation to their distance from the camera, the contour of objects has a significant role. The projection of such contours on an image sensor results in edges in the captured digital image. The orthogonal projection of a contour, represented as an edge in an image, is generally modelled as:

f_{1} (x) = A u (x) + B,

(4)

where

u (x)

is a step function, and

A

and

B

are the amplitude and offset of the edge, respectively. The simplified model has no information about the object distance from the camera. If we consider light to be represented by a complex plane wave which is traveling from any object towards the lens and then the image sensor, then each point on each of these object planes acts like a spherical wave source interfering constructively or destructively with every other spherical wave in the planes beyond the object plane. According to the Huygens–Fresnel principle, the phases change in any given plane, i.e., an interference pattern is generated in any given plane through a lens, the Fourier transform of the eventual captured image on the image sensor is generated in which the phase contains the information of objects from their respective object planes. Here, it can be argued that as long as object planes are at different distances to the lens, then the phase of the Fourier transform is related to the object distances.

Blanchet et al. [24] showed that the sharpness of images edges is related to phases and their coherency, and by a decrease in such coherency, the blurring effect appears. They also showed that the degree of blurring effect can be modeled by phase incoherency, which has a Gaussian distribution. Thus, to improve the model in Equation (4), we assume the step edge can undergo a variational blurring effect in relation to the variation in the phase incoherency, which in its turn is related to the distance variation in the object plane. Let us assume such a function is:

g (x) = \frac{1}{\sqrt{2 π σ_{m}^{2}}} e^{\frac{- x^{2}}{2 σ_{m}^{2}}},

(5)

where

σ_{m} = f (d_{i}, d_{f}) = d_{i} - d_{f}

, i.e., the standard deviation varies by two parameters of d_i and d_f. Thus, the improved model can be written as:

f_{2} (x) = (A u (x) + B) \otimes g (x),

(6)

where ⊗ represents the convolution operation. Here, the origin of

x

is assumed to be in the focus distance position, i.e.,

d_{f}

in Figure 1.

f (d_{i}, d_{f})

represents the variation in the object distance from the focal plane. By increasing the object distance from the focal plane, the blurring effect is increasing; the

σ_{m}

in

g (x)

is increasing.

When the point (i.e., a point of the object) is not in focus, see Figure 1, its image on the image plane is no longer a point but a circular patch with a certain radius that defines the amount of defocus associated with the depth of the point in the scene. On the other hand, the defocusing process can be modeled as

I (x) = \int f (τ) h (x, y) d y

, where x is adopted to denote the 2D space coordinates, f(x) is the focused image of the scene, and h is the space-varying PSF. Here, h(x) is given by a circularly symmetric 2D Gaussian function

h (x) = \frac{1}{\sqrt{2 π σ^{2}}} e x p (\frac{- x^{2}}{2 σ^{2}})

, where

σ

is a function of the depth at a given point f(d_i, d_f). Here, the depth is associated with two possible variables d_i and d_f.

The increase in the blurring effect in its turn causes a non-linear increase in

D_{C o C}

. Thus, it yields:

σ_{m} = f (D_{C o C}),

(7)

where

f (.)

represents a mathematical function. The effect of the lens, assuming a Gaussian PSF, p(x), on the captured contour can additionally improve the model as:

f (x) = (A u (x) + B) \otimes g (x) \otimes p (x), p (x) = \frac{1}{\sqrt{2 π σ_{l}^{2}}} e^{\frac{- x^{2}}{2 σ_{l}^{2}}} .

(8)

Generally, the edges in digital images are detected by gradient calculations. Thus, for captured edges as modeled in Equation (8), it yields:

\nabla (f (x)) = \nabla ((A u (x) + B) \otimes g (x) \otimes p (x)) = A g (x) \otimes p (x),

(9)

\nabla (f (x)) = A \frac{1}{\sqrt{2 π σ_{m}^{2}}} e^{\frac{- x^{2}}{2 σ_{m}^{2}}} {\otimes \frac{1}{\sqrt{2 π σ_{l}^{2}}} e}^{\frac{- x^{2}}{{2 σ}_{l}^{2}}}, \nabla (f (x)) = \frac{A}{\sqrt{2 π (σ_{m}^{2} + σ_{l}^{2})}} e^{\frac{- x^{2}}{2 σ_{m}^{2} + {2 σ}_{l}^{2}}} .

(10)

It is notable to consider when the object position is at the focal distance. In that case,

σ_{m} = 0

and

g (x)

= 1 in Equation (8). Then, the gradient of edges for such objects is calculated as

\nabla (f_{f o c a l} (x)) = {\frac{A}{\sqrt{2 π σ_{l}^{2}}} e}^{\frac{- x^{2}}{2 σ_{l}^{2}}} .

(11)

The result of detected edges which are contaminated by any tolerable blurring effect (i.e., when the object distance is still in the range of the DOF) or nontolerable blurring effect (i.e., when the object distance is outside the range of the DOF due to unintentional motion or defocus blur) can be compared to the same detected edges in the focal plane as

\frac{\nabla (f (x))}{\nabla (f_{f o c a l} (x))} = \sqrt{\frac{σ_{l}^{2}}{σ_{m}^{2} + σ_{l}^{2}}} e^{(\frac{- x^{2}}{2 σ_{m}^{2} + {2 σ}_{l}^{2}} + \frac{x^{2}}{2 σ_{l}^{2}})} .

(12)

Equation (12) indicates an important result which can be used for detecting any blurring effect. The amplitude part of the equation shows the ratio of amplitudes (i.e., due to possible intensity changes) on an edge before and after any blurring effect. However, obtaining the equation assumes that the same edge is detected in and out of focal plane positions, which are difficult tasks in practice. Thus, to use the mentioned result and solve the practical issue, let us consider an estimation of g(x) in Equation (8) as

\hat{g} (x) = \frac{1}{\sqrt{2 π σ_{v}^{2}}} e^{\frac{- x^{2}}{2 σ_{v}^{2}}},

(13)

which is a variable Gaussian model in which the model is modified by varying its standard deviation,

σ_{v}

. Let us assume the change in

σ_{v}

is linear, monotonic, and in a limited optional range, i.e.,

σ_{v} = f (d_{v}) = d_{v}

, and the range of

d_{v}

is chosen optionally.

Now, assume we have a database of images containing sharp and blur images caused by unintentional motion or defocus blurring effects. If the database images are captured by the same capturing device and have different content, some pairs of sharp and blurred images of the database can be used in a training procedure to classify blurred images, even without the existence of their sharp counterpart, as follows. For the training set of images, the edges and the intensity values along the edges (i.e., the amplitude) of each image can be detected and computed, respectively. Using the amplitude part of Equation (12) and substituting

\hat{g} (x)

instead of g(x), for a sharp image, it yields:

a m p (\frac{\nabla (f_{g e n - b l u r r e d} (x))}{\nabla (f_{s h a r p} (x))}) = \sqrt{\frac{σ_{l}^{2}}{σ_{v}^{2} + σ_{l}^{2}}} = T_{d v},

(14)

where

T_{d v}

is a measurable value representing the ratio of intensity values of the pixels which are along the edges in the sharp and the generated blurred images. As long as the

T_{d v}

values are obtained as a consequence of

\hat{g} (x)

, the values represent the relative distances (i.e., the relative depth) between the generated blurred and the sharp image.

As far as even sharp images contain edges of objects from different distances to the lens, i.e., the objects distances are in the range of DOF, different depths for the edges are obtained. To obtain a maximum range of depth, by varying

d_{v}

a certain generated blurred image is found where e.g.,

σ_{v}^{2} = σ_{t}^{2}

and

T_{d v} = T

; i.e., a maximum value among the computed

T_{d v}

values. Then, from Equation (14), we have:

σ_{l}^{2} = \frac{T^{2} σ_{t}^{2}}{{1 - T}^{2}} .

(15)

In the same way, a blurred image from the database can be compared to the generated blurred image as:

a m p (\frac{\nabla (f_{g e n - b l u r r e d} (x))}{\nabla (f_{b l u r r e d} (x))}) = \sqrt{\frac{σ_{m}^{2} + σ_{l}^{2}}{σ_{v}^{2} + σ_{l}^{2}}} = P_{d v} .

(16)

Here as well, different depths are detected using a maximum depth range by varying

d_{v}

, where a certain generated blurred image is found, e.g.,

σ_{v}^{2} = σ_{p}^{2}

and

P_{d v} = P

, i.e., a maximum value among the computed

P_{d v}

values. Then, from Equation (16), we obtain:

σ_{m}^{2} = P^{2} {(σ_{p}^{2} + σ}_{l}^{2}) - σ_{1}^{2} .

(17)

By having

σ_{l}^{2}

from Equation (15), Equation (17) is used to obtain the standard deviation of a burring kernel which causes the most blurring effect from the unintentional motion or defocused blur.

The pseudocode of the presented algorithm that summarizes the main computational steps is shown in Algorithm 1.

Algorithm 1 Full depth map estimation

Input: color image, I, sharp or blurred (either 0 or a value), ImSigma, σ_v values changes (rang of blurring), [blMin, blMax], blurring step, blStep
Output: pseudo color image (depth image), Id, max depth (related to T_dv or P_dv), maxDepthOut, Gaussian sigma of sharp (Equation (15)) or blurred image (Equation (17)), ImSigmaOut
Preprocessing: Ic -> Icp %mapping the intensity values of each of three color channel of Ic to new values in Icp
Color to grey conversion, Icp -> Ig
Edge detection, Ig -> Ie % where the Ie is a binary image
Generate Gaussian filter kernels, σ_v (blMin, blMax, blStep) -> G1, G2, …,GN
Calculate Gaussian gradient magnitude image for each Gaussian kernel,
Ie, G1 -> mgIm1
Ie, G2 -> mgIm2
…
Ie, Gn -> mgImN
Calculate amplitude ratio (Equation (14)), Le, mgIm1, mgIm2, …, mgImN -> TP_dv1, TP_dv2 … TP_dvN
If ImSigma = 0
max (TP_dv1, TP_dv2 … TP_dvN) -> T_dv
If TP_dvk = T_dv find Gk related to TP_dvk -> σ_t in Equation (15),
calculate σ₁ in Equation (15), σ_t, T_dv -> σ₁
ImSigmaOut = σ₁
maxDepthOut = T_dv
else % ImSigma has a value, σ₁, from sharp related image
max (TP_dv1, TP_dv2 … TP_dvN) -> P_dv
If TP_dvk = P_dv find Gk related to TP_dvk -> σ_v in Equation (16),
calculate σ_m in Equation (17), ImSigma (or σ₁), σ_v, P_dv -> σ_m
ImSigmaOut = σ₁
maxDepthOut = P_dv
end
Generate a sparse depth map using quantization of depth range, mgIm1, mgImK (related to found
T_dv or P_dv) -> blsparseDk
Apply joint bilateral filtering, blsparseDk -> blsparseDkF
Calculate sparse depth propagation using network of edges, blsparseDkF -> blsparseD
Clean the current sparse depth of outliers and noise, blsparseD -> blsparseD
Full sparse depth propagation, blsparseD -> Id

5.1. Estimating the Full Depth Map of Images

A sharp or blurred image generally contains edges. A “Canny” edge detector is used to obtain the magnitude of the edges. Equation (13) is implemented on the image in which (a) a variable Gaussian model

\hat{g} (x)

is generated with a stepwise variation in the standard deviation of

σ_{v}

from 0.5 to 0.75 with 0.05 in each step, (b) the image is filtered with each generated Gaussian model from (a). The amplitude ratio between the generated blurred images and the image, according to Equation (14) for a sharp image, or the unintentionally blurred image, according to Equation (16), is calculated, which results in obtaining different

T_{d v}

and

P_{d v}

, respectively. By finding the maximum of

T_{d v}

and

P_{d v}

values, the maximum depth ranges of

T

and

P

are obtained, respectively. A sparse depth map of the edges is computed by quantizing the amplitude ratio values in the found depth range. A joint bilateral filtering is applied on the sparse depth map to refine inaccurate estimates [25]. The obtained sparse depth map of edges is then propagated to the entire image to have a full depth map using the matting Laplacian method [26].

5.2. Object Depth Estimation from Images

A full depth map of an image is used to arrange a vector data array,

v_{0}

, of all depth values. Then, a new vector,

v_{1}

, is obtained by sorting the elements of

v_{0}

. The histogram of

v_{1}

is calculated and used as a feature vector,

v_{F e a t u r e}

, which represents the relative estimated depth of the objects in the whole image.

6. Blur Classification

The classification between “sharp” and “blurred” images was achieved using a probabilistic RUSBoost classification approach. RUSBoost is an algorithm to handle class imbalance problem in data with discrete class labels [27]. It uses a combination of RUS (random undersampling) and the standard boosting procedure AdaBoost [28], to model the minority class by removing majority class samples. We used a support vector machine (SVM) as a weak learner for boosting in the approach. The SVM determines a hyperplane in a high dimensional feature space of v_Feature, i.e., the relative estimated depth values in an image. The best hyperplane is derived by maximizing the margin; i.e., the least distance from the hyperplane to the data. The obtained RUSBoost trained model was then validated on the test set of images to find class prediction score of each image. Following the method presented in [29] to obtain binary outputs, a sigmoid model was used to achieve a posterior probability P(class/input) on the prediction results of RUSBoost.

The classification between “sharp” and “defocused blurred” images, and between “sharp” and “motion blurred” images were achieved by using the probabilistic RUSBoost classification approach as mentioned above for each of the classification frameworks.

7. Experimental Results

In this section, our results are presented.

7.1. Database of Images

An image database of historical documents was used, which consisted of 874 high-resolution color images. When capturing the images, due to unintentional motion or defocused blur, some of images were recorded as partially or fully blurred images. There were 22 and 15 motion and defocused blurred images in the database, respectively. The database also sometimes contained both the sharp and blurred images of the same document. An expert description about the type of degradation also existed in the database. Figure 7 shows a typical example of a sharp and blurred image of the database.

7.2. Object Depth Detection from Images

An image from the database was resized, preprocessed by intensity adjustment of each color channel, and converted to a grayscale image. Then, object depths were estimated according to Section 5.2. Figure 8 shows a typical example of a full depth map of sharp and blurred images on the left and right sides of the figure, respectively.

It should be noted that the size of document images was quite big. We used the original image size for training, and the time to obtain a depth image was around 7 s. After training, we did not need to use the original-size image. By reducing the image size (by a factor of 10, which means 10% of the original size) the computation for obtaining a depth image took around 1.4 s.

In the figure, the color map of Jet was used, where the dark blue and dark red indicate the nearest and furthest distance from the camera, respectively. A typical example of a

v_{1}

representation of depth data is shown in Figure 9, where the data related to the images in Figure 8 are used. A typical example of histogram calculation on

v_{1}

depth data is shown in Figure 10, where the data related to Figure 8 are used. Figure 10 shows that the related images consist mainly of two dominant object planes. As far as the image database consists of captured historical documents, our results from other images in the database also show that there are two dominant object planes in the images.

7.3. Verification of the Hypothesis

In the database, there are several pairs of sharp and blurred images for the same document. The feature vectors,

v_{F e a t u r e}

, of each such pair of images are estimated according to Section 5.2. Some examples of two types of blurring classes (blurring effects caused by unintentional defocusing and motion blur) can be found in Appendix A.

7.4. Performance of Classification

The dataset included 827 sharp images and 89 blurred ones including 47 with mixed blur, 22 with motion blur, and 15 with defocus blur. Three classifications were performed: (a) between sharp and mixed blur images, (b) between sharp and motion blur images, and (c) between sharp and defocused images, where the level of imbalance in the classifications was 5.38%, 2.59%, and 1.78%, respectively. A single class was selected to be the positive class in the classifications, while the remaining classes were combined to make up the negative class. All classifications were performed using a tenfold cross-validation. For each classification, the data set was split into ten partitions, nine of which were used to train the model, while the remaining partition was used to test the model. This process was repeated ten times so that each partition acted as test data once. In addition, ten independent runs of this procedure were performed to eliminate any biasing that may have occurred during the random partitioning process.

We use four quantities to evaluate the results: true positive rate (TP_rate) which is the number of true positives divided by the total number of positives in the dataset, true negative rate (TN_rate), which is the number of true negatives divided by the total number of negatives in dataset, false positive rate (FP_rate), which is the number of false positives divided by the total number of negatives in the dataset, and false negative rate (FN_rate), which is the number of false negatives divided by the total number of positives in dataset. Table 1 shows the classifications results. The classification accuracy was calculated as the mean of the true positive rate and true negative rate. According to Table 1, the classification accuracies between sharp and mixed blur, between sharp and motion blur, and between sharp and defocused blur were 0.9506, 0.9464, and 0.9638, respectively.

8. Discussion and Conclusions

The use of image databases for different applications in industrial, educational, and medical problems is more relevant than ever due to the ease of image capturing and storage. Enormous numbers of degraded images are captured unintentionally or by mistake and in the long run, there is a need to clean a large quantity of data from unwanted images.

In this paper, we studied the unintentional defocus or out-of-focus blur problem. The cause of blurring effects was expressed as a hypothesis based on geometrical optics. We showed that the unintentional blur caused a shifting and scaling of the DOF, which in turn resulted in a motion and defocused blur on the captured images. We proved the validity of the hypothesis by an independent method used to compare the sharp and blurred images.

In Appendix A, Figure A4 shows the result of the independent method and Figure A2 and Figure A3 show the hypothesis principles. In the independent method, we calculated the depth from a single image. We showed that an optimal range for a virtual DOF can be estimated by several virtually recaptures of the image. The recapturing of the image was achieved by using a Gaussian model for the lens. We argued that such recapturing process had strong effect on the amplitude of edges. Therefore, the ratio of the amplitude of edges for two virtual recaptured images, in the range of the virtual DOF, was argued to be a significant parameter in the estimation of the relative depth on the edges. We showed how a network of such edges could be propagated to the whole image to generate a depth image. The histogram of depth values in the depth image was used as a feature parameter for blur classification. A database of historical documentation was used for the verification of the hypothesis and blur classification. The use of documentation images was shown to be very useful in the simplification of a scene with its two major planes, the document and its background. The orientation of the two planes in the sharp and blurred images was easy to detect and compare, see Figure 10. In the classification of blur effects of the database, we faced a common problem in big data: the imbalance of classes. The numbers of blurred images as mixed blur, motion blur, and defocused blur were 47, 22, and 15 whereas the number of sharp images was 827 images. We used a probabilistic RUSBoost classification approach to solve the classification problem. The result of the blur classification was presented in Table 1. It should be noted that in the classification between sharp and mixed blur images, 44 of 47 of possible blurred images, between sharp and motion blur, 20 of 22 possible images, and between sharp and defocused, 14 of 15 possible images were classified. However, this indicates that the statistical results are sensitive to the number of blurred images, and there is a need to examine the methodology on more extensive databases which include a larger number of blurred images.

Author Contributions

Conceptualization, S.K.; Methodology, S.K. and W.W.; Software, W.W.; Validation, W.W.; Formal analysis, S.K. and S.M.E.; Investigation, S.K. and S.M.E.; Resources, S.K.; Writing—original draft, S.K.; Writing—review & editing, W.W. and S.M.E.; Supervision, S.K.; Project administration, S.K.; Funding acquisition, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

Part of this work was funded by the Knowledge Foundation (grant: 20140032) in Sweden for the research project of “Scalable resource-efficient systems for big data analytics”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data is unavailable due to privacy and ethical restrictions. The partner company (arkivdigital—www.arkivdigital.net, accessed on 23 January 2024) in the project provided the data.

Conflicts of Interest

The authors declare no conflicts of interests.

Appendix A

For a thin lens, the DOF is at a limited distance Z, which is determined by the aperture stop diameter A. When we have an acceptable circle of confusion c, the front and rare DOFs are z₁ and z₂, see Figure A1.

Figure A1. Geometry of imaging using a thin lens.

Using similar triangles yields

\frac{z_{1}}{\frac{c}{M}} = \frac{x - z_{1}}{A}, and \frac{z_{2}}{\frac{c}{M}} = \frac{x + z_{2}}{A},

(A1)

where

x

is the distance to the in-focus plane, and

M

is the lens magnification. Using the definition of numerical aperture,

N

, the aperture diameter is

A = \frac{f}{N}

. By approximating the magnification as

M \approx \frac{f}{x}

,

z_{1}

and

z_{2}

in Equation (A1) are

z_{1} = \frac{N c x^{2}}{f^{2} + N c x},

(A2)

z_{2} = \frac{N c x^{2}}{f^{2} - N c x} \cdot

(A3)

According to Equation (1),

c = \frac{|x - y|}{x} \frac{f}{y - f}

, where y is the focus distance. By replacing

N c

in Equations (A2) and (A3), we obtain

z_{1} = \frac{x |x - y|}{(y - f) + |x - y|},

(A4)

z_{2} = \frac{x |x - y|}{(y - f) - |x - y|},

(A5)

and subsequently the DOF is

Z = \frac{2 x |x - y|}{(y - f) - \frac{{(x - y)}^{2}}{y - f}} \cdot

(A6)

According to Equation (A6), when the distance to the in-focus plane,

x

, is the same as the focus distance,

y

, then the DOF,

Z

, is zero, and when

x

remains in a limited range, the DOF is limited; otherwise the DOF extends to infinity. The constraint to obtain a limited range of

x

is

(y - f) - \frac{{(x - y)}^{2}}{y - f} > 0,

(A7)

which gives

x > f a n d x < 2 y - f \cdot

(A8)

From similar triangles, we have

\frac{x - y}{y - f} = \frac{\frac{c}{M}}{A} \frac{y}{y - f},

(A9)

when the conjugate of the circle of confusion,

\frac{c}{M}

, is smaller than the lens aperture,

A

; then,

\frac{x - y}{y - f} < 1, a n d \frac{{(x - y)}^{2}}{y - f} ≪ 1 \cdot

(A10)

Then, from Equation (A6), we obtain

Z \approx \frac{2 x |x - y|}{(y - f)} \cdot

(A11)

According to Figure A2, by an unintentional change in the focus distance, a defocus blurring effect is developed, and as a result, when the distance to the in-focus plane remains at the same distance, the DOF is scaled by a factor.

According to Figure A3, by an unintentional change in the distance to the in-focus plane, the motion blurring effect is the result of varying the distance to the in-focus plane; consequently, it is the result of shifting the DOF.

The feature vector represents the relative estimated depth of the objects in the related image. Two dominant object planes are detected in each feature vector whose positions are related to the DOF. By comparison between the sharp and blurred feature vectors, the two types of blurring classes are detected as shown in Figure A4. According to expert description, our class I of blurring effects is caused by unintentional defocusing, that is, a mistake in having the right focus distance when capturing the image.

The blurring effect in this class has partially or entirely affected the image, i.e., the left or right side of the image is partially affected. Our class II is caused by motion blur according to the expert description. The comparison between sharp and blurred images verifies the hypothesis in Section 4. Thus, defocus and motion blurs are caused by a scaling and shifting of the DOF, respectively. It should be noted that the results are independent of the hypothesis from the methodology in Section 5.

Figure A2. Observation of DOF scaling when the focus distance, d_f, varies and the distance to the in-focus plane, d_i, remains the same.

Figure A3. Observation of DOF shifting when the distance to the in-focus plane, d_i, varies and the focus distance, d_f, remains the same.

Figure A4. Some examples of two types of blurring classes. The blue and red lines represent the data from the sharp and blurred image, respectively.

References

Tiwari, S.; Shukla, V.P.; Biradar, S.R.; Singh, A.K. Blur classification using ridgelet transform and feed forward neural network. Int. J. Image Graph. Signal Process 2014, 6, 47–53. [Google Scholar] [CrossRef]
Su, B.; Lu, S.; Tan, C.L. Blurred image region detection and classification. In Proceedings of the 19th ACM international conference on Multimedia, Scottsdale, AZ, USA, 28 November–1 December 2011; pp. 1397–1400. [Google Scholar]
Liu, R.; Li, Z.; Jia, J. Image partial blur detection and classification. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
Marziliano, P.; Dufaux, F.; Winkler, S.; Ebrahimi, T. A no-reference perceptual blur metric. In Proceedings of the International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002; pp. 57–60. [Google Scholar]
Chung, Y.C.; Wang, J.M.; Bailey, R.R.; Chen, S.W.; Chang, S.L. A non-parametric blur measure based on edge analysis for image processing applications. In Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore, 1–3 December 2004; pp. 356–360. [Google Scholar]
Zhao, Z.; Yang, H.; Luo, H. Defocus Blur detection via transformer encoder and edge guidance. Appl. Intell. 2022, 52, 14426–14439. [Google Scholar] [CrossRef]
Li, H.; Qian, W.; Cao, J.; Liu, P. Improving defocus blur detection via adaptive supervision prior-tokens. Image Vision Comput. 2023, 140, 104842. [Google Scholar] [CrossRef]
Almustofa, A.N.; Nugraha, Y.; Sulasikin, A.; Bhaswara, I.D.; Kanggrawan, J.I. Exploration of image blur detection methods on globally blur images. In Proceedings of the 2022 10th International Conference on Information and Communication Technology (ICoICT), Bandung, Indonesia, 2–3 August 2022; pp. 275–280. [Google Scholar]
Yu, X.; Xie, W. Single image blind deblurring based on salient edge-structures and elastic-net regularization. J. Math. Imaging Vision. 2020, 62, 1049–1061. [Google Scholar] [CrossRef]
Cao, S.; He, N.; Zhao, S.; Lu, K.; Zhou, X. Single image motion deblurring with reduced ringing effects using variational Bayesian estimation. Signal Process. 2018, 148, 260–271. [Google Scholar] [CrossRef]
Zeng, T.; Diao, C. Single Image Motion Deblurring Based on Modified DenseNet. In Proceedings of the 2nd International Conference on Machine Learning, Big Data and Business Intelligence, Taiyuan, China, 23–25 October 2020; pp. 521–525. [Google Scholar]
Tang, C.; Hou, C.; Song, Z. Defocus map estimation from a single image via spectrum contrast. Opt. Lett. 2013, 38, 1706–1708. [Google Scholar] [CrossRef] [PubMed]
Shao, W.Z.; Ge, Q.; Deng, H.S.; Wei, Z.H.; Li, H.B. A unified optimization perspective to single/multi-observation blur-kernel estimation with applications to camera-shake deblurring and nonparametric blind super-resolution. J. Math. Imaging Vision. 2016, 54, 216–239. [Google Scholar] [CrossRef]
Purohit, K.; Shah, A.B.; Rajagopalan, A.N. Learning based single image blur detection and segmentation. In Proceedings of the 25th IEEE International Conference on Image Processing, Athens, Greece, 7–10 October 2018; pp. 2202–2206. [Google Scholar]
Kovacs, L.; Sziranyi, T. Focus area extraction by blind deconvolution for defining regions of interest. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1080–1085. [Google Scholar] [CrossRef] [PubMed]
Rafiee, G.; Dlay, S.S.; Woo, W.L. Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches. Pattern Recognit. 2013, 46, 2685–2699. [Google Scholar] [CrossRef]
Datta, R.; Joshi, D.; Li, J.; Wang, J.Z. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 288–301. [Google Scholar]
Li, Y.; Shu, X.; Ren, D.; Li, Q.; Zuo, W. Joint learning of motion deblurring and defocus deblurring networks with a real-world dataset. Neurocomputing 2024, 565, 126996. [Google Scholar] [CrossRef]
Keshri, D.; Sriharsha, K.V.; Alphonse, P.J.A. Depth perception in single camera system using focus blur and aperture number. Multimed. Tools Appl. 2023, 3, 595. [Google Scholar] [CrossRef]
Nazir, S.; Vaquero, L.; Mucientes, M.; Brea, V.M.; Coltuc, D. Depth estimation and image restoration by deep learning from defocused images. IEEE Transactions on Computational Imaging. arXiv 2023, arXiv:2302.10730. [Google Scholar]
Kumar, H.; Yadav, A.S.; Gupta, S.; Venkatesh, K.S. Depth map estimation using defocus and motion cues. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 1365–1379. [Google Scholar] [CrossRef]
Anwar, S.; Hayder, Z.; Porikli, F. Deblur and deep depth from single defocus image. Mach. Vision. Appl. 2021, 32, 34. [Google Scholar] [CrossRef]
Hecht, E. Optics, 4th ed.; Addison Wesley: Boston, MA, USA, 2001. [Google Scholar]
Blanchet, G.; Moisan, L.; Rougé, B. Measuring the global phase coherence of an image. In Proceedings of the 15th IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; pp. 1176–1179. [Google Scholar]
Petschnigg, G.; Szeliski, R.; Agrawala, M.; Cohen, M.; Hoppe, H.; Toyama, K. Digital photography with flash and no-flash image pairs. ACM Trans. Graph. (TOG) 2004, 23, 664–672. [Google Scholar] [CrossRef]
Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 228–242. [Google Scholar] [CrossRef] [PubMed]
Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part. A Syst. Hum. 2009, 40, 185–197. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. Schapire R: Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, 3–6 July 1996. [Google Scholar]
Platt, J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 1999, 10, 61–74. [Google Scholar]

Figure 1. Scene projection using a thin lens model with object distance (a) equal to d_f, (b) less than d_f, (c) greater than d_f.

Figure 2. Relation between

D_{C o C}

and object distance.

Figure 2. Relation between

D_{C o C}

and object distance.

Figure 3. Relation between the CoC shown as C and the conjugate CoC shown as C/M, as well as distance regions in image space, around the CoC, and in object space, around the conjugate CoC.

Figure 4. The relation between depth of field, focus distance, and distance to the in-focus plane for the focal length of 200 mm. The green, red, and black lines are calculated for distances to the in-focus plane of 850, 1000, and 1150 mm.

Figure 5. Relations between depth of field and distance to the in-focus plane for certain focus distance values of 850, 1000, and 1150 mm.

Figure 6. Relations between depth of field and distance to the in-focus plane for different focal length values with a distance to the in-focus value of 1000.

Figure 7. A typical example of a sharp (left) image and blurred (right) image, in the used database.

Figure 8. A typical example of a full depth map of a sharp (left) image and blurred (right) image.

Figure 9. A typical example of a v₁ representation of depth data. The blue and red lines represent the data from the sharp and blurred image, respectively.

Figure 10. A typical example of histogram calculation on

v_{1}

depth data. The blue and red lines represent the data from the sharp and blurred image, respectively.

Figure 10. A typical example of histogram calculation on

v_{1}

depth data. The blue and red lines represent the data from the sharp and blurred image, respectively.

Table 1. Classification results between sharp and mixed blur, motion blur, and defocused blur images.

Classification between	Classifications Results
Classification between	TP_rate	TN_rate	FP_rate	FN_rate
Sharp and mixed blur	44/47 = 0.9362	798/827 = 0.9649	29/827 = 0.0351	3/47 = 0.0638
Sharp and motion blur	20/22 = 0.9091	839/852 = 0.9836	13/852 = 0.0153	2/22 = 0.0909
Sharp and defocused blur	14/15 = 0.9333	854/859 = 0.9942	5/859 = 0.0058	1/15 = 0.0667

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khatibi, S.; Wen, W.; Emam, S.M. Learning-Based Proof of the State-of-the-Art Geometric Hypothesis on Depth-of-Field Scaling and Shifting Influence on Image Sharpness. Appl. Sci. 2024, 14, 2748. https://doi.org/10.3390/app14072748

AMA Style

Khatibi S, Wen W, Emam SM. Learning-Based Proof of the State-of-the-Art Geometric Hypothesis on Depth-of-Field Scaling and Shifting Influence on Image Sharpness. Applied Sciences. 2024; 14(7):2748. https://doi.org/10.3390/app14072748

Chicago/Turabian Style

Khatibi, Siamak, Wei Wen, and Sayyed Mohammad Emam. 2024. "Learning-Based Proof of the State-of-the-Art Geometric Hypothesis on Depth-of-Field Scaling and Shifting Influence on Image Sharpness" Applied Sciences 14, no. 7: 2748. https://doi.org/10.3390/app14072748

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning-Based Proof of the State-of-the-Art Geometric Hypothesis on Depth-of-Field Scaling and Shifting Influence on Image Sharpness

Abstract

1. Introduction

2. Related Work

3. Relation between Blurring and DOF Based on Geometrical Optics

4. Hypothesis on the Cause of Blurring Effects

5. Object Depth Modelling

5.1. Estimating the Full Depth Map of Images

5.2. Object Depth Estimation from Images

6. Blur Classification

7. Experimental Results

7.1. Database of Images

7.2. Object Depth Detection from Images

7.3. Verification of the Hypothesis

7.4. Performance of Classification

8. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI