1. Introduction
The reproducing kernel Hilbert space (RKHS, denote as
) has been widely studied in many studies [
1,
2,
3,
4,
5,
6,
7]. Its most critical property is that the functions in RKHS can be linearly represented by reproducing kernel function. In addition, many studies have analyzed the properties of unitary or binary functions in RKHS. These functions usually can be regarded as signals or images in discrete form, so as to build optimization models and solve some application problems, such as image super-resolution and image restoration.
In general, the Hilbert space can be considered as
which
is a probability measure on the subset
,
is complete for a class of real valued functions
with
. Moreover, the reproducing kernel
of
can be defined as: (1) for any
, the function
blongs to
. (2) the function
has so-called reproducing property, that is
for all
, and
represents an associated inner product. By this relation, we could get the Gram matrix
by the discretization of reproducing kernel
for
, thus it is easy to get the following discrete formulation by considering the bias
:
where
is the coefficient we need to estimate, besides,
is a real symmetry matrix.
In the real world, the underlying function
generally will be polluted by outliers and Gaussian perturbation, which gets the following symmetry ill-posed problem:
where
can be viewed as outliers and
stands for Gaussian perturbation. Our final goal is to accurately estimate the coefficients
and
from the known
and
. After obtaining
and
, we could calculate the underlying function
by (
1). Note that, solving the problem (
2) is quite a chanllenging task, since the variables
,
,
and
in (
2) are all unknown, which leads to a ill-posed problem.
In [
8], Papageorfiou et al, considered that the function
g can be linearly represented by coefficient
and constant
c as
, and proposed a kernel regularized orthogonal maching pursuit (KROMP) method to solve the nonconvex problem. However, the KROMP method has two weakness: one is that the constant
c is not general and flexible; the other is that the convergence of its algorithm is not guaranteed theoretically. Therefore, this paper mainly establishes the nonconvex optimization model for the degraded problem (
2) in RKHS, and gives the designed algorithm whose convergence can be guaranteed, finally shows the effectiveness of the proposed method in some simulation experiments.
Regularized modeling is a promising way to deal with ill-posed problems. Actually, the variable
representing outliers is generally sparse, which motivates us to formulate a sparsity-based regularization model. Especially,
-quasi norm that counts the non-zero elements in a vector is an ideal metric to depict the sparse property. Therefore, the nonconvex minimization problem for solving the ill-posed problem (
2) can be simply shown as follows,
where
,
and
are positive parameters. The first term in (
3) is deduced from the Gaussian perturbation of
under the framework of maximum a posteriori (MAP) estimation. The second and third terms are two regularized terms to depict the underlying prior for
and
. The last term is a
-quasi norm to depict the sparse prior of outlier
. Note that the given regularization model (
3) is a nonconvex minimization problem due to the nonconvex property of
term. In general, the
term will be replaced approximately by some other convex terms (e.g.,
term or hard threshold [
9,
10,
11,
12]) for simpler computation and convergence guarantee. However, if taking this way, it will lose the accuracy of depicting sparsity, which may result in unreasonable outcomes.
However, the nonconvex minimization problems usually have the following difficulties which are encountered to solve: (1) Whether the designed algorithm can effectively solve the minimization model? (2) Whether the convergence analysis of the designed algorithm can be guaranteed? (3) Whether the initial value affects the convergence of the designed algorithm? Thus, many studies have been devote to conquer these weaknesses of nonconvex problems.
Recently, the nonconvex problem can be reformulated as an equivalent minimization problem based on the mathematical program with equilibrium constraints (MPEC) which can be effectively solved by the classical algorithms [
13,
14,
15]. For instance, Yuan et al. [
14] have proposed an equivalent biconvex MPEC formulation for
-quasi norm of the nonconvex minimization problem. Additionally, the proximal alternating based algorithm has widely been used to solve the nonconvex and nonsmooth problems [
16,
17,
18,
19,
20,
21,
22,
23,
24]. In [
18], Bolte et al. propose a proximal alternating linear minimization algorithm (PALM) framework to solve nonconvex and nonsmoothing minimization problem, and give the convergence analysis of the algorithm.
In this paper, we mainly focus on the above mentioned difficulties of nonconvex minimization problem (
3) to design an efficient algorithm with convergence guarantee theoretically. A simple and representative example is employed to verify the effectiveness of the proposed method. Besides, the contributions of this work can be summarized as: (1) New nonconvex minimization modeling based on RKHS; (2) Convergence guarantee of the designed algorithm for the nonconvex problem.
The outline of this paper is as follows.
Section 2 shows the detailed algorithm for the nonconvex minimization problem (
3). In
Section 3, the convergence analysis of the given algorithm is given. Numerical results are reported in
Section 4. Finally, conclusions are drawn in
Section 5.
2. The Solution for the Nonconvex Minimization Problem
Based on the MPEC lemma of
-quasi norm (see more details from [
14]), the nonconvex minimization problem (
3) can be equivalently reformulated the following model:
where ⊙ represents point-wise multiplication, and
is indicator function projecting the elements of
into
. The constrained minimization problem (
4) can be rewritten as the following unconstrained minimization problem by the penalty strategy:
where
,
,
,
,
.
We utilize the proximal-based algorithm to effectively deal with the unconstrained problem (
5) by alternatingly solving each variable, which leads to the following subproblems:
where the related parameters are all nonnegative, i.e.,
, and
,
, here
is identity matrix, and
represents Frobenius norm,
,
.
In particular, above subproblems are all convex functions whose closed-form solutions can be easily calculated as follows:
We iteratively and alternatingly update
,
,
and
according to (
10)–(
13). The final algorithm for the nonconvex minimization problem (
3) is summarized in Algorithm 1.
In Algorithm 1, “
” means the maximum iterations and the “
” represents the relative error between adjacent iterations. When the iteration stops, the final underlying function can be estimated by the relation of (
1).
Algorithm 1: The algorithm to minimize the problem (5) |
Input: blurred matrix , positive parameters , , , and , . Initialize: start with any , , . While or (1) Compute solved by Equation ( 10). (2) Compute by Equation ( 11). (3) Compute by Equation ( 12). (4) Compute by Equation ( 13). (5) Update penalty parameter by . (6) Calculate the relative error , where , and . (7) . Endwhile Output:, and |
3. Convergence Analysis
For the sake of notational simplicity, we uniform expression as: (1) Frobenius norm if ∗ is a matrix; (2) -norm if * is a vector. Denote that and its domain field is .
Lemma 1. Let the bounded sequenceis generated by designed algorithm. Then the sequencesufficiently decreases as follows:where. Note that,, andis Lipschitz constant offor, andis that offor. Proof. Since
is Lipschitz differential for variables
and
, respectively. There exist positive constants
and
which satisfy:
-subproblem: Based on the designed algorithm,
is the minimum solution of
-subproblem in
-th iteration, then we have:
Combine (
15) and (
17), we have:
-subproblem: Based on the designed algorithm,
is the minimum solution of
-subproblem in
-th iteration, then we have:
Combine (
16) and (
19), we have:
-subproblem: Based on the designed algorithm,
is the minimum solution of
-subproblem in
-th iteration, then we have:
-subproblem: Based on the designed algorithm,
is the minimum solution of
-subproblem in
-th iteration, then we have:
According to (
18), (
20)–(
22), we have:
where
. Thus, the consequence of Lemma 1 obviously holds. □
Lemma 2. Let the bounded sequencebe generated by the designed algorithm. Then:where, and, which the contantis bounded-value of the sequenceand ∂ is subdifferential operator. Proof. Obviously,
satisfies the first-order optimal condition of
-subproblem because of
is the
k-th solution of the
-subproblem. Similar to the other variables, then we have:
Since the object function
is continuous and differentiable for each variable, we have:
As the sequence
is bounded, there exists
such that
holds. Then, combine (
25) and (
26), we have:
where
. Thus, Equation (
24) holds. □
Lemma 3. Let the bounded sequencebe generated by designed algorithm and the initial variablebe bounded. Then the sequenceis bounded.
Proof. Obviously, the continuous function is proper and coercive, since there exists if and only if . Thus, the sequence is bounded because the sequence generated by the designed algorithm is bounded. □
Lemma 4. The functionin Equation (
3)
is a Kurdyka-ojasiewicz (K) function (The definition of a K function and some examples can be found in [
18]).
Proof. According to the definition and some examples for the KŁ function in [
18,
25],
are polynomial functions which are obviously real analytic functions. Thus,
is a KŁ function. □
Theorem 1. Let the sequencebe bounded which is generated by the designed algorithm. Then the sequenceconverges to a critical point.
Proof. Since the sequence is bounded, so it must exist a subsequence which converges to a critical point satisfying . Since the function is continuous, then the sequence is converged to , i.e., .
According to Lemmas 1–3, it means that the sequence
is also converged. Thus, the sequence
and the subsequence
will converge to the same function value as follows:
If there is an index such that . It is obviously that the sequence up to a stationary point and the corresponding function value does also not change based on Lemma 1, i.e., . Thus, it is the critical point of and the conclusion of Theorem 1 is obviously established.
Next part will prove that Theorem 1 still holds when the index is nonexistent.
Based on Lemma 1, it implies that for any
, we have:
From (
28), it implies that for any
, there exists
, so we have the following inequation when
,
Denote a set of limit points of the sequence
as
, and let
express the minimum distance between one point
and a set
, i.e.,
Based on Lemma 2,
is a continuous and differential function, and we have
since
as
. It implies that for any
, there exists a positive index
, when
, then:
Set
, and for each
, there exists:
From the Lemma 4,
is a KŁ function in domain
. Thus, there is a concave function
such that:
Moreover, based on the concavity of
, it has:
Denote
for all nonnegative integers
p, and
. Combine Equation (
33), Lemmas 1 and 2, we have:
which can be rewritten as follows:
It is a well-known inequality
for any
, thus, we have:
Taking summation to (
36) for
, it has the following inequation:
that is:
According to the definition of
, we have
. Thus,
Thus, is a Cauchy sequence, and has a finite length. Because of the completeness of Hilbert space, the Cauchy sequence is also certainly a convergence sequence. Thus, the sequence generated by designed algorithm converges to a critical point . Moreover, the convergence of the sequence generated by the designed algorithm can be guaranteed for any initial value. □
4. Numerical Result
In this section, we conduct some simple simulation examples to show the effectiveness of the proposed method. We choose
f as the ground-truth function (discrete form as
) and add Gaussian noise and sparse outliers to generate the observation. The proposed method is compared with the kernel-based regression using orthogonal matching pursuit (KROMP) method [
8], and the parameters of KROMP method are selected according to the range mentioned in the literature. The parameters of the proposed method in the experiment set empirically to:
,
,
,
,
,
, and it should be noted that better visual and numerical results can be obtained by fine-tuning the parameters more carefully.
The related error (ReErr) for the quantitative evaluation, which is a commonly index to measure the effect of restoration, and it is defined as:
where
is the restoration result which is estimated by different methods. Experiments are implemented in MATLAB (R2016a) on a desktop with 16Gb RAM and Inter(R) Core(TM) CPU i5-4590: @3.30GHz.
Example 1. The binary continuous function f is a given as follows:where,
(two dimensions, respectively, take 21 discrete points),. After discretization of f, 20 dB Gaussian noise and 10% outlier noise were added to obtain the final degraded data. In order to show the experimenta results of different examples, we will show the ground-truth data, the degraded data polluted by noise and outliers, and the restored outcome calculated by the proposed method and the KROMP method, respectively.
Example 2. The binary continuous function f is a given as follows:where,(two dimensions, respectively, take 31 discrete points),. After discretization of f which is a binary continuous function, 10 dB Gaussian noise and 10% outlier noise were added to obtain the final degraded data. From
Figure 1 and
Figure 2, although the shape of ground-truth in Example 2 is similar to that of Example 1 (see
Figure 1a and
Figure 2a), in fact, the degree of degradation of Example 2 is much greater than that of Example 1, which the function value and the degree of noise pollution are different (see
Figure 1b and
Figure 2b). The proposed method has obvious restorated outcomes, and can effectively recover original data (see
Figure 1c, however the restorated outcomes of the KROMP method still have obvious noise residual in
Figure 1d and
Figure 2d. It also shows the effectiveness of the proposed method.
Example 3. The binary continuous function f is a given as followswhere,(two dimensions, respectively, take 21 discrete points). After discretization of f which is a binary continuous function, 10 dB Gaussian noise and 5% outlier noise were added to obtain the final degraded data. It can be seen that even in the case of extremely large external pollution, such as Example 3 in
Figure 3b, the proposed method can obtain more accurate recovery data (
Figure 3c) than the KROMP method (
Figure 3d). In addition, the relative error (ReErr) results from Example 1 to Example 3 are shown in
Table 1, and the better results have been bolded. It is obvious that the proposed method has smaller relative errors compared with the KROMP method, which verifies the effectiveness of the proposed method.