Next Article in Journal
A Novel Hierarchical Secret Image Sharing Scheme with Multi-Group Joint Management
Next Article in Special Issue
Strong Convergence of Modified Inertial Mann Algorithms for Nonexpansive Mappings
Previous Article in Journal
Synchronization of Butterfly Fractional Order Chaotic System
Previous Article in Special Issue
On Null-Continuity of Monotone Measures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Forward–Backward Algorithm and Its Optimization Application

1
School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
2
Department of Mathematics, Hangzhou Normal University, Hangzhou 31121, China
3
Research Center for Interneural Computing, China Medical University Hospital, Taichung 40447, Taiwan
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(3), 447; https://doi.org/10.3390/math8030447
Submission received: 25 February 2020 / Revised: 17 March 2020 / Accepted: 18 March 2020 / Published: 19 March 2020
(This article belongs to the Special Issue Nonlinear Analysis and Optimization)

Abstract

:
In this paper, we study a hybrid forward–backward algorithm for sparse reconstruction. Our algorithm involves descent, splitting and inertial ideas. Under suitable conditions on the algorithm parameters, we establish a strong convergence solution theorem in the framework of Hilbert spaces. Numerical experiments are also provided to illustrate the application in the field of signal processing.

1. Introduction

Let H be a real Hilbert space. · , · denotes the associated scalar product and · stands for the induced norm. Recall that a set-valued operator G : H 2 H is said to be maximally monotone iff x x , y y 0 , y G ( x ) , y G ( x ) , and its graph, denoted by gph G : = { ( x , y ) H × H | y G ( x ) } , is not properly contained in the graph of any other monotone operator on H.
A fundamental and classical problem is to find a zero of a maximally monotone operator G in a real Hilbert space H, namely
find x H such that 0 G ( x ) .
This problem includes, as special cases, bifunction equilibrium problems, convex-concave saddle-point problems, minimization problems, non-smooth variational inequalities, etc. Due to its diverse applications in economics, medicine, and engineering, the techniques and methods for solving (1) have received much attention in the optimization community, see [1,2,3,4]. Indeed, the interdisciplinary nature of Problem (1) is evident from the viewpoint of algorithmic developments; see, for instance, [5,6,7,8] and the references therein. Among them, a celebrated method for solving (1) is the following proximal point algorithm. It can be traced back to the early results obtained in [6,7,8,9]. Given the current iterate p n , calculate the next iterate p n + 1 via
p n + 1 = ( I d + γ n G ) 1 p n , n N ,
where I d stands for the identity operator and γ n is a positive real number. The operator ( I d + γ n G ) 1 is the so-called resolvent operator, which was introduced by Moreau in [8]. In the context of algorithms, the resolvent operator is often referred as the backward operator. In [7], Rockefeller studied the operator and proved that the sequence generated by algorithm (2) is weakly convergent via the inexact evaluation of the resolvent operator in the framework of infinite-dimensional real Hilbert spaces. However, the evaluation error has to satisfy a summability restriction. Indeed, this restriction essentially implies that the resolvent operator is computed with the increasing accuracy. In fact, due to the computation of the resolvent operator is often hard to control in practice situation, this is still somewhat limiting. Quite often, computing the resolvent of a maximally monotone operator, which is an inverse problem, is not easy in many cases. Thus, this limits the practicability of the proximal point algorithm in its plain form. Aiming at this, a favorable situation occurs when the operator G can be written as the sum of two operators. Precisely, let us define G = A + B such that the resolvent ( I d + α A ) 1 (implicit, backward step), and the evaluation of B (explicit, forward step) are much easier to compute than the full resolvent ( I d + α G ) 1 . In so doing, we only consider the following inclusion problem: find a zero of an additively structured operator A + B , acting on space H, namely
find x H such that 0 A x + B x ,
where B is a smooth operator for which we can use direct evaluations, and A is a proxfriendly operator for which we can compute the resolvent. For solving (3), Lions and Mercier [10] proposed the forward–backward splitting method. It is based on the recursive application of a forward step with respect to B, followed by a backward step with respect to A. For any initial data p 0 H ,
p n + 1 = ( I d + γ n A ) 1 ( p n γ n B p n ) , n N ,
where I d stands for the identity operator and γ n > 0 . Basically, the operator A : H 2 H is maximally monotone, and the operator B is κ -cocoercive (i.e., κ > 0 such that B x B y , x y ρ B x B y 2 , x , y H ) and ι -Lipschitz continuous (i.e., ι > 0 such that B x B y ι x y , x , y H ). It was proven that the generated sequence ( p n ) n 0 converges weakly to a solution of (3). Of course, the problem decomposition is not the only consideration, the convergence rate is another. Accelerating first-order method is a subject of active research, which has been extensively studied. Since Polyak [6] introduced the so-called heavy ball method for minimizing a smooth convex function, much has been done on the development of first-order accelerated methods; see [11,12,13,14,15]. The inertial nature of the first-order accelerated method can be exploited in numerical computations to accelerate the trajectories and speed up the rate of convergence. In the context of algorithms, it is often referred as the inertial extrapolation, which involves two iterative steps and the second iterative step is defined based on the previous two iterates. In [16], Alvarez and Attouch employed the first-order accelerated method to study an initial proximal point algorithm for solving the problem of finding zero of a maximally monotone operator. This iteration can be written as the following form: for any initial data p 0 , p 1 H ,
p n + 1 = ( I d + λ n G ) 1 ( p n + α n ( p n p n 1 ) ) ,
where I d stands for the identity operator, G is a maximally monotone operator, the parameters ( λ n ) n 0 ( 0 , ) , ( α n ) n 0 [ 0 , 1 ] , n = 1 α n p n p n 1 2 < . It was showed that the iterative sequence ( p n ) n 0 generated by (4) converges to a solution of inclusion Problem (1) weakly. An alternative modification to the inertial forward–backward splitting algorithm or its modifications is the following algorithm proposed by Lorenz and Pock [17]. The algorithm weakly converges to a zero of the sum of two maximally monotone operators, with one of two operators being Lipschitz continuous and single valued. Starting with any data p 0 , p 1 H , we define a sequence ( p k ) k 0 as
w n = p n + α n ( p n p n 1 ) , p n + 1 = ( M + γ n A ) 1 ( M γ n B ) w n ,
where M is a positive definite and linear self-adjoint operator that can be used as a preconditioner for the scheme, γ n is a step-size parameter and α n [ 0 , 1 ) is an extrapolation factor. The algorithm keeps the weak convergence property of the iterates. To motivate the so-called the inertial forward–backward splitting algorithm, we consider the following differential equation
p ¨ ( t ) + ι ( t ) p ˙ ( t ) + M A , B , γ ( p ( t ) ) = 0 ,
where A : H H and B : H H are operators, M A , B , γ : H H is the operator defined by
M A , B , γ ( p ) = 1 γ ( I d + γ A ) 1 ( p γ B ( p ) ) ) .
It comes naturally by discretizing the continuous dynamics (5) explicitly with a time step h n > 0 . Set t n = i = 1 n h i , ι n = ι ( t n ) , γ n = γ ( t n ) and p n = p ( t n ) . An explicit finite-difference scheme for (5) with centered second-order variation gives that
1 h n 2 ( p n + 1 2 p n + p n 1 ) + ι n h n ( p n p n 1 ) + M A , B , γ n ( w n ) = 0 ,
where w n is a point in the line passing through p n 1 and p n (we have some flexibility for the choice of w n ). The above equality (6) can be rewritten as
p n + 1 = 1 h n 2 γ n ( p n + ( 1 ι n h n ) ( p n p n 1 ) ) + h n 2 γ n ( I + γ n A ) 1 ( y n γ n B ( w n ) ) .
Set γ n = h n 2 and α n = 1 ι n h n in (7). With the aid of the classical Nesterov extrapolation choice for w n , continuous dynamics (5) leads to a special case of the algorithm as
w n = p n + α n ( p n p n 1 ) , p n + 1 = ( I + γ n A ) 1 ( w n γ n B ( w n ) ) ,
where γ n is a step-size parameter and α n is an extrapolation factor, the extrapolation term α n ( p n p n 1 ) is intended to speed up the rate of convergence. In so doing, this dynamical approach leads to a special case of the forward–backward algorithm of inertial type.
Consider the following monotone variational inequality problem (VIP, in short), which consists of finding z Ω such that
S ( z ) , x z 0 , x Ω ,
where S : Ω H is a monotone operator. We denote the solution set of the variational inequality problem by V I ( Ω , S ) .
Remark 1.
It is known that x solves the V I ( Ω , S ) iff x is an equilibrium point of the following dynamical system, i.e.,
x = P r o j Ω ( x μ S x ) , μ > 0 ,
where P r o j Ω is the nearest point (or metric) projection from H onto Ω.
Variational inequality problems, which serve as a powerful mathematical model, unify several important concepts in applied mathematics, such as systems of nonlinear equations, complementarity problems, and equilibrium problems under a general and unified framework [18,19,20]. Recently, spotlight has been shed on developing efficient and implementable iterative schemes for solving monotone variational inequality problems, see [21,22,23,24]. A significant body of the work on iteration methods for VIPs has accumulated in the literature recently. In 2001, Yamada [25] investigated a so-called hybrid steepest descent method. Given the current iterate p n , calculate the next iterate p n + 1 via
p n + 1 = ( I α χ n S ) T p n , n 0 ,
where the operator T : Ω H is nonexpansive, the operator S is ι -Lipschitz continuous for some ι 0 and κ -strongly monotone (i.e., κ > 0 such that S x S y , x y ρ x y 2 , x , y H ), while the parameters α ( 0 , 2 κ / ι 2 ) , ( χ n ) n 0 ( 0 , 1 ] , lim n χ n = 0 and n = 1 χ n = . In this paper, the set Ω can be regarded as the solution set of the inclusion problem. In continuous case, Attouch and Mainǵe [26] developed a second-order autonomous system with a Tikhonov-like regularizing term F , which reads
p ¨ ( t ) + α p ˙ ( t ) + ϕ ( p ( t ) ) + B ( p ( t ) ) + β ( t ) F ( p ( t ) ) = 0 , p ( 0 ) = p 0 , p ˙ ( 0 ) = q 0 ,
where α > 0 and ( p 0 , q 0 ) H 2 is an arbitrarily given initial data. With the assumptions: (i) F : H R is a convex, differentiable operator with F strongly monotone and Lipschitz continuous; (ii) ϕ : H R is a convex, differentiable operator with ϕ : H H Lipschitz continuous; (iii) B : H H is a maximally monotone and cocoercive operator; (iv) β : [ 0 , + ) ( 0 , + ) is a positive and decreasing function of class C 1 such that 0 + β ( t ) d t = + and β ( t ) 0 as t + , with β ˙ Lipschitz continuous and bounded, they proved that each trajectory of (10) strongly converges to p * as t , which solves the following variational inequality problem:
find p * ( B + ϕ ) 1 ( 0 ) such that F ( p * ) , q p * 0 , q ( B + ϕ ) 1 ( 0 ) .
In the spirit of the splitting forward–backward method and the hybrid steepest descent method, we present an iterative scheme as a new strategy, parallel to that of the autonomous system (10). We analyze the convergence with the underlying operator B cocoercive and the extension from condition (ii) to the general maximally monotone case, which is considered in Section 3. From this perspective, our study is the natural extension of the convergence results obtained by Attouch and Mainǵe [26] in the case of continuous dynamical systems.

2. Preliminaries

Lemma 1
([25]). Suppose that W : H R is Fr e ´ chet differentiable with W : H H being κ-strongly monotone and ι-Lipschitz continuous. Define S : = I d χ α W , where χ [ 0 , 1 ] and α ( 0 , 2 κ / ι 2 ) . Then S ( x ) S ( y ) ( 1 χ ϑ ) x y ( x , y H ) , where ϑ : = 1 1 α ( 2 κ α ι 2 ) ( 0 , 1 ] .
Lemma 2
([27]). If B : H H is a ρ-cocoercive operator, then
(i) 
B is a 1 ρ -Lipschitz continuous and monotone operator;
(ii) 
if ν is any constant in ( 0 , 2 ρ ] , then I d ν B is nonexpansive, where I d stands for the identity operator on H.
Lemma 3.
Let H be a Hilbert space, A : H 2 H be a maximally monotone operator and B : E E be a κ-cocoercive on H. Then
(i) 
( I d + s A ) 1 ( I d s B ) x x 2 ( I d + t A ) 1 ( I d t B ) x x , x H ,
where t and s are any positive real numbers with t s .
(ii) 
( A + B ) 1 ( 0 ) = F i x ( ( I d + λ A ) 1 ( I d λ B ) ) , λ > 0 .
Proof. 
Since A is maximally monotone, one has
( I d + s A ) 1 ( I d s B ) x ( I d + t A ) 1 ( I d t B ) x , x ( I d + s A ) 1 ( I d s B ) x s x ( I d + t A ) 1 ( I d t B ) x t 0 .
 □
Lemma 4
([28]). Let S be a nonexpansive mapping defined on a closed convex subset C of a Hilbert space H. Then I d S is demi-closed, i.e., whenever ( x n ) n 0 is a sequence in C weakly converging to some x C and the sequence ( ( I d S ) x n ) n 0 strongly converges to some y H , it follows that y = ( I d S ) x .
Lemma 5
([29]). Let ( μ n ) n 0 be a sequence of nonnegative real numbers such that
μ n + 1 ( 1 λ n ) μ n + λ n ν n + γ n ,
where ( λ n ) n 0 ( 0 , 1 ) and ( ν n ) n 0 , ( γ n ) n 0 satisfy the following conditions (i) lim n λ n = 0 , n = 0 λ n = ; (ii) lim sup n ν n 0 ; (iii) γ n 0 , n = 0 γ n < . Then lim n u n = 0 .
Lemma 6
([30]). Let ( ζ n ) n 0 be a sequence of nonnegative real numbers such that there exists a subsequence ( ζ n i ) of ( ζ n ) n 0 such that ζ n i + 1 > ζ n i for all i N . Then there exists a nondecreasing sequence ( m j ) of N such that lim j m j = and the following properties are satisfied by all (sufficiently large) number of j N , ζ m j + 1 ζ m j , ζ m j + 1 ζ k . In fact, m j is the largest number n in the set { 1 , 2 , , j } such that ζ n + 1 ζ n .

3. Main Results

Throughout this section, we make the following standing assumptions
Assumption 1.
(a)  I d stands for the identity operator. The operator S : H H is supposed to be κ-strongly monotone and ι-Lipschitz continuous for some κ , ι > 0 ;
(b) Let A : H 2 H be a maximally monotone operator, and B : H H be a ρ-cocoercive operator for some ρ > 0 , with the solution set Ω = ( A + B ) 1 ( 0 ) .
Our Algorithm 1 is formally designed as follows.
Algorithm 1: The hybrid forward–backward algorithm
Mathematics 08 00447 i001
We make the following assumption with respect to the algorithm parameters.
Assumption 2.
( C 1 ) 0 < lim inf n ν n lim sup n ν n < 2 ρ ; (C 2 ) ( η n ) n 0 ( 0 , 1 ) , lim n η n = 0 , η n + 1 = O ( η n ) , n = 0 η n = ; (C 3 ) lim n μ n η n p n p n 1 = 0 ; (C 4 ) δ ( 0 , 2 κ / ι 2 ) .
Remark 2.
Please note that the condition (C 3 ) of Assumption 2 can be easily implemented since the value of p n p n 1 is known before choosing μ n . Indeed, the parameter μ n can be chosen such that
μ n = ω , i f p n = p n 1 ; μ n = τ n p n p n 1 , i f   o t h e r w i s e ,
where 0 ω and ( τ n ) n 0 is a positive sequence such that τ n = ( η n ) .
Now we are in a position to state and prove the main result of this section.
Theorem 1.
Suppose that Assumptions 1, 2 hold. Then for any initial data p 0 , p 1 H , the weak sequential cluster point of sequences ( p n ) n 0 , ( w n ) n 0 and ( q n ) n 0 generated by Algorithm 1 belongs to the solution set of Problem 3. In addition, the three sequences converge strongly to the unique element of V I ( Ω , S ) .
Proof. 
First, we show that I δ R is a contraction operator. According to Lemma 1, we have
( I d δ S ) x ( I d δ S ) y 2 = x y 2 + δ 2 S x S y 2 2 δ x y , S x S y x y 2 + δ 2 ι 2 x y 2 2 δ κ x y 2 = ( 1 α ) 2 x y 2 ,
where α = 1 2 δ ( 2 κ δ ι 2 ) . Thus, we find that I d δ S is a contraction operator with the constant 1 α . In light of the nonexpansivity of P r o j Ω , we further obtain that P r o j Ω ( I d δ S ) is a contraction operator. From the Banach contraction principle, there exists a unique point a Λ such that a = P r o j Ω ( I d δ S ) a .
Now, it remains to prove that ( p n ) n 0 is bounded. Since B : H H is ρ -cocoercive and ( I d + ν n A ) 1 is firmly nonexpansive, one concludes from Lemma 2 that I d ν n B is nonexpansive. Let z Λ be arbitrarily chosen. Invoking Assumption 2 (C 1 ), one infers that
q n z 2 =   ( I d + ν n A ) 1 ( I d ν n B ) w n ( I d + ν n A ) 1 ( I d ν n B ) z 2   w n z + ν n B z ν n B w n 2   w n z 2 ν n ( 2 ρ ν n ) B z B w n 2   w n z 2 ,
which yields
ν n ( 2 ρ ν n ) B z B w n 2 w n z 2 q n z 2 ,
and
q n z 2 w n z 2 .
From the definition of w n , one reaches
w n z p n z + μ n p n p n 1 .
Invoking Assumption 2 (C 3 ), there exists a positive constant M 1 R such that
μ n η n p n p n 1 M 1 , n 0 .
From the definition of p n , which together with (14)–(16), one obtains
p n + 1 z =   ( I d δ η n S ) q n ( I d δ η n S ) z δ η n S z   ( 1 ξ η n ) w n z + δ η n S z   ( 1 ξ η n ) ( p n z + η n M 1 ) + δ η n S z   max p n z , 1 ξ ( M 1 + δ S z )   max p 1 z , 1 ξ ( M 1 + δ S z ) ,
where ξ = 1 1 δ ( 2 κ δ ι 2 ) ( 0 , 1 ] , due to Assumption 2 ( C 4 ) . This implies that sequence ( p n ) n 0 is bounded. At the same time, by putting together (14) and (15), one concludes that ( q n ) n 0 and ( w n ) n 0 are bounded. Once again, using the definition of w n , one concludes that
w n z 2   p n z 2 + μ n p n p n 1 ( μ n p n p n 1 + 2 p n z )   p n z 2 + μ n η n p n p n 1 M 2 ,
where
M 2 = sup n 0 η n ( μ n p n p n 1 + 2 p n z ) < .
By combining (12) with (18), one immediately concludes that
q n z 2   p n z 2 + μ n η n p n p n 1 M 2 ν n ( 2 κ ν n ) B z B w n 2 .
Invoking (20), which together with the definition of p n , one deduces that
p n + 1 z 2 = ( I d ξ η n ) q n z 2 2 δ η n p n + 1 z , S z p n z 2 + μ n η n p n p n 1 M 2 ν n ( 2 κ ν n ) B z B w n 2 + η n M 3 ,
where M 3 = sup n 0 2 δ z p n + 1 , S z < , due to the boundedness of ( p n ) n 0 . Let us rewrite (21) as
ν n ( 2 κ ν n ) B z B w n 2 p n z 2 p n + 1 z 2 + μ n η n p n p n 1 M 2 + η n M 3 .
By using the firmly nonexpansive property of ( I d + ν n A ) 1 , one arrives at
q n z 2 q n z , ( I d ν n B ) w n ( I d ν n B ) z = 1 2 q n z 2 + 1 2 ( I d ν n B ) w n ( I d ν n B ) z 2 1 2 ( q n z ) ( I d ν n B ) w n + ( I d ν n B ) z 2 1 2 q n z 2 + 1 2 w n z 2 1 2 q n w n 2 ν n 2 2 B w n B z 2 + ν n q n w n B w n B z ,
which can be equivalently rewritten as
q n z 2 w n z 2 q n w n 2 ν n 2 B w n B z 2 + 2 ν n q n w n B w n B z .
Returning to (14), (18) and (23), one concludes
p n + 1 z 2 = η n ( I d δ S ) q n + ( 1 η n ) q n z 2 = η n ( I d δ S ) q n z 2 + ( 1 η n ) q n z 2 η n ( 1 η n ) δ S q n 2 η n ( I d δ S ) q n z 2 + q n z 2 η n ( I d δ S ) q n z 2 + p n z 2 + μ n η n p n p n 1 M 2 q n w n 2 ν n 2 2 B w n B z 2 + 2 ν n q n w n B w n B z ,
that is,
q n w n 2   η n ( I d δ S ) q n z 2 + μ n η n p n p n 1 M 2 + 2 ν n q n w n B w n B z + p n z 2 p n + 1 z 2 .
Next, we show that ( p n z 2 ) n 0 converges to zero by considering two possible cases on the sequence ( p n z 2 ) n 0 .
Case 1. There exists N N such that p n + 1 z 2 p n z 2 , n N . Recalling that p n z 2 is lower bounded, one deduces that lim n p n z 2 exists. Using Assumption 2 (C 2 ), and letting n tend to infinity in (22), one finds that
lim n ν n B z B w n = 0 .
Taking account of Assumption 2 (C 2 ), (C 3 ), (25), and letting n tend to infinity in (24), one concludes that
lim n q n w n 2 = 0 .
It follows from Assumption 2 (C 2 ), (C 3 ) the definitions of w n and p n + 1 that
lim n w n p n = lim n μ n p n p n 1 = 0 ,
and
lim n p n + 1 q n = lim n δ η n R q n = 0 .
Resorting to (26)–(28), one finds that
lim n p n + 1 p n lim n ( p n + 1 q n + q n w n + w n p n ) = 0 .
Denote G ν n = ( I d + ν n A ) 1 ( I d ν n B ) . It is then immediately that
lim n G ν n w n w n = lim n q n w n = 0 .
For any γ 0 such that γ ν n , n 0 , it implies from Lemma 3 that
lim n G γ w n w n 2 lim n G ν n w n w n = 0 .
Since ( p n ) n 0 is a bounded sequence, there exists a subsequence ( p n i ) n 0 of ( p n ) n 0 and a weak sequential cluster point s H such that p n i s as i . Due to (26) and (27), one finds that w n i s and q n i s as i . Thus,
lim sup n S z , z w n = lim i S z , z w n i .
From the demiclosedness of I G γ and Lemma 4, one obtains that s F i x ( G γ ) . In view of F i x ( G γ ) = ( A + B ) 1 ( 0 ) , one concludes s Λ . From the fact that a = P r o j Λ ( I δ S ) a , one sees a ( I δ S ) a , s a 0 . As a straightforward consequence, one finds
S a , a s 0 .
Coming back to (29), (32) and (33), one has
lim sup n S a , a p n + 1 = lim sup n S a , a p n = lim i S a , a p n i = S a , a s 0 .
Returning to (11) and owing to the definitions of p n and q n , one finds that
p n + 1 a 2 = η n ( I d δ S ) q n + ( 1 η n ) q n a 2 ( 1 η n ) 2 q n a 2 + 2 η n ( I d δ S ) q n a , p n + 1 a = ( 1 η n ) 2 q n a 2 + 2 η n ( I d δ S ) q n ( I d δ S ) a + ( I d δ S ) a a , p n + 1 a ( 1 η n ) 2 q n a 2 + 2 η n ( 1 α ) q n a p n + 1 a + 2 δ η n S a , a p n + 1 ( 1 η n ) 2 ( p n a + μ n p n p n 1 ) 2 + 2 η n ( 1 α ) ( p n a 2 + μ n p n p n 1 p n a ) + 2 δ η n S a , a p n + 1 ( 1 2 η n α ) ( p n a 2 + 2 μ n p n p n 1 p n a ) + μ n 2 p n p n 1 2 + 2 δ η n S a , a p n + 1 ,
where α = 1 2 δ ( 2 κ δ ι 2 ) ( 0 , 1 ) . Let
Γ n = p n a 2 + 2 μ n p n p n 1 p n a
and
M 4 = sup n 0 η n + 1 η n α p n + 1 a .
Furthermore, due to Assumption 2 (C 2 ) and the boundedness of ( p n ) n 0 , we find that M 4 < . In so doing, it asserts that
Γ n + 1 ( 1 2 η n α ) Γ n + 2 η n α δ α S a , p n + 1 a + μ n + 1 η n + 1 p n + 1 p n M 4 + η n 2 α μ n η n p n p n 1 2 .
Invoking (32), we infer that
lim sup n δ α S a , p n + 1 a + μ n + 1 η n + 1 p n + 1 p n M 4 + η n 2 α μ n η n p n p n 1 2 0 .
According to Assumption 2 (C 2 ), one finds that 2 η n α ( 0 , 1 ) and lim n 2 η n α = 0 . So doing, Lemma 5 asserts that lim n Γ n = 0 , which further implies that lim n p n a = 0 . Coming back to (26) and (27), one concludes that ( p n ) n 0 , ( q n ) n 0 , ( w n ) n 0 converge strongly to a.
Case 2. There exists a subsequence ( p n j z 2 ) j 0 of ( p n z 2 ) n 0 such that
p n j z 2 < p n j + 1 z 2 , j N .
In this case, it is obvious that there exists a nondecreasing sequence ( m k ) k 0 N such that m k as k . Lemma 6 asserts that the following inequalities hold,
p m k z 2 p m k + 1 z 2 , p k z 2 p m k z 2 , k N .
Thanks to (21) and (36), one finds
ν n ( 2 κ ν n ) B z B w n 2 μ n M 1 p n p n 1 + M 2 η n .
By letting n tend to infinity in the above inequality, one infers that lim n ν n B z B w n = 0 . According to (24) and (36), one finds that
q n w n 2 η n ( I d δ S ) q n z 2 + μ n M 1 p n p n 1 + 2 ν n q n w n B w n B z .
Accordingly, one finds that lim n q n w n = 0 . A calculation similar to the proof in Case 1. guarantees that
lim n p m k + 1 p m k = 0 ,
and
lim sup k S a , a p m k + 1 0 .
Invoking (35), one finds
p m k + 1 a 2 ( 1 η m k ) 2 ( p m k a + μ m k p m k p m k 1 ) 2 + 2 η m k ( 1 α ) ( p m k a + μ m k p m k p m k 1 ) p m k + 1 a + 2 δ η m k S a , a p m k + 1 ( 1 η m k ) 2 ( p m k + 1 a + μ m k p m k p m k 1 ) 2 + 2 η m k ( 1 α ) ( p m k + 1 a 2 + μ m k p m k p m k 1 p m k + 1 a ) + 2 δ η m k S a , a p m k + 1 ( 1 2 η m k α ) ( p m k + 1 a 2 + 2 μ m k p m k p m k 1 p m k + 1 a ) + μ m k 2 p m k p m k 1 2 + 2 δ η m k S a , a p m k + 1 ( 1 2 η m k α ) p m k + 1 a 2 + 2 μ m k p m k p m k 1 p m k + 1 a + μ m k 2 p m k p m k 1 2 + 2 δ η m k S a , a p m k + 1 ,
which sends us to
p m k + 1 a 2 μ m k η m k p m k p m k 1 M 5 + μ m k η m k p m k p m k 1 2 M 6 + S a , a p m k + 1 M 7 ,
where M 5 = sup n 0 p n + 1 a α < , M 6 = sup n 0 η n 2 α < and M 7 = δ α . By applying Assumption 2 (C 3 ), which together with (39), we additionally derive that
lim sup k p m k + 1 a 2 0 .
From p k a p m k + 1 a , we obtain that lim sup k p k a = 0 . This further implies that ( p k ) k 0 converges strongly to a. Furthermore, one has
lim n q n w n 2 = 0 , lim n w n p n = 0 .
Recalling that the sequence ( p k ) k 0 converges strongly to a, which together with (41), one finds that the sequences ( q n ) n 0 and ( w n ) n 0 also converge strongly to a. From the above, one can conclude that the sequences generated by Algorithm 1 converge strongly to the unique solution a Ω such that a = P r o j Ω ( I d δ S ) a . In view of Remark 1, one sees that a V I ( Ω , S ) . Furthermore, the solution of such variational inequality is unique due to the properties of S. This completes the proof. □
If ( μ n ) n N : = 0 , we obtain the following Algorithm 2 without the inertial extrapolation as a special case of Algorithm 1.
Algorithm 2: The hybrid forward–backward algorithm without the inertial term
Mathematics 08 00447 i002
Recently, much attention has been paid to the relationship between continuous-time systems and neural network models, see [31,32]. One can views an iterative algorithm as a time discretized version of a continuous-time dynamic system. We propose a recurrent neural network to Problem (1). By taking constant parameters, we obtain the following consequence of the dynamical with respect to the time variable t
w ( t ) = p ( t ) + α p ˙ ( t ) ) , u ( t ) = ( I d + γ A ) 1 ( w ( t ) γ B w ( t ) ) , p ˙ ( t ) + p ¨ ( t ) = ( I d δ η ( t ) S ) u ( t ) , p ( 0 ) = p 0 , p ˙ ( 0 ) = z 0 .
Algorithm 1 is strongly connected with the damped inertial system (42). Indeed, Algorithm 1 comes naturally into play by performing an implicit discretization of the inertial system with respect to the time variable t. Without ambiguity, we omit to write the variable t to get simplified notations. In so doing, p stands for p ( t ) , and so on. By setting S : = I d and k = 1 δ η , the circuit architecture diagram of the model (42) is shown in Figure 1.
The operational amplifiers U 1 , U 2 are combined with capacitors C 1 , C 2 , resistors R 1 , R 2 to work as integrators to realize the transformations between p , p ˙ , p ¨ . The amplifier U 9 cooperated with resistors R 20 , R 21 brings about effectiveness in amplification and opposition. Hence p ˙ is translated into α p ˙ . The amplifier U 3 and resistors R 3 , R 4 , R 5 , R 6 are united as the subtractor block to achieve p ¨ . Also, in the same way, the amplifier U 8 and resistors R 17 , R 18 , R 19 are united as another subtractor block to achieve p + α p ˙ . One sees that A ( ( p ˙ + p ¨ ) / k ) and B ( p + α p ˙ ) are obtained, respectively, by “ A ( · ) realization” block and “ B ( · ) realization” block. The amplifier U 6 cooperated with resistors R 13 , R 14 brings about effectiveness in amplification and opposition, therefore, A ( ( p ˙ + p ¨ ) / k ) is translated into v A ( ( p ˙ + p ¨ ) / k ) . The amplifier U 7 cooperated with resistors R 15 , R 16 brings about effectiveness in amplification and opposition, and hence B ( p + α p ˙ ) is translated into v B ( p + α p ˙ ) . The amplifier U 5 and resistors R 9 , R 10 , R 11 , R 12 are united as the subtractor block to achieve v A ( p ˙ + p ¨ ) / k + v B ( p + α p ˙ ) ( p + α p ˙ ) , that is ( p ˙ + p ¨ ) / k . On the other hand, ( p ˙ + p ¨ ) / k is translated into ( p ˙ + p ¨ ) / k through the phase inverter composed by the operational amplifier U 4 and resistors R 7 , R 8 .

4. Numerical Experiment

In this section, we consider a computational experiment to illustrate the convergence properties of the proposed method. The experiment is performed on a PC with Intel (R) Core (TM) i5-8250U CPU @1.60GHz under the MATLAB computing environment.
Digital signal reconstruction is one of the earliest and most classical problem in the file restoration, the video and image coding, the medical, the astronomical imaging and some other applications. Many problems in signal processing can be formulated as inverting the linear system, which are modeled as
b = Q x + ν ,
where x R i is the original signal to be reconstructed, ν R j is the noise, b R j is the noisy measurement, Q R j × i is a bounded linear observation operator, often ill conditioned because it models a process with loss of information.
In our experiment, we consider a general compressed sensing scenario, where the goal is to reconstruct an i-length sparse signal x with exactly k nonzero components from j ( k j < i ) observations, i.e., the number of measurements is much larger than the sparsity level of x and at the same time smaller than the number of the signal length. Considering the storage limitation of the PC, we test a small size signal with i = 2 12 and the original signal contains k = 180 randomly nonzero elements. We reconstruct this signal from j = 2 10 observations. More precisely, the observation b = Q x + ν , where Q R j × i is the Gaussian matrix whose elements are randomly obtained from the standard normal distribution N ( 0 , 1 ) and ν is the Gaussian noise distributed as N ( 0 , σ 2 I ) with σ 2 = 10 4 .
A classical and significant approach to the problems of the signal processing is the regularization method, which has attracted a considerable amount of attention and revived much interest in the compressed sensing literature. We restrict our attention to the l 1 -regularized least squares model (43). Lasso framework is a particular instant of the linear problems of type (43) with the non-smooth l 1 -norm as a regularizer, in which minimizes a squared loss function, and seeks to find the sparse solutions of
minimize x R i 1 2 Q x b 2 2 + α x 1 ,
where the regularization parameter α ( α 0 ) provides a tradeoff between the noise sensitivity and the fidelity to the measurements. One knows that Problem (44) always has a solution. However, it needs not to be unique. Please note that the optimal solution tends to zero as α . As α 0 , the limiting point has the minimum l 1 norm among all points that satisfy Q T ( Q x b ) = 0 , i.e., x = arg min Q T ( Q x b ) = 0 x 1 . Since the objective function of Problem (44) is convex but not differentiable, one uses a first-order optimality condition based on the subdifferential calculus. For m = 1 , 2 , , one can obtain the necessary and sufficient condition for the optimal solution as follows
( Q T ( Q x b ) ) m { α } , x m > 0 ; { α } , x m < 0 ; α , α , x m = 0 .
From above, one sees that the optimal solution of Problem (44) is 0, for ( Q T b ) m [ α , α ] , ( m = 1 , 2 , ), i.e., α Q T b . Thus, one can now derive the formula α m a x = Q T b . For l 1 -regularized least squares; however, the convergence occurs for a finite value of α ( α α m a x ) . To avoid the optimal sparse solution is a zero vector, in this experiment, the regularization parameter was denoted by α : = 0.01 α m a x , where the value of α m a x is computed by the above formula.
The proximal methods give a better modeling of the sparsity structure in the dataset. The major step in proximal methods is to find a solution of arg min x ( g ( x ) + 1 2 α x v 2 2 ) with respect to the function g and the parameter α > Q T b . On the other hand, the proximity operator is defined as the resolvent operator of the subgradient p r o x α g = ( I + α g ) 1 . Furthermore, the proximity operator for l 1 -norm is described as the shrinkage operator, which is defined as p r o x α · ( x ) m = ( | · | α ) + s g n ( ( x ) m ) . Now one considers the deblurring Problem (43) via the definition of the iterative shrinkage algorithm. Resorting to g ( x ) = α x 1 and f ( x ) = 1 2 Q x b 2 2 , one can see that Problem (44) is a special instance of the problem
find x H such that 0 ( g + f ) x .
Under the class of regularized loss minimization problems, the function g is considered to be a non-smooth regularization function and the function f is viewed as a smooth loss function with gradient being Q T Q -cocoercive. Furthermore,
p r o x ν n g x = sgn ( ( x ) m ) max { | ( x ) m | α ν n , 0 } , ( I d ν n f ) x = x ν n Q T ( Q x b ) , ν n > 0 .
One solves the deblurring problem by using the definition of the iterative shrinkage algorithm. Setting A : = g , B : = f and S : = I d in Algorithm 1, one can obtain the proximal gradient type algorithm given as follows
q n = ( I d + ν n A ) 1 ( I d ν n B ) ( p n + μ n ( p n p n 1 ) ) , p n + 1 = ( I d δ η n S ) q n ,
where ( μ n ) n N R + . Meanwhile, we set δ : = 1 50 , η n : = 1 n and ν n = ν : = 10 8 × Q T Q , where n N . Next, we randomly choose the starting signal in the range of ( 0 , 1 ) 4096 and take the number of iterations n = 2000 as the stopping criterion.
In this experiment, the minimum norm solution is the point in the set { x R 4096 | Q T Q x = Q T b } , which is closest to the original sparse signal. Thus, we can see that the result of this experiment for a signal sparse reconstruction is showed in Figure 2. By comparing the last two plots in Figure 2 and the top plot in Figure 2, one finds that the original sparse signal is recovered almost exactly. We hence conclude that our algorithm is efficient for dealing with the signal deblurring problem.
To illustrate that Algorithm 1 has a competitive performance compared with Algorithm 2, we describe the following numerical results shown in Figure 3.
The left plot in Figure 3 shows that the behaviors of the term ( p n ) n N (the y-axis) with respect to the number of iterations (x-axis), where p n is generated by Algorithms 1 and 2. It can be observed from the test result reported in Figure 3 that ( p n ) n N converges to 12.8603 . Thus, we can use ( p n ) n N to study the convergence and the computational performance of Algorithms 1 and 2. We denote D n = p n x , where x is the original signal to be reconstructed, and p n is an estimated signal of x. The right plot in Figure 3 shows that the value of ( D n ) n N (the y-axis) when the execution time in second elapses (x-axis). The sequence ( D n ) n N converges to 0.4152 with the running time. The restoration accuracy can be measured by means of the mean squared error. We find that the mean squared error M S E = D n 2 / i converges to 4.2088 × 10 5 , which further implies that the iterative sequence converges to the original signal x in this experiment.
We make a comparison of the behaviors of ( p n ) n N , ( D n ) n N generated respectively by Algorithms 1 and 2. It shows that the bigger μ n is, the fewer the required number of iterations becomes, the faster the convergence rate becomes. Furthermore, we find that all cases in Algorithm 1 need less computer time and enjoy a faster rate of the convergence than Algorithm 2. It can be observed from the plots that the changing process in all cases of Algorithm 1 outperforms Algorithm 2. The inertial extrapolation of Algorithm 1 plays a key role in the acceleration. For μ k judiciously chosen, this inertial term improves the convergence speed of this algorithm.

5. Conclusions

In this paper, we introduced a variational inequality problem over the zero solution set of the inclusion problems. Since this problem has a double structure, it can be considered as a double-hierarchical optimization problem. The proposed algorithm uses the extrapolation term α n ( p n - p n - 1 ) , which can be viewed as the procedure of speeding up the rate of convergence. As an application, the algorithm was used to solve the signal deblurring problem to support our convergence result.

Author Contributions

Conceptualization, L.L., X.Q. and J.-C.Y.; methodology, L.L., X.Q. and J.-C.Y.; data curation, L.L., X.Q. and J.-C.Y.; visualization, L.L.; supervision, X.Q. and J.-C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Moudafi, A. Split monotone variational inclusions. J. Optim. Theory Appl. 2011, 150, 275–283. [Google Scholar] [CrossRef]
  2. Iiduka, H. Two stochastic optimization algorithms for convex optimization with fixed point constraints. Optim. Methods Softw. 2019, 34, 731–757. [Google Scholar] [CrossRef] [Green Version]
  3. Ansari, Q.H.; Bao, T.Q. A limiting subdifferential version of Ekeland’s variational principle in set optimization. Optim. Lett. 2019, 1–5. [Google Scholar] [CrossRef]
  4. Al-Homidan, S.; Ansari, Q.H.; Kassay, G. Vectorial form of Ekeland variational principle with applications to vector equilibrium problems. Optimization 2020, 69, 415–436. [Google Scholar] [CrossRef]
  5. Li, X.; Huang, N.; Ansari, Q.H.; Yao, J.C. Convergence rate of descent method with new inexact line-search on Riemannian manifolds. J. Optim. Theory Appl. 2019, 180, 830–854. [Google Scholar] [CrossRef]
  6. Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
  7. Rockafellar, R.T. Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 1976, 14, 877–898. [Google Scholar] [CrossRef] [Green Version]
  8. Moreau, J.J. Proximité et dualité dans un espace hilbertien. Bull. De La Société Mathématique De Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
  9. Bruck, R.E.; Reich, S. Nonexpansive projections and resolvents of accretive operators in Banach spaces. Houst. J. Math. 1977, 3, 459–470. [Google Scholar]
  10. Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
  11. Cho, S.Y.; Kang, S.M. Approximation of fixed points of pseudocontraction semigroups based on a viscosity iterative process. Appl. Math. Lett. 2011, 24, 224–228. [Google Scholar] [CrossRef]
  12. Bot, R.I.; Csetnek, E.R. Second order forward-backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 2016, 54, 1423–1443. [Google Scholar] [CrossRef]
  13. Cho, S.Y.; Kang, S.M. Approximation of common solutions of variational inequalities via strict pseudocontractions. Acta Math. Sci. 2012, 32, 1607–1618. [Google Scholar] [CrossRef]
  14. Cho, S.Y.; Li, W.; Kang, S.M. Convergence analysis of an iterative algorithm for monotone operators. J. Inequal. Appl. 2013, 2013, 199. [Google Scholar] [CrossRef] [Green Version]
  15. Luo, Y.; Shang, M.; Tan, B. A general inertial viscosity type method for nonexpansive mappings and its applications in signal processing. Mathematics 2020, 8, 288. [Google Scholar] [CrossRef] [Green Version]
  16. Alvarez, F.; Attouch, H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001, 9, 3–11. [Google Scholar] [CrossRef]
  17. Lorenz, D.A.; Pock, T. An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 2015, 51, 311–325. [Google Scholar] [CrossRef] [Green Version]
  18. He, S.; Liu, L.; Xu, H.K. Optimal selections of stepsizes and blocks for the block-iterative ART. Appl. Anal. 2019, 1–14. [Google Scholar] [CrossRef]
  19. Cho, S.Y. Generalized mixed equilibrium and fixed point problems in a Banach space. J. Nonlinear Sci. Appl. 2016, 9, 1083–1092. [Google Scholar] [CrossRef] [Green Version]
  20. Liu, L. A hybrid steepest descent method for solving split feasibility problems involving nonexpansive mappings. J. Nonlinear Convex Anal. 2019, 20, 471–488. [Google Scholar]
  21. Kassay, G.; Reich, S.; Sabach, S. Iterative methods for solving systems of variational inequalities in reflexive Banach spaces. SIAM J. Optim. 2011, 21, 1319–1344. [Google Scholar] [CrossRef]
  22. Censor, Y.; Gibali, A.; Reich, S.; Sabach, S. Common solutions to variational inequalities. Set-Valued Var. Anal. 2012, 20, 229–247. [Google Scholar] [CrossRef]
  23. Gibali, A.; Liu, L.W.; Tang, Y.C. Note on the modified relaxation CQ algorithm for the split feasibility problem. Optim. Lett. 2018, 12, 817–830. [Google Scholar] [CrossRef]
  24. Cho, S.Y. Strong convergence analysis of a hybrid algorithm for nonlinear operators in a Banach space. J. Appl. Anal. Comput. 2018, 8, 19–31. [Google Scholar]
  25. Yamada, I. The hybrid steepest descent method for the variational inequality problem over the intersection of fixed point sets of nonexpansive mappings. Inherently Parallel Algorithms Feasibility Optim. Their Appl. 2001, 8, 473–504. [Google Scholar]
  26. Attouch, H.; Maingé, P.E. Asymptotic behavior of second-order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calc. Var. 2011, 17, 836–857. [Google Scholar] [CrossRef] [Green Version]
  27. Iiduka, H.; Takahashi, W. Strong convergence theorems for nonexpansive mappings and inverse-strongly monotone mappings. Nonlinear Anal. 2005, 61, 341–350. [Google Scholar] [CrossRef]
  28. Opial, Z. Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 1967, 73, 591–597. [Google Scholar] [CrossRef] [Green Version]
  29. Maingé, P.E. Inertial iterative process for fixed points of certain quasi-nonexpansive mappings. Set-Valued Anal. 2007, 15, 67–79. [Google Scholar] [CrossRef]
  30. Maingé, P.E. The viscosity approximation process for quasi-nonexpansive mappings in Hilbert spaces. Comput. Math. Appl. 2010, 59, 74–79. [Google Scholar] [CrossRef] [Green Version]
  31. Peypouquet, J.; Sorin, S. Evolution equations for maximal monotone operators: Asymptotic analysis in continuous and discrete time. arXiv 2009, arXiv:0905.1270. [Google Scholar]
  32. He, X.; Huang, T.; Yu, J.; Li, C.; Li, C. An inertial projection neural network for solving variational inequalities. IEEE Trans. Cybern. 2016, 47, 809–814. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Circuit architecture diagram of model (42), where e = p ˙ , f = p ¨ , w = p + α p ˙ , q = p ˙ + p ¨ , g = q / k .
Figure 1. Circuit architecture diagram of model (42), where e = p ˙ , f = p ¨ , w = p + α p ˙ , q = p ˙ + p ¨ , g = q / k .
Mathematics 08 00447 g001
Figure 2. From top to bottom: the original signal, the noisy measurement, the minimum norm solution, the reconstruction signals respectively by Algorithm 1 ( μ n = 0.5 ) and Algorithm 2 .
Figure 2. From top to bottom: the original signal, the noisy measurement, the minimum norm solution, the reconstruction signals respectively by Algorithm 1 ( μ n = 0.5 ) and Algorithm 2 .
Mathematics 08 00447 g002
Figure 3. The behavior of p n with the number of iterations (left), the behavior of D n with the running time (right). The number of iterations is 2000. The first CPU time to compute p 2000 is about 123.41 s; the second CPU time to compute p 2000 is about 116.64 s; the third CPU time to compute p 2000 is about 116.59 s; the fourth CPU time to compute p 2000 is about 138.18 s (from top to bottom in legend).
Figure 3. The behavior of p n with the number of iterations (left), the behavior of D n with the running time (right). The number of iterations is 2000. The first CPU time to compute p 2000 is about 123.41 s; the second CPU time to compute p 2000 is about 116.64 s; the third CPU time to compute p 2000 is about 116.59 s; the fourth CPU time to compute p 2000 is about 138.18 s (from top to bottom in legend).
Mathematics 08 00447 g003

Share and Cite

MDPI and ACS Style

Liu, L.; Qin, X.; Yao, J.-C. A Hybrid Forward–Backward Algorithm and Its Optimization Application. Mathematics 2020, 8, 447. https://doi.org/10.3390/math8030447

AMA Style

Liu L, Qin X, Yao J-C. A Hybrid Forward–Backward Algorithm and Its Optimization Application. Mathematics. 2020; 8(3):447. https://doi.org/10.3390/math8030447

Chicago/Turabian Style

Liu, Liya, Xiaolong Qin, and Jen-Chih Yao. 2020. "A Hybrid Forward–Backward Algorithm and Its Optimization Application" Mathematics 8, no. 3: 447. https://doi.org/10.3390/math8030447

APA Style

Liu, L., Qin, X., & Yao, J.-C. (2020). A Hybrid Forward–Backward Algorithm and Its Optimization Application. Mathematics, 8(3), 447. https://doi.org/10.3390/math8030447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop