Optimality Conditions for Group Sparse Constrained Optimization Problems

Wu, Wenying; Peng, Dingtao

doi:10.3390/math9010084

Open AccessArticle

Optimality Conditions for Group Sparse Constrained Optimization Problems

by

Wenying Wu

^† and

Dingtao Peng

^*,†

School of Mathematics and Statistics, Guizhou University, Guiyang 550025, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2021, 9(1), 84; https://doi.org/10.3390/math9010084

Submission received: 27 November 2020 / Revised: 21 December 2020 / Accepted: 29 December 2020 / Published: 1 January 2021

Download

Browse Figure

Versions Notes

Abstract

:

In this paper, optimality conditions for the group sparse constrained optimization (GSCO) problems are studied. Firstly, the equivalent characterizations of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones of the group sparse set are derived. Secondly, by using tangent cones and normal cones, four types of stationary points for GSCO problems are given:

T^{B}

-stationary point,

N^{B}

-stationary point,

T^{C}

-stationary point and

N^{C}

-stationary point, which are used to characterize first-order optimality conditions for GSCO problems. Furthermore, both the relationship among the four types of stationary points and the relationship between stationary points and local minimizers are discussed. Finally, second-order necessary and sufficient optimality conditions for GSCO problems are provided.

Keywords:

group sparse constrained optimization; tangent cone; normal cone; first-order optimality condition; second-order optimality condition

1. Introduction

The sparsity of a vector means that few entries of the vector are non-zero, while the group sparsity of a vector means that non-zero entries or zero entries in the vector may have some group structures, that is, they appear in blocks in certain areas. A vector can be grouped according to the prior information of the group structure among the entries, and then each group is examined to see if they are zeros entirely. For example, genes of the same biological path can be regarded as a group in gene expression analysis, so when they are described by a vector, the vector has group sparsity. Since it was first proposed by Yuan and Lin [1] in 2006, the group sparse optimization has attracted much attention of researchers [2,3,4,5]. The aim of group sparse optimization is to seek a solution of group sparsity for a system. It is now known that group sparse optimization has broad applications in bioinformatics, pattern recognition, image restoration, neuroimaging and other fields [1,6,7,8]. For instance, we can restore the signal by use of group sparse optimization according to the prior information of its group sparse structure. Moreover, the stability of the recovery can be improved in the presence of noise while the accuracy of the recovery can be improved in the absence of noise [2]. In practical problems, it is more targeted to adopt the corresponding group sparse optimization model for problems with group sparse structure [9].

The general sparse constrained optimization has been researched by many authors and achieved a lot. Here we mention few of them. In [10], the authors proposed both concepts of restricted strong convexity and restricted strong smoothness to ensure the existence of unique solution for the sparse constrained optimization, and obtained the corresponding error bounds. In [11], the authors defined

N^{B}

-stationary point and

N^{C}

-stationary point for the sparse constrained optimization. Beck and Eldar [12] put forward three types of first-order necessary optimality conditions for sparse constraints optimization. One of them is the basic feasibility which is a generation of the necessary optimality condition for zero gradient in unconstrained optimization. Another one of them is the L-stationary point which is based on the fixed point condition and can be used to derive the iterative hard thresholding algorithm for solving sparse constrained optimization problems. As we all know, Calamai and Moŕe [13] introduced

T^{B}

-stationary points and

T^{C}

-stationary points to describe the optimal conditions for general constrained optimization problems. Although N-stationary points, L-stationary points and T-stationary points are equivalent for convex optimization problems, they are not equivalent for sparse constrained optimization problems because of the non-convexity. In [14], the authors provided a description of the tangent cone and the normal cone of the sparse set, and then used to describe the first-order optimality condition and the second-order optimality condition, furthermore, they extended the results to the optimization problems subjected to sparse constraints and non-negative constraints. Chen, Pan, and Xiu [15] characterized the solutions of three kinds of sparse optimization problems and investigated the relationship among them. Recently, Bian and Chen [16] gave an exact continuous relaxation problem for the sparsity penalty optimization problem, and proposed a smoothing proximal gradient for the relaxation problem.

However, the above works are mainly for general sparse optimization problems. Due to the complexity of the group sparse structure, there still lacks of research on group sparse constrained optimization problems. If the group sparsity is a penalty in the objective function, Peng and Chen [17] studied the first-order and second-order optimality conditions for the relaxation problems for group sparse optimization problems, while Pan and Chen [18] used a capped folded concave function to approximate the group sparsity function and showed that the solution set of the continuous approximation problem and the set of group sparse solutions are same.

This paper focuses on the following group sparse constrained optimization (GSCO) problem, that is,

\begin{matrix} min_{x \in R^{n}} f (x) s . t . {∥ x ∥}_{2, 0} \leq k, \end{matrix}

(1)

where

f : R^{n} \to R

is a continuously differentiable function or a twice continuously differentiable function,

x \in R^{n}

is divided into m disjoint groups, denoted by

x = {(x_{1}^{⊤}, \dots, x_{m}^{⊤})}^{⊤}

with

x_{i} = {(x_{i (1)}, \dots, x_{i (n_{i})})}^{⊤} \in R^{n_{i}}

,

i = 1, \dots, m

and

\sum_{i = 1}^{m} n_{i} = n, n_{i} \geq 1

,

{∥ x ∥}_{2, 0} : = \sum_{i = 1}^{m} ♯ {∥ x_{i} ∥_{2} \neq 0}

counts the number of non-zero groups in

x

, where

∥ x_{i} ∥_{2}

is the

ℓ_{2}

vector norm of the ith group

x_{i}

. Throughout this paper, for simplicity,

∥ \cdot ∥

denotes the

ℓ_{2}

vector norm. Let k be a positive integer with

k \leq m \leq n

, and

S : = {x : ∥ x ∥_{2, 0} \leq k}

be a group sparse set.

Problem (1) is called GSCO due to the group structure in its entries. When

m = n

and

n_{i} = 1, i = 1, \dots, m

, Problem (1) reduces to the standard sparse constrained optimization.

Problem (1) is non-convex, non-smooth, and non-Lipschitz, for which the optimality conditions are of the theoretical importance. It is the basis of analyzing and solving the problem. The optimality conditions for constrained optimization are closely related to tangent cones and normal cones of the constraint set. We will use Boligand tangent cone, Clarke tangent cone and the corresponding normal cones of the group sparse set to describe optimality conditions for Problem (1).

This paper is organized as follows. In Section 2, some basic notations and definitions are introduced. In Section 3, the equivalent expressions of Boligand tangent cone, Clarke tangent cone, and the corresponding normal cones of the group sparse constraint set S are given. In Section 4, first-order optimality conditions for Problem (1) based on the tangent cones and normal cones of S are provided. The relationship between stationary points and local minimizers of Problem (1) is also discussed. In Section 5, second-order necessary and sufficient optimality conditions for Problem (1) are given. At last, a brief concluding remark is given in Section 6.

2. Notations and Definitions

In this section, we introduce some notations and preliminaries including the definitions of Boligand tangent cone, Clarke tangent cone and their corresponding normal cones.

For any

x = {(x_{1}^{⊤}, x_{2}^{⊤}, \dots, x_{m}^{⊤})}^{⊤} \in R^{n}

with

x_{i} \in R^{n_{i}}

, the group support set of

x

is denoted by

Γ (x) : = {i \in {1, \dots, m} : x_{i} \neq 0},

| Γ (x) |

is the cardinality of the set

Γ (x)

, then

{∥ x ∥}_{2, 0} = | Γ (x) |

, which means

{∥ x ∥}_{2, 0}

is the number of groups in

x

that have nonzero

ℓ_{2}

-norm.

For the n-dimensional real number space

R^{n}

,

R_{x_{i}}

denotes the

x_{i}

coordinate axis, and

R_{x_{i} x_{j}}^{2}

denotes the

x_{i} O x_{j}

coordinate plane. Let

e_{i} \in R^{n}

denote the n-dimensional vector in which the entries in ith group are all ones and the other entries are all zeros. Let

e_{i j} (i = 1, \dots, m, j = 1, \dots, n_{i})

denote the n-dimensional vector in which the jth entry of the ith group is one and the other entries are all zeros.

For a smooth function

f : R^{n} \to R

, let

{[\nabla f (x)]}_{i} : = {({[\nabla f (x)]}_{i (1)}, \dots, {[\nabla f (x)]}_{i (n_{i})})}^{⊤}, \nabla f (x) : = {({[\nabla f (x)]}_{1}^{⊤}, \dots, {[\nabla f (x)]}_{m}^{⊤})}^{⊤},

where

x_{i (j)} \in R

denotes the jth entry in

x_{i}

and

{[\nabla f (x)]}_{i (j)}

denotes the jth entry in

{[\nabla f (x)]}_{i}

.

The following example shows that the group sparse structure is different from the sparse structure.

Example 1.

Let

x = {(x_{1}, x_{2}, x_{3})}^{⊤}

be a 3-dimensional vector. We show the different ways of grouping and the corresponding group sparsity of

x

as follows.

(1): When $x = {(x_{1}, x_{2}, x_{3})}^{⊤}, n_{1} = n_{2} = n_{3} = 1,$
if ${∥ x ∥}_{2, 0} = 0,$ then $x = 0$ ;
if ${∥ x ∥}_{2, 0} = 1,$ then $x \in {x | x_{1} \in R \ {0}, x_{2} = x_{3} = 0} ⋃ {x | x_{2} \in R \ {0}, x_{1} = x_{3} = 0} ⋃ {x | x_{3} \in R \ {0}, x_{1} = x_{2} = 0}$ ;
if ${∥ x ∥}_{2, 0} = 2,$ then $x \in {x | x_{1}, x_{2} \in R \ {0}, x_{3} = 0} ⋃ {x | x_{1}, x_{3} \in R \ {0}, x_{2} = 0} ⋃ {x | x_{2},$ $x_{3} \in R \ {0}, x_{1} = 0}$ ;
if ${∥ x ∥}_{2, 0} = 3,$ then $x \in {x | x_{1}, x_{2}, x_{3} \in R \ {0}}$ .
(2): When $x = {(x_{1}, (x_{2}, x_{3}))}^{⊤}, n_{1} = 1, n_{2} = 2$ ,
if ${∥ x ∥}_{2, 0} = 0,$ then $x = 0$ ;
if ${∥ x ∥}_{2, 0} = 1,$ then $x \in {x | x_{1} \in R \ {0}, x_{2} = x_{3} = 0} ⋃ {x | x_{1} = 0, {(x_{2}, x_{3})}^{⊤} \in R^{2} \ {0}}$ ;
if ${∥ x ∥}_{2, 0} = 2,$ then $x \in {x | x_{1} \in R \ {0}, {(x_{2}, x_{3})}^{⊤} \in R^{2} \ {0}}$ .
(3): When $x = {((x_{1}, x_{2}, x_{3}))}^{⊤}$ , $n_{1} = 3$ ,
if ${∥ x ∥}_{2, 0} = 0,$ then $x = 0;$
if ${∥ x ∥}_{2, 0} = 1,$ then $x \in {x | {(x_{1}, x_{2}, x_{3})}^{⊤} \neq 0}$ .

In the end of this section, we will introduce the definition of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones [19].

Definition 1

([19]). Let

Ω \subseteq R^{n}

be an arbitrary nonemepty set. The Bouligand tangent cone

T_{Ω}^{B} (\hat{x})

, the Clarke tangent cone

T_{Ω}^{C} (\hat{x})

and their corresponding normal cone

N_{Ω}^{B} (\hat{x})

and

N_{Ω}^{C} (\hat{x})

to the set Ω at the point

\hat{x} \in Ω

are defined as follows.

(1): Bouligand tangent cone:

$T_{Ω}^{B} (\hat{x}) : = \{d \in R^{n} : \exists {x^{t}} \subset Ω, lim_{t \to \infty} x^{t} = \hat{x}, \exists λ_{t} \geq 0, t \in N, s . t . lim_{t \to \infty} λ_{t} (x^{t} - \hat{x}) = d\};$
(2): Fréchet normal cone:

$N_{Ω}^{B} (\hat{x}) : = {[T_{Ω}^{B} (\hat{x})]}^{\circ} = \{u \in R^{n} : 〈 u, z 〉 \leq 0, \forall z \in T_{Ω}^{B} (\hat{x})\};$
(3): Clarke tangent cone:

$T_{Ω}^{C} (\hat{x}) : = \{d \in R^{n} : \begin{matrix} \forall {x^{t}} \subset Ω, lim_{t \to \infty} x^{t} = \hat{x}, \forall {λ_{t}} \subset R_{+}, lim_{t \to \infty} λ_{t} = 0, \exists {d^{t}} \subset R^{n}, \\ s . t . lim_{t \to \infty} d^{t} = d and x^{t} + λ_{t} d^{t} \in Ω, t \in N \end{matrix}\};$
(4): Clarke normal cone:

$N_{Ω}^{C} (\hat{x}) : = {[T_{Ω}^{C} (\hat{x})]}^{\circ} = \{u \in R^{n} : 〈 u, z 〉 \leq 0, \forall z \in T_{Ω}^{C} (\hat{x})\} .$

3. Tangent Cones and Normal Cones of the Group Sparse Set $S$

Tangent cones and normal cones are widely used to describe optimality conditions for constrained optimization problems [19]. The following two theorems give the equivalent characterizations of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones to the group sparse constraint set S.

Theorem 1.

For any

\hat{x} \in S

, the Boligand tangent cone

T_{S}^{B} (\hat{x})

and Fréchet normal cone

N_{S}^{B} (\hat{x})

to the group sparse set S at the point

\hat{x}

has the following equivalent expressions:

\begin{matrix} T_{S}^{B} (\hat{x}) & = & {d \in R^{n} : ∥ d ∥_{2, 0} \leq k, ∥ \hat{x} + γ d ∥_{2, 0} \leq k, \forall γ \in R} \\ = & ⋃_{J \in Θ (\hat{x})} {d \in R^{n} : d_{i} = 0, i \notin J} \\ = & ⋃_{J \in Θ (\hat{x})} span {e_{i j}, i \in J, j = 1, \dots, n_{i}}; \end{matrix}

N_{S}^{B} (\hat{x}) = \{\begin{matrix} {u \in R^{n} : u_{i} = 0, i \in Γ (\hat{x})} = span {e_{i j}, i \notin Γ (\hat{x}), j = 1, \dots, n_{i}}, & ∥ \hat{x} ∥_{2, 0} = k, \\ {0}, & ∥ \hat{x} ∥_{2, 0} < k, \end{matrix}

where

Γ (\hat{x}) = {i \in {1, \dots, m} : {\hat{x}}_{i} \neq 0}

,

Θ (\hat{x}) = {J \subseteq {1, \dots, m} : Γ (\hat{x}) \subseteq J, | J | = k}

,

d_{i} \in R^{n_{i}}

is the ith group of

d \in R^{n}

,

u_{i} \in R^{n_{i}}

is the ith group of

u \in R^{n}

.

Proof.

(i) According to the definition of Bouligand tangent cone, we have

\begin{matrix} T_{S}^{B} (\hat{x}) = {d \in R^{n} : \exists {x^{t}} \subseteq S, lim_{t \to \infty} x^{t} = \hat{x}, \exists λ_{t} \geq 0, t \in N, s . t . lim_{t \to \infty} λ_{t} (x^{t} - \hat{x}) = d} . \end{matrix}

Firstly, we prove that

T_{S}^{B} (\hat{x}) = H (\hat{x}) : = {d \in R^{n} {: ∥ d ∥}_{2, 0} \leq k, ∥ \hat{x} + γ d ∥_{2, 0} \leq k, \forall γ \in R} .

For any

d \in T_{S}^{B} (\hat{x})

, there exists

{x^{t}} \subseteq S

such that

lim_{t \to \infty} x^{t} = \hat{x},

then

\begin{matrix} Γ (\hat{x}) \subseteq Γ (x^{t}) and | Γ (\hat{x}) | \leq | Γ (x^{t}) | \end{matrix}

for any sufficiently large t. It follows from

x^{t} \in S

that

\begin{matrix} | Γ (x^{t}) | = ∥ x^{t} ∥_{2, 0} \leq k . \end{matrix}

Since

d = lim_{t \to \infty} λ_{t} (x^{t} - \hat{x})

with

λ_{t} \geq 0

, then

Γ (d) \subseteq Γ (x^{t} - \hat{x})

. Due to

Γ (\hat{x}) \subseteq Γ (x^{t})

, we obtain

\begin{matrix} Γ (d) \subseteq Γ (x^{t} - \hat{x}) \subseteq Γ (x^{t}) . \end{matrix}

Therefore,

\begin{matrix} {∥ d ∥}_{2, 0} = | Γ (d) | \leq | Γ (x^{t}) | \leq k . \end{matrix}

(2)

According to

Γ (\hat{x}) \subseteq Γ (x^{t})

and

Γ (d) \subseteq Γ (x^{t})

, then

Γ (\hat{x} + γ d) \subseteq Γ (x^{t})

for any

γ \in R

. Hence we get

\begin{matrix} ∥ \hat{x} {+ γ d ∥}_{2, 0} = | Γ (\hat{x} + γ d) | \leq | Γ (x^{t}) | \leq k . \end{matrix}

(3)

Combining (2) with (3), we get

T_{S}^{B} (\hat{x}) \subseteq H (\hat{x})

.

Conversely, for any

d \in H (\hat{x})

, take any sequence

{λ_{t}}

such that

λ_{t} > 0

and

λ_{t} \to \infty

, let

x^{t} = \hat{x} + \frac{d}{λ_{t}}

, then

lim_{t \to \infty} x^{t} = \hat{x}

. Since

∥ \hat{x} {+ γ d ∥}_{2, 0} \leq k

for any

γ \in R

, we get

\begin{matrix} ∥ x^{t} ∥_{2, 0} = {∥ \hat{x} + \frac{d}{λ_{t}} ∥}_{2, 0} \leq k, \end{matrix}

which means

{x^{t}} \subseteq S

. It follows from

lim_{t \to \infty} x^{t} = \hat{x}

that

Γ (\hat{x}) \subseteq Γ (x^{t})

. Hence we obtain

\begin{matrix} ∥ \hat{x} ∥_{2, 0} = | Γ (\hat{x}) | \leq | Γ (x^{t}) | = ∥ x^{t} ∥_{2, 0} \leq k . \end{matrix}

From

x^{t} = \hat{x} + \frac{d}{λ_{t}}

, we get

\begin{matrix} lim_{t \to \infty} λ_{t} (x^{t} - \hat{x}) = d . \end{matrix}

Hence we have

d \in T_{S}^{B} (\hat{x})

, which means

T_{S}^{B} (\hat{x}) \supseteq H (\hat{x})

.

The above proof yields

T_{S}^{B} (\hat{x}) = H (\hat{x})

.

It is easy to prove that

\begin{matrix} H (\hat{x}) = ⋃_{J \in Θ (\hat{x})} {d \in R^{n} : d_{i} = 0, i \notin J} = ⋃_{J \in Θ (\hat{x})} span {e_{i j}, i \in J, j = 1, \dots, n_{i}} . \end{matrix}

(ii) According to the definition of Fréchet normal cone,

\begin{matrix} N_{S}^{B} (\hat{x}) = {[T_{S}^{B} (\hat{x})]}^{\circ} & = & {u \in R^{n} : 〈 u, d 〉 \leq 0, \forall d \in T_{S}^{B} (\hat{x})} . \end{matrix}

For any

u \in N_{S}^{B} (\hat{x})

and any

d \in T_{S}^{B} (\hat{x})

, it must hold

〈 u, d 〉 \leq 0

.

If

∥ \hat{x} ∥_{2, 0} = k

, we have

\begin{matrix} 〈 u, d 〉 = \sum_{i \in Γ (\hat{x})} 〈 u_{i}, d_{i} 〉 + \sum_{i \notin Γ (\hat{x})} 〈 u_{i}, d_{i} 〉 . \end{matrix}

Since

d \in T_{S}^{B} (\hat{x}) = ⋃_{J \in Θ (\hat{x})} {d \in R^{n} : d_{i} = 0, i \notin J}

, for any

J \in Θ (\hat{x})

, we have

Γ (\hat{x}) \subseteq J

and

d_{i} = 0, i \notin J

. Thus we have

d_{i} = 0, i \notin Γ (\hat{x})

,

\sum_{i \notin Γ (\hat{x})} 〈 u_{i}, d_{i} 〉 = 0

, and then

\begin{matrix} 〈 u, d 〉 = \sum_{i \in Γ (\hat{x})} 〈 u_{i}, d_{i} 〉 \leq 0, \end{matrix}

which, together with the arbitrariness of

d_{i} \in R^{n_{i}}

for

i \in Γ (\hat{x})

, implies

u_{i} = 0, i \in Γ (\hat{x})

. Therefore,

N_{S}^{B} (\hat{x}) = {u \in R^{n} : u_{i} = 0, i \in Γ (\hat{x})}

. It is easy to prove that

{u \in R^{n} : u_{i} = 0, i \in Γ (\hat{x})} = span {e_{i j}, i \notin Γ (\hat{x}), j = 1, \dots, n_{i}}

.

If

∥ \hat{x} ∥_{2, 0} < k

, for any

J \in Θ (\hat{x})

, it holds

\begin{matrix} 〈 u, d 〉 = \sum_{i \in J} 〈 u_{i}, d_{i} 〉 + \sum_{i \notin J} 〈 u_{i}, d_{i} 〉 . \end{matrix}

We also have

〈 u, d 〉 = \sum_{i \in J} 〈 u_{i}, d_{i} 〉 \leq 0

, which also implies

u_{i} = 0, i \in J

. Due to

∥ \hat{x} ∥_{2, 0} < k

,

Γ (\hat{x}) \subseteq J

and

| J | = k

, it must hold

⋃_{J \in Θ (\hat{x})} J = {1, 2, \dots, m}

, and then

N_{S}^{B} (\hat{x}) = {0}

. □

Next, we give the equivalent characterizations of Clarke tangent cone and Clarke normal cone of the group sparse constraint set S.

Theorem 2.

For any

\hat{x} \in S

, the Clarke tangent cone and the Clarke normal cone of the group sparse set S at

\hat{x}

have the following equivalent expressions:

\begin{matrix} T_{S}^{C} (\hat{x}) & = & {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})} \\ = & {d \in R^{n} : d_{i} = 0, i \notin Γ (\hat{x})} \\ = & span {e_{i j}, i \in Γ (\hat{x}), j = 1, \dots, n_{i}}; \end{matrix}

\begin{matrix} N_{S}^{C} (\hat{x}) & = & {u \in R^{n} : u_{i} = 0, i \in Γ (\hat{x})} \\ = & span {e_{i j}, i \notin Γ (\hat{x}), j = 1, \dots, n_{i}} . \end{matrix}

Proof.

(i) According to the definition of Clarke tangent cone, we have

\begin{matrix} T_{S}^{C} (\hat{x}) = \{d \in R^{n} : \begin{matrix} \forall {x^{t}} \subseteq S, & lim_{t \to \infty} x^{t} = \hat{x}, \forall {λ_{t}} \subset R_{+}, lim_{t \to \infty} λ_{t} = 0, \exists {y^{t}} \subset R^{n}, \\ s . t . lim_{t \to \infty} y^{t} = d, {∥ x^{t} + λ_{t} y^{t} ∥}_{2, 0} \leq k, \forall t \in N \end{matrix}\} . \end{matrix}

We first prove

T_{S}^{C} (\hat{x}) = {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})}

.

To prove

T_{S}^{C} (\hat{x}) \subseteq {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})}

, we assume, on the contrary, that there exists

d \in T_{S}^{C} (\hat{x})

, but

Γ (d) ⊈ Γ (\hat{x})

. Then there exists

i_{0} \in Γ (d)

but

i_{0} \notin Γ (\hat{x})

, which implies that

{\hat{x}}_{i_{0}} = 0

but

d_{i_{0}} \neq 0

.

Note that

| Γ (\hat{x}) | \leq k

. For any

t \in N

, take

Γ_{t} \subseteq {1, 2, \dots, m} \ {Γ (\hat{x}) ⋃ {i_{0}}}

such that

| Γ (\hat{x}) | + | Γ_{t} | = k .

Let

λ_{t} = \frac{1}{t^{2}} ↓ 0

and

\begin{matrix} x_{i}^{t} = \{\begin{matrix} x_{i}, & i \in Γ (\hat{x}) \\ \frac{1}{t} 1_{n_{i}}, & i \in Γ_{t}, \\ 0, & i \in {1, 2, \dots, m} \ {Γ_{t} ⋃ Γ (\hat{x})} . \end{matrix} \end{matrix}

where

1_{n_{i}}

is an

n_{i}

-dimensional vector of all ones. Then

\begin{matrix} Γ (x^{t}) = Γ (\hat{x}) ⋃ Γ_{t}, ∥ x^{t} ∥_{2, 0} = | Γ (\hat{x}) | + | Γ_{t} | = k, \end{matrix}

and thus

{x^{t}} \subseteq S

,

x_{i_{0}}^{t} = 0

, and

lim_{t \to \infty} x^{t} = \hat{x}

. For any

y^{t} \to d

, we have

\begin{matrix} x_{i}^{t} + λ_{t} y_{i}^{t} = \{\begin{matrix} x_{i}^{t} + \frac{1}{t^{2}} y_{i}^{t} \to {\hat{x}}_{i}, & i \in Γ (\hat{x}), \\ \frac{1}{t} 1_{n_{i}} + \frac{1}{t^{2}} y_{i}^{t} \to 0, & i \in Γ_{t}, \\ 0 + \frac{1}{t^{2}} y_{i_{0}}^{t} \to 0, & i = i_{0}, \\ 0 + \frac{1}{t^{2}} y_{i}^{t} \to 0, & i \in {1, 2, \dots, m} \ {Γ_{t} ⋃ Γ (\hat{x}) ⋃ {i_{0}}} . \end{matrix} \end{matrix}

Since

y_{i_{0}}^{t} \to d_{i_{0}} \neq 0

, for any sufficiently large t, we have

\begin{matrix} ∥ x^{t} + λ_{t} y^{t} ∥_{2, 0} \geq | Γ (\hat{x}) ⋃ Γ_{t} ⋃ {i_{0}} | = k + 1, \end{matrix}

Therefore,

x^{t} + λ_{t} y^{t} \notin S

for any sufficiently large t, which means

d \notin T_{S}^{C} (\hat{x})

according to the definition of

T_{S}^{C} (\hat{x})

. This contradiction shows that

T_{S}^{C} (\hat{x}) \subseteq {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})}

.

To prove

{d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})} \subseteq T_{S}^{C} (\hat{x})

, let

d \in {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})}

. For any

{x^{t}} \subseteq S

, with

lim_{t \to \infty} x^{t} = \hat{x}

and any

{λ_{t}} \subset R_{+}

with

{lim}_{t \to \infty} λ_{t} = 0

, we have

\begin{matrix} Γ (d) \subseteq Γ (\hat{x}) \subseteq Γ (x^{t}) . \end{matrix}

(4)

Let

y^{t} = x^{t} - \hat{x} + d

, then from (4), we get

Γ (y^{t}) \subseteq Γ (x^{t})

and

∥ x^{t} + λ_{t} y^{t} ∥_{2, 0} = | Γ (x^{t} + λ_{t} y^{t}) | \leq | Γ (x^{t}) | \leq k .

In addition,

lim_{t \to \infty} y^{t} = lim_{t \to \infty} (x^{t} - \hat{x} + d) = d

. It is easy to know that

d \in T_{S}^{C} (\hat{x})

according to the definition of

T_{S}^{C} (\hat{x})

. From the arbitrariness of

d \in {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})}

, we have

\begin{matrix} {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})} \subseteq T_{S}^{C} (\hat{x}) . \end{matrix}

Therefore, we have proved that

T_{S}^{C} (\hat{x}) = {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})}

.

Since

d_{i} = 0, i \notin Γ (d)

and

Γ (d) \subseteq Γ (\hat{x})

for any

d \in {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})}

, it must hold

d_{i} = 0, i \notin Γ (\hat{x})

. Hence we get

\begin{matrix} T_{S}^{C} (\hat{x}) = {d \in R^{n} : Γ (d) \subseteq Γ (\hat{x})} = {d \in R^{n} : d_{i} = 0, i \notin Γ (\hat{x})} . \end{matrix}

(5)

It is easy to prove that

{d \in R^{n} : d_{i} = 0, i \notin Γ (\hat{x})} = span {e_{i j} : i \in Γ (\hat{x}), j = 1, \dots, n_{i}}

, then

\begin{matrix} T_{S}^{C} (\hat{x}) = span {e_{i j} : i \in Γ (\hat{x}), j = 1, \dots, n_{i}} . \end{matrix}

(6)

(ii) According to the definition of Clarke normal cone, we have

\begin{matrix} N_{S}^{C} (\hat{x}) = {[T_{S}^{C} (\hat{x})]}^{\circ} = {u \in R^{n} : 〈 d, u 〉 \leq 0, \forall d \in T_{S}^{C} (\hat{x})} . \end{matrix}

For any

d \in T_{S}^{C} (\hat{x})

and any

u \in N_{S}^{C} (\hat{x})

, we have

\begin{matrix} 〈 d, u 〉 = \sum_{i \in Γ (\hat{x})} 〈 d_{i}, u_{i} 〉 + \sum_{i \notin Γ (\hat{x})} 〈 d_{i}, u_{i} 〉 \leq 0 . \end{matrix}

From (5),

d_{i} = 0, i \notin Γ (\hat{x})

, then we get

\sum_{i \notin Γ (\hat{x})} 〈 d_{i}, u_{i} 〉 = 0

, and thus

\begin{matrix} 〈 d, u 〉 = \sum_{i \in Γ (\hat{x})} 〈 d_{i}, u_{i} 〉 \leq 0 . \end{matrix}

which means

u_{i} = 0, i \in Γ (\hat{x})

due to the arbitrariness of

d_{i} \in R^{n_{i}}

. Therefore,

N_{S}^{C} (\hat{x}) = {u \in R^{n} : u_{i} = 0, i \in Γ (\hat{x})}

. □

Obviously, the following relationship holds for Boligand tangent cone, Clarke normal cone and the corresponding normal cones of the group sparse set S at any point

\hat{x} \in S

:

T_{S}^{C} (\hat{x}) \subseteq T_{S}^{B} (\hat{x}), N_{S}^{B} (\hat{x}) \subseteq N_{S}^{C} (\hat{x}) .

Remark 1.

In [14], the authors gave the expressions of tangent cone and normal cone to the sparse set

{x \in R^{n} : ∥ x ∥_{0} \leq k}

. Theorems 1 and 2 in this paper are the extension of their results.

In the end of this section, we give an example of the tangent cones of S in

R^{3}

.

Example 2.

Consider the group sparse set

S = {x = {(x_{1}, (x_{2}, x_{3}))}^{⊤} \in R^{3} : ∥ x ∥_{2, 0} \leq 1},

where

x_{1}

is the first group, and

{(x_{2}, x_{3})}^{⊤}

is the second group. Consider its Bouligand tangent cone and Clarke tangent cone at three points:

x^{1} = {(0, (1, 1))}^{⊤}, x^{2} = {(0, (1, 0))}^{⊤}

and

x^{3} = {(1, (0, 0))}^{⊤}

. It is easy to get the following statements:

Γ (x^{1}) = {2}

,

Γ (x^{2}) = {2}

,

Γ (x^{3}) = {1}

;

Θ (x^{1}) = {2}

,

Θ (x^{2}) = {2}

,

Θ (x^{3}) = {1}

;

T_{S}^{B} (x^{1}) = {x \in R^{3} : x_{1} = 0} = R_{x_{2} x_{3}}^{2}

;

T_{S}^{C} (x^{1}) = {x \in R^{3} : x_{1} = 0} = R_{x_{2} x_{3}}^{2}

;

T_{S}^{B} (x^{2}) = {x \in R^{3} : x_{1} = 0} = R_{x_{2} x_{3}}^{2}

;

T_{S}^{C} (x^{2}) = {x \in R^{3} : x_{1} = 0} = R_{x_{2} x_{3}}^{2}

;

T_{S}^{B} (x^{3}) = {x \in R^{3} : x_{2} = x_{3} = 0} = R_{x_{1}}

;

T_{S}^{C} (x^{3}) = {x \in R^{3} : x_{2} = x_{3} = 0} = R_{x_{1}}

.

Therefore,

T_{S}^{B} (x^{1}) = T_{S}^{C} (x^{1}) = T_{S}^{B} (x^{2}) = T_{S}^{C} (x^{2}) = R_{x_{2} x_{3}}^{2}

,

T_{S}^{B} (x^{3}) = T_{S}^{C} (x^{3}) = R_{x_{1}}

.

Figure 1 provides the figures of the above Bouligand tangent cones and Clarke tangent cones.

From example 3.1, we can see that the key of group sparsity is to survey whether each group as a whole is zero instead of checking whether each entry is zero.

4. First-Order Optimality Conditions for Problem (1)

The optimality conditions for optimization problems are usually closely related to their stationary points. In this section, we use Bouligand tangent cones, Clarke tangent cones and their corresponding normal cones to specifically describe the N-stationary points and T-stationary points of Problem (1), then based on the descriptions, we investigate the relationship among the stationary points and the relationship between stationary points and local minimizers.

Definition 2.

x^{*} \in S

is called an

N^{♯}

-stationary point or

T^{♯}

-stationary point of Problem (1) if it meets the following conditions respectively:

(i): $N^{♯}$ -stationary point: $0 \in \nabla f (x^{*}) + N_{S}^{♯} (x^{*});$
(ii): $T^{♯}$ -stationary point: $0 = ∥ \nabla_{S}^{♯} f (x^{*}) ∥;$

where

♯ \in {B, C}

stands for the sense of Bouligand or Clarke, and

\nabla_{S}^{♯} f (x^{*}) = arg min \{∥ d + \nabla f (x^{*}) ∥ : d \in T_{S}^{♯} (x^{*})\}

is the projection gradient on Bouligand tangent cone or Clarke tangent cone.

Next, we will study the link between

N^{B}

-stationary point and

T^{B}

-stationary point of Problem (1).

Theorem 3.

Suppose

x^{*} \in S

, then the following statements hold for Problem (1):

(i): If $∥ x^{*} ∥_{2, 0} = k$ , then $x^{*}$ is an $N^{B}$ -stationary point ⇔ $x^{*}$ is a $T^{B}$ -stationary point;
(ii): If $∥ x^{*} ∥_{2, 0} < k$ , then $x^{*}$ is an $N^{B}$ -stationary point ⇔ $\nabla f (x^{*}) = 0 \Leftrightarrow$ $x^{*}$ is a $T^{B}$ -stationary point.

Proof.

(i) Let

∥ x^{*} ∥_{2, 0} = k

.

On one hand, suppose

x^{*} \in S

is an

N^{B}

-stationary point of Problem (1), then

\begin{matrix} 0 \in \nabla f (x^{*}) + N_{S}^{B} (x^{*}), \end{matrix}

that is,

- \nabla f (x^{*}) \in N_{S}^{B} (x^{*})

. By Theorem 2,

N_{S}^{B} (x^{*}) = {u \in R^{n} : u_{i} = 0, i \in Γ (x^{*})}

, then we have

\begin{matrix} - \nabla f (x^{*}) \in {u \in R^{n} : u_{i} = 0, i \in Γ (x^{*})}, \end{matrix}

i.e.,

\begin{matrix} \begin{matrix} {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

It is easy to check that the converse is also true. That is, when

∥ x^{*} ∥_{2, 0} = k

, it holds that

\begin{matrix} \begin{matrix} 0 \in \nabla f (x^{*}) + N_{S}^{B} (x^{*}) \Leftrightarrow {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

(7)

On the other hand, suppose

x^{*} \in S

is a

T^{B}

-stationary point of Problem (1), then

\begin{matrix} 0 = ∥ \nabla_{S}^{B} f (x^{*}) ∥ . \end{matrix}

By Theorem 1,

T_{S}^{B} (x^{*}) = {d \in R^{n} {: ∥ d ∥}_{2, 0} \leq k, ∥ x^{*} + γ d ∥_{2, 0} \leq k, \forall γ \in R}

. Hence, in the case of

∥ x^{*} ∥_{2, 0} = k

, we have

\begin{matrix} d \in T_{S}^{B} (x^{*}) \Leftrightarrow Γ (d) \subseteq Γ (x^{*}) . \end{matrix}

Accordingly, we have

\begin{matrix} \begin{matrix} \nabla_{S}^{B} f (x^{*}) & = arg min {∥ d + \nabla f (x^{*}) ∥ : d \in T_{S}^{B} (x^{*})} \\ = arg min {∥ d + \nabla f (x^{*}) ∥ : Γ (d) \subseteq Γ (x^{*})} . \end{matrix} \end{matrix}

For

i \notin Γ (x^{*})

,

d_{i} = 0

, then

0 = {(\nabla_{S}^{B} f (x^{*}))}_{i}

; For

i \in Γ (x^{*})

, obviously,

{(\nabla_{S}^{B} f (x^{*}))}_{i} = - {(\nabla f (x^{*}))}_{i}

. Hence we get

\begin{matrix} \begin{matrix} {(\nabla_{S}^{B} f (x^{*}))}_{i} = \{\begin{matrix} 0, & i \notin Γ (x^{*}), \\ - {(\nabla f (x^{*}))}_{i}, & i \in Γ (x^{*}), \end{matrix} \end{matrix} \end{matrix}

According to

0 = ∥ \nabla_{S}^{B} f (x^{*}) ∥

, we have

\begin{matrix} \begin{matrix} {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

It is easy to check that the converse is also true. That is, in the case of

∥ x^{*} ∥_{2, 0} = k

, the following equivalence holds

\begin{matrix} \begin{matrix} 0 = ∥ \nabla_{S}^{B} f (x^{*}) ∥ \Leftrightarrow {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

(8)

Combining (7) with (8), we can conclude that, when

∥ x^{*} ∥_{2, 0} = k

,

x^{*}

is an

N^{B}

-stationary point of Problem (1) if and only if it is a

T^{B}

-stationary point of Problem (1).

(ii) In the case of

∥ x^{*} ∥_{2, 0} < k

, we first prove the equivalent relationship between

N^{B}

-stationary point of Problem (1) and

\nabla f (x^{*}) = 0

.

On one hand, suppose

x^{*} \in S

is an

N^{B}

-stationary point of Problem (1), then

\begin{matrix} 0 \in \nabla f (x^{*}) + N_{S}^{B} (x^{*}), \end{matrix}

that is,

- \nabla f (x^{*}) \in N_{S}^{B} (x^{*})

. It follows from

N_{S}^{B} (x^{*}) = {0}

that

\nabla f (x^{*}) = 0

. Hence the following implication holds

\begin{matrix} 0 \in \nabla f (x^{*}) + N_{S}^{B} (x^{*}) \Rightarrow \nabla f (x^{*}) = 0 . \end{matrix}

(9)

On the other hand, suppose

\nabla f (x^{*}) = 0

. In the case of

∥ x^{*} ∥_{2, 0} < k

, by theorem 1,

N_{S}^{B} (x^{*}) = 0

. Therefore

\begin{matrix} - \nabla f (x^{*}) = 0 \in N_{S}^{B} (x^{*}), \end{matrix}

i.e.,

0 \in \nabla f (x^{*}) + N_{S}^{B} (x^{*})

. Hence the following implication holds

\begin{matrix} \nabla f (x^{*}) = 0 \Rightarrow 0 \in \nabla f (x^{*}) + N_{S}^{B} (x^{*}) . \end{matrix}

(10)

From (9) and (10), we get the following equivalent relationship

\begin{matrix} \begin{matrix} 0 \in \nabla f (x^{*}) + N_{S}^{B} (x^{*}) \Leftrightarrow \nabla f (x^{*}) = 0, \end{matrix} \end{matrix}

(11)

that is, in the case of

∥ x^{*} ∥_{2, 0} < k

,

x^{*}

is an

N^{B}

-stationary point if and only if

\nabla f (x^{*}) = 0

.

In the following part, we prove the equivalent relationship between

T^{B}

-stationary point of Problem (1) and

\nabla f (x^{*}) = 0

in the case of

∥ x^{*} ∥_{2, 0} < k

.

Suppose

x^{*} \in S

satisfies

\nabla f (x^{*}) = 0

, then by Theorem 1,

\begin{matrix} \begin{matrix} \nabla_{S}^{B} f (x^{*}) & = arg min {∥ d + \nabla f (x^{*}) ∥ : d \in T_{S}^{B} (x^{*})} \\ = arg min {∥ d ∥ : ∥ d ∥_{2, 0} \leq k, ∥ x^{*} + γ d ∥_{2, 0} \leq k, \forall γ \in R} \\ = 0 . \end{matrix} \end{matrix}

That is,

\begin{matrix} \nabla f (x^{*}) = 0 \Rightarrow 0 = ∥ \nabla_{S}^{B} f (x^{*}) ∥ . \end{matrix}

(12)

Conversely, suppose

x^{*}

is a

T^{B}

-stationary point of Problem (1), i.e.,

\begin{matrix} 0 = ∥ \nabla_{S}^{B} f (x^{*}) ∥, \end{matrix}

then by Theorem 1,

\begin{matrix} \begin{matrix} 0 = \nabla_{S}^{B} f (x^{*}) & = arg min {∥ d + \nabla f (x^{*}) ∥ : d \in T_{S}^{B} (x^{*})} \\ = arg min {∥ d + \nabla f (x^{*}) ∥ : ∥ d ∥_{2, 0} \leq k, ∥ x^{*} + γ d ∥_{2, 0} \leq k, \forall γ \in R} . \end{matrix} \end{matrix}

Hence we get that

∥ \nabla f (x^{*}) ∥ = ∥ 0 + \nabla f (x^{*}) ∥ \leq ∥ d + \nabla f (x^{*}) ∥

for any

d \in R^{n}

satisfying

{∥ d ∥}_{2, 0} \leq k

,

∥ x^{*} {+ γ d ∥}_{2, 0} \leq k

,

\forall γ \in R

.

For any

i_{0} \in {1, 2, \dots, m}

, take

\hat{d} \in R^{n}

such that

Γ (\hat{d}) = {i_{0}}

and

{\hat{d}}_{i_{0}} = - {(\nabla f (x^{*}))}_{i_{0}}

. Following from

| Γ (x^{*}) | = ∥ x^{*} ∥_{2, 0} < k

, we have

\begin{matrix} \begin{matrix} ∥ x^{*} + γ \hat{d} ∥_{2, 0} = | Γ (x^{*}) \cup {i_{0}} | \leq | Γ (x^{*}) | + 1 \leq k . \end{matrix} \end{matrix}

From

∥ \nabla f (x^{*}) ∥ \leq ∥ \hat{d} + \nabla f (x^{*}) ∥

, we obtain

∥ {(\nabla f (x^{*}))}_{i_{0}} ∥ \leq ∥ - {(\nabla f (x^{*}))}_{i_{0}} + {(\nabla f (x^{*}))}_{i_{0}} ∥

, and then

\begin{matrix} {(\nabla f (x^{*}))}_{i_{0}} = 0 . \end{matrix}

According to the arbitrariness of

i_{0}

, we get

\nabla f (x^{*}) = 0

. That is,

\begin{matrix} 0 = ∥ \nabla_{S}^{B} f (x^{*}) ∥ \Rightarrow \nabla f (x^{*}) = 0, \end{matrix}

(13)

Combining (12) with (13), in the case of

∥ x^{*} ∥ < k

, the following equivalent relationship holds

\begin{matrix} 0 = ∥ \nabla_{S}^{B} f (x^{*}) ∥ \Leftrightarrow \nabla f (x^{*}) = 0 . \end{matrix}

The proof is thus finished. □

Furthermore, for Problem (1), its

N^{C}

-stationary point and

T^{C}

-stationary point have the following equivalent relationship.

Theorem 4.

For Problem (1), let

x^{*} \in S

, then

x^{*}

is an

N^{C}

-stationary point if and only if it is a

T^{C}

-stationary point.

Proof.

On one hand, by Theorem 2,

N_{S}^{C} (x^{*}) = {u \in R^{n} : u_{i} = 0, i \in Γ (x^{*})}

. Then we have the following equivalences:

\begin{matrix} \begin{matrix} x^{*} is an N^{C} ‑ stationary point of Problem (1) \\ \Leftrightarrow 0 \in \nabla f (x^{*}) + N_{S}^{C} (x^{*}) \\ \Leftrightarrow - \nabla f (x^{*}) \in N_{S}^{C} (x^{*}) \\ \Leftrightarrow {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

(14)

On the other hand, by Theorem 2,

T_{S}^{C} (x^{*}) = {d \in R^{n} : d_{i} = 0, i \notin Γ (x^{*})}

. Then according to the definition of

\nabla_{S}^{C} f (x^{*})

, we have that

\begin{matrix} \nabla_{S}^{C} f (x^{*}) & = & arg min {∥ d + \nabla f (x^{*}) ∥ : d \in T_{S}^{C} (x^{*})} \\ = & arg min {∥ d + \nabla f (x^{*}) ∥ : d_{i} = 0, i \notin Γ (x^{*})} \\ = & arg min {∥ d + \nabla f (x^{*}) ∥^{2} : d_{i} = 0, i \notin Γ (x^{*})} \\ = & arg min \{(\sum_{i \in Γ (x^{*})} + \sum_{i \notin Γ (x^{*})}) {∥ d_{i} + {(\nabla f (x^{*}))}_{i} ∥}^{2} : d_{i} = 0, i \notin Γ (x^{*})\} \\ = & arg min \{\sum_{i \in Γ (x^{*})} ∥ d_{i} + {(\nabla f (x^{*}))}_{i} ∥^{2} + \sum_{i \notin Γ (x^{*})} {∥ {(\nabla f (x^{*}))}_{i} ∥}^{2} : d_{i} = 0, i \notin Γ (x^{*})\} \\ = & arg min \{\sum_{i \in Γ (x^{*})} {∥ d_{i} + {(\nabla f (x^{*}))}_{i} ∥}^{2} : d_{i} \in R^{n_{i}}, i \in Γ (x^{*}); d_{i} = 0, i \notin Γ (x^{*})\} . \end{matrix}

Thus by directly computing,

\nabla_{S}^{C} f (x^{*})

satisfies

\begin{matrix} \begin{matrix} {(\nabla_{S}^{C} f (x^{*}))}_{i} = \{\begin{matrix} - {(\nabla f (x^{*}))}_{i}, & i \in Γ (x^{*}), \\ 0, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

Therefore, the following equivalent relationships hold:

\begin{matrix} \begin{matrix} x^{*} is T^{C} ‑ stationary point of Problem (1) \\ \Leftrightarrow 0 = \nabla_{S}^{C} f (x^{*}) \\ \Leftrightarrow - {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

(15)

Combine (14) and (15), then we get the following equivalent relationships:

\begin{matrix} \begin{matrix} 0 \in \nabla f (x^{*}) + N_{S}^{C} (x^{*}) \Leftrightarrow {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}), \end{matrix} \Leftrightarrow ∥ \nabla_{S}^{C} f (x^{*}) ∥ = 0 . \end{matrix} \end{matrix}

The proof is thus complete. □

Next, we investigate the relationship among the four types of stationary points of Problem (1).

Theorem 5.

Let

x^{*} \in S

, then the following statements hold for Problem (1):

(i): If $x^{*}$ is an $N^{B}$ -stationary point, then it must be an $N^{C}$ -stationary point;
(ii): If $x^{*}$ is a $T^{B}$ -stationary point, then it must be a $T^{C}$ -stationary point.

Proof.

(i) Let

x^{*}

is an

N^{B}

-stationary point of Problem (1). There are two cases:

∥ x^{*} ∥_{2, 0} = k

and

∥ x^{*} ∥_{2, 0} < k

.

Case 1:

∥ x^{*} ∥_{2, 0} = k

. In this case, by (7),

x^{*}

is an

N^{B}

-stationary point if and only if

\begin{matrix} \begin{matrix} {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}), \end{matrix} \end{matrix} \end{matrix}

which, by (14), is equivalent to that

x^{*}

is an

N^{C}

-stationary point of Problem (1). Thus we obtain that

N^{B}

-stationary point and

N^{C}

-stationary point are equivalent in the case of

∥ x^{*} ∥_{2, 0} = k

.

Case 2:

∥ x^{*} ∥_{2, 0} < k

. By (11), in this case,

x^{*}

is an

N^{B}

-stationary point of Problem (1) if and only if

\begin{matrix} \begin{matrix} \nabla f (x^{*}) = 0 . \end{matrix} \end{matrix}

By (14),

x^{*}

is an

N^{C}

-stationary point of Problem (1) if and only if

\begin{matrix} \begin{matrix} {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

Clearly, in the case of

∥ x^{*} ∥_{2, 0} < k

, if

x^{*}

is an

N^{B}

-stationary point of Problem (1), it must be an

N^{C}

-stationary point (the converse is not true). That is,

\begin{matrix} N^{B} - stationary point \Rightarrow N^{C} - stationary point . \end{matrix}

(16)

(ii) According to Theorems 3 and 4, the

N^{B}

-stationary point of Problem (1) is equivalent to its

T^{B}

-stationary point, and the

N^{C}

-stationary point of Problem (1) is equivalent to its

T^{C}

-stationary point, this is,

N^{B} - stationary point \Leftrightarrow T^{B} - stationary point;

N^{C} - stationary point \Leftrightarrow T^{C} - stationary point .

Moreover, from (16),

N^{B} - stationary point \Rightarrow N^{C} - stationary point .

Therefore,

T^{B} - stationary point \Rightarrow T^{C} - stationary point .

The proof is finished. □

To have a clear presentation, based on the proofs of Theorems 3 and 4, we use Table 1 to display the characterizations of the four types of stationary points of Problem (1).

In the end of this section, we discuss the relationship between the local minimizers of Problem (1) and its stationary points.

Theorem 6.

Let

x^{*} \in S

be a local minimizer of Problem (1), then the following two statements hold:

(i): $x^{*}$ is an $N^{B}$ -stationary point and hence an $N^{C}$ -stationary point;
(ii): $x^{*}$ is a $T^{B}$ -stationary point and hence a $T^{C}$ -stationary point.

Proof.

Since

x^{*}

is a local minimizer of Problem (1), for sufficiently small

α > 0

, it holds that

\begin{matrix} f (x^{*}) \leq f (x^{*} + α e_{i j}), \forall i \in J \supseteq Γ (x^{*}), | J | = k; j = 1, \dots, n_{i}, \end{matrix}

and then

\begin{matrix} 0 \in arg min {h_{i j} (α) ≜ f (x^{*} + α e_{i j}) : α \geq 0}, \forall i \in J \supseteq Γ (x^{*}), | J | = k; j = 1, \dots, n_{i} . \end{matrix}

Due to

x^{*} \in S

, there are two cases:

∥ x^{*} ∥_{2, 0} < k

and

∥ x^{*} ∥_{2, 0} = k

.

Case 1:

∥ x^{*} ∥_{2, 0} < k

. In this case,

⋃_{J \supseteq Γ (x^{*}), | J | = k} J = {1, \dots, m}

, then

\begin{matrix} 0 \in arg min {h_{i j} (α) = f (x^{*} + α e_{i j}) : α \geq 0}, \forall i = 1, \dots, m; \forall j = 1, \dots, n_{i} . \end{matrix}

By the optimality conditions for the above problems, we have

\begin{matrix} {(\nabla f (x^{*}))}_{i j} = h_{i j}^{'} (0) = 0, \forall i = 1, \dots, m; \forall j = 1, \dots, n_{i} . \end{matrix}

That is,

\nabla f (x^{*}) = 0

.

Case 2:

∥ x^{*} ∥_{2, 0} = k

. In this case,

\begin{matrix} 0 \in arg min {h_{i j} (α) = f (x^{*} + α e_{i j}) : α \geq 0}, \forall i \in Γ (x^{*}); \forall j = 1, \dots, n_{i} . \end{matrix}

It can be derived that

{(\nabla f (x^{*}))}_{i j} = 0

,

\forall i \in Γ (x^{*}), \forall j = 1, \dots, n_{i}

. That is,

{(\nabla f (x^{*}))}_{i} = 0

,

\forall i \in Γ (x^{*}) .

Combining the above two cases with (7) and (11), we know that

x^{*}

is an

N^{B}

-stationary point of Problem (1). From Theorem 5,

x^{*}

is also an

N^{C}

-stationary point of Problem (1).

(ii) From (i),

x^{*}

is both

N^{B}

-stationary point and

N^{C}

-stationary point. According to Theorems 3 and 4,

x^{*}

is both

T^{B}

-stationary point and

T^{C}

-stationary point. The proof is complete. □

As a summary of this section, we conclude the relationship among local minimizers and the four stationary points of Problem (1) as follows:

\begin{matrix} local minimizer \Rightarrow & N^{B} ‑ stationary point & \Leftrightarrow T^{B} ‑ stationary point \\ ⇓ & ⇓ \\ N^{C} ‑ stationary point & \Leftrightarrow T^{C} ‑ stationary point . \end{matrix}

5. Second-Order Optimality Conditions for Problem (1)

In this section, we provide some second-order necessary or sufficient optimality conditions for Problem (1) by use of Clarke tangent cone.

Theorem 7

(Second-order necessary condition). Let

x^{*} \in S

be a local minimizer of Problem (1), then for any

d \in T_{S}^{C} (x^{*})

, it must hold that

d^{⊤} \nabla f (x^{*}) = 0

and

d^{⊤} \nabla^{2} f (x^{*}) d \geq 0,

where

\nabla^{2} f (x^{*})

is the Hessian matrix of f at

x^{*}

.

Proof.

Since

x^{*} \in S

is a local minimizer of Problem (1), by Theorem 6,

x^{*}

is also an

N^{C}

-stationary point. By (14),

\begin{matrix} \begin{matrix} {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

According to (5), for any

d \in T_{S}^{C} (x^{*})

,

\begin{matrix} d_{i} = 0, i \notin Γ (x^{*}) . \end{matrix}

Thus, for any

d \in T_{S}^{C} (x^{*})

, it holds

\begin{matrix} d^{⊤} \nabla f (x^{*}) = 0 . \end{matrix}

(17)

In addition, since

x^{*}

is a local minimizer of Problem (1), for sufficiently small

α > 0

and any

d \in T_{S}^{C} (x^{*})

, we have

\begin{matrix} \begin{matrix} f (x^{*}) \leq f (x^{*} + α d) . \end{matrix} \end{matrix}

(18)

By Taylor’s Theorem,

\begin{matrix} f (x^{*} + α d) = f (x^{*}) + α d^{⊤} \nabla f (x^{*}) + \frac{1}{2} α^{2} d^{⊤} \nabla^{2} f (x^{*}) d + o (α^{2}) . \end{matrix}

(19)

Combine (17)–(19), then

\begin{matrix} \begin{matrix} f (x^{*}) \leq & f (x^{*}) + α d^{⊤} \nabla f (x^{*}) + \frac{1}{2} α^{2} d^{⊤} \nabla^{2} f (x^{*}) d + o (α^{2}) \\ = & f (x^{*}) + \frac{1}{2} α^{2} d^{⊤} \nabla^{2} f (x^{*}) d + o (α^{2}) . \end{matrix} \end{matrix}

Hence,

\begin{matrix} 0 \leq \frac{1}{2} α^{2} d^{⊤} \nabla^{2} f (x^{*}) d + o (α^{2}), \end{matrix}

which implies

d^{⊤} \nabla^{2} f (x^{*}) d \geq 0

,

\forall d \in T_{S}^{C} (x^{*})

. The desired result is derived. □

Finally, we give a second-order sufficient condition for the optimality of Problem (1).

Theorem 8

(Second-order sufficient condition). Let

x^{*} \in S

be an

N^{C}

-stationary point of Problem (1), if for any

d \in T_{S}^{C} (x^{*}) \ {0}

, it holds

d^{⊤} \nabla^{2} f (x^{*}) d > 0

, then the following two statements hold:

(i): $x^{*} \in R_{Γ (x^{*})}^{n}$ is a strictly local minimizer of Problem (1);
(ii): $x^{*}$ satisfies the second-order growth condition, that is, there are $ω > 0$ and $δ > 0$ such that for any $x \in B (x^{*}, δ) ⋂ R_{Γ (x^{*})}^{n}$ ,

$\begin{matrix} \begin{matrix} f (x) \geq f (x^{*}) + ω {∥ x - x^{*} ∥}^{2} . \end{matrix} \end{matrix}$

where

R_{Γ (x^{*})}^{n} = span {e_{i j}, i \in Γ (x^{*}), j = 1, \dots, n_{i}}

.

Proof.

(i) Since

x^{*}

is an

N^{C}

-stationary point of Problem (1), from Theorem 2, we have

\begin{matrix} \begin{matrix} {(\nabla f (x^{*}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{*}), \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) . \end{matrix} \end{matrix} \end{matrix}

For any

d \in T_{S}^{C} (x^{*})

, by (5),

\begin{matrix} d_{i} = 0, i \notin Γ (x^{*}) . \end{matrix}

Then for any

d \in T_{S}^{C} (x^{*}) \ {0}

, it holds

\begin{matrix} d^{⊤} \nabla f (x^{*}) = 0 . \end{matrix}

By Taylor’s Theorem, for any sufficiently small

α > 0

,

\begin{matrix} \begin{matrix} f (x^{*} + α d) = & f (x^{*}) + α d^{⊤} \nabla f (x^{*}) + \frac{1}{2} α^{2} d^{⊤} \nabla^{2} f (x^{*}) d + o (α^{2}) \\ = & f (x^{*}) + \frac{1}{2} α^{2} d^{⊤} \nabla^{2} f (x^{*}) d + o (α^{2}) . \end{matrix} \end{matrix}

Since

d^{T} \nabla^{2} f (x^{*}) d > 0, \forall d \in T_{S}^{C} (x^{*}) \ {0}

, then for any sufficiently small

α > 0

,

\begin{matrix} \begin{matrix} f (x^{*} + α d) = f (x^{*}) + \frac{1}{2} α^{2} d^{⊤} \nabla^{2} f (x^{*}) d + o (α^{2}) > f (x^{*}) . \end{matrix} \end{matrix}

Therefore,

x^{*}

is a strictly local minimizer of Problem (1).

(ii) Assume, on the contrary, that the second-order growth condition does not hold at

x^{*}

, then there is a sequence

{x^{t}}_{t \in N} \subset R_{Γ (x^{*})}^{n}

such that

{x^{t}}_{t \in N} \to x^{*}

but

\begin{matrix} \begin{matrix} f (x^{t}) < f (x^{*}) + \frac{1}{t} {∥ x^{t} - x^{*} ∥}^{2} . \end{matrix} \end{matrix}

Let

z^{t} = \frac{x^{t} - x^{*}}{∥ x^{t} - x^{*} ∥}

, then

∥ z^{t} ∥ = 1

. Since

{\frac{x^{t} - x^{*}}{∥ x^{t} - x^{*} ∥}}_{t \in N}

is bounded, without loss of generality, suppose

z^{t} \to z

, then

∥ z ∥ = 1

.

It follows

x^{t} \in R_{Γ (x^{*})}^{n}

that

Γ (x^{t}) \subseteq Γ (x^{*})

. Due to

lim_{t \to \infty} x^{t} = x^{*}

, we have

Γ (x^{*}) \subseteq Γ (x^{t})

, then

Γ (x^{t}) = Γ (x^{*})

for any sufficiently large t. From

z^{t} = \frac{x^{t} - x^{*}}{∥ x^{t} - x^{*} ∥}

, we get

Γ (z^{t}) \subseteq Γ (x^{t}) = Γ (x^{*}) and z^{t} \in R_{Γ (x^{*})}^{n} \ {0} .

Moreover, from

lim_{t \to \infty} z^{t} = z

, it follows

Γ (z) \subseteq Γ (z^{t}) \subseteq Γ (x^{*}) and z \in R_{Γ (x^{*})}^{n} \ {0}

for any sufficiently large t. According to (6), it holds

\begin{matrix} R_{Γ (x^{*})}^{n} = span {e_{i j}, i \in Γ (x^{*}), j = 1, \dots, n_{i}} = T_{S}^{C} (x^{*}) . \end{matrix}

Hence, for any

z^{t} \in R_{Γ (x^{*})}^{n} \ {0}

, we have

z^{t} \in T_{S}^{C} (x^{*}) \ {0}

, which together with (5) yields

\begin{matrix} {(z^{t})}^{⊤} \nabla f (x^{*}) = 0 . \end{matrix}

By Taylor’s Theorem,

\begin{matrix} f (x^{t}) - f (x^{*}) = {(x^{t} - x^{*})}^{⊤} \nabla f (x^{*}) + \frac{1}{2} {(x^{t} - x^{*})}^{⊤} \nabla^{2} f (x^{*}) (x^{t} - x^{*}) + o (∥ x^{t} - x^{*} ∥^{2}) . \end{matrix}

Since

\frac{{(x^{t} - x^{*})}^{⊤}}{∥ x^{t} - x^{*} ∥} \nabla f (x^{*}) = {(z^{t})}^{⊤} \nabla f (x^{*}) = 0

, we have

\begin{matrix} \frac{f (x^{t}) - f (x^{*})}{∥ x^{t} - x^{*} ∥^{2}} & = & \frac{1}{∥ x^{t} - x^{*} ∥^{2}} ({(x^{t} - x^{*})}^{⊤} \nabla f (x^{*}) + \frac{1}{2} {(x^{t} - x^{*})}^{⊤} \nabla^{2} f (x^{*}) (x^{t} - x^{*}) \\ + o (∥ x^{t} - x^{*} ∥^{2})) \\ = & \frac{\frac{1}{2} {(x^{t} - x^{*})}^{⊤} \nabla^{2} f (x^{*}) (x^{t} - x^{*})}{∥ x^{t} - x^{*} ∥^{2}} + o (1) \\ = & \frac{1}{2} {(z^{t})}^{⊤} \nabla^{2} f (x^{*}) z^{t} + o (1) . \end{matrix}

Under the assumption that

f (x^{t}) < f (x^{*}) + \frac{1}{t} {∥ x^{t} - x^{*} ∥}^{2}

, we obtain

\begin{matrix} \frac{1}{t} > \frac{f (x^{t}) - f (x^{*})}{∥ x^{t} - x^{*} ∥^{2}} = \frac{1}{2} {(z^{t})}^{⊤} \nabla^{2} f (x^{*}) z^{t} + o (1) . \end{matrix}

Letting

t \to \infty

, we get

z^{T} \nabla^{2} f (x^{*}) z \leq 0, where z \in R_{Γ (x^{*})}^{n} \ {0} = T_{S}^{C} (x^{*}) \ {0},

which contradicts the condition that

d^{⊤} \nabla^{2} f (x^{*}) d > 0

holds for any

d \in T_{S}^{C} (x^{*}) \ {0}

. Therefore, the second-order growth condition must hold at

x^{*}

. □

6. Concluding Remarks

In this paper, the first-order optimality conditions are built for group sparsity constrained optimization problems by use of Bouligand tangent cone, Clarke tangent and their corresponding normal cones, and the relationship among the local minimizers and the four types of stationary points of Problem (1) is investigated. Furthermore, the second-order sufficient and second-order necessary optimality conditions for group sparsity constrained optimization problems are provided. The results show that

N^{C}

-stationary points of Problem (1) may be strictly local minimizers, and even can fulfill the second-order growth condition under some mild conditions. The results provide the theoretical basis for analyzing or solving the group sparsity constrained optimization problems. In the future, we will use the optimality conditions to design algorithms for solving the problems.

Author Contributions

Methodology, D.P.; Project administration, D.P.; Supervision, D.P.; Writing original draft, W.W.; Writing review and editing, D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by NSFC (11861020), the Growth Project of Education Department of Guizhou Province for Young Talents in Science and Technology ([2018]121), the Foundation for Selected Excellent Project of Guizhou Province for High-level Talents Back from Overseas ([2018]03), and the Science and Technology Planning Project of Guizhou Province ([2018]5781).

Conflicts of Interest

The authors declare no conflict of interest.

References

Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 2006, 68, 49–67. [Google Scholar] [CrossRef]
Huang, J.; Breheny, P.; Ma, S. A selective review of group selection in high-dimensional models. Stat. Sci. 2012, 27, 481–499. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Ma, S.; Xue, H.; Zhang, C.H. A group bridge approach for variable selection. Biometrika 2009, 96, 339–355. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Meier, L.; van de Geer, S.; Bühlmann, P. The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B 2008, 70, 53–71. [Google Scholar] [CrossRef] [Green Version]
Yang, Y.; Zou, H. A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput. 2015, 25, 1129–1141. [Google Scholar] [CrossRef]
Beck, A.; Hallak, N. Optimization involving group sparsity terms. Math. Program. 2018, 178, 39–67. [Google Scholar] [CrossRef]
Hu, Y.; Li, C.; Meng, K.; Qin, J.; Yang, X. Group sparse optimization via ℓ_p,q regularization. J. Mach. Learn. Res. 2017, 18, 1–52. [Google Scholar]
Jiao, Y.; Jin, B.; Lu, X. Group sparse recovery via the ℓ₀(ℓ₂) penalty: Theory and algorithm. IEEE Trans. Signal Process. 2017, 65, 998–1012. [Google Scholar] [CrossRef]
Huang, J.; Zhang, T. The benefit of group sparsity. Ann. Stat. 2010, 38, 1978–2004. [Google Scholar] [CrossRef]
Agarwal, A.; Negahban, S.; Wainwright, M.J. Fast global convergence rates of gradient methods for high-dimensional statistical recovery. Int. Conf. Neural Inf. Process. Syst. 2010, 23, 37–45. [Google Scholar]
Attouch, H.; Bolte, J.; Svaiter, B.F. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 2013, 137, 91–129. [Google Scholar] [CrossRef]
Beck, A.; Eldar, Y. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim. 2013, 23, 1480–1509. [Google Scholar] [CrossRef] [Green Version]
Calamai, P.H.; More, J.J. Projection gradient methods for linearly constrained problems. Math. Program. 1987, 39, 93–116. [Google Scholar] [CrossRef]
Pan, L.L.; Xiu, N.H.; Zhou, S.L. On Solutions of Sparsity Constrained Optimization. J. Oper. Res. Soc. China 2015, 3, 421–439. [Google Scholar] [CrossRef]
Chen, X.J.; Pan, L.L.; Xiu, N.H. Solution sets of three sparse optimization problems for multivariate regression. Appl. Comput. Harmon. A 2020, revised. [Google Scholar]
Bian, W.; Chen, X.J. A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal. 2020, 58, 858–883. [Google Scholar] [CrossRef]
Peng, D.T.; Chen, X.J. Computation of second-order directional stationary points for group sparse optimization. Optim. Methods Softw. 2020, 35, 348–376. [Google Scholar] [CrossRef]
Pan, L.L.; Chen, X.J. Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging Sci. 2021. Available online: https://www.polyu.edu.hk/ama/staff/xjchen/Re_gsparseAugust.pdf (accessed on 5 November 2020).
Rockafellar, R.T.; Wets, R.J. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]

Figure 1. Bouligand tangent cones and Clarke tangent cones of S in

R^{3}

, where

S = {x \in R^{3} : ∥ x ∥_{2, 0} \leq 1}

,

x^{1} = (0, 1, 1), x^{2} = (0, 1, 0)

and

x^{3} = (1, 0, 0)

.

Figure 1. Bouligand tangent cones and Clarke tangent cones of S in

R^{3}

, where

S = {x \in R^{3} : ∥ x ∥_{2, 0} \leq 1}

,

x^{1} = (0, 1, 1), x^{2} = (0, 1, 0)

and

x^{3} = (1, 0, 0)

.

Table 1. The characterizations of

T^{B}

-,

N^{B}

-,

T^{C}

-,

N^{C}

- stationary point for Problem (1).

Table 1. The characterizations of

T^{B}

-,

N^{B}

-,

T^{C}

-,

N^{C}

- stationary point for Problem (1).

Stationary Point	$∥ x^{*} ∥_{2, 0} = k$	$∥ x^{*} ∥_{2, 0} < k$
$T^{B}$ -stationary point	${(\nabla f (x^{}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{}) \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) \end{matrix}$	$\nabla f (x^{*}) = 0$
$N^{B}$ -stationary point	${(\nabla f (x^{}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{}) \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) \end{matrix}$	$\nabla f (x^{*}) = 0$
$T^{C}$ -stationary point	${(\nabla f (x^{}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{}) \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) \end{matrix}$	${(\nabla f (x^{}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{}) \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) \end{matrix}$
$N^{C}$ -stationary point	${(\nabla f (x^{}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{}) \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) \end{matrix}$	${(\nabla f (x^{}))}_{i} \{\begin{matrix} = 0, & i \in Γ (x^{}) \\ \in R^{n_{i}}, & i \notin Γ (x^{*}) \end{matrix}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, W.; Peng, D. Optimality Conditions for Group Sparse Constrained Optimization Problems. Mathematics 2021, 9, 84. https://doi.org/10.3390/math9010084

AMA Style

Wu W, Peng D. Optimality Conditions for Group Sparse Constrained Optimization Problems. Mathematics. 2021; 9(1):84. https://doi.org/10.3390/math9010084

Chicago/Turabian Style

Wu, Wenying, and Dingtao Peng. 2021. "Optimality Conditions for Group Sparse Constrained Optimization Problems" Mathematics 9, no. 1: 84. https://doi.org/10.3390/math9010084

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimality Conditions for Group Sparse Constrained Optimization Problems

Abstract

1. Introduction

2. Notations and Definitions

3. Tangent Cones and Normal Cones of the Group Sparse Set $S$

4. First-Order Optimality Conditions for Problem (1)

5. Second-Order Optimality Conditions for Problem (1)

6. Concluding Remarks

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Optimality Conditions for Group Sparse Constrained Optimization Problems

Abstract

1. Introduction

2. Notations and Definitions

3. Tangent Cones and Normal Cones of the Group Sparse Set S

4. First-Order Optimality Conditions for Problem (1)

5. Second-Order Optimality Conditions for Problem (1)

6. Concluding Remarks

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Tangent Cones and Normal Cones of the Group Sparse Set $S$