Next Article in Journal
A Simplified Controller Design for Fixed/Preassigned-Time Synchronization of Stochastic Discontinuous Neural Networks
Previous Article in Journal
Multiplex Social Network Analysis to Understand the Social Engagement of Patients in Online Health Communities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Customized ADMM Approach for Large-Scale Nonconvex Semidefinite Programming

by
Chuangchuang Sun
Mississippi State University, Starkville, MS 39762, USA
Current address: 75 B. S. Hood Rd, Mississippi State, MS 39762, USA.
Mathematics 2023, 11(21), 4413; https://doi.org/10.3390/math11214413
Submission received: 23 September 2023 / Revised: 16 October 2023 / Accepted: 16 October 2023 / Published: 24 October 2023
(This article belongs to the Section Computational and Applied Mathematics)

Abstract

:
We investigate a class of challenging general semidefinite programming problems with extra nonconvex constraints such as matrix rank constraints. This problem has extensive applications, including combinatorial graph problems, such as MAX-CUT and community detection, reformulated as quadratic objectives over nonconvex constraints. A customized approach based on the alternating direction method of multipliers (ADMM) is proposed to solve the general large-scale nonconvex semidefinite programming efficiently. We propose two reformulations: one using vector variables and constraints, and the other further reformulating the Burer–Monteiro form. Both formulations admit simple subproblems and can lead to significant improvement in scalability. Despite the nonconvex constraint, we prove that the ADMM iterates converge to a stationary point in both formulations, under mild assumptions. Additionally, recent work suggests that in this matrix form, when the matrix factors are wide enough, the local optimum with high probability is also the global optimum. To demonstrate the scalability of our algorithm, we include results for MAX-CUT, community detection, and image segmentation.

1. Introduction

We consider rank-constrained semidefinite optimization problems (SDPs) of the type:
min Z , X f ( Z ) , s . t . A ( Z ) = b , Z = X X T , X C ,
where the matrix variable Z S + n is an n × n symmetric semidefinite matrix, and X R n × r a low-rank symmetric factor. The linear constraints A ( Z ) = b constrain either the diagonal or trace of Z, and the set C controls desirable features of the factor—e.g., nonnegativity, integer, norm-1, etc. ( C may be nonconvex.) The objective function f ( x ) is convex, differentiable everywhere, with L f -Lipschitz gradient, but the overall problem (1) is nonconvex.
This problem is equivalent to many important nonconvex SDPs, such as the MAX-CUT problem and its related applications [1,2,3], rank-constrained nonnegative matrix factorization problem [4,5], and constrained eigenvalue problems [6,7,8]. It is known that exactly solving (1) globally is in general a very difficult problem, as it includes many NP-hard problems. Methods for heuristically solving (1) fall in three categories: (i) solving the convexified SDP, where (1) does not have the rank-r or X C constraint, using any convex optimization method [9,10,11], (ii) approximately solving (1) using an alternating minimization method [12,13] and relying on statistical arguments suggesting that the acquired local optimal = the global optimal [13], or (iii) using other application-specific approaches [2,14]. The methods investigated in this paper fall in the second category. Specifically, we investigate solving (1) using ADMM and linearized ADMM on two reformulations. We find that these flexible reformulations allow easy incorporation of low-rank and sparse structures, making the resulting algorithm extremely scalable in both memory and computation, which we demonstrate on a number of popular applications.
However, often nonconvex formulations of SDPs are not favored because the convergence behavior of standard algorithms is not well understood. Specifically, an iterative procedure can do one of four things: diverge, oscillate within a bounded interval, converge to an arbitrary point, or converge to a useful point. We show that linearized ADMM on a nonsymmetric reformulation of (1) can either converge to a stationary point or diverge to ± ; it cannot oscillate or converge to a non-stationary point. Additionally, for the case without linear constraints, vanilla ADMM is guaranteed to converge to a stationary point with a monotonically decreasing augmented Lagrangian term, and at a linear rate if the objective is strongly convex.

2. Applications

It is well known that many convex optimization problems can be reformulated as SDPs (e.g., [15]). In nonconvex optimization, SDPs are studied in several key areas, as tight convex relaxations of otherwise NP-hard problems.

2.1. Combinatorial Problems

A simple reparameterization of the constraint x R n , x i { 1 , 1 } is as X = x x T , diag ( X ) = 1 . This property has been heavily exploited for finding lower bounds in combinatorial optimization [9,16,17] and generalized further to polynomial optimization [18,19]. Of high interest is the MAX-CUT problem:
min x R n x T C x , s . t . x i { 1 , 1 } , i = 1 , , n ,
where C = ( A diag ( A 1 ) ) / 4 and A S n is the symmetric adjacency matrix of an undirected graph. Written in this way, the solution to (1) is exactly the maximum cut of an undirected graph with nonnegative weights A i j .
This seemingly simple framework appears in many other applications, such as community detection [20] and image segmentation [21], and is equivalent to the nonconvex SDP:
min Z Tr ( C Z ) , s . t . Z k k = 1 , Z 0 , rank ( Z ) = 1 .
Lifting x R n to a skinny matrix X R n × k generalizes this technique to partitioning [22] and graph coloring problems [23].

2.1.1. Related Works on MAX-CUT

More generally, combinatorial methods can be solved using branch-and-bound schemes, using a linear relaxation of (1) as a bound [24,25], where the binary constraint x { 1 , 1 } is relaxed to 0 ( x + 1 ) / 2 1 . Historically, these “polyhedral methods” were the main approach to finding exact solutions to the MAX-CUT problem. Though this is an NP(non-deterministic polynomial-time)-hard problem, if the graph is sparse enough, branch-and-bound converges quickly even for very large graphs [25]. However, when the graph is not very sparse, the linear relaxation is loose, and finding efficient branching mechanisms is challenging, causing the algorithm to run slowly. The MAX-CUT problem can also be approximated by one pass of the linear relaxation (with bound f relax f exact 2 × # edges) [26].
A tighter approximation can be found with the semidefinite relaxation, which is also used for better bounding in branch-and-bound techniques [27,28,29,30,31]. In particular, the rounding algorithm of [9] returns a feasible x ^ given optimal Z, and is shown in expectation to satisfy x T C x x ^ T C x ^ 0.878 . For this reason, the semidefinite relaxation for problems of type (1) is heavily studied (e.g., [11,32,33]).

2.1.2. Specialization to Community Detection

A small modification of the matrix C generalizes problems of form (2) and (3) to community detection in machine learning. Here, the problem is to identify node clusters in undirected graphs that are more likely to be connected with each other than with nodes outside the cluster. This prediction is useful in many graphical settings, such as interpreting online communities through social networks or linking behavior [34], interpreting biological ecosystems [35], finding disease sources in epidemiology [36], and many more. There are many varieties and methodologies in this field, and it would be impossible to list them all, though many comprehensive overviews exist (e.g., [2]).
The stochastic binary model [37] is one of the simplest generative models for this application. Given a graph with n nodes and parameters 0 < q < p < 1 , the model partitions the nodes into two communities and generates an edge between nodes in a community with probability p and nodes in two different communities with probability q. Following the analysis in [20], we can define C = p + q 2 1 1 T A , where A is the graph adjacency matrix, and the solution to (1) gives a solution to the community detection problem with sharp recovery guarantees.

2.2. Nonnegative Factorization

For a symmetric matrix C, the maximum eigenvalue/eigenvector pair of C is the solution to the nonconvex optimization problem:
max x R n x T C x , s . t . x 2 = 1 .
By inverting the sign of C, we can transform this into a minimization problem or equivalently acquire the minimum eigenvalue/eigenvector pair. Interestingly, despite the nonconvex nature of (4), we have many efficient globally optimal methods for finding x, e.g., Lanczos, Arnoldi, etc. However, adding any additional constraints, such as nonnegativity of x [38], and simple methods generally do not work without heavy data assumptions [39]. This is of interest in problems such as phase retrieval, recommender systems with positive-only observations, clustering and topic models, etc. Here, we discuss three variations of the nonnegative factorization problem appearing in the literature, all of which are special instances of (1).

2.2.1. Optimization over Spectrahedron

We can frame (4) as a linear objective over the spectrahedron:
min Z S n Tr ( C Z ) , s . t . Tr ( Z ) = 1 , Z 0 .
If additionally the maximum eigenvalue of C is isolated (corresponding only to one leading eigenvector), then Z = x x T and C x = λ max ( C ) x . To see this, by definition,
λ max ( C ) = max x : x 2 = 1 x T C x = max Z : Z = x x T , x 2 = 1 Tr ( C Z ) = max Z : Tr ( Z ) = 1 , X 0 Tr ( C Z ) .
As a consequence, note that though (5) is convex, the solution Z * will always have rank 1 when λ max ( C ) has multiplicity 1. A simple extension of (5) often used in nonnegative PCA [40] is:
min Z S n , x R n Tr ( C Z ) , s . t . Tr ( Z ) = 1 , Z 0 , Z = x x T , x 0 ,
which is an instance of (1) with C the nonnegative orthant.

2.2.2. Factorization with Partial Observations

An equivalent way of formulating the top-k nonnegative-eigenvector problem is as the nonnegative minimizer X to X X T C 2 where X is R n × k . However, in many applications, we may not have full view of the matrix C, (e.g., C is a rating matrix). Suppose that an index set Ω defines the observed entries, e.g., { i , j } Ω implies that C i j is known. Then, the nonnegative factorization problem can be written as:
min Z S n , x R n i , j Ω ( Z i j C i j ) 2 , s . t . Z = x x T , x 0 .
This formulation exists in [41].

2.2.3. Projective Nonnegative Matrix Factorization

A third method toward this goal is to optimize over the low-rank projection matrix itself [42], a variant of nonnegative matrix factorization, solving:
min Z S n , X R n × k B Z B 2 , s . t . Z = X X T , X 0 ,
Here, the data matrix may not even be symmetric, but 1 Tr ( Z ) Z B will approximate the projection of B to its top-k singular vectors.

3. Related Work

3.1. Convex Relaxations

If r = n and C = S n , then (1) is a convex problem, and can be solved using many conventional methods with strong convergence guarantees. However, even in this case, if n is large, traditional semidefinite solvers are computationally limiting. In the most general case, an interior point method solves at each iteration a KKT system of at least order n 6 , and most first-order methods for general SDPs require eigenvalue decompositions, which are of order O ( n 3 ) per iteration.

3.2. Low-Rank Convex Cases

In fact, assuming low-rank solutions often allows for the construction of faster SDP methods. In [43], it is noted that the rank of the primal PSD matrix variable is equal to the multiplicity of the matrix variable arising from the gauge dual formulation, and finding only those r corresponding eigenvectors can recover the primal solution. In [10], a similar observation is made of the Lagrange dual variable and thus the dual problem can be solved via a modified bundle method. More generally, the recently popularized conditional gradient algorithm (also called the Frank–Wolfe algorithm) efficiently solves norm-constrained problems for nonsymmetric matrices [44], exploiting the fact that the dual norm minimizer can be computed efficiently; see also [45,46,47].

3.3. Nonconvex Cases

In close connection with these observations, [12,48] proposed simply reformulating semidefinite matrix variables Z = X X T , solving the “standard” nonconvex SDP:
min X R n × r C , X X T , s . t . A ( X X T ) = b ,
by sequentially optimizing the Lagrangian. However, solving (1) is still numerically burdensome; in the augmented Lagrangian term, the objective is quartic in R, and is usually solved using an iterative numerical method, such as L-BFGS.

3.4. Global Optimality of a Nonconvex Problem with Linear Objective

The main motivation behind solving rank-constrained problems using convex optimization methods comes from key results in  [49,50] which show that for a linear SDP, when X * is the optimum and r = rank ( X * ) , then r ( r + 1 ) 2 m where m is the number of linear constraints. Furthermore, a recent work [13] shows that almost all local optima of FSDP are also global optima, suggesting that any stationary point of the FSDP is also a reasonable approximation of (1), if the constraint space of (10) is compact and sufficiently smooth, e.g., A i Y linearly independent whenever A i , Y Y T = b i for all i = 1 , , m . The MAX-CUT problem satisfies this constraint; an example of a linear SDP without this condition is the phase retrieval problem [51], when m > n .

3.5. Nonconvex Constraint C

Although there are many cases where the linear constraint in (1) serves a distinct purpose, largely it is introduced to tighten the convex relaxation. When working in the nonconvex formulation, for many applications, the linear constraint becomes superfluous, and a more useful reformulation may be:
min x , y g ( x ) , s . t . x = y , y C ,
for some nonconvex set C (e.g., C = { 1 , 1 } n ). Note that the projection on C is extremely easy, despite its nonconvexity. Although less explored, this idea is not new; see [52] (chapter 9).

3.6. ADMM for Nonconvex Problems

The alternating direction method of multipliers (ADMM) [53,54] is now a popular method [52] for convex large-scale distributed optimization problems with understood convergence rates [55] and variations [56,57,58]. It is closely related to dual decomposition methods, but alternates its subproblems, and makes use of augmented Lagrangians, which smooths the subproblems and reduces the influence of the dual ascent step size. Although there are extensions to many variable blocks, most ADMM implementations use two variable block decompositions, solving:
min x g ( x ) + h ( y ) , s . t . A x = B y ,
by alternatingly minimizing over each variable in the augmented Lagrangian:
L ρ ( x , y ; u ) = g ( x ) + h ( y ) + u T ( A x B y ) + ρ 2 A x B y 2 2 ,
and then incrementally updating the dual variable:
x + = arg min x L ρ ( x , y ; u ) , y + = arg min x L ρ ( x + , y ; u ) , u + = u + ρ ( A x + B y + ) .
Here, any ρ > 0 will achieve convergence.
In general, there is a lack of theoretical justification for ADMM on nonconvex problems despite its good numerical performance. Almost all works concerning ADMM on nonconvex problems investigate when nonconvexity is in the objective functions ([59,60,61,62,63], and also [64,65] for matrix factorization). Under a variety of assumptions (e.g., convergence or boundedness of dual objectives) they are shown to converge to a KKT stationary point.
In comparison, relatively fewer works deal with nonconvex constraints. Ref. [66] tackles polynomial optimization problems by minimizing a general objective over a spherical constraint x 2 = 1 , Ref. [67] solves general QCQPs, and Ref. [68] solves the low-rank-plus-sparse matrix separation problem. In all cases, they show that all limit points are also KKT stationary points, but do not show that their algorithms will actually converge to the limit points. In this work, we investigate a class of nonconvex constrained problems and show with much milder assumptions that the sequence always converges to a KKT stationary point.
We now present our main results, the algorithms, and convergence analysis for different formulations.

4. Linearized ADMM on Full SDP

We first investigate a reformulation of (1) as:
min Z , X , Y f ( Z ) + δ { 0 } ( A ( Z ) b ) + δ C ( Y ) , s . t . Z = ( X Y T ) Ω , X = Y ,
with variables Z S n × n , X R n × r , and Y R n × r . The affine and C constraints are lifted to the objective via an indicator function:
δ C ( x ) = 0 if x C , else .
The notation A Ω for a symmetric matrix A is the projection of A on the sparsity pattern  Ω :
( A Ω ) i j = A i j , if { i , j } Ω 0 , else ,
and we write A S Ω n if A Ω = A . Specifically, Ω captures the effective sparsity of the problem; that is, f ( Z ) = f ( Z Ω ) and A ( Z ) = A ( Z Ω ) . We assume { i , i } Ω for all i, so the second is trivially true.

4.1. Duality

As shown in [69], a notion of a dual problem can be established via the augmented Lagrangian of (11):
L ρ ( Z , X , Y ; S , U ) = f ( Z ) + δ C ( Y ) + U , X Y + S , Z X Y T + ρ 2 X Y F 2 + ρ 2 Z X Y T F 2 ,
where the dual problem is:
max S , U min Z , X , Y L ρ ( Z , X , Y ; S , U ) .
The minimization of L ρ over Z and X is the solution to:
f ( Z ) A * ( ν ) + S + ρ ( Z X Y T ) = 0 U S Y + ρ ( X Y T Y Z Y ) + ρ ( X Y ) = 0 A ( Z ) = b ,
where ν > 0 is a Lagrange dual variable for the local constraint A ( Z ) = b . The minimization of L ρ over Y is the solution to the generalized projection problem:
min Y C Y Y ^ , Y Y ^ H = Tr ( ( Y Y ^ ) H ( Y Y ^ ) T ) ,
where: Y ^ = U + S X + ρ ( X + Z T X ) , H = ρ ( I + X T X ) . For general nonconvex problems, it is difficult to guarantee global minimality. Here, we introduce two sought-after properties that are more reasonably attainable.
 Definition 1 
([70]). The tangent cone of a nonconvex set C at x is given by T C ( x ) = { d : for all t 0 , x ^ x , x ^ C , there exists d ^ d , x ^ + t d ^ C } . The normal cone of C at x ( : = N C ( x ) ) is the polar of the tangent cone.
 Definition 2. 
For a minimization of a smooth constrained function min x C f ( x ) we say that x * is a KKT-stationary point if f ( x * ) N C ( x * ) .
 Definition 3. 
For a function defined over M variables L ( X 1 , , X m ) , we say that X 1 * , , X m * are (block) coordinatewise minimum points if for each k = 1 , , m ,
X k * = argmin X L ( X 1 * , , X k 1 * , X , X k + 1 * , , X m * ) .
Note that it is not always the case that stationarity is stronger than coordinatewise minimum. A simple example is C = { 1 , 1 } n . Then, for all points x C , the tangent cone is { 0 } and the normal cone is R n . Then, every point in C is stationary, no matter the objective function.
 Proposition 1. 
If Algorithm 1 converges to coordinatewise minimum points ( ( X , Z ) * , Y * , S * , U * ) , then the primal points (i) satisfy (13) for some choice of ν 0 , (ii) minimize (14), (iii) and are primal-feasible, e.g., X * = Y * and ( X * ( Y T ) * ) Ω = Z * . Furthermore, ( X * , Y * , Z * , S * , U * ) are stationary points of (12).
 Proof. 
It is clear that the convergent points of Algorithm 1 exactly satisfy the three conditions. To show that these points are stationary, note that the augmented Lagrangian is convex with respect to X , Z jointly, and is a projection on a compact set C with respect to Y. Therefore:
X , Z , S , U L ρ ( Z * , X * , Y * ; S * , U * ) = 0 , Y L ¯ ρ ( Z * , X * , Y * ; S * , U * ) N C ( Y * ) ,
where L ¯ ρ ( Z , X , Y ; S , U ) = U , Y S , X Y T + ρ 2 X Y F 2 + ρ 2 Z X Y T F 2 , with all the differentiable terms of L ρ involving Y.    □

4.2. Linearized ADMM

We propose to solve (11) via the linearized ADMM, e.g., where at each iteration, the objective is replaced by its current linearization:
f ( Z ) f ^ k ( Z ) : = f ( Z k 1 ) + f ( Z k 1 ) , Z Z k 1 .
We then build the linearized augmented Lagrangian function as:
L ^ k ( Z , X , Y ; S , U ) = g k ( X , Z ) + h ( Y ) + U , X Y + S , Z X Y T + ρ 2 X Y F 2 + ρ 2 Z X Y T F 2
where g k ( X , Z ) = f ^ k ( Z ) + δ { 0 } ( A ( Z ) b ) , h ( Y ) = δ C ( Y ) and S R n × n and U R n × r are the dual variables corresponding to the two coupling constraints. The full algorithm is given in Algorithm 1.
Algorithm 1 ADMM for solving (11)
1:
Inputs:  ρ 0 > 0 , α > 1 , tol ϵ > 0
2:
Initialize:  Z 0 , X 0 ; S 0 , U 0 as random matrices
3:
Outputs: Z, X = Y
4:
for  k = 1 do
5:
    Update Y k + 1 the solution of:
min Y R n × k Z k X k Y T + S k ρ k F 2 + X k Y + U k ρ k F 2 , s . t . Y C
6:
    Update ( Z , X ) k + 1 as the solutions of:
min X , Z S Ω n L k + 1 ( Z , X , Y k + 1 ; S k , U k ; ρ k ) , s . t . A ( Z ) = b
where L is the linearized augmented Lagrangian as defined in (16).
7:
    Update S , U and ρ via:
S k + 1 = S k + ρ k ( Z k + 1 X k + 1 ( Y k + 1 ) T ) Ω U k + 1 = U k + ρ k ( X k + 1 Y k + 1 ) ρ k + 1 = α ρ k
8:
    if  max { X k Y k , ( Z k X k ( Y k ) T ) Ω } ϵ  then
9:
          break
10:
     end if
11:
end for

4.2.1. Minimizing over Y

The generalized projection (14) can be solved a number of ways. Note that if r = 1 , then H is a positive scalar, and the problem reduces to Y + = proj Y C 1 H Y ^ . When C = { 1 , 1 } n , this process reduces to recovering the signs of Y ^ i.e., Y i = sign C ( Y ^ i ) , and when C = { u : u 2 = 1 } the set of unit-norm vectors, Y is just a properly scaled version of Y ^ : Y = 1 Y ^ 2 Y ^ . However, in general, it is difficult to compute the generalized projection over a nonconvex set. When C is convex, the generalized projection problem (14) can be computed using projected gradient descent. Note that the objective of (14) is 1-strongly convex; thus, we expect fast convergence in this subproblem. In practice, we find that if r is not too large, often a few tens of iterations is enough.

4.2.2. Minimizing over X and Z

Using standard linear algebra techniques, the linear system (13) can be reduced to a few simple instructions. First, we solve for the Lagrange dual variable ν associated with the linear constraints (and localized to the minimization of X and Z):
A ( A * ( ν ) ( Y Y T + I ) ) = ρ ( b A ( D Y T + Y Y T ) ) + A ( ( G + S ) ( I + Y Y T ) ) ,
where D = 1 ρ ( S Y U ) + Y and G = f ( Z k 1 ) the local gradient estimate. When A = diag , (20) reduces to n scalar element-wise computations ν i = ρ ( b ( D Y T ) i i ) + ( ( G + S ) ( I + Y Y T ) ) i i ( Y Y T ) i i + 1 . When A = Tr , ν = ρ ( b Tr ( D Y T ) + Tr ( ( G + S ) ( I + Y Y T ) ) Tr ( Y Y T ) + 1 . Note that in both cases, no n × n matrix need ever be formed, so the memory requirement remains O ( n r ) . (See Appendix A for elaboration). Then, the primal variables are recovered via X = B Y + D , and Z = ( X Y T ) Ω + B , with B = 1 ρ ( C A * ( ν ) + S ) . In these cases, the complexity is dominated by multiplications between n × n and n × r matrices. Thus, the method is especially efficient when r n .

4.3. Convergence Analysis

 Theorem 1. 
Assume that f ( Z ) is L f -smooth. Assume the dual variables are bounded, e.g., max { S k F , U k F , Y k F } k B P < + , and L f σ max is bounded above, where σ max = 1 σ Y 4 + 4 σ Y 2 σ Y 2 2 , σ Y = Y k + 1 2 . Then, by running Algorithm 1 with ρ k = α ρ k 1 = α k ρ 0 , if L k is bounded below, then the sequence { P k , D k } converges to a stationary point of (12).
 Proof. 
See Appendix B.    □
 Corollary 1. 
If r 2 n and the stationary point of Algorithm 1 converges to a second-order critical point of (1), then it is globally optimal for the convex relaxation of (10) [13].
Unfortunately, the extension of KKT stationary points to global minima is not yet known when r ( r + 1 ) 2 < n (i.e., r = 1 ). However, our empirical results suggest that even when r = 1 , often a local solution to (10) well-approximates the global solution to (1).

5. ADMM on Simplified Nonconvex SDP

When the linear constraints are not present, (1) can be reformulated without Z, into:
min X , Y g ( X ) + δ C ( Y ) , s . t . X = Y ,
with matrix variables X R n × r , Y R n × r , and where g ( X ) = f ( X X T ) is smooth. We can also define an augmented Lagrangian of (21) as L ρ ( X , Y ; U ) = g ( X ) + δ C ( Y ) + U , X Y + ρ 2 X Y F 2 .
 Theorem 2. 
The coordinatewise minimum points X * = Y * satisfying:
0 = g ( X * ) + U + ρ ( X Y ) Y = proj C ( X + 1 ρ U ) X = Y ,
are the stationary points of the problem:
min X g ( X ) , s . t . X C .
 Proof. 
The KKT stationary points of (23) can be characterized in terms of the normal cone of C at X * ; specifically, X * is stationary if:
g ( X * ) , X X * 0 , X C N ϵ ( X * ) ,
where N ϵ ( X * ) is some small neighborhood containing X * . (This is an equivalent definition of the Clarke stationary point [70], since in a close enough neighborhood to X * , the subdifferential of δ C ( x ) is N C ( x ) ).
Combining terms in (22) gives X * = Y * satisfying X * = proj C X * 1 ρ g ( X * ) . The optimality condition of the projection is X ( X 1 ρ g ( X * ) ) , X X * 0 , X C N ϵ ( X * ) which reduces to the desired condition.    □

5.1. ADMM

The alternating steps in minimizing the augmented Lagrangian over the primal variables are extremely simple, compared with the previous matrix formulation. In general, we are considering f ( X ) linear (in which case the update of X involves only addition) or quadratic with strictly positive diagonal Hessian (which adds a small scaling step). C = { 1 , 1 } n , C = { x : x 2 = 1 } , even when r > 1 .

5.2. Convergence Analysis

 Definition 4. 
A differentiable convex function g ( X ) is L g -smooth and H g -strongly convex over R n if for any X, Y, g ( X ) g ( Y ) f ( X ) , X Y L g 2 X Y F 2 and g ( X ) g ( Y ) f ( X ) , X Y H g 2 X Y F 2 .
 Theorem 3. 
Assume g ( X ) is lower bounded over C , and is L g -smooth. Given a sequence { ρ k } such that:
ρ k 3 L g 2 L g 2 ρ k + 1 + ρ k 2 ( ρ k ) 2 > 0 , ρ k > L g
for all k, then under Algorithm 2 the augmented Lagrangian L ( X k , Y k ; U k ) is lower bounded and convergent, with { X k , Y k , U k } { X * , Y * , U * } a stationary and feasible solution of (23).
 Proof. 
See Appendix C.    □
 Remark 1. 
Convergence is guaranteed under a constant penalty coefficient ρ k ρ 0 3 + 17 2 L g , α = 1 . However, in implementation, we find empirically that increasing { ρ k } from a relatively small ρ 0 can encourage convergence to more useful global minima.
Algorithm 2 ADMM for solving (23)
1:
Inputs:  ρ 0 > 0 , α > 1 , tol ϵ > 0
2:
Initialize:  Z 0 , X 0 ; S 0 , U 0 as random matrices
3:
Outputs: Z, X = Y
4:
for  k = 1 do
5:
    Update Y k + 1 the solution of:
min Y R n × k X k Y + U k ρ k F 2 , s . t . Y C .
6:
    Update X k + 1 as the solution of:
0 = g ( X ) + U + ρ ( X Y ) .
7:
    Update U and ρ via:
U k + 1 = U k + ρ k ( X k + 1 Y k + 1 ) , ρ k + 1 = α ρ k .
8:
    if  X k Y k F     ϵ  then
9:
         break
10:
     end if
11:
end for
 Theorem 4. 
If g ( X ) is H g -strongly convex and ρ k = ρ constant, with ρ + H g 2 L g 2 ρ , ρ > L g then under Algorithm 2 the augmented Lagrangian L ( X k , Y k ; U k ) converges to L ( X * , Y * , U * ) at a linear rate.
 Proof. 
See Appendix C.1. □

6. Numerical Experiments

In this section, we give numerical results on the proposed methods for community detection, MAX-CUT, image segmentation, and symmetric matrix factorization. In each application, we evaluate and compare these four methods. (i) SD: the solution to a semidefinite relaxation of (1) (SDR), where C = R n , r . The binary vector factor x where x x T = Z is recovered using a Goemans–Williamson style rounding. [9] technique. This is our baseline method and is described in more detail below. (ii) MR1: Algorithm 1 with r = 1 . (iii) MRR: Algorithm 1 with r = 2 n , then rounded to a binary vector using a nonsymmetric version of the Goemans–Williamson style rounding [9] technique. Both MR1 and MRR have the following stopping criterion max { P ( k ) , D ( k ) } ϵ for some tolerance parameter ϵ > 0 , where: P ( k ) : = Z k Z k 1 2 Z k 2 , X k X k 1 2 X k 2 , Y k Y k 1 2 Y k 2 , D ( k ) : = max Z ( k ) X ( k ) ( Y ( k ) ) T 2 Z k 2 , X ( k ) Y ( k ) 2 X ( k ) 2 . (Here, D ( k ) is also proportional to the difference in dual iterates, and thus P ( k ) and D ( k ) can be interpreted as primal and dual residuals, respectively). (iv) V: Algorithm 2, with stopping criterion max { P ( k ) , D ( k ) } ϵ where P ( k ) : = x k x k 1 2 x k 2 , y k y k 1 2 y k 2 , D ( k ) : = x k y k 2 x k 2 . The same primal and dual residual interpretation can be used here as well. In all cases, we use the following scheme for ρ : ρ k = min { ρ max , ρ k 1 * γ } , where ρ max 10 , 000 and γ 1.05 (slightly larger than 1).

6.1. Solving the Baseline (SDR)

As a baseline, we compare against the solution of the semidefinite relaxed problem without factor variables X (e.g., C = R n , n ):
min Z f ( Z ) , s . t . A ( Z ) = b , Z 0 .
For a fair comparison, we use a first-order splitting method very similar to ADMM, which is the Douglas–Rachford Splitting (DRS) method ([71,72], see also [73,74]). We introduce dummy variables and solve the reformulation of (27):
min Z 1 , Z 2 , Z 3 g 1 ( Z 1 ) + g 2 ( Z 2 ) + g 3 ( Z 3 ) , s . t . Z 1 + Z 2 + Z 3 ,
where g 1 ( Z 1 ) = Tr ( C Z 1 ) , g 2 ( Z 2 ) = 0 , A ( Z 2 ) = b + , else , g 3 ( Z 3 ) = 0 , Z 3 0 + , else . . An application of the DRS on this reformulation (see also Algorithm 3.1 in [75]) is then the following iteration scheme: for i = 1 , 2 , 3 ,
X i ( k + 1 ) = prox t g i ( Z i ) , Y ^ i = 2 X i ( k + 1 ) Z i ( k ) , Y ( k + 1 ) = 1 3 ( X 1 ( k + 1 ) + X 2 ( k + 1 ) + X 3 ( k + 1 ) ) , Z i ( k + 1 ) = Z i ( k ) + ρ ( Y ( k + 1 ) X i ( k + 1 ) )
and for a convex function f  z = prox t f ( u ) argmin z f ( z ) + 1 2 t z u 2 2 .

6.2. Rounding

Following the technique in [9], we can estimate x from a rank r matrix X x x T by randomly projecting the main eigenspaces on the unit sphere. The exact procedure is as follows. (i) For the symmetric SDP solution X, we first perform an eigenvalue decomposition X = Q Λ Q T and form a factor F = Q Λ 1 / 2 where the diagonal elements of Λ are in decreasing magnitude order. Then, we scan k = 1 , , n and find x k , t = sign ( F k z t ) for trials t = 1 , 10 . Here, F k contains the first k columns of F, and each element of z t R k is drawn i.i.d from a normal Gaussian distribution. We report the values for x r = argmin x k , t { x r T C x r } . (ii) For the MRR method, we repeat the procedure using a factor F = U Σ 1 / 2 where X = U Σ V T is the SVD of X. (iii) For MR1 and V, we simply take x r = sign ( x ) as the binary solution.

6.3. Computer Information

The following simulations are performed on a standard desktop computer with an Intel Xeon processor (3.6 GHz), and 32 GB of RAM. It is running with Matlab R2017a.

6.4. MAX-CUT

Table 1 gives the best MAX-CUT values using best-of-random-guesses and our approaches over four examples from the seventh DIMACS Implementation Challenge in 2002 (see http://dimacs.rutgers.edu/Workshops/7thchallenge/, problems downloaded from http://www.optsicom.es/maxcut/). Often, we find the quality of our recovered solutions close to the best-known solutions and often achieve similar suboptimality as the rounded SDR solutions. However, the runtime comparison (Figure 1) suggests that the ADMM methods (especially MR1 and SDR) are much more computationally efficient and scalable. All experiments are performed with ϵ = 1 × 10 3 .

6.5. Image Segmentation

Both community detection and MAX-CUT can be used in image segmentation, where each pixel is a node and the similarity between pixels forms the weight of the edges. Generally, solving (1) for this application is not preferred, since the number of pixels in even a moderately sized image is extremely large. However, because of our fast methods, we successfully performed image segmentation on several thumbnail-sized images, as seen in Figure 2.
The C matrix is composed as follows. For each pixel, we compose two feature vectors: f c i j containing the RGB values and f p i j containing the pixel location. Scaling f c i j by some weight c, we form the concatenated feature vector f i j = [ f c i j , c f p i j ] , and form the weighted adjacency matrix as the squared distance matrix between each feature vector A ( i j ) , ( k l ) = f i j f k l 2 2 . For MAX-CUT, we again form C = A Diag ( A 1 ) as before. For community detection, since we do not have exact p and q values, we use an approximation as C = a 1 1 T A where a = 1 n 2 1 T A 1 the mean value of A. Sweeping C and ρ 0 , we give the best qualitative result in Figure 2.

6.6. Symmetric Factorization with Partial Observations

Recall the factorization with partial observations formulation as follows:
min Z S n , X R n × r i , j Ω ( Z i j C i j ) 2 , s . t . Z = X X T , X 0 .
Note that here we generalize the aforementioned formulation with r = 5 . In this setting, while the strongly convex Y update in the proposed algorithm can no longer be solved in closed form, projected gradient descent is applied to deal with it. The relative error defined as ( Z * C ) Ω / C Ω and CPU time with varying problem size and sparsity are demonstrated in Table 2.

7. Conclusions

We present two methods for solving quadratic combinatorial problems using ADMM on two reformulations. Though the problem has a nonconvex constraint, we give convergence results to KKT solutions under mild conditions. From this, we give empirical solutions to several graph-based combinatorial problems, specifically MAX-CUT and community detection; both can be used in additional downstream applications, like image segmentation.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Derivation of X, Z Update

In a linearized case, consider G = f ( Z k 1 ) = G Ω . Then, the optimality conditions are:
G A * ( ν ) + S + ρ ( Z ( X Y T ) Ω ) = 0 U S Y + ρ ( ( X Y T ) Ω Y Z Y ) + ρ ( X Y ) = 0 A ( Z ) = b .
Using D = ρ 1 ( S Y U ) + Y , B = ρ 1 ( G A * ( ν ) + S ) , we obtain:
B + Z ( X Y T ) Ω = 0 D + ( X Y T ) Ω Y Z Y + X = 0 A ( Z ) = b .
Substitute for Z: Z = ( X Y T ) Ω + B D + ( ( X Y T ) Ω + B ) Y = ( X Y T ) Ω Y + X D + B Y = X . Since we assume the diagonal is in Ω , A ( X Ω ) = A ( X ) , so to solve for ν :
A ( ( X Y T ) Ω + B ) = A ( X Y T + B ) = A ( ( D + B Y ) Y T + B ) = b ,
and therefore A ( B ( Y Y T + I ) ) = b A ( D Y T ) . Insert B and simplify:
b A ( D Y T ) = A ( ( ρ 1 ( G A * ( ν ) + S ) ) ( Y Y T + I ) ) = ρ 1 A ( ( G A * ( ν ) + S ) ( Y Y T + I ) ) ,
and thus:
b A ( D Y T ) + ρ 1 A ( ( G + S ) ( Y Y T + I ) ) = ρ 1 A ( A * ( ν ) ( Y Y T + I ) ) = ρ 1 H ν ,
where H is an m × m matrix with: H i j = A i , A j ( Y Y T + I ) . Thus this system reduces to ν = H 1 b A ( D Y T ) + ρ 1 A ( ( G + S ) ( Y Y T + I ) ) .

Implicit Inverse of H

When A = diag , (20) reduces to n scalar element-wise computations ν i = ρ ( b ( D Y T ) i i ) + ( ( G + S ) ( I + Y Y T ) ) i i ( Y Y T ) i i + 1 . When A = Tr , ν = ρ ( b Tr ( D Y T ) + Tr ( ( G + S ) ( I + Y Y T ) ) Tr ( Y Y T ) + 1 . Note that in both cases, the computation for ν can be done without ever forming an n × n matrix. For example, for A = diag , D Y i i T = ρ 1 ( S Y Y T ) i i ρ 1 ( U Y T ) i i + ( Y Y T ) i i Recall that for any two matrices A, B R n × r , ( A B T ) i i = A i T B i where A i , B i are the ith rows of A and B; thus, an efficient way of computing ν is (i) Compute more skinny matrices F 1 = S Y , F 2 = G Y . (ii) Compute the element-wise products G 1 = F 1 Y , G 2 = U Y , G 3 = F 2 Y , and G 4 = Y Y , where ( A B ) i j = A i j B i j (element-wise multiplication). (iii) Compute the row sums g i = G i 1 , i = 1 , , 4 . (iv) Compute the “numerator vector” h 1 = ρ ( b ( ρ 1 ( g 1 g 2 ) + g 4 ) + diag ( G ) + diag ( S ) + g 3 + g 1 and “denominator vector” h 2 = g 4 + 1 . (v) Then, ν i = ( h 1 ) i ( h 2 ) i .
A similar procedure can be executed for A = Tr to keep memory requirements low.

Appendix B. Convergence Analysis for Matrix Form

To simplify notation, we first collect the primal and dual variables P k = ( Z , X , Y ) k and D k = ( Λ 1 , Λ 2 ) k . We define the augmented Lagrangian at iteration k as:
L k : = L ( P k ; D k ; ρ k ) = f ( Z k ) + δ C ( Y ) + U , X Y + S , Z X Y T + ρ 2 X Y F 2 + ρ 2 Z X Y T F 2 ,
and its linearization at iteration k as:
L ¯ k : = L ¯ ( P k ; D k ; ρ k ; f ¯ k ) = f ¯ k + δ C ( Y ) + U , X Y + S , Z X Y T + ρ 2 X Y F 2 , + ρ 2 Z X Y T F 2
Here, f ¯ k : = f ( Z k 1 ) + G k 1 , Z Z k 1 such that f k is the linearization of f at Z k 1 .
 Lemma A1. 
2 L Y = 2 L ¯ Y ρ k I .
 Proof. 
Given the definition of L , we can see that the Hessian 2 L Y = ρ k M + I ρ k I where M = blkdiag ( X T X , X T X , . . . ) 0 .
 Lemma A2. 
2 L ¯ ( X , Z ) ρ k 1 λ N 2 + 4 λ N λ N 2 I .
 Proof. 
For ( X , Z ) , we have ( X , Z ) 2 L k = ρ k I + N N T N N T I where N = blkdiag ( Y T , , Y T ) R n r × n 2 . Note that for block diagonal matrices, N 2 = Y 2 . Note also that the determinant of 1 ρ k ( X , Z ) 2 L k is det ( ( I + N N T ) N N T ) = 1 0 , so ( X , Z ) 2 L ˜ k 0 and equivalently λ min ( ( X , Z ) 2 L k ) > 0 .
To find the smallest eigenvalue λ min ( ( X , Z ) 2 L k ) , it suffices to find the largest σ > 0 such that:
H 2 = ( ρ k ) 1 ( X , Z ) 2 L ˜ k σ I = ( 1 σ ) I + N N T N N T ( 1 σ ) I 0 .
Equivalently, we want to find the largest σ > 0 where ( 1 σ ) I 0 and the Schur complement of H 2 i.4., H 3 = ( 1 σ ) I + N N T ( 1 ( 1 σ ) 1 ) ) 0 . Defining σ Y = Y 2 the largest singular vector of Y k + 1 , and noting that λ min ( α I + A ) = α + λ min ( A ) for any positive semidefinite matrix A, we have λ min ( H 3 ) = ( 1 σ ) + ( σ Y ) 2 ( 1 ( 1 σ ) 1 ) . We can see that ( 1 σ ) λ min ( H 3 ) is a convex function in ( 1 σ ) , with two zeros at 1 σ = ± σ Y 4 + 4 σ Y 2 ( σ Y ) 2 2 . In between the two roots, λ min ( H 3 ) < 0 . Since the smaller root cannot satisfy 1 σ > 0 , we choose σ max = 1 σ Y 4 + 4 σ Y 2 ( σ Y ) 2 2 > 0 as the largest feasible σ that maintains λ min ( H 3 ) 0 . As a result, λ min ( ( X , Z ) 2 L ¯ ) = ρ k σ max = ρ k 1 σ Y 4 + 4 σ Y 2 σ Y 2 2 . Figure A1 shows how this term behaves according to the spectral norm of Y. □
Figure A1. Strong convexity with respect to X, Z. Smallest eigenvalue of X , Z 2 L as a function of the spectral norm of Y.
Figure A1. Strong convexity with respect to X, Z. Smallest eigenvalue of X , Z 2 L as a function of the spectral norm of Y.
Mathematics 11 04413 g0a1
We now prove the main theorem.
 Lemma A3. 
Consider the sequence:
L k : = L ( P k ; D k ) = f ( Z k 1 ) + f ( Z k 1 ) , Z Z k 1 + δ C ( Y ) + U , X Y + S , Z X Y T + ρ 2 X Y F 2 + ρ 2 Z X Y T F 2 .
If f ( Z ) is L f -Lipschitz smooth, then sequence L k generated from Algorithm 1 satisfies:
L k + 1 L k c 1 k X k + 1 X k F 2 c 2 k Z k + 1 Z k F 2 c 3 k Y k + 1 Y k F 2 + ρ k + 1 + ρ k 2 ( ρ k ) 2 S k + 1 S k F 2 + U k + 1 U k F 2 .
with c 1 k = ρ k 2 1 σ Y 4 + 4 σ Y 2 σ Y 2 2 , c 2 k = c 1 k L f 2 , and c 3 k = ρ k 2 > 0 .
 Proof. 
The proof outline of Lemma A3 is to show that each update step is a non-ascent step in the linearized augmented Lagrangian, and at least one update step is descent. We can describe the linearized ADMM in terms of four groups of updates: the primal variable Y, the primal variables X and Z, the dual variables U, S, and coefficient ρ . In other words, at iteration k, taking:
  • Δ X k = X k + 1 X k , Δ Y k = Y k + 1 Y k , and Δ Z k = Z k + 1 Z k .
  • L k = L ( Z k , X k , Y k ; D k ; ρ k ; G k ) ,
  • L Y = L ( Z k , X k , Y k + 1 ; D k ; ρ k ; G k ) ,
  • L X Z = L ( P k + 1 ; D k ; ρ k ; G k ) , and
  • L k + 1 = L ( P k + 1 ; D k + 1 ; ρ k + 1 ; G k )
and L k + 1 L k = ( L Y L k ) + ( L X Z L Y ) + ( L k + 1 L X Z ) . We now lower bound each term.
  • Update Y. For the update of Y in (17), taking L Y = L ( Z k , X k , Y k + 1 ; D k ; ρ k ; G k ) , we have:
    L Y L k ( a ) Y L Y , Y k + 1 Y k λ min ( vec Y 2 L Y ) 2 Y k + 1 Y k F 2 ( b ) ρ k 2 Y k + 1 Y k F 2 ,
    where (a) follows from the definition of strong convexity, and (b) the optimality of Y k + 1 .
  • Update X, Z. Similarly, the update of ( Z , X ) in (17), denoting L X Z = L ( P k + 1 ; D k ; ρ k ; G k ) , we have:
    L ¯ X Z L Y ( a ) Z L ¯ X Z , Z k + 1 Z k + X L ¯ X Z , X k + 1 X k λ min ( ( X , Z ) 2 L X Z ) 2 Δ Z k F 2 + Δ X k F 2 ( b ) λ min ( ( X , Z ) 2 L ¯ X Z ) 2 Δ Z k F 2 + Δ X k F 2 ,
    where (a) follows from the definition of strong convexity, and (b) the optimality of X k + 1 and Z k + 1 . To further bound L X Z L ¯ X Z , we use the linearization definitions:
    L X Z L ¯ X Z = f ( Z k + 1 ) f ( Z k ) f ( Z k ) , Δ Z k ( a ) L f 2 Z k + 1 Z k F 2 ,
    where (a) comes from the L f Lipschitz smooth property of f. For a function f with Lipschitz constant L f , the following holds f ( y ) f ( x ) + y x , f ( x ) + L f 2 y x 2 2 .
  • Update S, U, and ρ . For the update of the dual variables and the penalty coefficient, with L k = L ( P k ; D k ; ρ k ) , we have:
    L D L X Z = ( a ) S k + 1 S k , Z k + 1 X k + 1 ( Y k + 1 ) T + U k + 1 U k , X k + 1 Y k + 1 + ρ k + 1 ρ k 2 ( Z k + 1 X k + 1 ( Y k + 1 ) T F 2 ) + ρ k + 1 ρ k 2 ( X k + 1 Y k + 1 F 2 ) = ( b ) ρ k + 1 + ρ k 2 ( ρ k ) 2 S k + 1 S k F 2 , + U k + 1 U k F 2 ,
    where (a) follows the definition of L and (b) from the dual update procedure.
The lemma statement results by incorporating (A6)–(A9). □
 Lemma A4. 
If L k is unbounded below, then either problem (1) is unbounded below, or the sequence L f Z k Z k 1 F diverges.
 Proof. 
First, consider the case that L k is unbounded below. First, rewrite L k equivalently as:
L k = f ( Z k 1 ) + f ( Z k 1 ) , Z k Z k 1 + δ C ( Y k ) + ρ 2 X k Y k + 1 ρ k U k F 2 + ρ 2 Z k X k ( Y k ) T + 1 ρ k S k F 2 1 2 ρ k U k F 2 1 2 ρ k S k F 2 .
Since U k F and S k F are bounded above, this implies that the linearization g k : = f ( Z k 1 ) + f ( Z k 1 ) , Z k Z k 1 is unbounded below.
Note that:
g k f ( Z k ) = f ( Z k 1 ) f ( Z k ) f ( Z k 1 ) , Z k 1 Z k L f 2 Z k Z k 1 F 2 ,
which implies either f ( Z k ) or L f Z k Z k 1 F 2 + . □
 Corollary A1. 
If L k is unbounded below and the objective f ( Z ) = Tr ( C Z ) , then it must be that (1) is unbounded below. This follows immediately since L f = 0 .
 Theorem A1. 
Assume the dual variables are bounded, e.g., max { S k F , U k F , Y k F } k B P < + , and L f σ max is bounded above, where σ max = 1 σ Y 4 + 4 σ Y 2 σ Y 2 2 , σ Y = Y k + 1 2 . Then, by running Algorithm 1 with ρ k = α ρ k 1 = α k ρ 0 , if L k is bounded below, then the sequence { P k , D k } converges to a stationary point of (12).
 Proof. 
If f ( Z ) is linear, take K 0 = 0 . If f ( Z ) is L f > smooth, take K ^ large enough such that for all k > K 0 , α k ρ L f σ max . By assumption, K 0 is always finite.
Taking Δ X Y Z k = Δ Z k F 2 + Δ X k F 2 + Δ Y k F 2 and c k = min { c 1 , c 2 , c 3 } , the summation of (A5) leads to:
L K L K 0 = k = K 0 K 1 L k + 1 L k k = K 0 K 1 ρ k + 1 + ρ k 2 ( ρ k ) 2 S k + 1 S k F 2 + U k + 1 U k F 2 k = K 0 K 1 c k Δ X Y Z k ( a ) 4 B P k = K 0 K 1 ρ k + 1 + ρ k 2 ( ρ k ) 2 k = K 0 K 1 c k Δ X Y Z k ( b ) 4 B P k = K 0 K 1 ρ k + 1 + ρ k 2 ( ρ k ) 2 ,
where (a) follows from the boundedness assumption of the dual variables, and (b) follows from Lemmas A1 and A2, and careful construction of ρ with respect to L f and Y k + 1 2 . Further simplifying, we see that L K is thus bounded above, since:
L K L K 0 lim K 4 B P k = K 0 K 1 ρ k + 1 + ρ k 2 ( ρ k ) 2 = 4 B P 1 + α 2 α K 0 ρ 1 + 1 α + 1 α 2 + = 4 B P 2 α K 0 ρ < + .
If L k is not unbounded below, then:
0 k = K 0 K 1 ( c 1 Δ X k F 2 + c 2 Δ Z k F 2 + c 3 Δ Y k F 2 ) + .
Recall c 3 k = ρ k 2 , and by boundedness assumption on Y 2 k + 1 , for k > K 0 , c 1 k , c 2 k ρ k . Since additionally k ρ k = + , then this immediately yields Z k + 1 Z k 0 , X k + 1 X k 0 , Y k + 1 Y k 0 .
Therefore, since the primal variables are convergent, this implies that:
Z k + 1 ( X k + 1 ( Y k + 1 ) T ) Ω = 1 ρ k ( S k + 1 S k ) , X k + 1 Y k + 1 = 1 ρ k ( U k + 1 U k ) ,
converges to a constant. But since ρ k and the dual variables are all bounded, then it must be that: Z k + 1 ( X k + 1 ( Y k + 1 ) T ) Ω 0 , X k + 1 Y k + 1 0 . Therefore, the limit points X * , Y * , and Z * are all feasible, and simply checking the first optimality condition will verify that this accumulation point is a stationary point of (12). □

Appendix C. Convergence Analysis for Vector Form

 Lemma A5. 
For two adjacent iterations of Algorithm 2, we have:
U k + 1 U k 2 2 L g 2 X k + 1 X k 2 2 .
 Proof. 
From the first-order optimality conditions for the update of X:
g ( X k + 1 ) + U k + ρ k ( X k + 1 Y k + 1 ) = 0 .
Combining with the dual update, we obtain g ( X k + 1 ) + U k + 1 = 0 . Then, the result follows from the definition of L g . □
Next, we will show that the augmented Lagrangian is monotonically decreasing and lower bounded.
 Lemma A6. 
Each step in the augmented Lagrangian update is decreasing, e.g., for:
L ( X , Y ; U ; ρ ) : = g ( X ) + δ C ( Y ) + U , X Y + ρ 2 X Y F 2
we have:
L ( Y k + 1 , X k + 1 ; U k + 1 ; ρ k + 1 ) L ( Y k + 1 , X k + 1 ; U k ; ρ k ) L ( Y k + 1 , X k ; U k ; ρ k ) L ( Y k , X k ; U k ; ρ k ) .
Furthermore, the amount of decrease is:
L ( Y k + 1 , X k + 1 ; U k + 1 ; ρ k + 1 ) L ( Y k , X k ; U k ; ρ k ) ρ k Y k + 1 Y k F 2 c k X k + 1 X k F 2 .
Here,
  • if g ( X ) is H g -strongly convex (where H g = 0 if g is convex but not strongly convex) then c k = ρ k + H g 2 L g 2 ρ k + 1 + ρ k 2 ( ρ k ) 2 , and
  • if g ( X ) is nonconvex but L g -smooth, then c k = ρ k 3 L g 2 L g 2 ρ k + 1 + ρ k 2 ( ρ k ) 2 .
 Proof. 
Both the updates of Y and X globally minimize L with respect to those variables. To minimize Y at ( X , U ) = ( X k , U k ) :
L ( Y k + 1 , X ; U ; ρ ) L ( Y k , X ; U ; ρ ) ( a ) Y L ( Y k + 1 , X ; U ; ρ ) , Δ Y k ρ k 2 Δ Y k 2 2
( b ) ρ k 2 Δ Y k 2 2 .
To minimize X at ( Y , U ) = ( Y k + 1 , U k ) , we consider two cases. If g is H g -strongly convex, then:
L ( Y , X k + 1 ; U ; ρ ) L ( Y , X k ; U ; ρ ) ( a ) X L ( Y , X k + 1 ; U ; ρ ) , X k + 1 X k ρ k + H g 2 X k + 1 X k 2 2 ( b ) ρ k + H g 2 X k + 1 X k 2 2 ,
where (a) follows from the strong convexity of L ( Y , X ; U ; ρ ) with respect to X, and (b) follows from the optimality condition of the update. If g is nonconvex but L g -Lipschitz, then note that:
g ( X k + 1 ) g ( X k ) g ( X k ) , X k + 1 X k + L g 2 X k + 1 X k F 2 = ( a ) g ( X k ) g ( X k + 1 ) , X k + 1 X k + L g 2 X k + 1 X k F 2 + g ( X k + 1 ) , X k + 1 X k ( b ) g ( X k ) g ( X k + 1 ) F X k + 1 X k F + L g 2 X k + 1 X k F 2 + g ( X k + 1 ) , X k + 1 X k ( c ) 3 L g 2 X k + 1 X k F 2 + g ( X k + 1 ) , X k + 1 X k
where (a) follows from adding and subtracting a term, (b) from Cauchy–Schwartz, and (c) from the Lipschitz gradient condition on g. Therefore:
L ( Y , X k + 1 ; U ; ρ ) L ( Y , X k ; U ; ρ ) ( a ) X L ( Y , X k + 1 ; U ; ρ ) , X k + 1 X k ρ k 3 L g 2 X k + 1 X k 2 2 ( b ) ρ k 3 L g 2 X k + 1 X k 2 2 .
In the dual variables, using { X , Y } = { X k + 1 , Y k + 1 } we have:
L ( Y , X ; U k + 1 ; ρ k + 1 ) L ( Y , X ; μ k ; ρ k ) ( a ) U k + 1 U k , X Y + ρ k + 1 ρ k 2 X Y F 2 ( b ) ρ k + 1 + ρ k 2 ( ρ k ) 2 U k + 1 U k 2 2 ( c ) L g 2 ρ k + 1 + ρ k 2 ( ρ k ) 2 X k + 1 X k 2 2 ,
where (a) follows the definition of L , (b) follows from the update of U, and (c) follows from Lemma (A5) since ρ k > 0 for al k. Incorporating these observations completes the proof. □
 Lemma A7. 
If ρ k L g and the objective g ( X ) is lower-bounded over C , then the augmented Lagrangian (A14) is lower bounded.
 Proof. 
From the L g -Lipschitz continuity of g ( X ) , it follows that:
g ( X ) g ( Y ) + g ( X ) , X Y L g 2 X Y F 2
for any X and Y. By definition:
L ( Y k , X k ; U k ; ρ k ) = g ( X k ) + U k , X k Y k + ρ k 2 X k Y k F 2 = ( a ) g ( X k ) g ( X k ) , X k Y k + ρ k 2 X k Y k F 2 ( b ) g ( Y k ) + ρ k L g 2 X k Y k F 2 ,
where (a) follows from the optimality in updating X and (b) follows from (A20). Since L k is unbounded below, then g ( Y k ) is unbounded below. Since Y k C for all k, this implies that g is unbounded below over C . □
Thus, if g ( X ) is lower-bounded over C , since the sequence { L ( X k , Y k ; U k ) } is monotonically decreasing and lower bounded, the sequence { L ( X k , Y k ; U k ) } converges. Given the monotonic descent of each subproblem (Lemma A6) and strong convexity of L k with respect to X and Y, it is clear that X k X * , Y k Y * fixed points. Combining with Lemma A5 gives also U k U * .
The proof of Theorem 3 easily follows from Lemma A7.

Appendix C.1. Linear Rate of Convergence when g Is Strongly Convex

 Lemma A8. 
Consider Algorithm 2 with ρ k constant. Then, collecting the variables all vectorized x = ( X , Y , Y ) ,
L k + 1 L k c 3 x k + 1 x k 2 ,
where g is H g strongly convex and:
c 3 = max θ ( 0 , 1 ) min θ ρ + H g 2 L g 2 ρ , ( 1 θ ) ρ + H g 2 H g L g 2 ρ H g , ρ .
 Proof. 
From Lemma A6, we already have that:
L k + 1 L k ρ Y k + 1 Y k 2 c X k + 1 X k 2 ,
where for constant ρ , c = ρ + H g 2 L g 2 ρ . Moreover, when g ( X ) is H g -strongly convex,
U k + 1 U k 2 = g ( X k + 1 ) g ( X k ) 2 H g X k + 1 X k 2 .
Therefore:
L k + 1 L k θ c H g U k + 1 U k 2 2 ( 1 θ ) c X k + 1 X k 2 .
for any θ ( 0 , 1 ) , We thus have:
L k + 1 L k θ c Δ X k F 2 ( 1 θ ) c H g Δ U F 2 ρ Δ Y k F 2 min θ c , ( 1 θ ) c H g , ρ X k + 1 X k Y k + 1 Y k U k + 1 U k ,
with Δ U = U k + 1 U k . Note that this does not mean L is strong convex with respect to the collected variables x = ( X , Y , Z ) ( L is not even convex). But with respect to each variable X, Y, and Z, it is strongly convex. □
 Lemma A9. 
Again with ρ k > 1 constant and collecting x = ( X , Y , Z ) , we have:
L k + 1 L * c 4 x k + 1 x * 2 , c 4 = min { L g + ρ + 2 , 2 ρ , 1 }
whenever Y k + 1 and Y * are both in C .
 Proof. 
Over the domain C , the augmented Lagrangian can be written as:
L ( x ) = g ( X ) + U , X Y + ρ 2 X Y F 2 ,
with gradient L ( x ) = g ( X ) + U + ρ ( X Y ) Y + ρ ( Y X ) X Y and thus:
L ( x 1 ) L ( x 2 ) F 2 = X L ( x 1 ) X L ( x 2 ) F 2 + Y L ( x 1 ) Y L ( x 2 ) F 2 + U L ( x 1 ) U L ( x 2 ) F 2 ( L g + ρ + 2 ) X 1 X 2 F 2 + ( 2 ρ ) Y 1 Y 2 F 2 + U 1 U 2 F 2 min { L g + ρ + 2 , 2 ρ , 1 } x 2 x 1 2 2 ,
which reveals the Lipschitz smoothness constraint for L as c 4 = min { L g + ρ + 2 , 2 ρ , 1 } . Then, using first-order optimality conditions,
L k + 1 L * + L ( x * ) , x k + 1 x * + c 4 x k + 1 x * 2 2 ( a ) L * + c 4 x k + 1 x * 2 2 ,
where (a) follows from the optimality of L * . □
 Lemma A10. 
Consider g ( x ) H g -strongly convex in x, and ρ large enough so that c 3 > 0 . Then, the number of steps for | L k L 0 | ϵ is O ( log ( 1 / ϵ ) ) .
This proof is standard in the linear convergence of block coordinate descent when the objective is strongly convex. Note that L is not strongly convex or even convex, but still all the steps hold.
 Proof. 
Take x k = { X k , Y k , U k } and x * = { X * , Y * , U * } . Then:
L ( x k ) L ( x * ) = L ( x k ) L ( x k + 1 ) + L ( x k + 1 ) L ( x * ) c 3 x k + 1 x k 2 + L ( x k + 1 ) L ( x * ) c 3 c 4 + 1 ( L ( x k + 1 ) L ( x * ) )
Therefore:
L ( x k ) L ( x * ) L ( x 0 ) L ( x * ) c 4 c 4 + c 3 k
and so:
L ( x k ) L ( x * ) ϵ
if:
k D 1 log ( 1 / ϵ ) + D 2
where:
D 1 = log 1 c 4 + c 3 c 4 , D 2 = log ( L ( x 0 ) L ( x * ) ) log c 4 + c 3 c 4 .

References

  1. Bandeira, A.S.; Boumal, N.; Voroninski, V. On the low-rank approach for semidefinite programs arising in synchronization and community detection. In Proceedings of the Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; pp. 361–382. [Google Scholar]
  2. Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef]
  3. Javanmard, A.; Montanari, A.; Ricci-Tersenghi, F. Phase transitions in semidefinite relaxations. Proc. Natl. Acad. Sci. USA 2016, 113, E2218–E2223. [Google Scholar] [CrossRef] [PubMed]
  4. Gillis, N. Nonnegative Matrix Factorization: Complexity, Algorithms and Applications. Doctoral Dissertation, Université Catholique de Louvain (CORE), Louvain-La-Neuve, France, 2011. [Google Scholar]
  5. Ding, C.; He, X.; Simon, H.D. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the 2005 SIAM International Conference on Data Mining (SIAM), New Orleans, LA, USA, 11–15 July 2005; pp. 606–610. [Google Scholar]
  6. Da Costa, A.P.; Seeger, A. Cone-constrained eigenvalue problems: Theory and algorithms. Comput. Optim. Appl. 2010, 45, 25–57. [Google Scholar] [CrossRef]
  7. Gander, W.; Golub, G.H.; von Matt, U. A constrained eigenvalue problem. In Numerical Linear Algebra, Digital Signal Processing and Parallel Algorithms; Springer: Berlin/Heidelberg, Germany, 1991; pp. 677–686. [Google Scholar]
  8. Júdice, J.J.; Sherali, H.D.; Ribeiro, I.M. The eigenvalue complementarity problem. Comput. Optim. Appl. 2007, 37, 139–156. [Google Scholar] [CrossRef]
  9. Goemans, M.X.; Williamson, D.P. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 1995, 42, 1115–1145. [Google Scholar] [CrossRef]
  10. Helmberg, C.; Rendl, F. A spectral bundle method for semidefinite programming. SIAM J. Optim. 2000, 10, 673–696. [Google Scholar] [CrossRef]
  11. Fujie, T.; Kojima, M. Semidefinite programming relaxation for nonconvex quadratic programs. J. Glob. Optim. 1997, 10, 367–380. [Google Scholar] [CrossRef]
  12. Burer, S.; Monteiro, R.D. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 2003, 95, 329–357. [Google Scholar] [CrossRef]
  13. Boumal, N.; Voroninski, V.; Bandeira, A. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 2757–2765. [Google Scholar]
  14. Lee, D.D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems Conference, Denver, CO, USA, 4 October 2001; pp. 556–562. [Google Scholar]
  15. Wolkowicz, H.; Saigal, R.; Vandenberghe, L. Handbook of Semidefinite Programming: Theory, Algorithms, and Applications; Springer Science & Business Media: New York, NY, USA, 2012; Volume 27. [Google Scholar]
  16. Laurent, M. Sums of squares, moment matrices and optimization over polynomials. In Emerging Applications of Algebraic Geometry; Springer: Berlin/Heidelberg, Germany, 2009; pp. 157–270. [Google Scholar]
  17. Rendl, F. Semidefinite relaxations for partitioning, assignment and ordering problems. 4OR 2012, 10, 321–346. [Google Scholar] [CrossRef]
  18. Blekherman, G.; Parrilo, P.A.; Thomas, R.R. Semidefinite optimization and convex algebraic geometry. In Proceedings of the 2012 Annual Meeting of the Society for Industrial and Applied Mathematics (SIAM), Minneapolis, MN, USA, 9–13 July 2012. [Google Scholar]
  19. Anjos, M.F.; Lasserre, J.B. Introduction to semidefinite, conic and polynomial optimization. In Handbook on Semidefinite, Conic and Polynomial Optimization; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–22. [Google Scholar]
  20. Abbe, E.; Bandeira, A.S.; Hall, G. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory 2016, 62, 471–487. [Google Scholar] [CrossRef]
  21. Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar]
  22. Karisch, S.E.; Rendl, F. Semidefinite programming and graph equipartition. Top. Semidefin. Inter. Point Methods 1998, 18, 77–95. [Google Scholar]
  23. Karger, D.; Motwani, R.; Sudan, M. Approximate graph coloring by semidefinite programming. J. ACM 1998, 45, 246–265. [Google Scholar] [CrossRef]
  24. Barahona, F.; Grötschel, M.; Jünger, M.; Reinelt, G. An application of combinatorial optimization to statistical physics and circuit layout design. Oper. Res. 1988, 36, 493–513. [Google Scholar] [CrossRef]
  25. De Simone, C.; Diehl, M.; Jünger, M.; Mutzel, P.; Reinelt, G.; Rinaldi, G. Exact ground states of Ising spin glasses: New experimental results with a branch-and-cut algorithm. J. Stat. Phys. 1995, 80, 487–496. [Google Scholar] [CrossRef]
  26. Poljak, S.; Tuza, Z. The expected relative error of the polyhedral approximation of the MAX-CUT problem. Oper. Res. Lett. 1994, 16, 191–198. [Google Scholar] [CrossRef]
  27. Helmberg, C.; Rendl, F. Solving quadratic (0,1)-problems by semidefinite programs and cutting planes. Math. Program. 1998, 82, 291–315. [Google Scholar] [CrossRef]
  28. Rendl, F.; Rinaldi, G.; Wiegele, A. A branch and bound algorithm for MAX-CUT based on combining semidefinite and polyhedral relaxations. In Proceedings of the 12th International IPCO Conference, Ithaca, NY, USA, 25–27 June 2007; Volume 4513, pp. 295–309. [Google Scholar]
  29. Burer, S.; Vandenbussche, D. A finite branch-and-bound algorithm for nonconvex quadratic programming via semidefinite relaxations. Math. Program. 2008, 113, 259–282. [Google Scholar] [CrossRef]
  30. Bao, X.; Sahinidis, N.V.; Tawarmalani, M. Semidefinite relaxations for quadratically constrained quadratic programming: A review and comparisons. Math. Program. 2011, 129, 129–157. [Google Scholar] [CrossRef]
  31. Krislock, N.; Malick, J.; Roupin, F. Improved semidefinite branch-and-bound algorithm for k-cluster. HAL Open Sci. Prepr. 2012, hal-00717212. Available online: https://inria.hal.science/file/index/docid/717823/filename/krislock-malick-roupin-2012a.pdf (accessed on 22 September 2023).
  32. Poljak, S.; Rendl, F.; Wolkowicz, H. A recipe for semidefinite relaxation for (0, 1)-quadratic programming. J. Glob. Optim. 1995, 7, 51–73. [Google Scholar] [CrossRef]
  33. Helmberg, C. Semidefinite Programming for Combinatorial Optimization; Konrad-Zuse-Zentrum für Informationstechnik: Berlin, Germany, 2000. [Google Scholar]
  34. Papadopoulos, S.; Kompatsiaris, Y.; Vakali, A.; Spyridonos, P. Community detection in social media. Data Min. Knowl. Discov. 2012, 24, 515–554. [Google Scholar] [CrossRef]
  35. Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef]
  36. Keeling, M. The implications of network structure for epidemic dynamics. Theor. Popul. Biol. 2005, 67, 1–8. [Google Scholar] [CrossRef]
  37. Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
  38. Queiroz, M.; Judice, J.; Humes, C., Jr. The symmetric eigenvalue complementarity problem. Math. Comput. 2004, 73, 1849–1863. [Google Scholar] [CrossRef]
  39. Deshpande, Y.; Montanari, A.; Richard, E. Cone-constrained principal component analysis. In Proceedings of the 28th Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2717–2725. [Google Scholar]
  40. Zass, R.; Shashua, A. Nonnegative sparse PCA. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–9 December 2007; pp. 1561–1568. [Google Scholar]
  41. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788. [Google Scholar] [CrossRef] [PubMed]
  42. Yuan, Z.; Oja, E. Projective nonnegative matrix factorization for image compression and feature extraction. In Proceedings of the Scandinavian Conference on Image Analysis, Joensuu, Finland, 19–22 June 2005; pp. 333–342. [Google Scholar]
  43. Friedlander, M.P.; Macedo, I. Low-rank spectral optimization via gauge duality. SIAM J. Sci. Comput. 2016, 38, A1616–A1638. [Google Scholar] [CrossRef]
  44. Jaggi, M.; Sulovsk, M. A simple algorithm for nuclear norm regularized problems. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 471–478. [Google Scholar]
  45. Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717. [Google Scholar] [CrossRef]
  46. Recht, B.; Fazel, M.; Parrilo, P.A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 2010, 52, 471–501. [Google Scholar] [CrossRef]
  47. Udell, M.; Horn, C.; Zadeh, R.; Boyd, S. Generalized low rank models. Found. Trends® Mach. Learn. 2016, 9, 1–118. [Google Scholar] [CrossRef]
  48. Burer, S.; Monteiro, R.D. Local minima and convergence in low-rank semidefinite programming. Math. Program. 2005, 103, 427–444. [Google Scholar] [CrossRef]
  49. Pataki, G. On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Math. Oper. Res. 1998, 23, 339–358. [Google Scholar] [CrossRef]
  50. Barvinok, A.I. Problems of distance geometry and convex properties of quadratic maps. Discret. Comput. Geom. 1995, 13, 189–202. [Google Scholar] [CrossRef]
  51. Candes, E.J.; Eldar, Y.C.; Strohmer, T.; Voroninski, V. Phase retrieval via matrix completion. SIAM Rev. 2015, 57, 225–251. [Google Scholar] [CrossRef]
  52. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
  53. Glowinski, R.; Marroco, A. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Rev. Fr. Autom. Inform. Rech. Oper. Anal. Numer. 1975, 9, 41–76. [Google Scholar] [CrossRef]
  54. Gabay, D.; Mercier, B. A Dual Algorithm for the Solution of Non Linear Variational Problems via Finite Element Approximation; Institut de Recherche d’Informatique et d’Automatique: Rocquencourt, France, 1975. [Google Scholar]
  55. Eckstein, J.; Yao, W. Understanding the convergence of the alternating direction method of multipliers: Theoretical and computational perspectives. Pac. J. Optim. 2015, 11, 619–644. [Google Scholar]
  56. Sun, R.; Luo, Z.Q.; Ye, Y. On the expected convergence of randomly permuted ADMM. arXiv 2015, arXiv:1503.06387. [Google Scholar]
  57. Yin, W. Three-Operator Splitting and its Optimization Applications. Set-Valued Var. Anal. 2017, 25, 829–858. [Google Scholar]
  58. Goldstein, T.; O’Donoghue, B.; Setzer, S.; Baraniuk, R. Fast alternating direction optimization methods. SIAM J. Imaging Sci. 2014, 7, 1588–1623. [Google Scholar] [CrossRef]
  59. Hong, M.; Luo, Z.Q.; Razaviyayn, M. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 2016, 26, 337–364. [Google Scholar] [CrossRef]
  60. Wang, Y.; Yin, W.; Zeng, J. Global convergence of ADMM in nonconvex nonsmooth optimization. arXiv 2015, arXiv:1511.06324. [Google Scholar] [CrossRef]
  61. Li, G.; Pong, T.K. Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 2015, 25, 2434–2460. [Google Scholar] [CrossRef]
  62. Magnússon, S.; Weeraddana, P.C.; Rabbat, M.G.; Fischione, C. On the convergence of alternating direction Lagrangian methods for nonconvex structured optimization problems. IEEE Trans. Control. Netw. Syst. 2016, 3, 296–309. [Google Scholar] [CrossRef]
  63. Liu, Q.; Shen, X.; Gu, Y. Linearized admm for non-convex non-smooth optimization with convergence analysis. arXiv 2017, arXiv:1705.02502. [Google Scholar]
  64. Lu, S.; Hong, M.; Wang, Z. A nonconvex splitting method for symmetric nonnegative matrix factorization: Convergence analysis and optimality. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
  65. Xu, Y.; Yin, W.; Wen, Z.; Zhang, Y. An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 2012, 7, 365–384. [Google Scholar] [CrossRef]
  66. Jiang, B.; Ma, S.; Zhang, S. Alternating direction method of multipliers for real and complex polynomial optimization models. Optimization 2014, 63, 883–898. [Google Scholar] [CrossRef]
  67. Huang, K.; Sidiropoulos, N.D. Consensus-ADMM for general quadratically constrained quadratic programming. IEEE Trans. Signal Process. 2016, 64, 5297–5310. [Google Scholar] [CrossRef]
  68. Shen, Y.; Wen, Z.; Zhang, Y. Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 2014, 29, 239–263. [Google Scholar] [CrossRef]
  69. Rockafellar, R.T. Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM J. Control 1974, 12, 268–285. [Google Scholar] [CrossRef]
  70. Clarke, F.H. Optimization and Nonsmooth Analysis; SIAM: Philadelphia, PA, USA, 1990; Volume 5. [Google Scholar]
  71. Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
  72. Douglas, J.; Rachford, H.H. On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 1956, 82, 421–439. [Google Scholar] [CrossRef]
  73. Spingarn, J.E. Applications of the method of partial inverses to convex programming: Decomposition. Math. Program. 1985, 32, 199–223. [Google Scholar] [CrossRef]
  74. Eckstein, J.; Bertsekas, D.P. On the Douglas Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 1992, 55, 293–318. [Google Scholar] [CrossRef]
  75. Combettes, P.L.; Pesquet, J.C. A proximal decomposition method for solving convex variational inverse problems. Inverse Probl. 2008, 24, 065014. [Google Scholar] [CrossRef]
Figure 1. Time comparisons for DIMACS problems. (top): average runtime per iteration. (bottom): total runtime. We observe that both V and MRR converge in relatively few iterations, with MR1 taking slightly longer. However, as previously observed with splitting methods, the convergence rate is sensitive to the parameter choices ρ ( t ) . For best performance, we start with a relatively small initial penalty coefficient and increase it with the iteration until the upper bound is achieved.
Figure 1. Time comparisons for DIMACS problems. (top): average runtime per iteration. (bottom): total runtime. We observe that both V and MRR converge in relatively few iterations, with MR1 taking slightly longer. However, as previously observed with splitting methods, the convergence rate is sensitive to the parameter choices ρ ( t ) . For best performance, we start with a relatively small initial penalty coefficient and increase it with the iteration until the upper bound is achieved.
Mathematics 11 04413 g001
Figure 2. Image segmentation. The center and right columns are the best MAX-CUT and community detection results, respectively.
Figure 2. Image segmentation. The center and right columns are the best MAX-CUT and community detection results, respectively.
Mathematics 11 04413 g002
Table 1. MAX-CUT values for graphs from the 7th DIMACS Challenge. MRR = matrix formulation, r = 2 n . SDR = SDP relaxation + rounding technique.
Table 1. MAX-CUT values for graphs from the 7th DIMACS Challenge. MRR = matrix formulation, r = 2 n . SDR = SDP relaxation + rounding technique.
DatabasenSparsityBKVMR1MRRSDR
g3-85120.01241,684,81434,105,23136,780,18035,943,35033424095
g3-1533750.018281,029,888235,893,612255,681,256241,740,931212,669,181
pm3-8-505120.012454394346378416
pm3-15-5033750.01829642594196621402616
G18000.059911,62410,93811,04711,32111,360
G28000.059911,62010,83411,08211,14411,343
G38000.059911,62210,85810,89411,17411,367
G48000.059911,64610,84910,76011,19211,429
G58000.059911,63110,79610,78311,35211,394
G68000.059921781853182019491941
G78000.059920031694164417051774
G88000.059920031688164117281766
G98000.059920481771168118071830
G108000.059919941662164117371732
G118000.005564496460480506
G128000.005556486448480512
G138000.005580516476498528
G148000.014730602715276828612901
G158000.014630492625281028032884
G168000.014630452667273628622910
G178000.014630432638278928402920
G188000.0147988798768841858
G198000.0146903700641694780
G208000.0146941723691766788
G218000.0146931696713810794
G2220000.0113,34612,46112,54812,75112,926
G2320000.0113,31712,54012,52812,85312,889
G2420000.0113,31412,54012,44712,72312,904
G2520000.0113,32612,44712,55812,73312,874
G2620000.0113,31412,44512,47512,71812,847
G2720000.0133182824250828072909
G2820000.0132852753251827962845
G2920000.0133892864262829012896
G3020000.0134032887263929372971
G3120000.0132882833251829022825
G3220000.00213981220106612041254
G3320000.00213761202105411661250
G3420000.00213721208109611701222
G3520000.005976706605691467647209
G3620000.005976606564694365987228
G3720000.005976666478683967897183
G3820000.005976816486675967687212
G3920000.005923951616169718401997
G4020000.005923871617143819211890
G4120000.005923981606165617781899
G4220000.005924691707175618621971
G4310000.0266596222623663986475
G4410000.0266486275619264476458
G4510000.0266526243625564076454
G4610000.0266456217623363986407
G4710000.0266566221626664336454
G4830000.001360005882500654026000
G4930000.001360005844503853626000
G5030000.001358805814499454105880
G5110000.011838463317344635243642
G5210000.011838493360347134993662
G5310000.011838463323351035163660
G5410000.011838463306342835093651
Table 2. Result for nonnegative factorization with partial observations from linearized ADMM (5 trials). STD = standard deviation.
Table 2. Result for nonnegative factorization with partial observations from linearized ADMM (5 trials). STD = standard deviation.
n1000300050008000
| Ω | / n 2 0.10.50.80.10.50.80.10.50.80.10.50.8
CPU time/s9.7413.5313.9761.1578.9964.76117.5485.24131.64212.26220.42337.74
( Z * C ) Ω C Ω 0.860.850.860.890.890.890.890.880.870.880.900.89
STD0.0430.0200.0210.0100.0060.0080.0080.0120.0180.0040.0080.008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, C. A Customized ADMM Approach for Large-Scale Nonconvex Semidefinite Programming. Mathematics 2023, 11, 4413. https://doi.org/10.3390/math11214413

AMA Style

Sun C. A Customized ADMM Approach for Large-Scale Nonconvex Semidefinite Programming. Mathematics. 2023; 11(21):4413. https://doi.org/10.3390/math11214413

Chicago/Turabian Style

Sun, Chuangchuang. 2023. "A Customized ADMM Approach for Large-Scale Nonconvex Semidefinite Programming" Mathematics 11, no. 21: 4413. https://doi.org/10.3390/math11214413

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop