Next Article in Journal
Nonlinear Optimal Control for Stochastic Dynamical Systems
Previous Article in Journal
Discounting the Distant Future: What Do Historical Bond Prices Imply about the Long-Term Discount Rate?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distributed Sparse Precision Matrix Estimation via Alternating Block-Based Gradient Descent

1
School of Mathematics and Statistics, Zhengzhou University, Zhengzhou 450001, China
2
School of Physical Education (Main Campus), Zhengzhou University, Zhengzhou 450001, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(5), 646; https://doi.org/10.3390/math12050646
Submission received: 21 January 2024 / Revised: 17 February 2024 / Accepted: 21 February 2024 / Published: 22 February 2024

Abstract

:
Precision matrices can efficiently exhibit the correlation between variables and they have received much attention in recent years. When one encounters large datasets stored in different locations and when data sharing is not allowed, the implementation of high-dimensional precision matrix estimation can be numerically challenging or even infeasible. In this work, we studied distributed sparse precision matrix estimation via an alternating block-based gradient descent method. We obtained a global model by aggregating each machine’s information via a communication-efficient surrogate penalized likelihood. The procedure chooses the block coordinates using the local gradient, to guide the global gradient updates, which can efficiently accelerate precision estimation and lessen communication loads on sensors. The proposed method can efficiently achieve the correct selection of non-zero elements of a sparse precision matrix. Under mild conditions, we show that the proposed estimator achieved a near-oracle convergence rate, as if the estimation had been conducted with a consolidated dataset on a single computer. The promising performance of the method was supported by both simulated and real data examples.

1. Introduction

Estimating an inverse covariance (or precision) matrix in high dimensions naturally arises in a wide variety of application domains, such as clustering analysis [1,2], discriminant analysis [3], and so on. When the dimension p is much larger than the sample size N, the precision matrix cannot be estimated using the inverse of the sample covariance matrix, due to the singularity of the sample covariance matrix, and estimating the precision matrix is ill-posed and time-consuming, as the number of parameters to be estimated is of the order O ( p 2 ) . As an illustration, in the prostate cancer RNA-Seq dataset we analyze in this paper, genetic activity measurements have been documented for 102 subjects, with 50 normal control subjects and 52 prostate cancer patients. Given that there are over D = 6033 2 parameters to estimate, the analytical challenges associated with simultaneous discriminant analysis and estimation are significantly amplified. Accurate and fast precision estimation is becoming increasingly important in statistical learning. Among the many high-dimensional inference problems, a variety of precision estimating methods have been proposed to enrich the theory of this field. Friedman et al. [4] developed an l 1 penalized likelihood approach to directly estimate the precision matrix, namely graphical Lasso (GLasso); Cai et al. [5] proposed a constrained l 1 -minimization procedure to seek a sparse precision matrix under a matrix inversion constraint; Liu and Luo [6] developed a penalized column-wise procedure for estimating a precision matrix; Zhang and Zou [7] advocated a new empirical loss termed the D-trace loss, to avoid computing the log determinant term. For more details refer to [8,9].
However, the rapid emergence of massive datasets poses a serious challenge for high-dimensional precision estimation, where the dimensionality p and the sample size N are both huge. In addition, the computing power, memory constraints, and privacy considerations often make it difficult to pool the separate collections of massive data into a single dataset. Communication is prohibitively expensive due to the limited bandwidth, and direct data sharing raises concerns about privacy and loss of ownership. For example, hospitals may collect the information of tens of thousands of patients, and directly transferring the raw data can be inefficient due to storage bottlenecks. Moreover, in practice, the hospitals are unwilling to share their raw data directly when scientists need to locate relevant genes corresponding to a certain disease from massive data, owing to privacy considerations. The accelerated growth of data sizes and joint analysis of data collected by different parties make statistical inferences on a single computer no longer sufficient, which in addition makes high-dimensional precision estimation a challenging task.
To resolve the above difficulties, one natural strategy is to consider using a “divide-and-conquer” strategy. In such a strategy, a large problem is first divided into smaller manageable subproblems, and the final output is obtained by combining the corresponding sub-outputs. Following this idea, statisticians can improve computing efficiency and reduce privacy risks, while obtaining a global method by aggregating the statistics of each machine. Many distributed statistical methods have been rebuilt for processing massive datasets. Lee et al. [10] proposed a debiasing approach to allow aggregation of local estimates in a distributed setting; Jordan et al. [11] developed an approximate likelihood approach for solving distributed statistical inference problems; and Fan et al. [12] extended their idea and presented two communication-efficient accurate statistical estimators (CEASE). For more details refer to [13,14].
Due to the importance of estimating a precision matrix, some studies have begun to focus on distributed estimation for the precision matrix, where the datasets are distributed over multiple machines, due to size limitations or privacy considerations. Arroyo and Hou [15] estimated the precision matrix for Gaussian graphical models via a simple averaging method; Wang and Cui [16] developed a distributed estimator of the sparse precision matrix by debiasing a D-trace Lasso-type estimator and aggregated estimator by simple averaging. Under distributed data storage, one needs to carefully address two crucial questions for estimating the precision matrix: (a) Estimation-effectiveness: The estimation suffers non-negligible information loss of the whole data, and one should design a distributed procedure to conduct an effective global high-dimensional precision matrix estimation, as if the data were used with a consolidated dataset on a single computer; (b) Communication-efficiency: Estimating a precision matrix suffers from high communication costs under a distributed setup, and the communication costs increases O ( p 2 ) with the dimensionality p from each machine, and one should design an efficient method to reduce the communication costs incurred by transferring matrices of O ( p 2 ) .
To ease the implementation difficulties and communication costs of estimating a precision matrix, we propose an alternating block-based gradient descent (Bgd) method for distributed precision matrix estimation. In detail, we optimize a surrogate loss function, with all the machines participating to optimize their corresponding gradient-enhanced loss functions and evaluate gradients. In each iteration, we only update the block coordinates of the precision matrix, and the block is chosen using the largest κ sizes of the local gradient in a random machine m. By setting κ p 2 , we can develop an accurate statistical estimation for the precision matrix under a distributed setup, which can lessen the communication costs and computation budget by using a random machine to evaluate the choice of block. Under mild conditions, we show that Bgd led to a consistent estimator, it could even achieve a similar performance as debiased lasso penalized D-trace estimation [7]. The promising performance of the method was supported by both simulated and real data examples.
The rest of this paper is organized as follows: In Section 2, we formulate the research problem and introduce the Bgd framework. In Section 3, we investigate the theoretical properties of Bgd. In Section 4, we demonstrate the promising performance of Bgd through Monte Carlo simulations and a real data example. Concluding remarks are given in Section 5. Technical details are presented in Appendix A.
Throughout this paper, we use c and C to represent certain positive constants, which may be different from line to line. Let [ p ] mean the set of { 1 , 2 , , p } , and use a 0 = j = 1 p I ( a j 0 ) , a = max j | a j | , and a 2 = j = 1 p a j 2 to denote its l 0 , l , l 2 norms for a vector a = ( a 1 , , a p ) R p , respectively. For a matrix A = ( a i j ) R p 1 × p 2 , let A max = max i , j | a i j | , A 2 = sup x 2 1 A x 2 , A = max i = 1 , , p 1 j = 1 p 2 | a i j | , A F = i = 1 p 1 j = 1 p 2 a i j 2 be its max, spectral, inf, and Frobenius norms, respectively.

2. Distributed Sparse Precision Matrix Estimation

2.1. Model Setups

Assume that we have N independent and identically distributed p-dimensional random variables with a covariance matrix Σ or its corresponding precision matrix Ω : = Σ 1 . Each nonzero entry of Ω corresponds to an edge in a Gaussian graphical model for describing the conditional dependence structure of the observed variables. In particular, if a p-dimensional random vector X N p ( μ , Σ ) , the conditional independence between X j and X l given other features is equivalent to Ω j l = 0 , j , l [ p ] . A sparse structure of the precision matrix Ω provides a concise relationship between features and also gives a meaningful interpretation of the conditional independence among the features; thus, one needs to achieve a sparse and stable estimation for the precision matrix Ω .
Throughout this paper, we assume that the number of features p can be much larger than the total sample size N, but the true precision matrix is sparse, so there are few non-zero entries in the high-dimensional setting. We use s * = { ( j 1 , j 2 ) : ω j 1 j 2 0   for   j 1 , j 2 [ p ] } to denote the index set of the true nonzero components in the precision matrix Ω * . Given N independent observations of { X i } i = 1 N , we suppose it is partitioned into M subsets D 1 , , D M completely at random and stored on M local clients. Without loss of generality, assume that X is sub-Gaussian and the data are equally partitioned into M machines. In the high-dimensional setting, a common approach to obtaining a sparse estimator of Ω is by minimizing the following l 1 -regularized negative log-likelihood, known as a graphical lasso, which is defined as
tr ( S Ω ) log   det ( Ω ) + λ Ω 1 ,
where S is the sample covariance matrix. Many algorithms have been developed to solve the above problem. However, eigendecomposition or calculation of the determinant of a p × p matrix is inevitable in these algorithms. Motivated by [6,7], under the distributed scenario, the global and the local loss functions can be written as
l N ( Ω ) = 1 M m = 1 M l ( m ) ( Ω )
l ( m ) ( Ω ) = : 1 2 tr ( Ω S m Ω ) tr ( Ω ) ,
where S m is the local sample covariance matrix in the client m. For a single machine, many algorithms have been developed to solve the above problem, and some authors have shown that their estimators are asymptotically consistent. The goal of this study is to estimate the high-dimensional precision matrix Ω in a distributed system, where the communication cost and the accuracy of estimation are the major considerations.

2.2. Block-Gradient Descent Algorithm

To develop a communication-efficient method for learning a high-dimensional precision matrix, we first review the proposal of Jordan et al. [11]. Starting from an initial estimator, the gradient can be communicated and the parameters can be obtained based on a communication-efficient surrogate likelihood framework. Note that, in [11], only the first machine solved optimization problems, and the global Hessian matrix was replaced by the first local Hessian matrix. To fully utilize the information on each machine, we choose a random machine m to solve optimization problems in every iteration. In this strategy, we define the loss function for a random machine m as
L ˜ ( Ω ) = l ( m ) ( Ω ) tr ( Ω l ( m ) ( Ω ¯ ) Ω l N ( Ω ¯ ) ) Ω + p ( | Ω | , λ ) ,
where Ω ¯ is an initial estimator of Ω and Ω l N ( Ω ¯ ) = m = 1 M Ω l ( m ) ( Ω ¯ ) , p ( , λ ) is a concave penalty function with a tuning parameter λ > 0 . In high-dimensional regimes, it is impossible to derive the closed-form solution of Ω . A naive method to remedy this is to add a strict convex quadratic regularization term ν 2 Ω Ω ¯ 2 2 , and use g ( Ω | Ω ¯ ) to approximate the surrogate loss function L ˜ ( Ω ) . Then, g ( Ω | Ω ¯ ) can be defined as
g ( Ω | Ω ¯ ) = l ( m ) ( Ω ¯ ) + tr Ω l ( m ) ( Ω ¯ ) ( Ω Ω ¯ ) tr ( Ω l ( m ) ( Ω ¯ ) Ω l N ( Ω ¯ ) ) Ω + ν 2 Ω Ω ¯ } F 2 + p ( | Ω | , λ ) .
Using (4), if we set Ω ¯ as the current t-th iteration Ω ( t ) , an approximate solution to (3) is obtained by the following iterative procedure
Ω ( t + 1 ) = arg min g ( Ω | Ω ( t ) ) .
At each iteration, the regularization term prevents its minimizer from moving too far away from Ω ( t ) . This feature performs a non-greedy update in searching for the estimation of a high-precision matrix [17]. We can use gradient descent to optimize g ( Ω | Ω ( t ) ) , and g ( Ω | Ω ( t ) ) can well approximate L ˜ ( Ω ) for Ω ( t ) close to Ω when the stepsize ν is chosen appropriately by the local sample covariance matrix S m in machine m. However, the gradient descent method needs to transmit O ( p 2 ) bits from each machine, which results in high communication costs and computation burden per round. In intuition, we should choose the block that can best update the global gradient rapidly with communication constraints. In this paper, we use the local gradient to guide the choice of the block, where the block in the t -th iteration is chosen using
C t = { ( j , l ) , | B j , l | ρ = vec ( | B | ) ( κ ) } ,
where vec ( B ) ( κ ) denotes κ -th largest component of the vector vec ( B ) , and B = Ω l ( k ) ( Ω ( t ) ) is the local gradient in machine k and k is a random machine chosen to optimize the surrogate negative likelihood l ( k ) ( Ω ) tr ( Ω l ( k ) ( Ω ¯ ) Ω l N ( Ω ¯ ) ) Ω + p ( | Ω | , λ ) . In every iteration, every machine transfers O ( κ ) bits rather than O ( p 2 ) gradient matrices, and we just update the gradient and precision matrix in block C t . Details can be found in Algorithm 1. With the aid of surrogate negative likelihood, we can efficiently train a global model that aggregates information from other machines. Thus, this has good potential to provide more reliable estimation results and degrade the communication loads and costs.
Regarding the update Ω , we need to design a distributed algorithm with communication constraints. To update Ω , we need to solve the following minimization problem:
Ω ( t + 1 ) = arg min Ω ν 2 Ω Ω ( t ) + ν 1 C t [ Ω l N ( Ω ( t ) ) ] 2 + ( j , l ) C t p ( Ω j l , λ ) ,
where
C t [ Ω l N ( Ω ( t ) ) ] j l = Ω j l l N ( Ω ( t ) ) ( j , l ) C t 0 ( j , l ) C t .
The penalty function in (6) for updating Ω can be chosen from Lasso, MCP, or SCAD off-diagonal penalties [18,19]. Closed-form solutions exist for the Lasso, SCAD, and MCP penalties for (6). For example, let S ( a , λ ) = sign ( a ) ( | a | λ ) + be the soft thresholding rule, and ( b ) + = b if b > 0 , and ( b ) + = 0 otherwise. Denote ϑ ( t + 1 ) = Ω ( t ) ν 1 C t [ Ω l N ( Ω ( t ) ) ]. The closed-form solution for the Lasso penalty is
ω j l ( t + 1 ) = S ( ϑ j l ( t + 1 ) , λ / ν ) ( j , l ) C t ω j l ( t ) ( j , l ) C t .
For the MCP penalty with ς > 1 / ν ,
ω j l ( t + 1 ) = S ( ϑ j l ( t + 1 ) , λ / ν ) 1 1 / ( ς ν ) | ϑ j l ( t + 1 ) | ς λ ϑ j l ( t + 1 ) | ϑ j l ( t + 1 ) | > ς λ ω j l ( t ) ( j , l ) C t .
For the SCAD penalty with ς > 1 / ν + 1 , the closed-form solution can be written as
ω j l ( t + 1 ) = S ( ϑ j l ( t + 1 ) , λ / ν ) | ϑ j l ( t + 1 ) | λ + λ / ν S ( ϑ j l ( t + 1 ) , ς λ / ( ( ς 1 ) ν ) ) 1 1 / ( ( ς 1 ) ν ) λ + λ / ν < | ϑ j l ( t + 1 ) | ς λ ϑ j l ( t + 1 ) | ϑ j l ( t + 1 ) | > ς λ ω j l ( t ) ( j , l ) C t .
where Ω ( t + 1 ) = ( ω j l ( t + 1 ) ) , ς is a parameter that controls the concavity of the MCP and SCAD function. In particular, the SCAD converges to the Lasso penalty as ς . Following Fan and Li [20], we treat ς as a fixed constant, such as ς = 3.7 . The SCAD not only enjoys sparsity as the L 1 penalty, but also has the property of unbiasedness, in that it does not shrink large estimated parameters, so that they remain unbiased throughout the iterations.
Note that the solution Ω 1 ( t + 1 ) to (6) is not symmetric in general. To make the solution symmetric, following the symmetrization strategy of Cai et al. [5] and Cai et al. [21], the final estimator Ω ( t + 1 ) is constructed through comparison and assigning the one with the smallest magnitude at both entries of Ω 1 ( t + 1 ) , which is
ω j l ( t + 1 ) = ω 1 j l ( t + 1 ) I ( | ω 1 j l ( t + 1 ) | | ω 1 l j ( t + 1 ) | ) + ω 1 l j ( t + 1 ) I ( | ω 1 l j ( t + 1 ) | < | ω 1 j l ( t + 1 ) | ) .
This symmetrizing procedure is not ad hoc. The procedure ensures that the final estimator Ω ( t + 1 ) achieves the same entry-wise L estimation error as Ω 1 ( t + 1 ) . For more details refer to Section 3 of Cai et al. [21].
We now discuss how to select the constraints κ and the stepsize ν . Regarding the setup for κ , a larger value of κ often transmits more information of each local client in each iteration and leads to a more accurate and faster convergence of Bgd. Nevertheless, a larger κ also means higher communication loads and costs. The choice of κ is a great challenge. The value of κ should not be too small or too large. Fortunately, we found the performance of Bgd was robust with a wide range of choices of κ within certain a interval, which facilitates the use of Bgd by avoiding an elaborative specification on κ . In addition, many empirical studies have shown that a smaller value of ν often leads to a faster convergence of the above algorithm. Theorem 1 indicates that only if ν is larger than Λ max ( S m ) , will the objective function be guaranteed to increase for every iteration. In practice, one can first use a tentatively small ν in local client m, and then check the condition based on the data in the local client m, as follows:
ν Δ Ω ( t ) F 2 tr ( ( Δ Ω ( t ) ) S m Δ Ω ( t ) ) ,
where Δ Ω ( t ) = Ω ( t + 1 ) Ω ( t ) . If (11) is not satisfied, we take ν as twice its current value. The proposed Bgd algorithm is summarized in Algorithm 1.
Algorithm 1 Distributed sparse precision matrix estimation via Bgd
  • Input      Initial value Ω ( 0 ) , number of iterations T.
  • For t = 0 , 1 , 2 , , T 1 :
                 • choose a random machine m, update C t ;
  •              • The machine m sends C t to machines through the central processor;
  •              • Each machine evaluates Ω l ( k ) ( Ω ( t ) ) and sends Ω C t l ( k ) ( Ω ( t ) ) to the central processor;
  •              • Then central processor transmit Ω C t l ( k ) ( Ω ( t ) ) to machine m and machine m computes
    Ω ( t + 1 ) = arg min Ω ν 2 Ω Ω ( t ) ν 1 C t [ Ω l N ( Ω ( t ) ) ] 2 + ( j , l ) C t p ( Ω j l , λ )
    and broadcast to machines through the central processor, where the regularizer ν is chosen through the local covariance matrix S m .
  • Output        Ω ( T ) .
Remark 1.
In the Bgd algorithm, we only transmit the gradient Ω C t l ( k ) ( Ω ( t ) ) and Ω C t with block C t to the central processor and client m, by setting C t 0 p 2 , we can efficiently reduce the communication loads and costs by avoiding transferring O ( p 2 ) bits of the gradient matrix and the estimated precision matrix.

3. Theoretical Properties

We conducted a theoretical analysis to justify the proposed Bgd procedure. In particular, we studied the efficiency of the Bgd estimator in a distributed setup. To investigate the properties of the proposed Bgd algorithm, we required the following conditions:
(A1)
(Sparse matrix class) Suppose that Ω * U with
U = Ω 0 : max 1 j p l = 1 p 1 { w j l 0 } s p , Ω L 1 Q .
(A2)
(Irrepresentability condition) Let δ = { ( j , l ) | Ω j l 0 } be the set of all non-zero entries of 0, and δ c be the complement of δ , for some 0 < α < 1 , the covariance matrix Σ * satisfies
max 1 i p Σ δ i c δ i * ( Σ δ i * Σ δ i * ) 1 α .
(A3)
(Bounded condition) There exists a constant c 0 1 such that c 0 1 Λ min ( Ω * ) Λ max ( Ω * ) c 0 , where Λ min ( A ) and Λ max ( A ) denote the smallest and largest eigenvalues of matrix A , respectively.
(A4)
(Restricted strong convexity for negative loglikelihood) There exists a positive constant c 1 and Ω ˜ such that
l N ( Ω * + Δ ) l N ( Ω * ) tr [ l ˙ N ( Ω * ) Δ ] = tr [ Δ l ¨ N ( Ω ˜ ) Δ ] c 4 Δ F 2 ,
for any Δ 0 satisfying Δ s c * 1 3 Δ s * 1 .
Condition (A1) indicates that the precision matrix has a sparse structure, it has been widely used in the literature on Gaussian graphical model estimation [5,6]. Condition (A2) is in the same spirit as the mutual incoherence or irrepresentable condition of Liu and Luo [6]. Condition (A3) requires that the smallest eigenvalue of the precision matrix Ω * is bounded below zero, and that its largest eigenvalue is finite. Condition (A3) also implies that c 0 1 Λ min ( Σ * ) Λ max ( Σ * ) c 0 . This assumption is commonly imposed in the literature for the analysis of Gaussian graphical models [22]. Condition (A4) states that the negative log-likelihood l N ( Ω ) is restricted to strong convexity at Δ s c * 1 3 Δ s * 1 .
Theorem 1.
Let Ω ( t ) be the sequence defined in the above Algorithm 1, if we use the local client m to compute 6, and set Ω ¯ = Ω ( t ) , ν > Λ max ( S m ) , then
L ˜ ( Ω ( t + 1 ) ) L ˜ ( Ω ( t ) ) .
Theorem 1, provided in Appendix A, indicates that with the appropriate scale ν , we can ensure L ˜ ( Ω ( t + 1 ) ) L ˜ ( Ω ( t ) ) with limited communication costs in every iteration. Theorem 1 provides an insight into choosing the stepsize ν with the local data under the distributed setting in a practical implementation.
Theorem 2.
Under the sub-Gaussian condition, suppose that assumptions (A1)–(A4) hold, if s p = o ( N / log p ) with λ C 1 Q log p / N for some C 1 > 2 , and setting ρ = C 2 Q log p / N for some C 2 0 ,
1 p Ω ^ Ω * F 2 = O p Q 2 s p log p N 0 .
Theorem 2, provided in Appendix A, shows the convergence rate under the Frobenius norm.

4. Numerical Studies

In this section, we present several simulation studies and a real data example, to investigate the finite-sample performance of the proposed Bgd procedure in terms of its estimation accuracy. We compare the proposed method with several other distributed high-dimensional precision matrix estimation methods: naive estimation based on averaging the local estimation obtained from the R package “glasso”(Naive) using R version 4.1.0, debiased distributed D-trace loss penalized estimation (Dtrace, [16]), and debiased distributed graphical lasso estimation (Dglasso, [15]). For a benchmark, we set the debiased D-trace loss penalized estimation proposed by Zhang and Zou [7] with all data in a non-distributed setting as the global method. Each estimator was tuned by cross-validation and all numerical experiments were conducted using the software R on a Microsoft Windows computer with a sixteen-core 4.50 GHz CPU and 32 GB RAM. In addition, in this paper, for the penalty function in the objective function (3), we chose lasso and SCAD.
In our numerical studies, the Bgd model was implemented based on Algorithm 1 with Ω ( 0 ) = I . We terminated the iterations when Ω C t ( t + 1 ) Ω C t ( t ) F / Ω C t ( t ) F < 10 3 or T T max , and we set T max = 300 in Bgd. We chose κ = p log p , obviously, κ p 2 , which could efficiently reduce the communication loads and costs by avoiding transferring O ( p 2 ) bits of the gradient matrix and the estimated precision matrix. Here, ⌞a⌟ denotes the largest integer part of a. The Bgd model was implemented based on Algorithm 1 and we evaluated the estimation accuracy of each method using Frobenius loss and spectral loss, as follows:
L F = Ω ^ Ω * F ,   L 2 = Ω ^ Ω * 2 .
Generally, the smaller L F and L 2 are, the higher the estimation accuracy. Moreover, to assess the accuracy with the sparseness of the true precision matrix recovered, we also evaluated false negative (FN) and false positive (FP) rates, as described below:
F N = i < j 1 ( w ^ i j = 0 , w i j * 0 ) i < j 1 ( w i j * 0 ) , F P = i < j 1 ( w ^ i j 0 , w i j * = 0 ) i < j 1 ( w i j * = 0 ) ,
The false negative rate gives the percentage of nonzero-elements that are wrongly estimated to be zero. In contrast, the false positive rate gives the percentage of zero-elements that are wrongly estimated as nonzero. Both values are desired to be as small as possible. For each model under study, we set M = { 5 , 10 , 20 } . We further specify the parameter settings in the following sections, where the corresponding simulation results are also given.
(S1) We first assessed the performance of Bgd and its competitors based on their estimation accuracy across two different values of p (i.e., p = 200 and p = 500 , and this design resulted in D = 200 2 and D = 500 2 parameters to be estimated, respectively). Here, we set Ω * as a band matrix, i.e., ω j l * = 0.45 for | j l | = 1 , ω j j * = 1.0 , and the other elements of Ω * were taken as zero, where j , l = 1 , , p . The sample size was N = 140 .
(S2) We evaluated the performance of Bgd and its competitors across two different values of total sample size N. To this end, we considered the dimensional p = 350 , with sample sizes N = 160 and N = 120 , respectively. Using a similar set as Wang and Jiang [23], we set Σ * = 0 . 4 | j l | , j , l = 1 , , p . In this setting, Ω * had a sparse structure.
(S3) We considered a case with varying sparsity levels of the precision matrix. To this end, let Ω 1 = ( ω j l ) , where ω j l = u j l δ j l , u j j = 1.0 and u j l = 0.5 , for j l , δ j l is the Bernoulli random variable with a success probability of 0.01 and 0.02 , respectively, and we chose ( N , p ) = ( 300 , 400 ) . We let Ω * = Ω 1 + 1 ( Λ m i n 0 ) ( | Λ m i n | + 0.02 ) I p to ensure that the precision matrix Ω * was positive definite.
For each of the above presented cases, we generated T = 100 datasets. For each of the 100 datasets, the aforementioned methods were adopted to perform high-dimensional distributed precision estimation. The average Frobenius loss L F , spectral loss L 2 , FN, and FP for each method are reported in Table A1. We investigated the effect of the number of machines M and the local sample sizes n in terms of the estimation error. As shown in Table A1, (i) the naive method performance was poor in all cases; (ii) Dtrace and Dglasso exhibited improvement over the naive estimate by debiasing local lasso estimators and averaging the debiased local estimators; (iii) as the number of machines increased, the Dtrace and Dglasso methods deteriorated drastically; (iv) considering more complex structures of precision and varied machines, the proposed Bgd method with lasso and SCAD penalties still achieved smaller errors than the other methods, and the SCAD method had smaller L F values than the Lasso, in general, since SCAD had more accurate selection results and produced less biased estimates. In summary, the proposed Bgd method outperformed the other methods regardless of the machine number M, and the structure of the precision matrix, in that the values of L F , L 2 and F N for the former were smaller than those for the latter except the Global method, which was the benchmark.
To investigate the effect on the number of machines M, we replicated the aforementioned simulation study for cases S1–S3 and varied the number of machines M from 5 to 30, and plot the L F in Figure A1. From Figure A1, the performance of the Naive, Dtrace, and Dglasso deteriorated drastically with the number of machines. In contrast, the L F of Bgd versus M was almost flat and was very close to the global method, even when M was large. The proposed Bgd still showed accuracies that surpassed its competitors.

Real Data Analysis

In this subsection, we applied our method Bgd to a real data example. The prostate cancer data are available at http://bioinformatics.mdanderson.org/ (acessed on 1 March 2023). The data consist of genetic expression levels for p = 6033 genes from 102 individuals (50 normal control subjects and 52 prostate cancer patients). This dataset has been analyzed in several articles for high-dimensional analysis [5,24].
To evaluate the performance of the proposed Bgd method for distributed precision estimation, we randomly partitioned 100 γ % samples as training data and the remaining 100 ( 1 γ ) % samples as testing data, and the data were equally partitioned into M = { 5 , 10 , 20 } data segments. Often having more than 50% training data is preferred [25], and we set γ = 0.6 or 0.8 , which led to every client having only 3 or 4 observations for training when the total machines M = 20 . For simplicity of calculation, we selected p = 300 genes from all the 16,386 genes using the Package “SIS” with the logistic model, which resulted in over 90,000 parameters to be estimated.
Our goal was to estimate the precision (inverse covariance) matrix in a distributed setting, and we could not use L 2 or L F to measure the estimation accuracy of each method as we did not know the true values of the Ω * . Following the same analysis as [5], the normalized gene expression data were assumed to be normally distributed as N ( μ k , Σ ) , where the two groups were assumed to have the same covariance matrix Σ but different means μ k , k = 1 , 2 . The estimated inverse covariance Ω produced by the different methods was used in the linear discriminant scores:
δ k ( x ) = x Ω ^ μ ^ k 1 2 μ ^ k Ω ^ μ ^ k + log π ^ k .
The classification rule was taken to be k ^ ( x ) = arg max δ k ( x ) for k = 1 , 2 . For simplicity, the μ ^ k and π ^ k in the linear discriminant scores were estimated by training data with the non-distributed setting, whereas Ω ^ was estimated by training data under a distributed setup. The classification performance was clearly associated with the estimation accuracy of Ω . The training dataset was used to perform parameter estimation, while the testing dataset was adopted to compute the classification error. We used the classification error in the testing dataset to assess the estimation performance and compare it with the existing results of other methods. A good estimation method for a precision matrix is expected to have a low misclassification (prediction error, Prr), high sensitivity (Sen), and specificity (Spe) for all partitions.
We summarized the assessment based on T = 100 replications in terms of sensitivity (Sen), specificity (Spe), as well as the overall prediction error. The proposed Bgd method outperformed all the other methods and even had a similar performance to the global method. The models chosen by Bgd had a higher sensitivity and specificity, and lower misclassification error (Prr). From Table A2, the promising performance of Bgd is again observed.
Moreover, to demonstrate that Bgd is robust to a wide range of choices of κ within a certain interval, we repeated the above procedure with γ = 0.8 and calculated their corresponding Prr values. Figure A2 plots the Prr values by taking τ as the x-axis with κ = τ p log p . Inspection of Figure A2 indicates that the proposed Bgd method was robust against a wide choice of κ and still performed better than the other methods, in that the Prr of Bgd was lower than the other methods with various κ .

5. Discussion

This paper proposes a novel method for high-dimensional precision estimation when the dataset is distributed into different machines. In this work, we studied distributed sparse precision matrix estimation via an alternating block-based gradient descent method, where the block was chosen by the local gradient. This procedure can increase the communication loads and costs for a reliable estimation. The proposed method showed good potential to improve the accuracy of estimation compared with the other distributed methods.
The current work focused on cases with homogeneous data analysis. It would be an interesting topic for future research to further extend the existing work to joint estimation of multiple precision matrices.

Author Contributions

Validation, H.L.; Writing—original draft, W.D. The authors carried out this work collaboratively. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Social Science Foundation of China (23BTJ061).

Data Availability Statement

The real data in this paper consist of genetic expression levels for p = 6033 genes from 102 individuals. For simplicity of calculation and comparison, we selected p = 300 genes from the total 16,386 genes using the Package “SIS” with the logistic model, which resulted in over 90,000 parameters to be estimated. The variables could be obtained through R code with “SIS (prostate$x, prostate$y, family = “binomial”, nsis = 300, iter = F) $ix0”.

Acknowledgments

The authors are grateful to the two reviewers for the constructive comments and suggestions that led to significant improvements to the original manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Tables and Figures

Table A1. Performance of the Bgd and competing methods in the simulation study.
Table A1. Performance of the Bgd and competing methods in the simulation study.
M = 5 M = 10 M = 20
SetupMeth L F L 2 FNFPTime L F L 2 FNFPTime L F L 2 FNFPTime
Naive5.360.7700.017.187.030.810.010.035.428.861.340.080.013.10
Dtrace3.220.70003225.140.800.0102075.431.350.030214
S1 (type1)Dglasso3.760.720.0109.444.970.970.0507.745.961.260.2603.66
Bgd-lasso3.220.700.01032.14.250.740.01015.94.260.750.02013.8
D = 200 2 Bgd-scad3.200.680.01045.13.820.730.01033.83.900.740.02029.2
Global2.800.680062.6
Naive9.300.7700.0381.812.10.960.010.0175.214.51.300.90022.8
Dtrace7.560.900.0409238.150.840.0208729.031.680.030864
S1 (type2)Dglasso6.040.850.0501087.750.900.18011510.21.390.380122
Bgd-lasso6.050.740.0204476.500.750.0203016.400.740.020144
D = 500 2 Bgd-scad6.000.820.0104485.950.770.0103086.030.820.010225
Global5.890.8200984
Naive12.11.210.010.0612.813.51.291.00015.213.51.841022.2
Dtrace9.961.150.010.0145710.41.080.04044610.91.920.270438
S2 (type1)Dglasso8.371.090.08018.88.411.060.19020.110.72.020.35024.7
Bgd-lasso8.070.960.010.0173.88.150.970.010.0137.68.200.980.010.0125.5
N = 160 Bgd-scad7.780.970.01090.37.870.990.010.0139.38.001.020.010.0162.8
Global7.891.010.040210
Naive11.91.120.190.049.7912.51.240.170.0410.416.24.281012.6
Dtrace11.21.350.020.0259210.81.160.24062414.83.710.990657
S2 (type2)Dglasso9.021.240.10020.29.091.250.30026.714.54.320.58032.8
Bgd-lasso8.861.150.020.0280.68.951.150.030.0240.912.21.280.040.0625.4
N = 120 Bgd-scad8.741.130.020.0292.98.891.150.030.0298.59.291.440.040.0381.7
Global8.681.120.030208
Naive15.22.020.40018.216.52.420.790.0119.117.02.640.92025.9
Dtrace9.451.360.20080512.11.800.250.0183416.72.490.950845
S3 (type1)Dglasso10.62.350.26020.814.52.790.060.0128.115.44.160.83030.4
Bgd-lasso8.151.180.020.0241610.21.590.01032810.81.6400.02232
p r = 0.01 Bgd-scad6.320.850.0204286.390.860.0403886.740.910.060280
Global5.540.950.040239
Naive24.13.320.63011.325.03.580.83015.525.13.800.96022.7
Dtrace16.12.150.30087416.22.260.36089221.92.790.800924
S3 (type2)Dglasso16.42.400.010.0611.717.83.600.180.0925.822.44.010.88032.1
Bgd-lasso10.61.4200.0539410.61.3900.0531311.91.580.010.05223
p r = 0.02 Bgd-scad9.401.1800.044089.431.1900.0535211.11.280.030.03261
Global10.11.490.090524
Table A2. Performance of the Bgd and competing methods in the real-data analysis with different partitions.
Table A2. Performance of the Bgd and competing methods in the real-data analysis with different partitions.
M = 5 M = 10 M = 20
M = 5 M = 10 M = 20
γ Meth.PrrSenSpePrrSenSpePrrSenSpe
Naive0.130.890.860.140.890.840.170.870.78
Dtrace0.110.880.890.120.890.870.150.870.84
Dglasso0.150.890.810.150.890.810.180.860.77
0.6 Bgd-lasso0.080.910.910.110.890.890.130.880.86
Bgd-scad0.100.890.900.110.880.890.140.860.86
Global0.060.920.95
Naive0.140.870.830.150.880.840.170.870.78
Dtrace0.110.880.890.120.890.870.150.870.84
Dglasso0.140.880.820.150.890.800.170.870.78
0.8 Bgd-lasso0.070.900.940.090.900.910.120.880.89
Bgd-scad0.070.900.940.080.890.950.130.870.87
Global0.040.960.97
Figure A1. The L F norms for type1 of cases S1–S3 with varies machines.
Figure A1. The L F norms for type1 of cases S1–S3 with varies machines.
Mathematics 12 00646 g0a1
Figure A2. The Prr for real data analysis with various τ .
Figure A2. The Prr for real data analysis with various τ .
Mathematics 12 00646 g0a2

Appendix A.2. Proof of the Main Results

Proof of Theorem 1.
For L ˜ ( Ω ( t + 1 ) ) and a random machine m, if we set the initial value as Ω ( t ) , then we take the Taylor expansion of l ( m ) ( Ω ) at Ω ( t ) and define Δ Ω ( t ) = Ω ( t + 1 ) Ω ( t ) , f ( Ω ( t ) ) = ( Ω l ( m ) ( Ω ( t ) ) Ω l N ( Ω ( t ) ) ) Ω ( t ) , there exists a Ω ˜ between Ω ( t ) and Ω ( t + 1 ) such that
l L ˜ ( Ω ( t + 1 ) ) + f ( Ω ( t ) ) l ( m ) ( Ω ( t ) )   = ( vec ( Δ Ω ( t ) ) ) vec ( Ω ) l N ( Ω ( t ) ) + 1 2 ( vec ( Δ Ω ( t ) ) ) vec ( Ω ) 2 l m ( Ω ˜ ) vec ( Δ Ω ( t ) ) +     p ( | Ω t + 1 | , λ ) ,
where
vec 2 ( Ω ) l m ( Ω ˜ ) = diag ( S m , , S m ) = S m S m .
By the fact that
( vec ( Δ Ω ( t ) ) ) vec ( Ω ) l m ( Ω ( t ) ) = tr ( Ω l m ( Ω ( t ) ) Δ Ω ( t ) ) ,
ν 2 ( vec ( Δ Ω ( t ) ) ) vec ( Δ Ω ( t ) ) = ν 2 Δ Ω ( t ) F 2 .
If we choose ν satisfying
1 2 ( vec ( Δ Ω ( t ) ) ) vec ( Ω ) 2 l m ( Ω ˜ ) vec ( Δ Ω ( t ) ) ν 2 ( vec ( Δ Ω ( t ) ) ) vec ( Δ Ω ( t ) ) ,
we have
L ˜ ( Ω ( t + 1 ) ) + f ( Ω ( t ) ) g ( Ω ( t + 1 ) | Ω ( t ) ) + f ( Ω ( t ) ) .
Note that Ω ( t + 1 ) = arg min Ω g ( Ω | Ω ( t ) ) subject to ( j , l ) C t , where j , l [ p ] , we have
g ( Ω ( t + 1 ) | Ω ( t ) ) g ( Ω ( t ) | Ω ( t ) ) = L ˜ ( Ω ( t ) ) ,
which means if ν Λ max ( S m ) and we use local client m to compute (6), one can obtain
L ˜ ( Ω ( t + 1 ) ) g ( Ω ( t + 1 ) | Ω ( t ) ) g ( Ω ( t ) | Ω ( t ) ) = L ˜ ( Ω ( t ) ) .
We have completed the proof of Theorem 1. □
Lemma A1.
Given X 1 , X 2 , , X N being i.i.d. sub-Gaussian random variables with V a r ( X i ) = Σ * , and suppose X i ψ 2 H . Let S m be the local covariance matrix of machine m and define S = 1 M m = 1 M S m . Then we have a constant C
S Σ * max C H 2 log p N .
The Lemma can be found in [26].
Proof of Theorem 2.
We prove the Theorem 2 by setting the penalty as Lasso, other penalties can simplify modified the proof. Considering
Ω ˇ = arg min Ω l N ( Ω ) + λ Ω 1 , off .
Define { Ω ˜ ( t ) } as the sequence generated by the following optimization problem without the block constraint C t :
arg min Ω h ( Ω | Ω ˜ ( t ) ) = l N ( Ω ˜ ( t ) ) + tr [ Ω l N ( Ω ˜ ( t ) ) ( Ω Ω ˜ ( t ) ) ] + ν 2 Ω Ω ˜ ( t ) F 2 + p ( | Ω | , λ ) .
Suppose the finial solution is Ω ˜ , then by theorem 3 of Beck and Teboulle [27], we have
l N ( Ω ˜ ) + p ( | Ω ˜ | , λ ) l N ( Ω * ) + p ( | Ω * | , λ ) + 1 / ( 2 t ) Λ max ( S ) Ω ˜ ( 0 ) Ω * F 2 .
By Taylor’s expansion, we have
l N ( Ω ˜ ) l N ( Ω * ) = tr [ ( Ω ˜ Ω * ) Ω l N ( Ω * ) ] + ( 1 / 2 ) tr [ ( Ω ˜ Ω * ) S ( Ω ˜ Ω * ) ] .
Note that using Lemma A1,
Ω l N ( Ω * ) = S Ω * I S * Ω * I + ( S S * ) Ω * Q log p / N λ / 2 .
When the iteration t > 4 λ diag ( Ω * Ω ˜ ) 1 / ( Λ max ( S ) Ω ˜ Ω * F 2 ) , we have
( 1 / t ) Λ max ( S ) Ω ˜ ( 0 ) Ω * F 2 4 λ diag ( Ω * Ω ˜ ) 1 .
Combining the above inequalities, and if we set p ( | Ω | , λ ) = λ Ω * 1 , off , we have
( 1 / 2 ) tr [ ( Ω ˜ Ω * ) S ( Ω ˜ Ω * ) ] λ Ω ˜ Ω * 1 + 2 λ Ω * 1 , off 2 λ Ω ˜ 1 , off + 2 λ diag ( Ω * Ω ˜ ) 1 .
which further implies that
( 1 / 2 ) tr [ ( Ω ˜ Ω * ) S ( Ω ˜ Ω * ) ] + λ Ω ˜ Ω * 1 2 λ Ω ˜ Ω * 1 + 2 λ Ω * 1 , off 2 λ Ω ˜ 1 , off + 2 λ diag ( Ω * Ω ˜ ) 1 4 λ j 1 j 2 S * ( Ω ˜ Ω * ) j 1 j 2 1 .
By the fact that tr [ ( Ω ˜ Ω * ) S ( Ω ˜ Ω * ) ] 0 , we have
Ω ˜ Ω * 1 4 ( Ω ˜ Ω * ) s * 1 .
Then, the condition (C4) is satisfied, and we have
C 3 Ω ˜ Ω * F 2 ( 1 / 2 ) tr [ ( Ω ˜ Ω * ) S ( Ω ˜ Ω * ) ] λ ( Ω ˜ Ω * ) s * 1 λ p s p Ω ˜ Ω * F .
Then, we have
1 p Ω ˜ Ω * F 2 C s p M 2 λ 2 = O p ( M 2 s p log p N ) .
Now, we turn to the surrogate loss function L ˜ ( Ω ) , when given the initial Ω ( t ) in t-th iteration, h ( Ω | Ω ( t ) ) has the same gradient and solution path as L ˜ ( Ω ) without the block constraints using the gradient descent method. Note that for Ω ( t ) , we have
1 M ( S m Ω ( t ) I ) ρ C 2 M log p n .
Using theorem 4 of [5] and the above statements, we have the same conclusion as (A3), that is
1 p Ω ^ Ω * F 2 = O p ( M 2 s p log p N ) .

References

  1. Hao, B.; Sun, W.W.; Liu, Y.; Cheng, G. Simultaneous clustering and estimation of heterogeneous graphical models. J. Mach. Learn. Res. 2017, 18, 7981–8038. [Google Scholar]
  2. Ren, M.; Zhang, S.; Zhang, Q.; Ma, S. Gaussian graphical model based heterogeneity analysis via penalized fusion. Biometrics 2022, 78, 524–535. [Google Scholar] [CrossRef]
  3. Jiang, B.; Wang, X.; Leng, C. Quda: A direct approach for sparse quadratic discriminant analysis. J. Mach. Learn. Res. 2018, 19, 1098–1134. [Google Scholar]
  4. Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef]
  5. Cai, T.T.; Liu, W.D.; Luo, X. A constrained l1 minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 2011, 104, 594–607. [Google Scholar] [CrossRef]
  6. Liu, W.; Luo, X. Fast and adaptive sparse precision matrix estimation in high dimensions. J. Multivar. Anal. 2015, 135, 153–162. [Google Scholar] [CrossRef]
  7. Zhang, T.; Zou, H. Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika 2014, 101, 103–120. [Google Scholar] [CrossRef]
  8. Cai, T.T.; Liu, W.D.; Zhou, H.H. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Stat. 2016, 44, 455–488. [Google Scholar] [CrossRef]
  9. Fan, J.Q.; Yuan, L.; Han, L. An overview of the estimation of large covariance and precision matrices. Econom. J. 2016, 19, C1–C32. [Google Scholar] [CrossRef]
  10. Lee, J.D.; Liu, Q.; Sun, Y.; Taylor, J.E. Communication-efficient sparse regression. J. Mach. Learn. Res. 2017, 18, 115–144. [Google Scholar]
  11. Jordan, M.; Lee, J.; Yang, Y. Communication-efficient distributed statistical inference. J. Am. Stat. Assoc. 2018. [Google Scholar] [CrossRef]
  12. Fan, J.Q.; Guo, Y.Y.; Wang, K.Z. Communication-efficient accurate statistical estimation. J. Am. Stat. Assoc. 2023, 118, 1000–1010. [Google Scholar] [CrossRef] [PubMed]
  13. Gao, Y.; Liu, W.D.; Wang, H.S.; Wang, X.Z.; Yan, Y.B.; Zhang, R.Q. A review of distributed statistical inference. Stat. Theory Relat. Fields 2022, 6, 89–99. [Google Scholar] [CrossRef]
  14. Li, X.X.; Xu, C. Feature screening with conditional rank utility for big-data classification. J. Am. Stat. Assoc. 2023, 1–22. [Google Scholar] [CrossRef]
  15. Arroyo, J.; Hou, E. Efficient distributed estimation of inverse covariance matrices . In Proceedings of the 2016 IEEE Statistical Signal Processing Workshop (SSP), Mallorca, Spain, 26–29 June 2016; pp. 1–5. [Google Scholar] [CrossRef]
  16. Wang, G.P.; Cui, H.J. Efficient distributed estimation of high-dimensional sparse precision matrix for transelliptical graphical models. Acta Math. Sin. Engl. Ser. 2021, 37, 689–706. [Google Scholar] [CrossRef]
  17. Dong, W.; Li, X.X.; Xu, C.; Tang, N.S. Hybrid hard-soft screening for high-dimensional latent class analysis. Stat. Sin. 2023, 33, 1319–1341. [Google Scholar] [CrossRef]
  18. Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
  19. Ma, S.; Huang, J. A concave pairwise fusion approach to subgroup analysis. J. Am. Stat. Assoc. 2017, 112, 410–423. [Google Scholar] [CrossRef]
  20. Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  21. Cai, T.T.; Li, H.Z.; Liu, W.D.; Xie, J. Joint estimation of multiple high-dimensional precision matrices. Stat. Sin. 2016, 26, 445–464. [Google Scholar] [CrossRef]
  22. Ravikumar, P.; Wainwright, M.J.; Raskutti, G.; Yu, B. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electron. J. Stat. 2011, 5, 935–980. [Google Scholar] [CrossRef]
  23. Wang, C.; Jiang, B. An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss. Comput. Data Anal. 2020, 142, 106812. [Google Scholar] [CrossRef]
  24. Xie, J.H.; Lin, Y.Y.; Yan, X.D.; Tang, N.S. Category-adaptive variable screening for ultra-high dimensional heterogeneous categorical data. J. Am. Stat. Assoc. 2019, 747–760. [Google Scholar] [CrossRef]
  25. Uçar, M.K.; Nour, M.; Sindi, H.; Polat, K. The effect of training and testing process on machine learning in biomedical datasets. Math. Probl. Eng. 2020, 2020, 2836236. [Google Scholar] [CrossRef]
  26. Xu, P.; Tian, L.; Gu, Q.Q. Communication-efficient distributed estimation and inference for transelliptical graphical models. arXiv 2016, arXiv:1612.09297. [Google Scholar]
  27. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dong, W.; Liu, H. Distributed Sparse Precision Matrix Estimation via Alternating Block-Based Gradient Descent. Mathematics 2024, 12, 646. https://doi.org/10.3390/math12050646

AMA Style

Dong W, Liu H. Distributed Sparse Precision Matrix Estimation via Alternating Block-Based Gradient Descent. Mathematics. 2024; 12(5):646. https://doi.org/10.3390/math12050646

Chicago/Turabian Style

Dong, Wei, and Hongzhen Liu. 2024. "Distributed Sparse Precision Matrix Estimation via Alternating Block-Based Gradient Descent" Mathematics 12, no. 5: 646. https://doi.org/10.3390/math12050646

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop