Next Article in Journal
Several Functions Originating from Fisher–Rao Geometry of Dirichlet Distributions and Involving Polygamma Functions
Next Article in Special Issue
A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control
Previous Article in Journal
Innovative Approach for the Determination of a DC Motor’s and Drive’s Parameters Using Evolutionary Methods and Different Measured Current and Angular Speed Responses
Previous Article in Special Issue
Research on Malware Detection Technology for Mobile Terminals Based on API Call Sequence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Adaptive Low Computational Cost Alternating Direction Method of Multiplier for RELM Large-Scale Distributed Optimization

1
College of Infomation Science and Technology, Zhejiang Shuren University, Hangzhou 310015, China
2
State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China
3
School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(1), 43; https://doi.org/10.3390/math12010043
Submission received: 23 October 2023 / Revised: 15 December 2023 / Accepted: 18 December 2023 / Published: 22 December 2023
(This article belongs to the Special Issue AI Algorithm Design and Application)

Abstract

:
In a class of large-scale distributed optimization, the calculation of RELM based on the Moore–Penrose inverse matrix is prohibitively expensive, which hinders the formulation of a computationally efficient optimization model. Attempting to improve the model’s convergence performance, this paper proposes a low computing cost Alternating Direction Method of Multipliers (ADMM), where the original update in ADMM is solved inexactly with approximate curvature information. Based on quasi-Newton techniques, the ADMM approach allows us to solve convex optimization with reasonable accuracy and computational effort. By introducing this algorithm into the RELM model, the model fitting problem can be decomposed into a set of subproblems that can be executed in parallel to achieve efficient classification performance. To avoid the storage of expensive Hessian for large problems, BFGS with limited memory is proposed with computational efficiency. And the optimal parameter values of the step-size search method are obtained through Wolfe line search strategy. To demonstrate the superiority of our methods, numerical experiments are conducted on eight real-world datasets. Results on problems arising in machine learning suggest that the proposed method is competitive with other similar methods, both in terms of better computational efficiency as well as accuracy.

1. Introduction

The extreme learning machine (ELM) [1] has received much attention in recent years, owing to its fast training speed and good generalization. Note that traditional extreme learning machines suffer from a limitation of memory with large scale datasets. Especially in the era of big data, the dataset scale is usually extremely large and the data are often very high-dimensional for detailed information [2,3,4,5,6] because the increasing complexity of the datasets enlarges the dimension of the hidden output matrix, which leads to a huge memory space and heavy computational load in matrix-inversion based (MI-based) solutions.
To address these limitations, some enhanced ELMs with parallel or distributed structures have been implemented to meet the challenge of large-scale datasets, as shown in Table 1. For example, ELM based on the MapReduce framework can effectively calculate the matrix multiplication in parallel [7,8], and has an efficient learning ability in massive rapidly updated datasets [9]. However, the parallel ELM based on MapReduce creates a large amount of extra overhead and degrades the learning speed. The algorithm based on the Spark parallel framework is then proposed to speed up the whole computing process of ELM for big data [10].
The methods discussed above focus on the computation of MI-based solutions using parallel and distributed hardware structures and programming models. The alternating direction method of multipliers (ADMM), without the time-consuming MI operation, is also an effective method for distributed optimization [3,11,12]. By using the ADMM framework, the model fitting problem can be decomposed into a set of subproblems that can be executed in parallel to achieve efficient classification performance so as to meet the needs of large-scale data processing in the real environment. To achieve optimal performance without user oversight, an adaptive method that automatically tunes the key algorithm parameters is applied to improve the Relaxed ADMM [13]. Appropriate selection of the penalty parameter is crucial to obtaining good performance from the ADMM. While analytic results for optimal selection of the penalty parameter are very limited, an adaptive penalty strategy based on residual balancing is proposed to obtain good performance from the ADMM [14] because a convex model fitting problem can be split into a set of concurrently executable subproblems. Therefore, in a big data environment, the regularized least-squares problem is split across the coefficients and incorporates a relaxation technique to achieve good convergence [15]. Furthermore, elastic-net theory is employed to simultaneously improve the sparsity and stability of the model, which develops an accelerated ADMM algorithm [16].
Table 1. Review of various approaches of enhanced ELMs in the literature.
Table 1. Review of various approaches of enhanced ELMs in the literature.
FrameworkUtilized TechniquesMetricsDatasetsMain Characteristics
Parallel or distributed learningMapReduce [7]Running time, Speed upSynthetic datasetsParallel computing ability, Efficient learning of large-scale data
MapReduce [9]Running time, Update ratioSynthetic datasetsEfficient learning in massive rapidly updated dataset
MapReduce [8]Speed up, Scaleup, SizeupReal datasetsParallelism, Low runtime memory, Good scalability
Spark [10]Running time,
Speed up, Accuracy
Synthetic datasetsFault tolerance, Persist/cache strategies
ADMMResiduals normalization [14]Iteration numberSynthetic datasetsRobust in sparse coding
Adaptive penalty, Relaxation technique [13]Iteration numberReal datasetsWithout user oversight or parameter tuning
Maximally splitting, Relaxation technique [15]Convergence ratio, Acceleration ratiosReal datasetsFast convergence, Less computations, High parallelism
Inertial technique, Bregman distance [17]Training time, Constraint errorsSynthetic datasetsGlobal convergence, High acceleration
For real-time data classification in the convex optimization problem, the prime problem can be decomposed into several subproblems by leveraging ADMM [18]. The global optimal solution to the original problem can be obtained by processing the subproblem in parallel. Fast convergence speed and parallelism make ADMM suitable for solving large-scale distributed optimization problems. However, subproblem optimization must be solved at each iteration, which imposes a heavy calculation burden [19]. Numerical experience has shown that the effective solution of subproblems is critical to the performance of ADMM [20].
Several alternatives are available for unconstrained optimization such as Newton-type methods [21], Chebyshev-like methods [22], the quasi-Newton method(QNM) [23], and so on [24]. These methods require less computational effort to calculate the search direction, thus demonstrating rapid convergence. The RLS problem in RELM mainly involves the computation of a Hessian matrix and the gradient of the cost function. The second-order partial derivatives of the Hessian matrix can be avoided by putting forth the displacement and first derivative information of two adjacent entries [25,26]. Combined with line search techniques, it could achieve attractive global convergence properties.
However, in machine learning and image processing, the computation of the Hessian matrix at each iteration is not a trivial task [13]. The cost of storing and working with Hessian approximations can be excessive for large matrixes. In order to reduce the storage of the Hessian approximations, variants of the quasi-Newton approach, such as limited-memory BFGS (LBFGS) and stochastic QNM [27,28,29,30,31], are developed to store Hessian approximations compactly. Azam et al. [32] analyzed the convergence performance of L-BFGS in convex optimization problems and further proved their practical value in solving large-scale optimization problems. Aryan et al. [28] proposed stochastic QNM(SQN), which uses second-order information to accelerate stochastic convergence, and modified the update formula of BFGS to ensure that the eigenvalues of the Hessian approximation matrix remain bounded, so as to ensure that the function can obtain extreme values. Chen et al. [30] proposed the stochastic damped L-BFGS to ensure positivity and avoid ill-conditioned outputs in the Hessian update process by introducing damping parameters. The computation of Hessian matrixes often involves the step-size parameter, which can be determined by line search method. Backtracking line search is commonly used to guarantee convergence in case the linear model assumptions break down and an unstable stepsize is produced. It is time-consuming and may lose its advantage in other types of ELMs.
In this paper, we study a low-cost computational scheme for the ADMM and jointly devise an adaptive step-size selection. The stochastic damped optimal L-BFGS(R-SDL-BFGS) is therefore derived, which improves the computational efficiency of the ADMM. Our contributions can be summarized as follows:
(1)
Low-cost computational scheme: The curvature information from recent iterations is used to reduce computational cost;
(2)
Damped BFGS correction scheme: The damping technology is introduced into the BFGS to make up for the deficiency of the Hessian approximation matrix in the positive definiteness of the solution, and to ensure the positive definiteness of the BFGS matrix in non-convex optimization;
(3)
Step factor selection scheme: The non-monotonic Wolfe-type strategy is applied to the memory gradient method, combined with the BB spectral gradient descent, to obtain the optimal step-size factor.
Finally, we compare the proposed method to other ADMM variants by experiments of real-world classification and image processing problems.

2. Preliminaries

2.1. Convex Optimization

There are great advantages to recognizing or formulating a problem as a convex optimization problem. The most basic advantage is that the problem can then be solved, very reliably and efficiently, using interior-point methods or other special methods for convex optimization. These solution methods are reliable enough to be embedded in a machine learning model, or even a real-time reactive or automatic control system. An example would be the single-node perceptron neural network, in which the parameter optimization can be formulated as a convex optimization problem.
The idea behind the convex optimization problem can be extended to a more general form, in which the convex minimization problem can be expressed as an unconstrained optimization problem. There are also theoretical or conceptual advantages in formulating a problem as an unconstrained problem in optimization research. For the purpose of achieving good numerical performance, it is helpful to make use of convex functions to find bounds on the optimal value. Thus, in most cases where combinatorial optimization and global optimization is needed, it is highly advantageous to make use of it.
In particular, the unconstrained optimization problem can be summarized as follows.
min f ( x ) , x R n
where f ( x ) denotes the objective function with variable x R .
There is in general no analytical formula for the solution of convex optimization problems, but there are very effective methods for solving them. The ability to analytically solve the unconstrained optimization problem is the basis for BFGS [31], a powerful method for convex optimization. For convex optimization problem, the BFGS resorts to an iterative algorithm to achieve accurate solutions. A general iterative numerical method has the form:
x k + 1 = x k + α k d k
d k = B k 1 g k
where d k is search direction, g k indicates the gradient of f ( x k ) at x k , and α k is the step size. Here, B k denotes the approximate Hessian matrix. The BFGS algorithm approximates the inverse of the Hessian matrix by constructing a symmetric, positive definite initial moment  B 0 , and then iteratively updating the B .
The essence of BFGS is to approximate the Hessian using finite differences computed by successive iterations and gradients, so the gradient estimation and the iterative difference are defined as follows:
y k = g k + 1 g k
s k = x k + 1 x k
Because the size of the Hessian may vary across iterations, it is not trivial to calculate an appropriate Hessian at each iteration. To address this problem, this paper seeks to approximate the Hessian to avoid calculating the inverse of the Hessian matrix for each iteration.

2.2. Limited-Memory BFGS Algorithm

Different correction formulas of the quasi-Newton matrix B k + 1 represent different QNM. The BFGS correction formula can maintain the symmetric positive definiteness of the iterative matrix B k . And the BFGS algorithm has global convergence for the convex minimization problem, and therefore the BFGS correction formula is commonly used to solve B k + 1 . The BFGS correction formula is as follows:
B k + 1 = B k B k s k s k B k T s k T B k s k + y k y k T y k T s k
The line search method has been proved useful in practice since selection of the step size α k determines where along the line x k + α k d k | x R the next iterate will be. The exact line search is used when the cost of the minimization problem is with one variable. However, in an exact line search method, the values of B k and one or more derivatives at multiple values of k must be found. It usually gives a very small improvement in efficiency. For these reasons, most practical implementations use an inexact line search, which is guaranteed to be feasible for the unconstrained problem. Common inexact line search criteria are given by [23]:
(1)
Armijo criteria:
f ( x k + α k d k ) f ( x k ) + c 1 g k T α k d k
(2)
Wolfe criteria:
g ( x k + α k d k ) T d k f ( x k ) + c 2 g k T d k
where 0 < c 1 < c 2 < 1 .
The BFGS algorithm finds it difficult to solve large-scale unconstrained optimization problems on account of the calculation and storage of metrics in each iteration. That is, in each iteration, the BFGS approximates the inverse of the Hessian with gradient information, and the computional cost is expensive. In this section, the LBFGS method [33] is introduced to reduce the computational cost. It stores the matrix in a vector form that makes it possible to reduce computational effort for transmission or storage of data (assuming the vectors are stored using an appropriate data structure). To simplify the calculation of the inversion matrix, by applying the the Sherman–Morrison formula, the corresponding BFGS correction formula is transformed into:
B k + 1 1 = ( I s k y k T y k T s k ) B k 1 ( I y k s k T y k T s k ) + s k s k T y k T s k
y k = 1 M k k = 1 M k f ( x k + 1 ) f ( x k )
where M k is storage length.

3. Adaptive Stochastic Damping Optimization for Limited Memory

The positive definite matrix plays a significant role in convex optimization. And the BFGS uses the positive definite matrix B k to approximate the Hessian. Note that, during iterations, the B k may become a singular matrix, which will significantly affect the convergence of the algorithm. Simultaneously, BFGS requires that the optimization problem must be convex. Otherwise, the B k may become non-positive, and the decreasing step size may not be positive. Therefore, we need to deal with non-convexity and ill-conditioned behavior to guarantee the positivity of B k .

3.1. Proposed Damped SL-BFGS Method

In the optimization process of BFGS, the Hessian approximation B k may become a non-positive definite matrix if the property s k T y k > 0 is not satisfied. In this case, it may not be possible to ensure that the algorithm is along the best search direction. Considering that the convergence of BFGS relies heavily on the positive definite matrix, damping technology [23] is used to correct the BFGS update formula. This makes up the deficiency of the solution to the positive definiteness of the Hessian approximation, which therefore maintains the positive definiteness of the B k .
The optimization of LBFGS reduces the computational cost from the perspective of transmitting or storing data. Nevertheless, such modifications do affect the accuracy of the Hessian approximation. Stochastic optimization methods, as a popular optimization tool, can effectively obtain good analytical solutions. By applying the stochastic gradient information to approximate the curvature of the objective function in the convex optimization, the optimal analytical solution can be obtained, which also speeds up the convergence.
Because the noise of the stochastic gradient may be infinitely amplified in the curvature estimation, the Hessian approximation matrix will be negatively affected, which reduces the convergence speed. We shall adjust the gradient estimation y k and descent distance estimation s k by different batch sizes, thereby decoupling the computations of stochastic gradient and curvature estimation. By extending the random damping technique to the LBFGS, y k and s k can be represented by y ^ k and s ^ k , respectively.
s ^ k = x ^ k + 1 x ^ k , x ^ k = 1 b k = ( k 1 ) b j b 1 x k , j 1 x 0 , j = 0
y ^ k = ω k y k + ( 1 ω k ) B k s ^ k
ω k = 0.8 s ^ k T B k s k s ^ k T B k s k s ^ k T y k , s ^ k T y k < 0.2 s ^ k T B k s ^ k 1 , others
where b is interval length (also called batch size), and scalar ω k denotes the damping factor.
It is possible to design a more efficient model that computes the stochastic gradient and target variable with a batch update of batch size b in each iteration.
x ^ k + 1 = x ^ k γ k B k 1 g k , mod ( d , k ) = 0 x ^ k γ k g k , others
g k = f ( x ^ k ) γ k = γ 0 τ 0 τ 0 + k
where γ k represents the batch step size, and γ 0 τ 0 are constant.
Although knowledge of gradient information allows BFGS to gradually approximate the inverse of the Hessian, the search direction also plays a crucial role in the global convergence. We should ensure that the algorithm makes reasonable progress along the given search direction and focus on finding a suitable step length along this direction.

3.2. Robust Optimization Approach for Limited Memory

As a common descending direction search criterion, the key to the success of the inexact line search is that the convergence performance of each step must be monotonically decreasing. In many cases, it is possible to leverage a non-monotonic search technique [34] to relax the convergence conditions while overcoming the oscillation phenomenon, whereas this method makes it easy to obtain the local extrema when the initial value is taken near the local function valley.
To avoid the above problems, a non-monotonic Wolfe search strategy can be devised. This method combines current and past function iteration information to find global solutions. By introducing this method for a convex optimization problem, the objective value changes by rules based on
x k = x k , k = 1 [ ( 1 ξ k ) x k + ξ k x k ] , k 2
ξ k = ζ x k 2 x k 2 + x k T x k 1 , ζ ( 0 , 1 )

4. The Adaptive Stochastic Optimization to RELM Design

Consider a large-scale optimization problem, in which the training data has high dimension and large volume. The MI-based RELM method will require more computations. As a powerful tool for solving large-scale optimization problems, ADMM can greatly improve the speed of convergence by parallel computing. Using ADMM, a large-scale optimization problem can be split into a set of concurrently executable subproblems, each with just a subset of model coefficients. As for ADMM, efficiency and robustness of the numerical computation greatly depend on the effective solution of the subproblem. The complexity of subproblems therefore limits the convergence of the algorithm.

4.1. AADMM with Low Computing Cost

ADMM has advantages of dealing with convex optimization problems owing to its good convergence and parallel structure [35]. In practice, optimal trade-off solutions are explored for convex optimization problems, based on the separable structure. Define [ A = [ a 1 , a 2 , , a P ] R N × P , a i R N ] as the data matrix. x R P indicates the model coefficient, b R N denotes the target output, f ( · ) and h ( · ) are the convex loss function and the convex regularization function, respectively. x can be partitioned as [ x 1 , x 2 , , x L ] T , x l R p l where l = 1 L p l = P . Similarly, partition A as [ A 1 , A 2 , , A L ] , A l R N × p l . According to the above definitions, h ( x ) = l = 1 L h l ( x l ) . Then, a convex optimization problem can be given by
min f ( l = 1 L z l b ) + l = 1 L h l ( x l )
A l x l z l = 0
where Z = [ z 1 , z 2 , , z L ] , z l R N .
Its augmented Lagrangian function is given by
L ρ ( x , Z , Λ ) = f ( l = 1 L z l b ) + l = 1 L h l ( x l ) + l = 1 L λ l T ( A l x l z l ) + ρ 2 l = 1 L A l x l z l 2 2
where ρ > 0 represents the penalty factor, and Λ = [ λ 1 , λ 2 , , λ L ] R N × L denotes the dual variable.
The key technology that affects the performance of ADMM lies in the division of subproblems and the selection of hyperparameters. Built upon the maximally split technique, a method for basic vector and matrix operations that ensures that each partition subproblem only contains one scalar component to take advantage of specific computer architectures. Consider the L partition ADMM with L = P ; the optimization problem (18) is therefore maximally split into P subproblems. In this case, the matrix A l is reduced to a vector a l , and vector x l is reduced to a scalar x l . To also account for past iterates when computing the next ones, an alternative relaxation technique is used, in which z l is replaced by ( 1 α ) a l x l k + 1 + α z l . Then, the corresponding iterative solution process turns into
x l k + 1 = arg min x l h l ( x l ) + ρ 2 a l x l z l k + λ l k α ρ ) 2 2 Z k + 1 = arg min Z f ( l = 1 L z l b ) + ρ α 2 2 l = 1 L a l x l k + 1 z l + λ l k α ρ 2 2 λ l k + 1 = λ l k + α ρ ( a l x l k + 1 z l k + 1 )
where α > 0 is the relaxation parameter.
In view of the fact that the number of iterations required for ADMM convergence depends on the penalty factor, we try to automatically adjust the value of the key parameters in each iterative process to improve the convergence speed. This can be achieved by balancing the original residual and dual residual to measure the algorithm convergence.
r k = A x k z k θ k = ρ A T ( z k z k 1 )
where r k and d k are primal residual and dual residual, respectively. According to [36], the iteration is generally stopped when
r k 2 ε t o l max A x k 2 , z k 2 , d k 2 ε t o l max A T λ k 2
where ε t o l denotes the termination tolerance. It can be observed that the higher penalty term results in smaller primal residuals but larger dual ones, while a smaller penalty term yields larger primal and smaller dual residuals. In order to maintain small residuals at convergence, the penalty parameter ρ should be adaptively tuned to balance the residuals.
Following [37] the Barzilai–Borwein(BB) and Douglas–Rachdord splitting (DRS) method, the optimal ρ that minimizes the residual is given by
ρ k = arg min ρ 1 + θ k σ k ρ 2 ( θ k + σ k ) ρ = 1 θ k σ k
where θ k > 0 and σ k > 0 are the spectral gradient descent stepsizes for the dual variable of problem (18) at iteration k. Under this choice of ρ k , we then have the optimal relaxation parameter α k
α k = 1 + 1 + θ k σ k ρ k 2 ( θ k + σ k ) ρ k 2
The model parameters θ k and σ k can be estimated based on the results from iteration k and an older iteration k 0 < k . Let S D stand for the steepest descent and M G for the minimum gradient, and then
θ ^ k = θ ^ k M G , 2 θ ^ k M G > θ ^ k S D θ ^ k S D θ ^ k M G / 2 , otherwise
where θ ^ = 1 / θ . σ k can be obtained in a similar way. For a detailed account of these formulas, see [13].
The specific steps of the proposed LCC-AADMM are shown in Algorithm 1.
Algorithm 1: LCC-AADMM
Data: 
optimization variable x k , batch step size γ k , interval length b, initial Hessian approximation matrix B k + 1 1 , storage length M k , number of iterations k, constant γ 0 , τ 0
Result: 
the optimal variable x
1
Initialization;
2
Randomly select samples with batch size b and calculate the stochastic gradient g k of the objective function;
3
Update the target variable x k by Equation (14);
4
Calculate the distance estimation s k and gradient estimation y k of the objective function by Formulas (11) and (12);
5
Calculate the dual variable λ k and auxiliary variable z k by Formula (21);
6
Judging whether the iteration termination condition (23) is satisfied; if it is satisfied, terminate the iteration, or calculate and update the BFGS approximate matrix by Formula (9), and return to iterative step 1, and the number of iterations is increased by 1;

4.2. RELM Models with Stochastic Optimization Constraints

For an M-category classification problem, assuming that the training sample is X = [ x 1 , x 2 , , x S ] R L × S and the number of hidden layer nodes is I, the model output f m ( x s ) is given by
f m ( x s ) = i = 1 I F ( w i T x s + τ i ) β i m
where w i R L is the weight vector, and τ i is the bias of i-th node. Final output can be obtained with the combination of the activation function F ( · ) and the weight matrix B R I × M .
To improve the generalization performance and stability, a regularization theory is introduced to simplify the ELM model. If we define the target output as T R S × M and the hidden-layer output as N R S × I with n s i = F ( w i T x s + τ i ) , then the least squares problem is turned into
min B 1 2 N B T F 2 + 1 2 μ 2 B F 2
where · F indicates the Frobenius norm, and μ > 0 represents the regulation factor.
Formula (28) is the objective function, which is a convex optimization calculation model with separable structure. To simplify the objective function, the ADMM calculation framework is introduced to decompose the objective function into a set of concurrent executable subproblems. Therefore, each subproblem has a subset of the model coefficients; that is, Formula (28) is equivalent to Formula (18). Then, the RELM method (28) can be derived from LCC-AADMM.
Denote v m k = [ v 1 m k , v 1 m k , , v ( S + I ) m k ] T , z m k = [ z 1 m k , z 1 m k , , z ( S + I ) m k ] T and λ m k = [ λ 1 m k , λ 1 m k , , λ ( S + I ) , m k ] T . Then, the RELM is expressed as
β i m k + 1 = β i m k α ^ n ^ i 1 ( v m k + λ m k z m k ) v j m k + 1 = h ^ j T β m k + 1 z j m k + 1 = 1 1 + ( I 1 α 2 ρ ) t ^ j m + I 1 α 2 ρ 1 + ( I 1 α 2 ρ ) ( v j m k + 1 + λ j m k α ρ ) λ j m k + 1 = λ j m k + α ρ ( v j m k + 1 z j m k + 1 )
where α ^ = I 1 α , i = 1 , 2 , , I and j = 1 , 2 , , S + I . N ^ = I 1 [ N T , μ I I ] T . Then, N ^ = [ h ^ 1 , h ^ 2 , , h ^ S + I ] T = [ n ^ 1 , n ^ 2 , , n ^ I ] . T ^ = I 1 [ T T , 0 M × I ] T with T ^ = [ t ^ 1 , t ^ 2 , , t ^ M ] and t ^ m = [ t ^ 1 m , t ^ 2 m , , t ^ ( S + I ) m ] T .
By reconstructing the RELM method, all variable updates can be decomposed into scalar variable updates, which has a highly parallel structure. This is the desired structure for optimization applications with high computational cost.

5. Simulation Experiment and Result Analysis

As with the R-SDL-BFGS method discussed so far, it can not only be well applied to unconstrained optimization problems but also solve convex optimization problems. In unconstrained optimization problems, the ability to find the bounds of the optimal value and the stochastic robust approximation make the proposed method superior to other quasi-Newton algorithms. To verify this claim, simulations are carried out on four benchmark functions (Branin Function, Levy Function N.13, Matyas Function, and Three-Hump Camel Function) to compare the performance of R-SDL-BFGS, SD-BFGS, LBFGS, andBFGS methods. Experiments are conducted using MATLAB 2019(The MathWorks, Inc., Natick, MA, US) on a desktop with an Intel Core i7-10700 8-core CPU and 16GB of RAM. At the same time, in order to ensure that the performance of the algorithm is not accidental, it is necessary to take the average value of multiple experiments to ensure whether the algorithm has good convergence performance in each experiment, so as to ensure that the algorithm has good robustness. The specific description is shown in Table 2.
For convex optimization problems, the R-SDL-BFGS method can simplify the solution process of subproblems with the Hessian approximation matrix, thereby reducing the ADMM computational cost and improving the speed of convergence. To verify this claim, simulations are carried out on eight benchmark datasets to compare the performance of R-SDL-BFGS with LCC-AADMM, MS-AADMM and RB-ADMM algorithms. Here, the benchmark datasets include Gisette, USPS, Magic, BASEHOCK, Pendigits, Optical-Digits, statlog, and PCMAC. The characteristics of the eight datasets are shown in Table 3. The first six sets of data in the table are from the UCI machine learning library, and the last two are from the ASU feature selection dataset.

5.1. Comparative Analysis of Convergence Performance of Quasi-Newton Algorithms

This section reports the the convergence speed of different algorithms, based on the number of iterations required before f ( x k + 1 ) f ( x k ) Q , where Q is the error stop station. In general, the quasi-Newton method needs simple line search procedures to satisfy the termination condition. This property leads to a low computational cost during the training phase. Therefore, it is convenient to use the iteration step to evaluate the effectiveness of the method.
As shown in Table 4 and Table 5, the standard BFGS algorithm avoids the problem of singular matrices by replacing inverse matrices with Hessian approximation matrices. However, this algorithm needs to calculate and store the matrix in each iteration, which leads to an expensive computational cost as well as a slow convergence.
To ensure the positive definiteness, the SD-BFGS algorithm is devised by making use of damping technology so as to maintain the positive definiteness of the BFGS matrix in non-convex optimization. Since the global convergence of the algorithm depends on the monotonicity of the function, a suitable numerical solution of non-monotonic equations is in general not feasible.
In order to attain asymptotic linear convergence, a non-monotonic Wolfe-type strategy can be applied to the memory gradient method (R-SDL-BFGS). By combining the current function iteration information and the function information of multiple points in the past, it overcomes the oscillation phenomenon and improves the global convergence of the algorithm.
From a theoretical point of view, the R-SDL-BFGS algorithm has a better global convergence and convergence speed compared with the BFGS, SD-BFGS, and R-SDL-BFGS methods. Experiments on benchmark function are presented in Table 4, which shows the speed of convergence can be quite acceptable. As can be seen from Table 5, the convergence rate of R-SDL-BFGS is 88.7260 % higher than that of BFGS in the CEC benchmark function. The convergence rate of R-SDL-BFGS is 69.20135 % higher than that of SD-BFGS in the CEC benchmark function. According to the numerical experiment results, it can be seen that the performance of the R-SDL-BFGS is completely consistent with the theoretical results and achieves good classification performance in practice. The results of experiments in Figure 1, Figure 2, Figure 3 and Figure 4 and Table 5 demonstrate that the R-SDL-BFGS algorithm converges faster than other QNMs.

5.2. Convergence Performance Comparative Analysis

The complexity of the iterative algorithm is determined by the complexity of the unit computation and the number of iterations, where the unit computation complexity refers to the number of floating-point operations required by the optimization algorithm for single iteration, and the complexity of the iteration times refers to the number of iterations required to calculate the solution with a given precision. However, since the unit computation complexity in the iterative process is almost the same, the convergence performance of the algorithm is evaluated by comparing the complexity of iterations.
In this section, the convergence performance of different ADMM algorithms under the same error condition are proved, and the iterative termination condition (27) is the key to evaluating the convergence speed of the algorithm. Under the same iteration termination condition, LCC-AADMM, MS-AADMM, and RB-ADMM methods are evaluated on eight benchmark datasets. At the same time, in order to ensure that the performance of the algorithm is not accidental, it is necessary to take the average value of multiple experiments to ensure whether the algorithm has good convergence performance in each experiment so as to ensure that the algorithm has good robustness. Additionally, these algorithms for computing and approximating the matrix are analyzed by comparing the iteration number.
From a theoretical point of view, for the M-category classification problem, we can rewrite (28) as
min β N β t F 2 β = vec ( B ) R I M × 1 N = I M [ N T , μ I I T ] T R ( S + I ) M × I M t = vec ( [ T T , 0 I × M T ] T ) R ( S + I ) M × 1
where ⊗ is the Kronecker product, and vec ( · ) means the concatenation of all columns of the matrix. Then, the MS-ADMM (31) and RB-AADMM (32) are as follows:
β i m k + 1 = arg min β i m g i m ( β i m ) + ρ 2 n i m β i m z i m k + λ i m k ρ 2 2 Z k + 1 = arg min Z i m = 1 I M z i m t + ρ 2 i m = 1 I M n i m β i m k z i + λ i m k ρ 2 2 λ i m k + 1 = λ i m k + ρ ( n i m β i m k + 1 z i m k + 1 )
β i m k + 1 = arg min β i m ρ 2 n i m ( β i m β i m k ) α ( z i m k n i m β i m k λ i m k ρ α ) 2 2 Z k + 1 = arg min Z i m = 1 I M z i m t + ρ α 2 2 i m = 1 I M z i m n i m β i m k + 1 λ i m k ρ α 2 2 λ i m k + 1 = λ i m k + ρ α ( n i m β i m k + 1 z i m k + 1 )
where i m = 1 , 2 , , I M and z i m = n i m β i m .

5.2.1. Convergence Performance Analysis Compared with RB-ADMM

Combining (29) and (31), it can be found that most of iterative steps in LCC-AADMM are linear, which reduces the computational cost and improves the speed of convergence. From a theoretical point of view, the choice of the penalty factor is of practical importance in improving the overall performance of the model. Although RB-ADMM can auto-adjust the penalty factor by balancing relationship between the dual residuals and the primitive residuals, it can be seen that the computational cost of RB-ADMM varies greatly with the size of the problem. At the same time, if no proper penalty factor is chosen, then the algorithm may not converge.
The parameter selection scheme can provide fast and accurate estimates of the optimal parameters of the algorithm. To improve the convergence performance, LCC-AADMM use step-size selection constraints to construct the adaptive parameter selection scheme, where the step size is chosen to satisfy the Wolfe conditions. Also, instead of computing Hessian approximation afresh at every iteration, LCC-AADMM updates it in a simple manner to account for the curvature measured during the most recent step. This makes LCC-AADMM converge more rapidly than RB-ADMM.
The simulation results of the objective function are given in Table 6, showing that R-SDL-BFGS has the desired effect of reducing ADMM computational costs. For comparison, the improvement rate of different algorithms is shown in Table 7. As can be seen from Table 7, the convergence rate of LCC-ADMM is 96.6109 % higher than that of RB-ADMM in the two types of classified datasets. In the 6-classification datasets, the convergence speed is improved by 96.8750 % on average. In the 10-classification datasets, the convergence rate is improved by 95.9727 % on average.

5.2.2. Convergence Performance Analysis Compared with MS-AADMM

One powerful approach to obtaine the optimal output weights starts from an appropriate parameter selection scheme, which allows us to use an adjustable step size to speed up algorithm convergence. For the actual numerical performance of ADMM, the subproblem solving process is the key to determining the performance of the algorithm. However, MS-AADMM has ignored this key factor.
It can be known from (31) and (32) that LCC-AADMM converts the exact solution into an approximate solution by performing inexact optimization with the help of the Hessian approximation matrix, which greatly reduces the computational cost and thereby improves the speed of convergence. Theoretically, it can be obviously found that the convergence performance of LCC-AADMM is better than MS-AADMM.
A comparison of the convergence performance of different methods is shown in Table 6. It can be seen that the R-SDL-BFGS algorithm obviously performs the best in terms of classification performance. And Table 7 shows that the R-SDL-BFGS method has a certain improvement over MS-AADMM in the efficiency of classification.
As can be seen from Table 7, the convergence rate of LCC-ADMM is 42.6666 % higher than that of RB-ADMM in the two types of classified datasets. In the 6-classification datasets, the convergence speed is improved by 39.1304 % on average. In the 10-classification datasets, the convergence rate is improved by 63.2057 % on average.

5.2.3. Overall Convergence Performance Analysis

The LCC-AADMM method divides the convex optimization problem of RELM into univariate subproblems that can be executed in parallel by using the maximum partitioning technique, which reduces the computational complexity in iterative updates. By introducing the R-SDL-BFGS algorithm, AADMM achieves inexact optimization with the help of the Hessian approximation matrix, which reduces the computational cost while maintaining a fast convergence speed.
Theoretically, LCC-AADMM usually has better convergence performance than other algorithms in solving classification problems. It can be seen from Table 6 that the LCC-AADMM algorithm has the fastest convergence speed. The difference in performance of these methods is also evident as the size of the error is varied. Under the same conditions, the LCC-AADMM algorithm always has the best convergence according to Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12.

5.3. Accuracy Analysis for LCC-AADMM

Classification accuracy is one of the important indicators to evaluate the performance of classification models and assesses the quality of the model. The fatal flaw of RELM is that it cannot be applied to large-scale distributed optimization problems owing to its high computational cost. In view of the above shortcomings, LCC-AADMM is adopted to decompose the convex optimization problem into a set of subproblems that can be executed in parallel, thereby achieving efficient classification performance. We performed experiments on the eight benchmark classification datasets in Table 2 using the MI-based, MS-AADMM, and proposed methods. The accuracy of the test results is listed in Figure 13.
Figure 13 shows the performance of different methods under the big data classification task. It is obvious that the LCC-AADMM methods consistently outperform the two competitive ELM algorithms on all eight datasets. The best overall performance is provided by the the proposed LCC-AADMMs as shown in Figure 12. In addition, the LCC-AADMM algorithm shows significant improvement over the best results obtained by the other two competitive ADMM methods, showing good performance in terms of classification accuracy and suitability for applications that require superior accuracy.
It can be concluded that the proposed method performs well on a wide variety of problems and does not require excessive computer time or storage compared with MI-based and MS-AADMM methods. In practice, this technique can be expected to provide good learning ability and satisfactory generalization performance.

6. Conclusions

In this paper, we consider implementing distributed learning through the effective solution of subproblems. That is, the regularized LS problem in the RELM is split into a set of optimization subproblems. To achieve high computational efficiency in solving subproblems, an efficient LCC-AADMM based on the R-LBFGS algorithm is proposed. The novelty of this method mainly lies in three aspects: (1) A SL-BFGS method is devised, which uses a limited amount of storage and updates the quasi-Newton matrix continuously; (2) random damping technology is proposed, which adopts a new strategy for determining the step size at each iteration and guarantees the positive definiteness of the BFGS matrix to achieve a learning ability with high quality; (3) based on the residual balancing scheme, an adaptive penalty factor selection strategy is applied to balance the relationship between the distance from convergence and the residuals and obtain good convergence.
The impact of this issue is demonstrated on eight benchmark dataset example problems. These experiments show that the proposed method achieves good performance in certain cases and converges faster than other ADMM methods. The high parallelism of LCC-AADMM is further demonstrated by comparison with an MI-based method. Therefore, the LCC-AADMM method offers a complementary alternative to optimization problems in large-scale applications.

Author Contributions

Conceptualization, K.W. and S.H.; methodology, K.W.; writing—original draft preparation, S.H. and K.W.; writing—review and editing, K.W., S.H. and T.R.; supervision, Z.W. and B.L.; project administration, K.W., Z.W. and T.R.; funding acquisition, Z.W. and T.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Public Welfare Technology Application and Research Projects in Zhejiang Province of China under Grants No. LQ23F030002; the “Ling Yan” Research and Development Project of Science and Technology Department under Grants No. 2022C03122; the State Key Laboratory of Industrial Control Technology, Zhejiang University, No.ICT2022B34; and Zhejiang Shuren University Basic Scientific Research Special Funds 2023XZ001.

Data Availability Statement

The data are fully open access [38,39]. The details can be found from http://archive.ics.uci.edu/ml/index.php and http://featureselection.asu.edu/datasets.php (accessed on 1 January 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shi, X.; Kang, Q.; An, J.; Zhou, M. Novel L1 Regularized Extreme Learning Machine for Soft-Sensing of an Industrial Process. IEEE Trans. Ind. Inform. 2022, 18, 1009–1017. [Google Scholar] [CrossRef]
  2. Zheng, Y.; Chen, B.; Wang, S. Mixture correntropy-based kernel extreme learning machines. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 811–825. [Google Scholar] [CrossRef] [PubMed]
  3. Luo, M.; Zhang, L.; Liu, J.; Guo, J.; Zheng, Q. Distributed extreme learning machine with alternating direction method of multiplier. Neurocomputing 2017, 261, 164–170. [Google Scholar] [CrossRef]
  4. Qing, Y.; Zeng, Y.; Li, Y. Deep and wide feature based extreme learning machine for image classification. Neurocomputing 2020, 412, 426–436. [Google Scholar] [CrossRef]
  5. Farahbakhsh, F.; Shahidinejad, A.; Ghobaei-Arani, M. Multiuser context-aware computation offloading in mobile edge computing based on Bayesian learning automata. Trans. Emerg. Telecommun. Technol. 2021, 32, e4127. [Google Scholar] [CrossRef]
  6. Masdari, M.; Gharehpasha, S.; Ghobaei-Arani, M.; Ghasemi, V. Bio-inspired virtual machine placement schemes in cloud computing environment: Taxonomy, review, and future research directions. Clust. Comput. 2020, 23, 2533–2563. [Google Scholar] [CrossRef]
  7. Xin, J.; Wang, Z.; Chen, C.; Ding, L.; Wang, G.; Zhao, Y. ELM*: Distributed extreme learning machine with MapReduce. World Wide Web 2014, 17, 1189–1204. [Google Scholar] [CrossRef]
  8. Wang, Y.; Dou, Y.; Liu, X.; Lei, Y. PR-ELM: Parallel regularized extreme learning machine based on cluster. Neurocomputing 2016, 173, 1073–1081. [Google Scholar] [CrossRef]
  9. Xin, J.; Wang, Z.; Qu, L.; Wang, G. Elastic extreme learning machine for big data classification. Neurocomputing 2015, 149, 464–471. [Google Scholar] [CrossRef]
  10. Duan, M.; Li, K.; Liao, X.; Li, K. A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2337–2351. [Google Scholar] [CrossRef]
  11. Wang, B.; Fang, J.; Duan, H.; Li, H. Graph Simplification-Aided ADMM for Decentralized Composite Optimization. IEEE Trans. Cybern. 2021, 51, 5170–5183. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, Z.; Huo, S.; Xiong, X.; Wang, K.; Liu, B. A Maximally Split and Adaptive Relaxed Alternating Direction Method of Multipliers for Regularized Extreme Learning Machines. Mathematics 2023, 11, 3198. [Google Scholar] [CrossRef]
  13. Xu, Z.; Figueiredo, M.A.T.; Yuan, X.; Studer, C.; Goldstein, T. Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  14. Wohlberg, B. ADMM Penalty Parameter Selection by Residual Balancing. 2017. Available online: http://xxx.lanl.gov/abs/1704.06209 (accessed on 20 April 2017).
  15. Lai, X.; Cao, J.; Huang, X.; Wang, T.; Lin, Z. A Maximally Split and Relaxed ADMM for Regularized Extreme Learning Machines. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 1899–1913. [Google Scholar] [CrossRef] [PubMed]
  16. Zhang, Y.; Dai, Y.; Wu, Q. Sparse and Outlier Robust Extreme Learning Machine Based on the Alternating Direction Method of Multipliers. Neural Process. Lett. 2023, 55, 9787–9809. [Google Scholar] [CrossRef]
  17. Xu, J.; Chao, M. An inertial Bregman generalized alternating direction method of multipliers for nonconvex optimization. J. Appl. Math. Comput. 2022, 68, 1–27. [Google Scholar] [CrossRef]
  18. Wang, X.; Yan, J.; Jin, B.; Li, W. Distributed and Parallel ADMM for Structured Nonconvex Optimization Problem. IEEE Trans. Cybern. 2021, 51, 4540–4552. [Google Scholar] [CrossRef] [PubMed]
  19. Li, Y.; Wang, R.; Fang, Y.; Sun, M.; Luo, Z. Alternating Direction Method of Multipliers for Convolutive Non-Negative Matrix Factorization. IEEE Trans. Cybern. 2023, 53, 7735–7748. [Google Scholar] [CrossRef]
  20. Wang, H.; Gao, Y.; Shi, Y.; Wang, R. Group-Based Alternating Direction Method of Multipliers for Distributed Linear Classification. IEEE Trans. Cybern. 2017, 47, 3568–3582. [Google Scholar] [CrossRef]
  21. Darvishi, M.T. A two-step high order Newton-like method for solving systems of nonlinear equations. Int. J. Pure Appl. Math. 2009, 57, 543–555. [Google Scholar]
  22. Babajee, D.K.R.; Dauhoo, M.Z.; Darvishi, M.T.; Karami, A.; Barati, A. Analysis of two Chebyshev-like third order methods free from second derivatives for solving systems of nonlinear equations. J. Comput. Appl. Math. 2010, 233, 2002–2012. [Google Scholar] [CrossRef]
  23. Coşğun, S.; Bilgin, E.; Çayören, M. Quasi-Newton-Based Inversion Method for Determining Complex Dielectric Permittivity of 3D Inhomogeneous Objects. IEEE Trans. Antennas Propag. 2020, 70, 4810–4817. [Google Scholar] [CrossRef]
  24. Al-Obaidi, R.H.; Darvishi, M.T. Constructing a Class of Frozen Jacobian Multi-Step Iterative Solvers for Systems of Nonlinear Equations. Mathematics 2022, 10, 2952. [Google Scholar] [CrossRef]
  25. Li, X.; Feng, F.; Zhang, J.; Zhang, W.; Zhang, Q. Advanced Simulation-Inserted Optimization Using Combined Quasi-Newton Method with Lagrangian Method for EM-Based Design Optimization. IEEE Trans. Microw. Theory Tech. 2022, 70, 3753–3764. [Google Scholar] [CrossRef]
  26. Wang, D.; Xu, X.; Yang, Y.; Zhang, T. A Quasi-Newton Quaternions Calibration Method for DVL Error Aided GNSS. IEEE Trans. Veh. Technol. 2021, 70, 2465–2477. [Google Scholar] [CrossRef]
  27. Byrd, R.H.; Hansen, S.L.; Nocedal, J.; Singer, Y. A Stochastic Quasi-Newton Method for Large-Scale Optimization, 2015. Available online: http://xxx.lanl.gov/abs/1401.7020 (accessed on 18 February 2015).
  28. Zhang, Q.; Huang, F.; Deng, C. Faster Stochastic Quasi-Newton Methods. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 4388–4397. [Google Scholar] [CrossRef] [PubMed]
  29. Chen, H.; Wu, H.; Chan, S.; Lam, W. A Stochastic Quasi-Newton Method for Large-Scale Nonconvex Optimization with Applications. Neurocomputing 2020, 31, 4776–4790. [Google Scholar] [CrossRef] [PubMed]
  30. Aryan, M.; Alejandro, R. Stochastic Quasi-Newton Methods. Proc. IEEE 2020, 108, 1906–1922. [Google Scholar]
  31. Zhang, X.; Liu, D.; Wang, X.; Zhang, X. Advanced Ellipse Fitting Algorithm Based on ADMM and Hybrid BFGS Method. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
  32. Azam, A.; Michael, O. Analysis of limited-memory BFGS on a class of nonsmooth convex functions. IMA J. Numer. Anal. 2020, 41, 1–27. [Google Scholar]
  33. Li, L.; Hu, J. Fast-Converging and Low-Complexity Linear Massive MIMO Detection with L-BFGS Method. IEEE Trans. Veh. Technol. 2022, 71, 10656–10665. [Google Scholar] [CrossRef]
  34. Yu, T.; Liu, X.; Dai, H.; Sun, J. A Minibatch Proximal Stochastic Recursive Gradient Algorithm Using a Trust-Region-Like Scheme and Barzilai–Borwein Stepsizes. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4627–4638. [Google Scholar] [CrossRef] [PubMed]
  35. Bastianello, N.; Carli, R.; Schenato, L. Asynchronous distributed optimization over lossy networks via relaxed admm: Stability and linear convergence. IEEE Trans. Autom. Control 2021, 66, 2620–2635. [Google Scholar] [CrossRef]
  36. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  37. Zhou, B.; Gao, L.; Dai, Y.H. Gradient methods with adaptive step-sizes. Comput. Optim. Appl. 2006, 35, 69–86. [Google Scholar] [CrossRef]
  38. Markelle, K.; Rachel, L.; Kolby, N. UCI Machine Learning Repository. 2023. Available online: https://archive.ics.uci.edu (accessed on 1 January 2023).
  39. Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature selection: A data perspective. ACM Comput. Surv. (CSUR) 2018, 50, 94. [Google Scholar] [CrossRef]
Figure 1. Performance on Branin.
Figure 1. Performance on Branin.
Mathematics 12 00043 g001
Figure 2. Performance on Levy Function N.13.
Figure 2. Performance on Levy Function N.13.
Mathematics 12 00043 g002
Figure 3. Performance on Matyas.
Figure 3. Performance on Matyas.
Mathematics 12 00043 g003
Figure 4. Performance on Three-Hump Camel.
Figure 4. Performance on Three-Hump Camel.
Mathematics 12 00043 g004
Figure 5. Convergence results on Gisette.
Figure 5. Convergence results on Gisette.
Mathematics 12 00043 g005
Figure 6. Convergence results on USPS.
Figure 6. Convergence results on USPS.
Mathematics 12 00043 g006
Figure 7. Convergence results on BASEHOCK.
Figure 7. Convergence results on BASEHOCK.
Mathematics 12 00043 g007
Figure 8. Convergence results on Magic.
Figure 8. Convergence results on Magic.
Mathematics 12 00043 g008
Figure 9. Convergence results on Pendigits.
Figure 9. Convergence results on Pendigits.
Mathematics 12 00043 g009
Figure 10. Convergence results on Optical-Digits.
Figure 10. Convergence results on Optical-Digits.
Mathematics 12 00043 g010
Figure 11. Convergence results on Statlog.
Figure 11. Convergence results on Statlog.
Mathematics 12 00043 g011
Figure 12. Convergence results on PCMAC.
Figure 12. Convergence results on PCMAC.
Mathematics 12 00043 g012
Figure 13. Accuracy of Test Results.
Figure 13. Accuracy of Test Results.
Mathematics 12 00043 g013
Table 2. Benchmark function description.
Table 2. Benchmark function description.
FunctionFunction Optimal ValueOptimal Solution
Branin Function0.389[3.1416, 2.2750]
Levy Function N.130[0, 0]
Matyas Function0[0, 0]
Three-Hump Camel Function0[0, 0]
Table 3. Datasets specifications.
Table 3. Datasets specifications.
DatasetAttributesTraining SamplesTesting SamplesClasses
Gisette5000560014002
USPS2567439185910
Magic1015,21638042
Pendigits168794219810
Optical-Digits644496112410
statlog36514812876
PCMAC328915553882
BASEHOCK486215954382
Table 4. Comparison of Convergence Speeds of Different Quasi-Newton Methods.
Table 4. Comparison of Convergence Speeds of Different Quasi-Newton Methods.
FunctionBFGSSD-BFGSR-SDL-BFGS
Branin Function15126
Levy Function N.13551
Matyas Function2011
Three-Hump Camel Function853
Table 5. Comparison of Convergence Speeds of BFGS based Methods.
Table 5. Comparison of Convergence Speeds of BFGS based Methods.
CEC Benchmark FunctionRatio (BFGS)Ratio (SD-BFGS)
Branin Function88.000081.2500
Levy Function N.1395.238088.8888
Matyas Function96.666666.6666
Three-Hump Camel Function75.000040.0000
Table 6. Comparison of Convergence Analysis Results.
Table 6. Comparison of Convergence Analysis Results.
DatasetRB-AADMMMS-AADMMLCC-AADMM
TimeIterationTimeIterationTimeIteration
Gisette737.524491233.4513395.0.9613
USPS2621.8914546152.57413139.107213
BASEHOCK332.813916749.0221412.626719
Magic1156.617867446.05992519.286614
Pendigits3912.3798742176.37533346.793614
Optical-Digits1825.9015538139.22354118.48589
statlog1626.432270492.01773916.849313
PCMAC353.241417533.3102151.409513
Table 7. Convergence Speed Improvement Ratio.
Table 7. Convergence Speed Improvement Ratio.
DatasetClassesRB-ADMMMS-AADMM
Gisette297.980831.5789
Magic294.698455.1724
BASEHOCK298.363629.6296
PCMAC295.400854.2857
statlog696.875039.1304
USPS1094.649468.9655
Pendigits1095.042562.5871
Optical-Digits1098.226458.0645
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, K.; Huo, S.; Liu, B.; Wang, Z.; Ren, T. An Adaptive Low Computational Cost Alternating Direction Method of Multiplier for RELM Large-Scale Distributed Optimization. Mathematics 2024, 12, 43. https://doi.org/10.3390/math12010043

AMA Style

Wang K, Huo S, Liu B, Wang Z, Ren T. An Adaptive Low Computational Cost Alternating Direction Method of Multiplier for RELM Large-Scale Distributed Optimization. Mathematics. 2024; 12(1):43. https://doi.org/10.3390/math12010043

Chicago/Turabian Style

Wang, Ke, Shanshan Huo, Banteng Liu, Zhangquan Wang, and Tiaojuan Ren. 2024. "An Adaptive Low Computational Cost Alternating Direction Method of Multiplier for RELM Large-Scale Distributed Optimization" Mathematics 12, no. 1: 43. https://doi.org/10.3390/math12010043

APA Style

Wang, K., Huo, S., Liu, B., Wang, Z., & Ren, T. (2024). An Adaptive Low Computational Cost Alternating Direction Method of Multiplier for RELM Large-Scale Distributed Optimization. Mathematics, 12(1), 43. https://doi.org/10.3390/math12010043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop