A Flexible Framework for Decentralized Composite Optimization with Compressed Communication
Abstract
:1. Introduction
- First, we propose a flexible framework termed as CE-DADMM for communication-efficient decentralized composite optimization problems. The framework not only encompasses some existing algorithms, such as COLA [32] and CC-DQM [39], but also introduces several new algorithms. Specifically, by incorporating quasi-Newton updates into CE-DADMM, we derive CE-DADMM-BFGS, the first communication-efficient decentralized second-order algorithm for composite optimization. Compared with CC-DQM, it avoids computing the Hessian matrix and its inversion, significantly reducing the computational cost. Compared with DRUID [22], CE-DADMM can reduce the communication cost due to the efficient communication scheme.
- Second, we theoretically prove that CE-DADMM can achieve exact linear convergence under the assumption of strong convexity by carefully analyzing the mixing error arisen by the efficient communication scheme and the disagreement of decision vectors. The dependency of the convergence rate on the parameters of the compression mechanism is also established. Additionally, sufficient numerical experiments are presented to substantiate the superior performance of our algorithms in terms of the communication efficiency.
2. Problem Setting
3. Algorithm Formulation
3.1. Background: ADMM-Based Algorithm
3.2. Communication-Efficient Decentralized ADMM
Algorithm 1 CE-DADMM | |
1: Initialization: , , , , , . | |
2: for t = 0,1,… do | |
3: for agent i do | |
4: Compute using (13), (14), or (15) according to its choice; | |
5: Compute using (11a); | |
6: | // Compressing information |
7: Broadcast to neighbors | |
8: | |
9: if then | // Dealing with the non-smooth function |
10: | |
11: | |
12: end if | |
13: end for | |
14: end for |
3.3. Discussion
4. Convergence Analysis
5. Numerical Experiments
5.1. Distributed Logistic Regression
5.2. Distributed Ridge Regression
5.3. Distributed LASSO
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Lemma 3
Appendix B. Proof of Theorem 1
References
- Olfati-Saber, R.; Fax, J.A.; Murray, R.M. Consensus and cooperation in networked multi-agent systems. Proc. IEEE 2007, 95, 215–233. [Google Scholar] [CrossRef]
- Yoo, S.J.; Park, B.S. Dynamic event-triggered prescribed-time consensus tracking of nonlinear time-delay multiagent systems by output feedback. Fractal Fract. 2024, 8, 545. [Google Scholar] [CrossRef]
- Liu, H.J.; Shi, W.; Zhu, H. Distributed voltage control in distribution networks: Online and robust implementations. IEEE Trans. Smart Grid 2017, 9, 6106–6117. [Google Scholar] [CrossRef]
- Molzahn, D.K.; Dorfler, F.; Sandberg, H.; Low, S.H.; Chakrabarti, S.; Baldick, R.; Lavaei, J. A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans. Smart Grid 2017, 8, 2941–2962. [Google Scholar] [CrossRef]
- Liu, Y.F.; Chang, T.H.; Hong, M.; Wu, Z.; So, A.M.C.; Jorswieck, E.A.; Yu, W. A survey of recent advances in optimization methods for wireless communications. IEEE J. Sel. Areas Commun. 2024, 42, 2992–3031. [Google Scholar] [CrossRef]
- Huang, J.; Zhou, S.; Tu, H.; Yao, Y.; Liu, Q. Distributed optimization algorithm for multi-robot formation with virtual reference center. IEEE/CAA J. Autom. Sin. 2022, 9, 732–734. [Google Scholar] [CrossRef]
- Yang, X.; Zhao, W.; Yuan, J.; Chen, T.; Zhang, C.; Wang, L. Distributed optimization for fractional-order multi-agent systems based on adaptive backstepping dynamic surface control technology. Fractal Fract. 2022, 6, 642. [Google Scholar] [CrossRef]
- Liu, J.; Zhang, C. Distributed learning systems with first-order methods. Found. Trends Databases 2020, 9, 1–100. [Google Scholar] [CrossRef]
- Nedic, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 2009, 54, 48–61. [Google Scholar] [CrossRef]
- Nedic, A.; Olshevsky, A.; Shi, W. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 2017, 27, 2597–2633. [Google Scholar] [CrossRef]
- Xu, J.; Zhu, S.; Soh, Y.C.; Xie, L. Convergence of asynchronous distributed gradient methods over stochastic networks. IEEE Trans. Autom. Control 2018, 63, 434–448. [Google Scholar] [CrossRef]
- Wen, X.; Luan, L.; Qin, S. A continuous-time neurodynamic approach and its discretization for distributed convex optimization over multi-agent systems. Neural Netw. 2021, 143, 52–65. [Google Scholar] [CrossRef]
- Feng, Z.; Xu, W.; Cao, J. Alternating inertial and overrelaxed algorithms for distributed generalized Nash equilibrium seeking in multi-player games. Fractal Fract. 2021, 5, 62. [Google Scholar] [CrossRef]
- Che, K.; Yang, S. A snapshot gradient tracking for distributed optimization over digraphs. In Proceedings of the CAAI International Conference on Artificial Intelligence, Beijing, China, 27–28 August 2022; pp. 348–360. [Google Scholar]
- Zhou, S.; Wei, Y.; Liang, S.; Cao, J. A gradient tracking protocol for optimization over Nabla fractional multi-agent systems. IEEE Trans. Signal Inf. Process. Over Netw. 2024, 10, 500–512. [Google Scholar] [CrossRef]
- Shi, W.; Ling, Q.; Wu, G.; Yin, W. EXTRA: An exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 2015, 25, 944–966. [Google Scholar] [CrossRef]
- Ling, Q.; Shi, W.; Wu, G.; Ribeiro, A. DLM: Decentralized linearized alternating direction method of multipliers. IEEE Trans. Signal Process. 2015, 63, 4051–4064. [Google Scholar] [CrossRef]
- Mokhtari, A.; Shi, W.; Ling, Q.; Ribeiro, A. DQM: Decentralized quadratically approximated alternating direction method of multipliers. IEEE Trans. Signal Process. 2016, 64, 5158–5173. [Google Scholar] [CrossRef]
- Eisen, M.; Mokhtari, A.; Ribeiro, A. A primal-dual quasi-Newton method for exact consensus optimization. IEEE Trans. Signal Process. 2019, 67, 5983–5997. [Google Scholar] [CrossRef]
- Mansoori, F.; Wei, E. A fast distributed asynchronous Newton-based optimization algorithm. IEEE Trans. Autom. Control 2019, 65, 2769–2784. [Google Scholar] [CrossRef]
- Jiang, X.; Qin, S.; Xue, X.; Liu, X. A second-order accelerated neurodynamic approach for distributed convex optimization. Neural Netw. 2022, 146, 161–173. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Voulgaris, P.G.; Stipanović, D.M.; Freris, N.M. Communication efficient curvature aided primal-dual algorithms for decentralized optimization. IEEE Trans. Autom. Control 2023, 68, 6573–6588. [Google Scholar] [CrossRef]
- Alistarh, D.; Grubic, D.; Li, J.Z.; Tomioka, R.; Vojnovic, M. QSGD: Communication-efficient SGD via gradient quantization and encoding. In Proceedings of the 30th NeurIPS, Long Beach, CA, USA, 4–9 December 2017; pp. 1710–1721. [Google Scholar]
- Wangni, J.; Wang, J.; Liu, J.; Zhang, T. Gradient sparsification for communication-efficient distributed optimization. In Proceedings of the 31st NeurIPS 2018, Montreal, QC, Canada, 2–8 December 2018; pp. 1306–1316. [Google Scholar]
- Stich, S.U.; Cordonnier, J.B.; Jaggi, M. Sparsified SGD with memory. In Proceedings of the 31st NeurIPS 2018, Montreal, QC, Canada, 2–8 December 2018; pp. 4447–4458. [Google Scholar]
- Doan, T.T.; Maguluri, S.T.; Romberg, J. Fast convergence rates of distributed subgradient methods with adaptive quantization. IEEE Trans. Autom. Control 2020, 66, 2191–2205. [Google Scholar] [CrossRef]
- Taheri, H.; Mokhtari, A.; Hassni, H.; Pedarsani, R. Quantized decentralized stochastic learning over directed graphs. In Proceedings of the 37th ICML, Virtual, 13–18 July 2020; pp. 9324–9333. [Google Scholar]
- Song, Z.; Shi, L.; Pu, S.; Yan, M. Compressed gradient tracking for decentralized optimization over general directed networks. IEEE Trans. Signal Process. 2022, 70, 1775–1787. [Google Scholar] [CrossRef]
- Xiong, Y.; Wu, L.; You, K.; Xie, L. Quantized distributed gradient tracking algorithm with linear convergence in directed networks. IEEE Trans. Autom. Control 2022, 68, 5638–5645. [Google Scholar] [CrossRef]
- Zhu, S.; Hong, M.; Chen, B. Quantized consensus ADMM for multi-agent distributed optimization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 4134–4138. [Google Scholar]
- Elgabli, A.; Park, J.; Bedi, A.S.; Issaid, C.B.; Bennis, M.; Aggarwal, V. Q-GADMM: Quantized group ADMM for communication efficient decentralized machine learning. IEEE Trans. Commun. 2020, 69, 164–181. [Google Scholar] [CrossRef]
- Li, W.; Liu, Y.; Tian, Z.; Ling, Q. Communication-censored linearized ADMM for decentralized consensus optimization. IEEE Trans. Signal Inf. Process. Over Netw. 2020, 6, 18–34. [Google Scholar] [CrossRef]
- Gao, L.; Deng, S.; Li, H.; Li, C. An event-triggered approach for gradient tracking in consensus-based distributed optimization. IEEE Trans. Netw. Sci. Eng. 2021, 9, 510–523. [Google Scholar] [CrossRef]
- Zhang, Z.; Yang, S.; Xu, W.; Di, K. Privacy-preserving distributed ADMM with event-triggered communication. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 2835–2847. [Google Scholar] [CrossRef]
- Chen, T.; Giannakis, G.; Sun, T.; Yin, W. LAG: Lazily aggregated gradient for communication-efficient distributed learning. Adv. Neural Inf. Process. Syst. 2018, 31, 5050–5060. [Google Scholar]
- Sun, J.; Chen, T.; Giannakis, G.; Yang, Z. Communication-efficient distributed learning via lazily aggregated quantized gradients. Adv. Neural Inf. Process. Syst. 2019, 32, 3370–3380. [Google Scholar]
- Singh, N.; Data, D.; George, J.; Diggavi, S. SPARQ-SGD: Event-triggered and compressed communication in decentralized optimization. IEEE Trans. Autom. Control 2022, 68, 721–736. [Google Scholar] [CrossRef]
- Yang, X.; Yuan, J.; Chen, T.; Yang, H. Distributed adaptive optimization algorithm for fractional high-order multiagent systems based on event-triggered strategy and input quantization. Fractal Fract. 2023, 7, 749. [Google Scholar] [CrossRef]
- Zhang, Z.; Yang, S.; Xu, W. Decentralized ADMM with compressed and event-triggered communication. Neural Netw. 2023, 165, 472–482. [Google Scholar] [CrossRef]
- Richtárik, P.; Sokolov, I.; Fatkhullin, I. EF21: A new, simpler, theoretically better, and practically faster error feedback. In Proceedings of the 34th NeurIPS, Virtual, 6–14 December 2021; pp. 4384–4396. [Google Scholar]
- Richtarik, P.; Sokolov, I.; Fatkhullin, I.; Gasanov, E.; Li, Z.; Gorbunov, E. 3PC: Three point compressors for communication-efficient distributed training and a better theory for Lazy aggregation. In Proceedings of the 39th ICML, Baltimore, MD, USA, 17–23 July 2022; pp. 18596–18648. [Google Scholar]
- Shi, W.; Ling, Q.; Wu, G.; Yin, W. A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Signal Process. 2015, 63, 6013–6023. [Google Scholar] [CrossRef]
- Alghunaim, S.; Yuan, K.; Sayed, A.H. A linearly convergent proximal gradient algorithm for decentralized optimization. In Proceedings of the 32nd NeurIPS, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Guo, L.; Shi, X.; Yang, S.; Cao, J. DISA: A dual inexact splitting algorithm for distributed convex composite optimization. IEEE Trans. Autom. Control 2024, 69, 2995–3010. [Google Scholar] [CrossRef]
- Li, W.; Liu, Y.; Tian, Z.; Ling, Q. COLA: Communication-censored linearized ADMM for decentralized consensus optimization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 5237–5241. [Google Scholar]
Distributed Logistic Regression | Distributed Ridge Regression | Distributed LASSO | ||||
---|---|---|---|---|---|---|
a9a | ijcnn1 | a9a | ijcnn1 | a9a | ijcnn1 | |
Convergence Error () |
Method | Distributed | Distributed | Distributed | |||
---|---|---|---|---|---|---|
Logistic Regression | Ridge Regression | LASSO | ||||
a9a | ijcnn1 | a9a | ijcnn1 | a9a | ijcnn1 | |
P2D2 | 468 | 811 | - | - | 4230 | 409 |
PG-EXTRA | 470 | 813 | - | - | 4231 | 410 |
CC-DQM | - | - | 246 | 79 | - | - |
DRUID-Gradient | 845 | 884 | 977 | 176 | 3348 | 445 |
CE-DADMM-Gradient:EF21 | 845 | 884 | 977 | 171 | 3348 | 446 |
CE-DADMM-Gradient:CLAG | 845 | 884 | 980 | 170 | 3349 | 446 |
DRUID-Newton | 154 | 559 | 197 | 60 | 1910 | 318 |
CE-DADMM-Newton:EF21 | 155 | 559 | 252 | 60 | 1912 | 318 |
CE-DADMM-Newton:CLAG | 155 | 559 | 194 | 64 | 1912 | 318 |
DRUID-BFGS | 684 | 566 | 330 | 77 | 2890 | 399 |
CE-DADMM-BFGS:EF21 | 478 | 567 | 325 | 73 | 2890 | 400 |
CE-DADMM-BFGS:CLAG | 478 | 566 | 327 | 69 | 2890 | 399 |
Method | Distributed | Distributed | Distributed | |||
---|---|---|---|---|---|---|
Logistic Regression | Ridge Regression | LASSO | ||||
a9a | ijcnn1 | a9a | ijcnn1 | a9a | ijcnn1 | |
P2D2 | 116,056,832 | 38,539,264 | - | - | 862,720,000 | 29,344,768 |
PG-EXTRA | 125,088,000 | 30,080,000 | - | - | 861,520,000 | 20,101,504 |
CC-DQM | - | - | 20,782,080 | 1,219,680 | - | - |
DRUID-Gradient | 146,340,480 | 27,382,784 | 169,200,768 | 5,451,776 | 579,820,032 | 13,784,320 |
CE-DADMM-Gradient:EF21 | 35,692,800 | 7,468,032 | 41,268,480 | 1,444,608 | 141,419,520 | 3,767,808 |
CE-DADMM-Gradient:CLAG | 35,650,560 | 7,459,584 | 41,352,960 | 1,427,712 | 141,419,520 | 3,759,360 |
DRUID-Newton | 26,670,336 | 17,315,584 | 34,117,248 | 1,858,560 | 330,781,440 | 9,850,368 |
CE-DADMM-Newton:EF21 | 6,547,200 | 4,722,432 | 10,644,480 | 506,880 | 80,762,880 | 2,686,464 |
CE-DADMM-Newton:CLAG | 6,504,960 | 4,713,984 | 8,152,320 | 532,224 | 80,720,640 | 2,678,016 |
DRUID-BFGS | 118,457,856 | 17,532,416 | 57,150,720 | 2,385,152 | 500,501,760 | 12,359,424 |
CE-DADMM-BFGS:EF21 | 20,190,720 | 4,790,016 | 13,728,000 | 616,704 | 122,073,600 | 3,379,200 |
CE-DADMM-BFGS:CLAG | 20,148,480 | 4,773,120 | 13,770,240 | 574,464 | 122,031,360 | 3,362,304 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chang, Z.; Zhang, Z.; Yang, S.; Cao, J. A Flexible Framework for Decentralized Composite Optimization with Compressed Communication. Fractal Fract. 2024, 8, 721. https://doi.org/10.3390/fractalfract8120721
Chang Z, Zhang Z, Yang S, Cao J. A Flexible Framework for Decentralized Composite Optimization with Compressed Communication. Fractal and Fractional. 2024; 8(12):721. https://doi.org/10.3390/fractalfract8120721
Chicago/Turabian StyleChang, Zhongyi, Zhen Zhang, Shaofu Yang, and Jinde Cao. 2024. "A Flexible Framework for Decentralized Composite Optimization with Compressed Communication" Fractal and Fractional 8, no. 12: 721. https://doi.org/10.3390/fractalfract8120721
APA StyleChang, Z., Zhang, Z., Yang, S., & Cao, J. (2024). A Flexible Framework for Decentralized Composite Optimization with Compressed Communication. Fractal and Fractional, 8(12), 721. https://doi.org/10.3390/fractalfract8120721