Open AccessArticle
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors
by
Jilin Zhang 1,2,3,4,5,†, Hangdi Tu 1,2,†, Yongjian Ren 1,2,*, Jian Wan 1,2,4,5, Li Zhou 1,2, Mingwei Li 1,2, Jue Wang 6, Lifeng Yu 7,8, Chang Zhao 1,2 and Lei Zhang 9
1
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
2
Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, Hangzhou 310018, China
3
College of Electrical Engineering, Zhejiang University, Hangzhou 310058, China
4
School of Information and Electronic engineering, Zhejiang University of Science & Technology, Hangzhou 310023, China
5
Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou 310018, China
6
Supercomputing Center of Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
7
Hithink RoyalFlush Information Network Co., Ltd., Hangzhou 310023, Zhejiang, China
8
Financial Information Engineering Technology Research Center of Zhejiang Province, Hangzhou 310023, China
9
Computer Science Department, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
†
The two authors Jilin Zhang and Hangdi Tu contribute equally to this paper, and they are co-first authors.
Cited by 10 | Viewed by 5843
Abstract
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes
[...] Read more.
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.
Full article
►▼
Show Figures