Effective Model Update for Adaptive Classification of Text Streams in a Distributed Learning Environment
Abstract
:1. Introduction
- We design a scalable classification model based on a distributed learning environment that enhances the parallelism of the model learning. Therefore, it can resolve the bottleneck that occurred during the learning process in the entire event stream classification pipeline (Section 3.1).
- Based on a distributed learning architecture, we propose two kinds of model update strategies: (1) the entire model update and (2) the partial model update. Because they have their distinguishing properties in terms of learning efficiency and model accuracy, they can be selectively chosen according to the needs of the target applications (Section 3.3 and Section 3.4).
- As the target classification models, we consider not only fully trainable language models based on CNN, RNN, and Bi-LSTM but also a pre-trained word-embedding model based on BERT. In particular, we identify the trainable partial modules that are commonly applied in the deep learning-based classification models (Section 3.2).
- We conduct extensive experiments using two real tweet datasets and show the effectiveness of the proposed update strategies. Specifically, the entire model update gradually improves the classification accuracy in the range of 28.96∼58.63% compared to the pre-trained offline model; the partial model update improves it in the range of 12.34∼50.92%, while significantly reducing the learning time by 69.35∼93.95% compared to entire model update strategy. We also confirm the scalability of the proposed distributed learning architecture by showing that compared to using a single worker node, the learning time decreases by 34.03% in the entire model update and by 45.21% in the partial model update, respectively, when using three worker nodes (Section 4).
2. Related Work
2.1. Data Stream Classification
2.2. Distributed Learning
2.3. Continual Learning
2.4. Distributed Online Learning
2.5. Summary
3. Proposed Method
3.1. Overall Framework
3.2. Classification Model
3.3. Entire Model Update
3.4. Partial Model Update
3.5. Application to Pre-Trained Embedding Model
4. Performance Evaluation
4.1. Datasets
4.1.1. CSI Dataset
4.1.2. Disaster Dataset
4.2. Experimental Methods and Environments
4.3. Experimental Results
4.3.1. Model Accuracy
4.3.2. Model Learning Time
4.3.3. Scalability on a Cluster
4.3.4. Case Study
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CNN | Convolutional Neural Network |
RNN | Recurrent Neural Network |
LSTM | Long Short-Term Memory |
NLP | Natural Language Processing |
BERT | Bidirectional Encoder Representations from Transformers |
CSI | Cyber-Security Intelligence |
Bi-LSTM | Bidirectional Long Short Term Memory |
References
- Weng, J.; Lee, B.S. Event detection in twitter. Proc. Int. Aaai Conf. Web Soc. Media 2011, 5, 401–408. [Google Scholar] [CrossRef]
- Batool, R.; Khattak, A.M.; Maqbool, J.; Lee, S. Precise tweet classification and sentiment analysis. In Proceedings of the 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS), Niigata, Japan, 16–20 June 2013; Volume 5, pp. 461–466. [Google Scholar]
- Shin, H.S.; Kwon, H.Y.; Ryu, S.J. A new text classification model based on contrastive word embedding for detecting cybersecurity intelligence in twitter. Electronics 2020, 9, 1527. [Google Scholar] [CrossRef]
- Kim, M.S.; Kwon, H.Y. Distributed Classification Model of Streaming Tweets based on Dynamic Model Update. In Proceedings of the 2022 IEEE International Conference on Big Data and Smart Computing (BigComp), Daegu, Republic of Korea, 17–20 January 2022; pp. 47–51. [Google Scholar]
- Nishida, K.; Hoshide, T.; Fujimura, K. Improving tweet stream classification by detecting changes in word probability. In Proceedings of the 35th international ACM SIGIR conference on Research and Development in Information Retrieval, Portland, OR, USA, 12–16 August 2020; pp. 971–980. [Google Scholar]
- Weiler, A.; Grossniklaus, M.; Scholl, M.H. Event identification and tracking in social media streaming data. In Proceedings of the EDBT/ICDT, Athens, Greece, 28 March 2014; pp. 282–287. [Google Scholar]
- Nguyen, D.T.; Jung, J.J. Real-time event detection on social data stream. Mob. Net. Appl. 2015, 20, 475–486. [Google Scholar] [CrossRef]
- Hasan, R.A.; Alhayali, R.A.I.; Zaki, N.D.; Ali, A.H. An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark. TELKOMNIKA (Telecommun. Comput. Electron. Control.) 2019, 17, 3086–3099. [Google Scholar] [CrossRef] [Green Version]
- Zyblewski, P.; Sabourin, R.; Woźniak, M. Data preprocessing and dynamic ensemble selection for imbalanced data stream classification. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2019; pp. 367–379. [Google Scholar]
- Krawczyk, B.; Cano, A. Adaptive Ensemble Active Learning for Drifting Data Stream Mining. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 2763–2771. [Google Scholar]
- Bermejo, U.; Almeida, A.; Bilbao-Jayo, A.; Azkune, G. Embedding-based real-time change point detection with application to activity segmentation in smart home time series data. Expert Syst. Appl. 2021, 185, 115641. [Google Scholar] [CrossRef]
- Malialis, K.; Panayiotou, C.G.; Polycarpou, M.M. Nonstationary data stream classification with online active learning and siamese neural networks. Neurocomputing 2022, 512, 235–252. [Google Scholar] [CrossRef]
- Wang, J.; Kolar, M.; Srebro, N.; Zhang, T. Efficient distributed learning with sparsity. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 3636–3645. [Google Scholar]
- Wang, S.; Tuor, T.; Salonidis, T.; Leung, K.K.; Makaya, C.; He, T.; Chan, K. Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 2019, 37, 1205–1221. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Ning, Y.; Slawski, M.; Rangwala, H. Asynchronous online federated learning for edge devices with non-iid data. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 15–24. [Google Scholar]
- Wang, Y.; Guo, L.; Zhao, Y.; Yang, J.; Adebisi, B.; Gacanin, H.; Gui, G. Distributed learning for automatic modulation classification in edge devices. IEEE Wirel. Commun. Lett. 2020, 9, 2177–2181. [Google Scholar] [CrossRef]
- Hsieh, K.; Phanishayee, A.; Mutlu, O.; Gibbons, P. The non-iid data quagmire of decentralized machine learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 4387–4398. [Google Scholar]
- Abad, M.S.H.; Ozfatura, E.; Gunduz, D.; Ercetin, O. Hierarchical federated learning across heterogeneous cellular networks. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 8866–8870. [Google Scholar]
- Cha, H.; Park, J.; Kim, H.; Bennis, M.; Kim, S.L. Proxy experience replay: Federated distillation for distributed reinforcement learning. IEEE Intell. Syst. 2020, 35, 94–101. [Google Scholar] [CrossRef]
- Park, J.; Samarakoon, S.; Elgabli, A.; Kim, J.; Bennis, M.; Kim, S.L.; Debbah, M. Communication-efficient and distributed learning over wireless networks: Principles and applications. Proc. IEEE 2021, 109, 796–819. [Google Scholar] [CrossRef]
- Jiang, Y.; Wang, S.; Valls, V.; Ko, B.J.; Lee, W.H.; Leung, K.K.; Tassiulas, L. Model pruning enables efficient federated learning on edge devices. IEEE Trans. Neural Net. Learn. Syst. 2022, 1–13. [Google Scholar] [CrossRef]
- Tekin, C.; Van Der Schaar, M. Distributed online learning via cooperative contextual bandits. IEEE Trans. Signal Process. 2015, 63, 3700–3714. [Google Scholar] [CrossRef]
- Zhang, W.; Zhao, P.; Zhu, W.; Hoi, S.C.; Zhang, T. Projection-free distributed online learning in networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 4054–4062. [Google Scholar]
- Li, C.; Zhou, P.; Xiong, L.; Wang, Q.; Wang, T. Differentially private distributed online learning. IEEE Trans. Knowl. Data Eng. 2018, 30, 1440–1453. [Google Scholar] [CrossRef] [PubMed]
- Paternain, S.; Lee, S.; Zavlanos, M.M.; Ribeiro, A. Distributed constrained online learning. IEEE Trans. Signal Process. 2020, 68, 3486–3499. [Google Scholar] [CrossRef]
- Wu, Y.C.; Lin, C.; Quek, T.Q. A Robust Distributed Hierarchical Online Learning Approach for Dynamic MEC Networks. IEEE J. Sel. Areas Commun. 2021, 40, 641–656. [Google Scholar] [CrossRef]
- Mittal, V.; Kashyap, I. Empirical study of impact of various concept drifts in data stream mining methods. Int. J. Intell. Syst. Appl. 2016, 8, 65. [Google Scholar] [CrossRef]
- Ed-daoudy, A.; Maalmi, K. Application of machine learning model on streaming health data event in real-time to predict health status using spark. In Proceedings of the 2018 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Rabat, Morocco, 21–23 November 2018; pp. 1–4. [Google Scholar]
- Gupta, O.; Raskar, R. Distributed learning of deep neural network over multiple agents. J. Netw. Comput. Appl. 2018, 116, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Huang, Z.; Hu, R.; Guo, Y.; Chan-Tin, E.; Gong, Y. DP-ADMM: ADMM-based distributed learning with differential privacy. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1002–1012. [Google Scholar] [CrossRef] [Green Version]
- Gao, Z.; Gama, F.; Ribeiro, A. Wide and deep graph neural network with distributed online learning. IEEE Trans. Signal Process. 2022, 70, 3862–3877. [Google Scholar] [CrossRef]
- Zaharia, M.; Chowdhury, M.; Franklin, M.J.; Shenker, S.; Stoica, I. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10), Boston, MA, USA, 22–25 June 2010. [Google Scholar]
- Dünner, C.; Parnell, T.; Atasu, K.; Sifalakis, M.; Pozidis, H. Understanding and optimizing the performance of distributed machine learning applications on apache spark. In Proceedings of the 2017 IEEE International Conference on Big Data (big data), Boston, MA, USA, 11–14 December 2017; pp. 331–338. [Google Scholar]
- Zhao, S.Y.; Xiang, R.; Shi, Y.H.; Gao, P.; Li, W.J. Scope: Scalable composite optimization for learning on spark. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Alkhoury, F.; Wegener, D.; Sylla, K.H.; Mock, M. Communication efficient distributed learning of neural networks in Big Data environments using Spark. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 3871–3877. [Google Scholar]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef]
- Zenke, F.; Poole, B.; Ganguli, S. Continual learning through synaptic intelligence. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 3987–3995. [Google Scholar]
- Mirzadeh, S.I.; Farajtabar, M.; Pascanu, R.; Ghasemzadeh, H. Understanding the role of training regimes in continual learning. Adv. Neural Inf. Process. Syst. 2020, 33, 7308–7320. [Google Scholar]
- Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rebuffi, S.A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2001–2010. [Google Scholar]
- Castro, F.M.; Marín-Jiménez, M.J.; Guil, N.; Schmid, C.; Alahari, K. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 233–248. [Google Scholar]
- Chaudhry, A.; Ranzato, M.; Rohrbach, M.; Elhoseiny, M. Efficient lifelong learning with a-gem. arXiv 2018, arXiv:1812.00420. [Google Scholar]
- Wang, Z.; Mehta, S.V.; Póczos, B.; Carbonell, J. Efficient meta lifelong-learning with limited memory. arXiv 2020, arXiv:2010.02500. [Google Scholar]
- Shin, H.; Lee, J.K.; Kim, J.; Kim, J. Continual learning with deep generative replay. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 2990–2999. [Google Scholar]
- Wang, L.; Yang, K.; Li, C.; Hong, L.; Li, Z.; Zhu, J. Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference, 9–25 June 2021; pp. 5383–5392. [Google Scholar]
- Rusu, A.A.; Rabinowitz, N.C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; Hadsell, R. Progressive neural networks. arXiv 2016, arXiv:1606.04671. [Google Scholar]
- Mallya, A.; Lazebnik, S. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7765–7773. [Google Scholar]
- Mallya, A.; Davis, D.; Lazebnik, S. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 67–82. [Google Scholar]
- Rebuffi, S.A.; Bilen, H.; Vedaldi, A. Efficient parametrization of multi-domain deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8119–8127. [Google Scholar]
- Ashfahani, A.; Pratama, M. Autonomous deep learning: Continual learning approach for dynamic environments. In Proceedings of the 2019 SIAM International Conference on Data Mining, Calgary, AB, Canada, 2–4 May 2019; pp. 666–674. [Google Scholar]
- Yoon, J.; Jeong, W.; Lee, G.; Yang, E.; Hwang, S.J. Federated continual learning with weighted inter-client transfer. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–14 August 2021; pp. 12073–12086. [Google Scholar]
- Cano, A.; Krawczyk, B. ROSE: Robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach. Learn. 2022, 111, 2561–2599. [Google Scholar] [CrossRef]
- Ruder, S.; Plank, B. Strong baselines for neural semi-supervised learning under domain shift. arXiv 2018, arXiv:1804.09530. [Google Scholar]
- Yoo, D.; Kweon, I.S. Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 93–102. [Google Scholar]
- Smith, J.; Taylor, C.; Baer, S.; Dovrolis, C. Unsupervised progressive learning and the STAM architecture. arXiv 2019, arXiv:1904.02021. [Google Scholar]
- Aghdam, H.H.; Gonzalez-Garcia, A.; Weijer, J.v.d.; López, A.M. Active learning for deep detection neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3672–3680. [Google Scholar]
- Tiwari, P.; Uprety, S.; Dehdashti, S.; Hossain, M.S. TermInformer: Unsupervised term mining and analysis in biomedical literature. Neural Comput. Appl. 2020, 1–14. [Google Scholar] [CrossRef]
- Ashfahani, A.; Pratama, M. Unsupervised Continual Learning in Streaming Environments. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–12. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Zhu, Y.; Kiros, R.; Zemel, R.; Salakhutdinov, R.; Urtasun, R.; Torralba, A.; Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 19–27. [Google Scholar]
- You, Y.; Li, J.; Reddi, S.; Hseu, J.; Kumar, S.; Bhojanapalli, S.; Song, X.; Demmel, J.; Keutzer, K.; Hsieh, C.J. Large batch optimization for deep learning: Training bert in 76 minutes. arXiv 2019, arXiv:1904.00962. [Google Scholar]
- Chen, X.; Cheng, Y.; Wang, S.; Gan, Z.; Wang, Z.; Liu, J. Earlybert: Efficient bert training via early-bird lottery tickets. arXiv 2020, arXiv:2101.00063. [Google Scholar]
- Apronti, P.T.; Osamu, S.; Otsuki, K.; Kranjac-Berisavljevic, G. Education for disaster risk reduction (DRR): Linking theory with practice in Ghana’s basic schools. Sustainability 2015, 7, 9160–9186. [Google Scholar] [CrossRef]
Number of Tweets | Ingestion Time | Learning Time | Inference Time |
---|---|---|---|
50,000 | 266 s | 1120 s | 0.0671 s |
100,000 | 523 s | 2467 s | 0.0784 s |
150,000 | 751 s | 4076 s | 0.0854 s |
200,000 | 1094 s | 5604 s | 0.1345 s |
Papers | Streaming Classification | Distributed Learning | Dynamic Model Update | Model Learning Efficiency |
---|---|---|---|---|
[5] | O | O | ||
[7] | O | |||
[12] | O | O | ||
[14] | O | O | O | |
[20] | O | O | ||
[23] | O | O | O | |
[29] | O | |||
[30] | O | O | ||
[40] | O | O | ||
[48] | O | |||
[51] | O | O | ||
[52] | O | O | O | |
Our model | O | O | O | O |
Dataset Name | Total Data Size | Number of Time Windows | Target Written Duration (Years) |
---|---|---|---|
CSI Dataset | 1,000,000 tweets | 6 | 2007 ∼ 2015 |
Disaster Dataset | 1,400,000 tweets | 5 | 2007 ∼ 2015 |
1st Time Window | 2nd Time Window | 3rd Time Window | 4th Time Window | 5th Time Window | 6th Time Window | ||
---|---|---|---|---|---|---|---|
CNN | Entire | 489.35 | 530.83 | 790.23 | 1040.21 | 1250.54 | 1560.38 |
Partial | 130.13 | 143.62 | 131.99 | 156.23 | 139.37 | 147.01 | |
RNN | Entire | 300.1 | 440.63 | 532.44 | 640.34 | 784.78 | 838.81 |
Partial | 91.4 | 97.17 | 86.66 | 78.28 | 75.91 | 73.26 | |
Bi-LSTM | Entire | 1550.45 | 1750.27 | 2250.17 | 3240.83 | 4010.45 | 4630.2 |
Partial | 1430.59 | 1382.2 | 1357.06 | 1436.02 | 1402.12 | 1419.21 | |
BERT | Partial | 60.36 | 62.27 | 70.23 | 51.29 | 54.33 | 58.15 |
1st Time Window | 2nd Time Window | 3rd Time Window | 4th Time Window | 5th Time Window | ||
---|---|---|---|---|---|---|
CNN | Entire | 210.16 | 745.88 | 1140.34 | 1754.71 | 2165.9 |
Partial | 138.45 | 131.62 | 127.17 | 134.8 | 131.01 | |
RNN | Entire | 173.3 | 517.62 | 820.29 | 1209.09 | 1876.43 |
Partial | 116.17 | 112.53 | 109.8 | 128.41 | 115.69 | |
Bi-LSTM | Entire | 1840.23 | 4073.32 | 7580.61 | 8073.73 | 12587.11 |
Partial | 1770.3 | 1520.13 | 1646.1 | 1602.36 | 1457.14 | |
BERT | Partial | 159.66 | 127.52 | 133.68 | 149.84 | 130.62 |
Update Strategy | Number of Worker Nodes | Learning Time | Inference Time |
---|---|---|---|
Entire Model Update | 1 | 1834.23 | 0.1599 |
2 | 1587.05 | 0.0967 | |
3 | 1210.04 | 0.0681 | |
Partial Model Update | 1 | 1668.87 | 0.1362 |
2 | 1424.88 | 0.0857 | |
3 | 914.30 | 0.0676 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, M.-S.; Lim, B.-Y.; Lee, K.; Kwon, H.-Y. Effective Model Update for Adaptive Classification of Text Streams in a Distributed Learning Environment. Sensors 2022, 22, 9298. https://doi.org/10.3390/s22239298
Kim M-S, Lim B-Y, Lee K, Kwon H-Y. Effective Model Update for Adaptive Classification of Text Streams in a Distributed Learning Environment. Sensors. 2022; 22(23):9298. https://doi.org/10.3390/s22239298
Chicago/Turabian StyleKim, Min-Seon, Bo-Young Lim, Kisung Lee, and Hyuk-Yoon Kwon. 2022. "Effective Model Update for Adaptive Classification of Text Streams in a Distributed Learning Environment" Sensors 22, no. 23: 9298. https://doi.org/10.3390/s22239298
APA StyleKim, M.-S., Lim, B.-Y., Lee, K., & Kwon, H.-Y. (2022). Effective Model Update for Adaptive Classification of Text Streams in a Distributed Learning Environment. Sensors, 22(23), 9298. https://doi.org/10.3390/s22239298