Privacy-Preserving Vertical Federated KNN Feature Imputation Method
Abstract
:1. Introduction
2. Problem Definition
2.1. Missing Data Formulation
2.2. Vertical Federated Imputation Settings
3. Vertical Federated k-Nearest Neighbors Feature Imputation
3.1. Fundamentals
3.2. Architecture
3.3. Algorithm
Algorithm 1 Federated process. represents party set, represents homomorphic encryption algorithm, Decry() represents decryptsion algorithms, Sort() represents distance sorting. |
|
3.4. Implementation
4. Experimental Section
4.1. Environment
4.2. Data
4.3. Results
4.3.1. Federated Comparative Experiment
4.3.2. Centralized Comparative Experiment
4.3.3. Lossless Testing
4.3.4. Contribution to Regression
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679 (accessed on 14 January 2024).
- Gaff, B.M.; Sussman, H.E.; Geetter, J. Privacy and big data. Computer 2014, 47, 7–9. [Google Scholar] [CrossRef]
- Misra, N.N.; Dixit, Y.; Al-Mallahi, A. IoT, big data, and artificial intelligence in agriculture and food industry. IEEE Internet Things J. 2020, 9, 6305–6324. [Google Scholar] [CrossRef]
- Yang, Q.; Liu, Y.; Chen, T. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Yao, A.C.C. How to generate and exchange secrets. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science, Toronto, ON, Canada, 27–29 October 1986; pp. 162–167. [Google Scholar]
- Clifton, C.; Kantarcioglu, M.; Vaidya, J. Tools for privacy preserving distributed data mining. ACM Sigkdd Explor. Newsl. 2002, 4, 28–34. [Google Scholar] [CrossRef]
- Dai, W.; Jin, H.; Zou, D. TEE: A virtual DRTM based execution environment for secure cloud-end computing. In Proceedings of the 17th ACM Conference on Computer and Communications Security, Dubai, United Arab Emirates, 4–8 October 2010; pp. 663–665. [Google Scholar]
- Konen, J.; Mcmahan, H.B.; Ramage, D. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv 2016, arXiv:1610.02527. [Google Scholar]
- Mcmahan, H.B.; Moore, E.; Ramage, D. Federated learning of deep networks using model averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]
- Konečný, J.; McMahan, H.B.; Yu, F.X. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
- Cheng, K.; Fan, T.; Jin, Y. Secureboost: A lossless federated learning framework. IEEE Intell. Syst. 2021, 36, 87–98. [Google Scholar] [CrossRef]
- Yang, K.; Fan, T.; Chen, T. A quasi-newton method based vertical federated learning framework for logistic regression. arXiv 2019, arXiv:1912.00513. [Google Scholar]
- Zhang, Q.; Wang, C.; Wu, H. GELU-Net: A Globally Encrypted, Locally Unencrypted Deep Neural Network for Privacy-Preserved Learning. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, SE, USA, 13–19 July 2018; pp. 3933–3939. [Google Scholar]
- Zhang, Y.; Zhu, H. Additively Homomorphical Encryption based Deep Neural Network for Asymmetrically Collaborative Machine Learning. arXiv 2020, arXiv:2007.06849. [Google Scholar]
- Zhou, X.; Liu, X.; Lan, G. Federated conditional generative adversarial nets imputation method for air quality missing data. Knowl.-Based Syst. 2021, 228, 107261. [Google Scholar] [CrossRef]
- Gkillas, A.; BLalos, A.S. Missing Data Imputation for Multivariate Time series in Industrial IoT: A Federated Learning Approach. In Proceedings of the 2022 IEEE 20th International Conference on Industrial Informatics (INDIN), Perth, Australia, 25–28 July 2022; pp. 87–94. [Google Scholar]
- FATE (Federated AI Technology Enabler). Available online: https://github.com/FederatedAI/FATE (accessed on 28 November 2023).
- Aittokallio, T. Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings Bioinform. 2010, 11, 253–264. [Google Scholar] [CrossRef] [PubMed]
- García-Laencina, P.J.; Sancho-Gómez, J. Pattern classification with missing data: A review. Neural Comput. Appl. 2010, 19, 263–282. [Google Scholar] [CrossRef]
- Armitage, E.G.; Godzien, J.; Alonso-Herranz, V. Missing value imputation strategies for metabolomics data. Electrophoresis 2015, 10, 3050–3060. [Google Scholar] [CrossRef] [PubMed]
- Aussem, A.; de Morais, S.R. A conservative feature subset selection algorithm with missing data. Neurocomputing 2010, 73, 585–590. [Google Scholar] [CrossRef]
- De Souto, M.C.; Jaskowiak, P.A.; Costa, I.G. Impact of missing data imputation methods on gene expression clustering and classification. BMC Bioinform. 2015, 16, 1–9. [Google Scholar] [CrossRef]
- Hron, K.; Templ, M.; Filzmoser, P. Imputation of missing values for compositional data using classical and robust methods. Comput. Stat. Data Anal. 2010, 54, 3095–3107. [Google Scholar] [CrossRef]
- Liu, C.C.; Dai, D.Q.; Yan, H. The theoretic framework of local weighted approximation for microarray missing value estimation. Pattern Recognit. 2010, 43, 2993–3002. [Google Scholar] [CrossRef]
- Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2019; p. 793. [Google Scholar]
- Graham, J.W. Missing Data Analysis: Making It Work in the Real World. Annu. Rev. Psychol. 2008, 60, 549–576. [Google Scholar] [CrossRef]
- Ding, Y.; Ross, A. A comparison of imputation methods for handling missing scores in biometric fusion. Pattern Recognit. 2012, 45, 919–933. [Google Scholar] [CrossRef]
- Kapelner, A.; Bleich, J. Prediction with missing data via Bayesian additive regression trees. Can. J. Stat. 2015, 43, 224–239. [Google Scholar] [CrossRef]
- Ding, Y.F.; Simonoff, J.S. An investigation of missing data methods for classification trees applied to binary response data. J. Mach. Learn. Res. 2010, 11, 142–149. [Google Scholar]
- Li, D.; Gu, H.; Zhang, L. A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst. Appl. 2010, 37, 6942–6947. [Google Scholar] [CrossRef]
- Bahmani, R.; Barbosa, M.; Brasser, F. Secure multiparty computation from SGX. In International Conference on Financial Cryptography and Data Security; Springer: Berlin/Heidelberg, Germany, 2017; pp. 477–497. [Google Scholar]
- Kaggle (Electric Motor Temperature). Available online: https://www.kaggle.com/datasets/wkirgsn/electric-motor-temperature (accessed on 26 April 2021).
- Lin, W.; Tsai, C.F. Missing value imputation: A review and analysis of the literature (2006–2017). Artif. Intell. Rev. 2020, 53, 1487–1509. [Google Scholar] [CrossRef]
- Troyanskaya, O.; Cantor, M.; Sherlock, G. Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17, 520–525. [Google Scholar] [CrossRef]
Features | Description | Features Type |
---|---|---|
idx | 138 h to 185 h of records | int |
pm | Permanent magnet temperature (in °C) measured with thermocouples | float |
stator_yoke | Stator yoke temperature (in °C) measured with thermocouples | float |
stator_tooth | Stator tooth temperature (in °C) measured with thermocouples | float |
stator_winding | Stator winding temperature (in °C) measured with thermocouples | float |
motor_speed (label) | Motor speed (in rpm) | float |
Features | Description | Features Type |
---|---|---|
idx | 138 h to 185 h of records | int |
ambient | Feed pump flow | float |
coolant | Coolant temperature (in °C) | float |
u_d | Voltage d-component measurement in dq-coordinates | float |
u_q | Voltage q-component measurement in dq-coordinates (in V) | float |
torque | Motor torque (in Nm) | float |
i_d | Current d-component measurement in dq-coordinates | float |
i_q | Current q-component measurement in dq-coordinates | float |
Missing Rate | Federated KNN | MAX | MIN | MEAN |
---|---|---|---|---|
10% | 0.3677 | 2.4136 | 2.3402 | 0.9861 |
20% | 0.3358 | 2.1008 | 2.1515 | 0.8188 |
30% | 0.3883 | 2.5465 | 2.2252 | 0.9650 |
Missing Rate | FKNN | MEAN | LR | KNN | RF |
---|---|---|---|---|---|
5% | 0.38509 | 1.06481 | 1.01250 | 0.83104 | 1.07105 |
10% | 0.33951 | 1.01277 | 0.95623 | 0.86715 | 1.02745 |
15% | 0.36231 | 1.04159 | 0.99785 | 0.92863 | 1.06034 |
20% | 0.43012 | 1.02356 | 0.97037 | 0.94683 | 1.03123 |
25% | 0.44781 | 1.05121 | 1.01357 | 1.05514 | 1.10138 |
30% | 0.48925 | 1.04509 | 1.00616 | 1.05207 | 1.08145 |
35% | 0.51976 | 1.02456 | 0.97432 | 1.08242 | 1.07756 |
40% | 0.53624 | 1.04242 | 0.99425 | 1.11375 | 1.06162 |
Missing Rates | 5% | 10% | 15% | 20% | 25% | 30% | 35% | 40% |
---|---|---|---|---|---|---|---|---|
p value | 0.6412 | 0.4416 | 0.7110 | 0.4836 | 0.1732 | 0.3608 | 0.9499 | 0.5784 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Du, W.; Wang, Y.; Meng, G.; Guo, Y. Privacy-Preserving Vertical Federated KNN Feature Imputation Method. Electronics 2024, 13, 381. https://doi.org/10.3390/electronics13020381
Du W, Wang Y, Meng G, Guo Y. Privacy-Preserving Vertical Federated KNN Feature Imputation Method. Electronics. 2024; 13(2):381. https://doi.org/10.3390/electronics13020381
Chicago/Turabian StyleDu, Wenyou, Yichen Wang, Guanglei Meng, and Yuming Guo. 2024. "Privacy-Preserving Vertical Federated KNN Feature Imputation Method" Electronics 13, no. 2: 381. https://doi.org/10.3390/electronics13020381
APA StyleDu, W., Wang, Y., Meng, G., & Guo, Y. (2024). Privacy-Preserving Vertical Federated KNN Feature Imputation Method. Electronics, 13(2), 381. https://doi.org/10.3390/electronics13020381