GenAI-Based Digital Twins Aided Data Augmentation Increases Accuracy in Real-Time Cokurtosis-Based Anomaly Detection of Wearable Data
Abstract
1. Introduction
2. Methodology
- The integration of real user wearable data with streaming data, GenAI-WGAN for digital twins and synthetic twins, and fourth-order moment-based anomaly detection offers a new perspective on healthcare data analysis while addressing privacy concerns.
- Validation of digital twins and synthetic datasets by comparing their statistical signatures with those of real datasets, ensuring the reliability and accuracy of our generated data.
- Enhancement of real user wearable datasets by augmenting generated digital twins and synthetic datasets, thereby enriching the data available for analysis.
- Implementing an anomaly detection methodology based on the fourth-order moments of physiological parameters. This approach is beneficial as it can be applied to both univariate and multivariate datasets, accommodating multiple physiological parameters for comprehensive analysis.
2.1. Data Collection
2.2. Wasserstein Generative Adversarial Networks (GenAI)
2.3. Stream Data Processing
2.4. Anomaly Detection
Anomaly Detection in Health Data
2.5. Model Evaluation
3. Results and Discussion
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
WGANs | Wasserstein Generative Adversarial Networks |
DT | Digital twins |
HOSVD | Higher-order singular value decomposition |
FMMs | Feature moment metrics |
OHR | Overall heart rate |
AHR | Active heart rate |
RHR | Resting heart rate |
FNR | False negative rate |
CDC | Centers for Disease Control and Prevention’s |
bpm | Beats per minute |
HD | Hellinger distance |
LHS | Latin Hypercube Sampling |
SU | synthetic users |
RD | Real data |
SD | Synthetic data |
ACT | Average computing time |
References
- Murray, J.; Cohen, A. Infectious Disease Surveillance. In International Encyclopedia of Public Health, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2017; Volume 4, pp. 222–229. [Google Scholar] [CrossRef]
- Liu, K.; Allahyari, M.; Salinas, J.; Zgheib, N.; Balachandar, S. Investigation of theoretical scaling laws using large eddy simulations for airborne spreading of viral contagion from sneezing and coughing. Phys. Fluids 2021, 33, 063318. [Google Scholar] [CrossRef] [PubMed]
- Liu, K.; Allahyari, M.; Salinas, J.S.; Zgheib, N.; Balachandar, S. Peering inside a cough or sneeze to explain enhanced airborne transmission under dry weather. Sci. Rep. 2021, 11, 9826. [Google Scholar] [CrossRef]
- Salinas, J.S.; Krishnaprasad, K.A.; Zgheib, N.; Balachandar, S. Improved guidelines of indoor airborne transmission taking into account departure from the well-mixed assumption. Phys. Rev. Fluids 2022, 7, 064309. [Google Scholar] [CrossRef]
- Radin, J.M.; Wineinger, N.E.; Topol, E.J.; Steinhubl, S.R. Harnessing wearable device data to improve state-level real-time surveillance of influenza-like illness in the USA: A population-based study. Lancet Digit. Health 2020, 2, e85–e93. [Google Scholar] [CrossRef]
- Zhu, G.; Li, J.; Meng, Z.; Yu, Y.; Li, Y.; Tang, X.; Dong, Y.; Sun, G.; Zhou, R.; Wang, H.; et al. Learning from large-scale wearable device data for predicting epidemics trend of COVID-19. Discret. Dyn. Nat. Soc. 2020, 2020, 6152041. [Google Scholar] [CrossRef]
- Mishra, T.; Wang, M.; Metwally, A.A.; Bogu, G.K.; Brooks, A.W.; Bahmani, A.; Alavi, A.; Celli, A.; Higgs, E.; Dagan-Rosenfeld, O.; et al. Pre-symptomatic detection of COVID-19 from smartwatch data. Nat. Biomed. Eng. 2020, 4, 1208–1220. [Google Scholar] [CrossRef]
- Alavi, A.; Bogu, G.K.; Wang, M.; Rangan, E.S.; Brooks, A.W.; Wang, Q.; Higgs, E.; Celli, A.; Mishra, T.; Metwally, A.A.; et al. Real-time alerting system for COVID-19 and other stress events using wearable data. Nat. Med. 2022, 28, 175–184. [Google Scholar] [CrossRef]
- Loperfido, N. Kurtosis-based projection pursuit for outlier detection in financial time series. Eur. J. Financ. 2020, 26, 142–164. [Google Scholar] [CrossRef]
- Patcha, A.; Park, J.M. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw. 2007, 51, 3448–3470. [Google Scholar] [CrossRef]
- Zamini, M.; Hasheminejad, S. A comprehensive survey of anomaly detection in banking, wireless sensor networks, social networks, and healthcare. Intell. Decis. Technol. 2019, 13, 229–270. [Google Scholar] [CrossRef]
- Konduri, A.; Kolla, H.; Kegelmeyer, W.P.; Shead, T.M.; Ling, J.; Davis, W.L. Anomaly detection in scientific data using joint statistical moments. J. Comput. Phys. 2019, 387, 522–538. [Google Scholar] [CrossRef]
- Salinas, J.S.; Kolla, H.; Rieth, M.; Jung, K.S.; Chen, J.; Bennett, J.; Arienti, M.; Esclapez, L.; Day, M.; Marsaglia, N.; et al. In situ multi-tier auto-ignition detection applied to dual-fuel combustion simulations. Combust. Flame 2025, 279, 114273. [Google Scholar] [CrossRef]
- Ping, H.; Stoyanovich, J.; Howe, B. Data-synthesizer: Privacy-preserving synthetic datasets. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, 27–29 June 2017. [Google Scholar]
- Xin, B.; Yang, W.; Geng, Y.; Chen, S.; Wang, S.; Huang, L. Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 2927–2931. [Google Scholar]
- Reiter, J.P. Using CART to generate partially synthetic public use microdata. J. Off. Stat. 2005, 21, 441–462. [Google Scholar]
- Zhang, J.; Cormode, G.; Procopiuc, C.M.; Srivastava, D.; Xiao, X. Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst. (TODS) 2017, 42, 1–41. [Google Scholar] [CrossRef]
- Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling Tabular data using Conditional GAN. arXiv 2019, arXiv:1907.00503. [Google Scholar] [CrossRef]
- Yoon, J.; Jarrett, D.; Schaar, M. Time-series generative adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: New York, NY, USA, 2019; Volume 32. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. Available online: https://proceedings.neurips.cc/paper_files/paper/2014/file/f033ed80deb0234979a61f95710dbe25-Paper.pdf (accessed on 1 September 2025).
- Ehrhart, M.; Resch, B.; Havas, C.; Niederseer, D. A Conditional GAN for Generating Time Series Data for Stress Detection in Wearable Physiological Sensor Data. Sensors 2022, 22, 5969. [Google Scholar] [CrossRef]
- Wearable Sensor Health Data Sets Consists for Steps, Heart Rate, and Sleep. 2023. Available online: https://storage.googleapis.com/gbsc-gcp-project-ipop_public/COVID-19/COVID-19-Wearables.zip (accessed on 1 September 2025).
- Wearable Sensor Health Data Sets Consists for Steps, Heart Rate, and Sleep. 2023. Available online: https://static-content.springer.com/esm/art:10.1038%2Fs41551-020-00640-6/MediaObjects/41551_2020_640_MOESM3_ESM.xlsx (accessed on 1 September 2025).
- Yi, X.; Walia, E.; Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image Anal. 2019, 58, 101552. [Google Scholar] [CrossRef] [PubMed]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://dl.acm.org/doi/10.5555/3295222.3295327 (accessed on 1 September 2025).
- Papadimitriou, S.; Sun, J.; Faloutsos, C. Streaming Pattern Discovery in Multiple Time-Series; ACM: New York, NY, USA, 2005. [Google Scholar]
- Zhou, Y.; Ren, H.; Li, Z.; Pedrycz, W. An anomaly detection framework for time series data: An interval-based approach. Knowl. Based Syst. 2021, 228, 107153. [Google Scholar] [CrossRef]
- Yu, Y.; Zhu, Y.; Li, S.; Wan, D. Time series outlier detection based on sliding window prediction. Math. Probl. Eng. 2014, 2014, 879736. [Google Scholar] [CrossRef]
- Dasgupta, D.; Forrest, S. Novelty detection in time series data using ideas from immunology. In Proceedings of the International Conference on Intelligent Systems, Reno, NV, USA, 19–21 June 1996; pp. 82–87. [Google Scholar]
- Keogh, E.; Lonardi, S.; Chiu, B.C. Finding surprising patterns in a time series database in linear time and space. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AL, Canada, 23–26 July 2002; pp. 550–556. [Google Scholar]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 15–18 May 2000; SIGMOD ’00. pp. 93–104. [Google Scholar] [CrossRef]
- Ahrens, J.; Arienti, M.; Ayachit, U.; Bennett, J.; Binyahib, R.; Biswas, A.; Bremer, P.T.; Brugger, E.; Bujack, R.; Carr, H.; et al. The ECP ALPINE project: In situ and post hoc visualization infrastructure and analysis capabilities for exascale. Int. J. High Perform. Comput. Appl. 2024, 39, 10943420241286521. [Google Scholar] [CrossRef]
- Cissoko, M.B.H.; Castelain, V.; Lachiche, N. Multi-Way adaptive Time Aware LSTM for irregularly collected sequential ICU data. Expert Syst. Appl. 2025, 261, 125548. [Google Scholar] [CrossRef]
- Xu, H.; Pang, G.; Wang, Y.; Wang, Y. Deep Isolation Forest for Anomaly Detection. IEEE Trans. Knowl. Data Eng. 2023, 35, 12591–12604. [Google Scholar] [CrossRef]
- Ostchega, Y. Resting Pulse Rate Reference Data for Children, Adolescents, and Adults: United States, 1999–2008; Number 41; US Department of Health and Human Services, Centers for Disease Control and and Prevention, National Center for Health Statistics: Washington, DC, USA, 2012. [Google Scholar]
- Piascik, B.; Vickers, J.; Lowry, D.; Scotti, S.; Stewart, J.; Calomino, A. Materials, Structures, Mechanical Systems, and Manufacturing Roadmap; NASA: Washington, DC, USA, 2012; pp. 12–22.
- Glaessgen, E.; Stargel, D. The digital twin paradigm for future NASA and US Air Force vehicles. In Proceedings of the 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference 20th AIAA/ASME/AHS Adaptive Structures Conference 14th AIAA, Honolulu, HI, USA, 23–26 April 2012; p. 1818. [Google Scholar]
- Angulo, C.; Gonzalez-Abril, L.; Raya, C.; Ortega, J.A. A Proposal to Evolving Towards Digital Twins in Healthcare. In Proceedings of the Bioinformatics and Biomedical Engineering, Granada, Spain, 6–8 May 2020; Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F., Eds.; Springer: Cham, Switzerland, 2020; pp. 418–426. [Google Scholar]
- Sahal, R.; Alsamhi, S.H.; Brown, K.N. Personal digital twin: A close look into the present and a step towards the future of personalised healthcare industry. Sensors 2022, 22, 5918. [Google Scholar] [CrossRef]
- Katsoulakis, E.; Wang, Q.; Wu, H.; Shahriyari, L.; Fletcher, R.; Liu, J.; Achenie, L.; Liu, H.; Jackson, P.; Xiao, Y.; et al. Digital twins for health: A scoping review. NPJ Digit. Med. 2024, 7, 77. [Google Scholar] [CrossRef] [PubMed]
- McKay, M.D.; Beckman, R.J.; Conover, W.J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 2000, 42, 55–61. [Google Scholar] [CrossRef]
- Iman, R.L.; Helton, J.C.; Campbell, J.E. An approach to sensitivity analysis of computer models: Part I—Introduction, input variable selection and preliminary variable assessment. J. Qual. Technol. 1981, 13, 174–183. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kamruzzaman, M.; Salinas, J.S.; Kolla, H.; Sale, K.L.; Balakrishnan, U.; Poorey, K. GenAI-Based Digital Twins Aided Data Augmentation Increases Accuracy in Real-Time Cokurtosis-Based Anomaly Detection of Wearable Data. Sensors 2025, 25, 5586. https://doi.org/10.3390/s25175586
Kamruzzaman M, Salinas JS, Kolla H, Sale KL, Balakrishnan U, Poorey K. GenAI-Based Digital Twins Aided Data Augmentation Increases Accuracy in Real-Time Cokurtosis-Based Anomaly Detection of Wearable Data. Sensors. 2025; 25(17):5586. https://doi.org/10.3390/s25175586
Chicago/Turabian StyleKamruzzaman, Methun, Jorge S. Salinas, Hemanth Kolla, Kenneth L. Sale, Uma Balakrishnan, and Kunal Poorey. 2025. "GenAI-Based Digital Twins Aided Data Augmentation Increases Accuracy in Real-Time Cokurtosis-Based Anomaly Detection of Wearable Data" Sensors 25, no. 17: 5586. https://doi.org/10.3390/s25175586
APA StyleKamruzzaman, M., Salinas, J. S., Kolla, H., Sale, K. L., Balakrishnan, U., & Poorey, K. (2025). GenAI-Based Digital Twins Aided Data Augmentation Increases Accuracy in Real-Time Cokurtosis-Based Anomaly Detection of Wearable Data. Sensors, 25(17), 5586. https://doi.org/10.3390/s25175586