2SpamH: A Two-Stage Pre-Processing Algorithm for Passively Sensed mHealth Data
Abstract
:1. Introduction
1.1. Background
1.2. Downward Bias in Passive Sensing Data
1.3. Existing Methods for Evaluating Sensor Data Validity
1.4. Our Solution
2. Materials and Methods
2.1. Notation and Specifications
2.2. Stage 1 of 2SpamH: Feature Space Construction via Principal Component Analysis and Prototype Selection
Algorithm 1. Pseudocode of the 2SpamH algorithm which has two stages: (1) prototype selection in the feature space of device use and sensor activity levels to label data points as “missing” or “non-missing” with some confidence based on a threshold, and (2) a k-nearest neighbors (KNN) approach to label non-prototype data points in the feature space based on their proximity to the labeled prototypes. The algorithm returns “missing” labels for all data points. | |
2SpamH Algorithm | |
Input: Sensor activity matrix , Device usage matrix , Prototype selection percentiles , Number of nearest neighbors | |
Output: Missing label matrix | |
Stage 1: Prototype Selection | |
1. | Perform PCA on and to obtain the principal components: |
where and are vectors of length T of the first principal components of and . If ncol() = 1, then ; if ncol() = 1, then = . | |
2. | Construct the feature space as the set of points for each t: |
where represents the coordinates of the tth data point in the constructed feature space. | |
3. | Compute the lower and upper quantiles for and : |
4. | Identify the set of missing prototypes in the feature space : |
5. | Identify the set of non-missing prototypes in the feature space F: |
6. | For each data point : |
7. | Assign labels to rows of based on whether data points fall within the prototype regions: |
Stage 2: Labeling Unlabeled Data Using KNN | |
8. | For each unlabeled data point that was not assigned a label in Stage 1: |
9. | Implement KNN with and Euclidean distance function to label the remaining unlabeled data points: |
Return: Missing label matrix |
2.3. Stage 2 of 2SpamH: K-Nearest Neighbors Algorithm
2.4. Remarks on Parameter Tuning
2.5. Simulation Studies
2.6. Imputation Techniques
3. Results
3.1. Simulation Results
3.2. Real-Life mHealth Dataset Results
3.3. Imputation Results
4. Discussion and Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hicks, J.L.; Althoff, T.; Sosic, R.; Kuhar, P.; Bostjancic, B.; King, A.C.; Leskovec, J.; Delp, S.L. Best practices for analyzing large-scale health data from wearables and smartphone apps. NPJ Digit. Med. 2019, 2, 45. [Google Scholar] [CrossRef] [PubMed]
- Althoff, T.; Sosič, R.; Hicks, J.L.; King, A.C.; Delp, S.L.; Leskovec, J. Large-scale physical activity data reveal worldwide activity inequality. Nature 2017, 547, 336–339. [Google Scholar] [CrossRef] [PubMed]
- Singh, V.K.; Agarwal, R.R. Cooperative phoneotypes: Exploring phone-based behavioral markers of cooperation. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–15 September 2016; pp. 646–657. [Google Scholar]
- Miluzzo, E.; Lane, N.D.; Fodor, K.; Peterson, R.; Lu, H.; Musolesi, M.; Eisenman, S.B.; Zheng, X.; Campbell, A.T. Sensing meets mobile social networks: The design, implementation and evaluation of the cenceme application. In Proceedings of the 6th ACM Conference on Embedded Network Sensor Systems, Raleigh, NC, USA, 5–7 November 2008; pp. 337–350. [Google Scholar]
- Min, J.-K.; Doryab, A.; Wiese, J.; Amini, S.; Zimmerman, J.; Hong, J.I. Toss’n’turn: Smartphone as sleep and sleep quality detector. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Toronto, ON, Canada, 26 April–1 May 2014; pp. 477–486. [Google Scholar]
- Lee, J.; Solomonov, N.; Banerjee, S.; Alexopoulos, G.S.; Sirey, J.A. Use of passive sensing in psychotherapy studies in late life: A pilot example, opportunities and challenges. Front. Psychiatry 2021, 12, 732773. [Google Scholar] [CrossRef] [PubMed]
- Fahmy, H.M.A. Energy management techniques for wsns (1): Duty-cycling approach. In Wireless Sensor Networks: Energy Harvesting and Management for Research and Industry; Springer International Publishing: Cham, Switzerland, 2020; pp. 109–258. [Google Scholar]
- Barnett, I.; Onnela, J.P. Inferring mobility measures from gps traces with missing data. Biostatistics 2020, 21, e98–e112. [Google Scholar] [CrossRef] [PubMed]
- Walch, O.J.; Cochran, A.; Forger, D.B. A global quantification of “normal” sleep schedules using smartphone data. Sci. Adv. 2016, 2, e1501705. [Google Scholar] [CrossRef] [PubMed]
- Benkhelifa, E.; Welsh, T.; Tawalbeh, L.; Jararweh, Y.; Basalamah, A. Energy optimisation for mobile device power consumption: A survey and a unified view of modelling for a comprehensive network simulation. Mob. Netw. Appl. 2016, 21, 575–588. [Google Scholar] [CrossRef]
- Miller, G. The smartphone psychology manifesto. Perspect. Psychol. Sci. 2012, 7, 221–237. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Ibrahim, A.; Parsia, B.; Poliakoff, E.; Harper, S. Passive social sensing with smartphones: A systematic review. Computing 2023, 105, 29–51. [Google Scholar] [CrossRef]
- Duncan, M.J.; Wunderlich, K.; Zhao, Y.; Faulkner, G. Walk this way: Validity evidence of iphone health application step count in laboratory and free-living conditions. J. Sports Sci. 2018, 36, 1695–1704. [Google Scholar] [CrossRef] [PubMed]
- Recio-Rodríguez, J.I.; Martín-Cantera, C.; González-Viejo, N.; Gómez-Arranz, A.; Arietaleanizbeascoa, M.S.; Schmolling-Guinovart, Y.; Maderuelo-Fernandez, J.A.; Pérez-Arechaederra, D.; Rodriguez-Sanchez, E.; Gómez-Marcos, M.A.; et al. Effectiveness of a smartphone application for improving healthy lifestyles, a randomized clinical trial (evident ii): Study protocol. BMC Public Health 2014, 14, 254. [Google Scholar] [CrossRef] [PubMed]
- Kwapisz, J.R.; Weiss, G.M.; Moore, S.A. Activity recognition using cell phone accelerometers. ACM SigKDD Explor. Newsl. 2011, 12, 74–82. [Google Scholar] [CrossRef]
- Kraft, R.; Hofmann, F.; Reichert, M.; Pryss, R. Dealing with inaccurate sensor data in the context of mobile crowdsensing and mhealth. IEEE J. Biomed. Health Inform. 2022, 26, 5439–5449. [Google Scholar] [CrossRef] [PubMed]
- Lu, H.; Pan, W.; Lane, N.D.; Choudhury, T.; Campbell, A.T. Soundsense: Scalable sound sensing for people-centric applications on mobile phones. In Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services, Kraków, Poland, 22–25 June 2009; pp. 165–178. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013. [Google Scholar]
- Agarwal, G.; Tu, W.; Sun, Y.; Kong, L. Flexible quantile contour estimation for multivariate functional data: Beyond convexity. Comput. Stat. Data Anal. 2022, 168, 107400. [Google Scholar] [CrossRef]
- Geraci, M.; Boghossian, N.S.; Farcomeni, A.; Horbar, J.D. Quantile contours and allometric modelling for risk classification of abnormal ratios with an application to asymmetric growth-restriction in preterm infants. Stat. Methods Med. Res. 2020, 29, 1769–1786. [Google Scholar] [CrossRef] [PubMed]
- Coomans, D.; Massart, D.L. Alternative k-nearest neighbour rules in supervised pattern recognition: Part 1. K-nearest neighbour classification by using alternative voting rules. Anal. Chim. Acta 1982, 136, 15–27. [Google Scholar] [CrossRef]
- Reider, L.; Bai, J.; Scharfstein, D.O.; Zipunnikov, V. Methods for step count data: Determining “valid” days and quantifying fragmentation of walking bouts. Gait Posture 2020, 81, 205–212. [Google Scholar] [CrossRef] [PubMed]
- Stekhoven, D.J.; Bühlmann, P. Missforest—Non-parametric missing value imputation for mixed-type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef] [PubMed]
- Quartagno, M.; Grund, S.; Carpenter, J. Jomo: A flexible package for two-level joint modelling multiple imputation. R J. 2019, 9, 205–228. [Google Scholar] [CrossRef]
- Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
- Zhang, H.; Lee, J.; Carter, E. Twospamh: A Two-Stage Algorithm for Processing Passively Sensed Mhealth Data. 0.1.0. 2023. Available online: https://github.com/emilycarter/TwoSpamH (accessed on 14 August 2024).
Device Usage | Number of Uploads | Implication | Prototype Label |
---|---|---|---|
High | High | Engaging with the device | Non-Missing |
High | Low | Using the device but inactive | N/A |
Low | High | Active, carrying/wearing the device but not engaging | N/A |
Low | Low | Not engaged and not carrying/wearing the device | Missing |
Measure | MissForest (Within-User) | JomoImpute | MissForest (Across Users) |
---|---|---|---|
Step Count | 0.44 | 0.44 | 0.31 |
Time at Home | 0.39 | 0.54 | 0.32 |
Conversation Percentage | 0.14 | 0.18 | 0.19 |
Time in Conversation | 0.24 | 0.30 | 0.24 |
Sleep Duration | 0.43 | 0.57 | 0.37 |
Travel Diameter | 0.59 | 0.88 | 0.44 |
Active Time | 0.37 | 0.52 | 0.27 |
Sleep Interruptions | 0.53 | 0.68 | 0.23 |
Radius of Gyration | 0.46 | 0.83 | 0.33 |
Total Activity Duration | 0.36 | 4.46 | 0.41 |
Total Location Duration | 0.42 | 0.52 | 0.26 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Diaz, J.L.; Kim, S.; Yu, Z.; Wu, Y.; Carter, E.; Banerjee, S. 2SpamH: A Two-Stage Pre-Processing Algorithm for Passively Sensed mHealth Data. Sensors 2024, 24, 7053. https://doi.org/10.3390/s24217053
Zhang H, Diaz JL, Kim S, Yu Z, Wu Y, Carter E, Banerjee S. 2SpamH: A Two-Stage Pre-Processing Algorithm for Passively Sensed mHealth Data. Sensors. 2024; 24(21):7053. https://doi.org/10.3390/s24217053
Chicago/Turabian StyleZhang, Hongzhe, Jihui L. Diaz, Soohyun Kim, Zilong Yu, Yiyuan Wu, Emily Carter, and Samprit Banerjee. 2024. "2SpamH: A Two-Stage Pre-Processing Algorithm for Passively Sensed mHealth Data" Sensors 24, no. 21: 7053. https://doi.org/10.3390/s24217053
APA StyleZhang, H., Diaz, J. L., Kim, S., Yu, Z., Wu, Y., Carter, E., & Banerjee, S. (2024). 2SpamH: A Two-Stage Pre-Processing Algorithm for Passively Sensed mHealth Data. Sensors, 24(21), 7053. https://doi.org/10.3390/s24217053