Characterisation of Temporal Patterns in Step Count Behaviour from Smartphone App Data: An Unsupervised Machine Learning Approach
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data
Data Preparation
2.2. Clustering Algorithm Choice
2.3. Model Specification
2.4. Model Evaluation
2.5. Cluster Characterisation
2.6. Cluster Interrelation
3. Results
3.1. K-Means Clustering Model Specification
3.2. Agglomerative Hierarchical Clustering Model Specification
3.3. Modelled Behaviours
3.3.1. Seasonal Physical Activity Behaviour
3.3.2. Weekly Physical Activity Behaviour
3.4. Model Evaluation
4. Cluster Characterisation
4.1. Cluster Demographics
4.1.1. Yearlong Physical Activity Behaviour Cluster Demographics
4.1.2. Weekly Cluster Demographics
4.2. Further Physical Activity Behaviours of the Clusters
4.2.1. Physical Activity Behaviour of Yearlong Clusters
4.2.2. Physical Activity Behaviour of Weekly Clusters
4.3. Cluster Interactions
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
WHO | World Health Organisation |
MVPA | Moderate to Vigorous Physical Activity |
DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
DTW | Dynamic Time Warped |
ARI | Adjusted Rand Index |
MI | Mutual Information |
AMI | Adjusted Mutual Information |
CDRC | Consumer Data Research Centre |
Appendix A
References
- World Health Organization. Global Action Plan on Physical Activity 2018–2030: More Active People for a Healthier World; World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
- Evenson, K.R.; Herring, A.H.; Wen, F. Accelerometry-assessed latent class patterns of physical activity and sedentary behavior with mortality. Am. J. Prev. Med. 2017, 52, 135–143. [Google Scholar] [CrossRef] [Green Version]
- Shiroma, E.J.; Lee, I.M.; Schepps, M.A.; Kamada, M.; Harris, T.B. Physical Activity Patterns and Mortality: The Weekend Warrior and Activity Bouts. Med. Sci. Sport. Exerc. 2019, 51, 35–40. [Google Scholar] [CrossRef]
- Servick, K. Mind the phone. Science 2015, 350, 1306–1309. [Google Scholar] [CrossRef]
- Bergman, P. The number of repeated observations needed to estimate the habitual physical activity of an individual to a given level of precision. PLoS ONE 2018, 13, e0192117. [Google Scholar] [CrossRef] [Green Version]
- O’Brien, W.J.; Shultz, S.P.; Firestone, R.T.; George, L.; Breier, B.H.; Kruger, R. Exploring the challenges in obtaining physical activity data from women using hip-worn accelerometers. Eur. J. Sport Sci. 2017, 17, 922–930. [Google Scholar] [CrossRef]
- Intille, S.S.; Lester, J.; Sallis, J.F.; Duncan, G. New horizons in sensor development. Med. Sci. Sport. Exerc. 2012, 44, S24. [Google Scholar] [CrossRef] [Green Version]
- Martinson, B.C.; Crain, A.L.; Sherwood, N.E.; Hayes, M.G.; Pronk, N.P.; O’Connor, P.J. Population reach and recruitment bias in a maintenance RCT in physically active older adults. J. Phys. Act. Health 2010, 7, 127–135. [Google Scholar] [CrossRef]
- McConnell, M.V.; Shcherbina, A.; Pavlovic, A.; Homburger, J.R.; Goldfeder, R.L.; Waggot, D.; Cho, M.K.; Rosenberger, M.E.; Haskell, W.L.; Myers, J.; et al. Feasibility of Obtaining Measures of Lifestyle From a Smartphone App: The MyHeart Counts Cardiovascular Health Study. J. Am. Med. Assoc. Cardiol. 2017, 2, 67–76. [Google Scholar] [CrossRef] [Green Version]
- van Wijk, D.C.; Groeniger, J.O.; van Lenthe, F.J.; Kamphuis, C.B. The role of the built environment in explaining educational inequalities in walking and cycling among adults in the Netherlands. Int. J. Health Geogr. 2017, 16, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Treuth, M.S.; Catellier, D.J.; Schmitz, K.H.; Pate, R.R.; Elder, J.P.; McMurray, R.G.; Blew, R.M.; Yang, S.; Webber, L. Weekend and Weekday Patterns of Physical Activity in Overweight and Normal-weight Adolescent Girls. Obesity 2007, 15, 1782–1788. [Google Scholar] [CrossRef] [Green Version]
- Shephard, R.J.; Aoyagi, Y. Seasonal variations in physical activity and implications for human health. Eur. J. Appl. Physiol. 2009, 107, 251–271. [Google Scholar] [CrossRef]
- Evenson, K.R.; Wen, F.; Metzger, J.S.; Herring, A.H. Physical activity and sedentary behavior patterns using accelerometry from a national sample of United States adults. Int. J. Behav. Nutr. Phys. Act. 2015, 12, 20. [Google Scholar] [CrossRef] [Green Version]
- O’Connell, S.E.; Griffiths, P.L.; Clemes, S.A. Seasonal variation in physical activity, sedentary behaviour and sleep in a sample of UK adults. Ann. Hum. Biol. 2014, 41, 1–8. [Google Scholar] [CrossRef]
- O’Donovan, G.; Lee, I.M.; Hamer, M.; Stamatakis, E. Association of “Weekend Warrior” and Other Leisure Time Physical Activity Patterns With Risks for All-Cause, Cardiovascular Disease, and Cancer Mortality Association of Leisure Time Physical Activity With Risk for Mortality Association of Leisure Time Physical Activity With Risk for Mortality. J. Am. Med. Assoc. Intern. Med. 2017, 177, 335–342. [Google Scholar] [CrossRef] [Green Version]
- Shuval, K.; Li, Q.; Gabriel, K.P.; Tchernis, R. Income, physical activity, sedentary behavior, and the ’weekend warrior’ among U.S. adults. Prev. Med. 2017, 103, 91–97. [Google Scholar] [CrossRef]
- Wang, G.; Li, B.; Zhang, X.; Niu, C.; Li, J.; Li, L.; Speakman, J.R. No seasonal variation in physical activity of Han Chinese living in Beijing. Int. J. Behav. Nutr. Phys. Act. 2017, 14, 48. [Google Scholar] [CrossRef]
- Harari, G.M.; Lane, N.D.; Wang, R.; Crosier, B.S.; Campbell, A.T.; Gosling, S.D. Using Smartphones to Collect Behavioral Data in Psychological Science: Opportunities, Practical Considerations, and Challenges. Perspect. Psychol. Sci. 2016, 11, 838–854. [Google Scholar] [CrossRef] [Green Version]
- Tudor-Locke, C.; Bassett, D.R.; Swartz, A.M.; Strath, S.J.; Parr, B.B.; Reis, J.P.; DuBose, K.D.; Ainsworth, B.E. A Preliminary study of one year of pedometer self-monitoring. Ann. Behav. Med. 2004, 28, 158–162. [Google Scholar] [CrossRef]
- Althoff, T.; Hicks, J.L.; King, A.C.; Delp, S.L.; Leskovec, J. Large-scale physical activity data reveal worldwide activity inequality. Nature 2017, 547, 336. [Google Scholar] [CrossRef]
- Musakwa, W.; Selala, K.M. Mapping cycling patterns and trends using Strava Metro data in the city of Johannesburg, South Africa. Data Brief 2016, 9, 898–905. [Google Scholar] [CrossRef] [Green Version]
- Hong, J.; McArthur, D.P.; Livingston, M. The evaluation of large cycling infrastructure investments in Glasgow using crowdsourced cycle data. Transportation 2020, 47, 2859–2872. [Google Scholar] [CrossRef] [Green Version]
- Alattar, M.A.; Cottrill, C.; Beecroft, M. Modelling cyclists’ route choice using Strava and OSMnx: A case study of the City of Glasgow. Transp. Res. Interdiscip. Perspect. 2021, 9, 100301. [Google Scholar] [CrossRef]
- Camacho-Torregrosa, F.J.; Llopis-Castelló, D.; López-Maldonado, G.; García, A. An Examination of the Strava Usage Rate—A Parameter to Estimate Average Annual Daily Bicycle Volumes on Rural Roadways. Safety 2021, 7, 8. [Google Scholar] [CrossRef]
- Carroll, J.K.; Moorhead, A.; Bond, R.; LeBlanc, W.G.; Petrella, R.J.; Fiscella, K. Who Uses Mobile Phone Health Apps and Does Use Matter? A Secondary Data Analytics Approach. J. Med. Internet Res. 2017, 19, e125. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hicks, J.L.; Althoff, T.; Kuhar, P.; Bostjancic, B.; King, A.C.; Leskovec, J.; Delp, S.L. Best practices for analyzing large-scale health data from wearables and smartphone apps. Nat. Partn. J. Digit. Med. 2019, 2, 1–12. [Google Scholar] [CrossRef]
- Tripathy, J.P. Secondary Data Analysis: Ethical Issues and Challenges. Iran J. Public Health 2013, 42, 1478–1479. [Google Scholar] [PubMed]
- Pontin, F. Utilising smartphone data to explore spatial influences on physical activity. In Big Data Applications in Geography and Planning; Edward Elgar Publishing: Cheltenham, UK, 2021. [Google Scholar]
- Pontin, F.; Lomax, N.; Clarke, G.; Morris, M.A. Socio-demographic determinants of physical activity and app usage from smartphone data. Soc. Sci. Med. 2021, 284, 114235. [Google Scholar] [CrossRef]
- Dobbins, C.; Rawassizadeh, R. Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition. Informatics 2018, 5, 29. [Google Scholar] [CrossRef] [Green Version]
- Willetts, M.; Hollowell, S.; Aslett, L.; Holmes, C.; Doherty, A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Sci. Rep. 2018, 8, 7961. [Google Scholar] [CrossRef] [Green Version]
- Jones, P.J.; James, M.K.; Davies, M.J.; Khunti, K.; Catt, M.; Yates, T.; Rowlands, A.V.; Mirkes, E.M. FilterK: A new outlier detection method for k-means clustering of physical activity. J. Biomed. Inform. 2020, 104, 103397. [Google Scholar] [CrossRef]
- Mooney, S.J.; Pejaver, V. Big data in public health: Terminology, machine learning, and privacy. Annu. Rev. Public Health 2018, 39, 95–112. [Google Scholar] [CrossRef] [Green Version]
- Sidey-Gibbons, J.A.M.; Sidey-Gibbons, C.J. Machine learning in medicine: A practical introduction. BioMed Cent. Med. Res. Methodol. 2019, 19, 64. [Google Scholar] [CrossRef] [Green Version]
- Rodriguez, M.Z.; Comin, C.H.; Casanova, D.; Bruno, O.M.; Amancio, D.R.; Costa, L.d.F.; Rodrigues, F.A. Clustering algorithms: A comparative approach. PLoS ONE 2019, 14, e0210236. [Google Scholar] [CrossRef] [PubMed]
- Consumer Data Research Centre. Active Inspiration Activity Data. 2016. Available online: https://data.cdrc.ac.uk/dataset/active-inspiration-activity-data (accessed on 20 September 2021).
- Kraus, W.E.; Janz, K.F.; Powell, K.E.; Campbell, W.W.; Jakicic, J.M.; Troiano, R.P.; Sprow, K.; Torres, A.; Piercy, K.L.; Committee, P.A.G.A. Daily step counts for measuring physical activity exposure and its relation to health. Med. Sci. Sport. Exerc. 2019, 51, 1206. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Hauck, T. Scikit-Learn Cookbook; Packt Publishing: Birmingham, UK, 2014. [Google Scholar]
- Romano, S.; Vinh, N.X.; Bailey, J.; Verspoor, K. Adjusting for chance clustering comparison measures. J. Mach. Learn. Res. 2016, 17, 4635–4666. [Google Scholar]
- Guertler, D.; Vandelanotte, C.; Kirwan, M.; Duncan, M.J. Engagement and nonusage attrition with a free physical activity promotion program: The case of 10,000 steps Australia. J. Med. Internet Res. 2015, 17, e4339. [Google Scholar] [CrossRef] [Green Version]
- Levin, S.; Jacobs, D.R.J.; Ainsworth, B.E.; Richardson, M.T.; Leon, A.S. Intra-individual variation and estimates of usual physical activity. Ann. Epidemiol. 1999, 9, 481–488. [Google Scholar] [CrossRef]
- Seefeldt, V.; Malina, R.M.; Clark, M.A. Factors affecting levels of physical activity in adults. Sport. Med. 2002, 32, 143–168. [Google Scholar] [CrossRef]
- Centre for Diet and Activity Research. Evidence Brief 12—A Child for All Seasons—Addressing Seasonal Variation in Children’s Physical Activity. Report, University of Cambridge. 2016. Available online: https://www.cedar.iph.cam.ac.uk/wp-content/uploads/2018/11/Evidence-Brief-12-childrens-seasonal-variation-PA-v.1.0_15.9.16.pdf (accessed on 20 September 2021).
- Department for Education. Guidance: Holiday Activities and Food Programme 2021. Available online: https://www.gov.uk/government/publications/holiday-activities-and-food-programme/holiday-activities-and-food-programme-2021 (accessed on 20 September 2021).
- Hjorth, M.F.; Chaput, J.P.; Michaelsen, K.; Astrup, A.; Tetens, I.; Sjödin, A. Seasonal variation in objectively measured physical activity, sedentary time, cardio-respiratory fitness and sleep duration among 8–11 year-old Danish children: A repeated-measures study. BioMed Cent. Public Health 2013, 13, 1–10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Goodman, A.; Page, A.S.; Cooper, A.R. Daylight saving time as a potential public health intervention: An observational study of evening daylight and objectively-measured physical activity among 23,000 children from 9 countries. Int. J. Behav. Nutr. Phys. Act. 2014, 11, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Maddison, R.; Gemming, L.; Monedero, J.; Bolger, L.; Belton, S.; Issartel, J.; Marsh, S.; Direito, A.; Solenhill, M.; Zhao, J. Quantifying human movement using the Movn smartphone app: Validation and field study. J. Med. Internet Res. mHealth uHealth 2017, 5, e7167. [Google Scholar] [CrossRef]
- Kruger, J.; Ham, S.A.; Kohl, H.W. Characteristics of a “weekend warrior”: Results from two national surveys. Med. Acience Aports Exerc. 2007, 39, 796–800. [Google Scholar] [CrossRef]
- Lee, I.M.; Sesso, H.D.; Oguma, Y.; Paffenbarger, R.S.J. The “Weekend Warrior” and Risk of Mortality. Am. J. Epidemiol. 2004, 160, 636–641. [Google Scholar] [CrossRef] [Green Version]
- Hamer, M.; O’Donovan, G.; Lee, I.M.; Stamatakis, E. The ‘weekend warrior’ physical activity pattern: How little is enough? Br. J. Sport. Med. 2017, 51, 1384–1385. [Google Scholar] [CrossRef] [PubMed]
- Bond, D.S.; Raynor, H.A.; Phelan, S.; Steeves, J.; Daniello, R.; Wing, R.R. The relationship between physical activity variety and objectively measured moderate-to-vigorous physical activity levels in weight loss maintainers and normal-weight individuals. J. Obes. 2012, 2012, 812414. [Google Scholar] [CrossRef] [PubMed]
- Raynor, H.A.; Bond, D.S.; Steeves, J.; Thompson, D.L. Physical activity variety, energy expenditure, and body mass index. Am. J. Health Behav. 2014, 38, 624–630. [Google Scholar] [CrossRef] [Green Version]
- Department of Health & Social Care. UK Chief Medical Officers’ Physical Activity Guidelines. 2019. Available online: https://www.gov.uk/government/publications/physical-activity-guidelines-uk-chief-medical-officers-report (accessed on 20 September 2021).
- Meyer, J.; Wasmann, M.; Heuten, W.; El Ali, A.; Boll, S.C. Identification and classification of usage patterns in long-term activity tracking. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, 6–11 May 2017; pp. 667–678. [Google Scholar]
- Morissette, L.; Chartier, S. The k-means clustering technique: General considerations and implementation in Mathematica. Tutorials Quant. Methods Psychol. 2013, 9, 15–24. [Google Scholar] [CrossRef] [Green Version]
- Fränti, P.; Sieranoja, S. K-means properties on six clustering benchmark datasets. Appl. Intell. 2018, 48, 4743–4759. [Google Scholar] [CrossRef]
- Dynamic Time Warping. In Information Retrieval for Music and Motion; Springer: Berlin/Heidelberg, Germany, 2007; pp. 69–84. [CrossRef]
- Romeo, A.; Edney, S.; Plotnikoff, R.; Curtis, R.; Ryan, J.; Sanders, I.; Crozier, A.; Maher, C. Can smartphone apps increase physical activity? Systematic review and meta-analysis. J. Med. Internet Res. 2019, 21, e12053. [Google Scholar] [CrossRef]
- Sun, Y.; Du, Y.; Wang, Y.; Zhuang, L. Examining associations of environmental characteristics with recreational cycling behaviour by street-level Strava data. Int. J. Environ. Res. Public Health 2017, 14, 644. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, Y.R.; Mobasheri, A. Utilizing Crowdsourced Data for Studies of Cycling and Air Pollution Exposure: A Case Study Using Strava Data. Int. J. Environ. Res. Public Health 2017, 14, 274. [Google Scholar] [CrossRef] [PubMed]
- Zwolinsky, S.; McKenna, J.; Pringle, A.; Widdop, P.; Griffiths, C. Physical activity assessment for public health: Efficacious use of the single-item measure. Public Health 2015, 129, 1630–1636. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Clustering Algorithm | Class of Algorithm | Example of Application to Seasonal Physical Activity Data |
---|---|---|
k-means | Partitioning | Yearly activity behaviours are partitioned into a predefined number of clusters, minimising within cluster difference (within cluster sum of squares) and maximising between cluster difference. Initially randomly positioned centroids are used and each year of activity behaviour assigned to its nearest centroid, the mean of the centroids is calculated and the process is iteratively repeated until the centroids do not change [30]. These are then the defined clusters. |
Agglomerative | Hierarchical/ linkage | Initially each unique year of activity behaviour is treated as its own cluster, the most similar clusters are then grouped together to form larger clusters. The point at which we chose to stop cluster merging and examine the remaining cluster is determined using a dendrogram. |
Validation Metric | Seasonal Physical Activity Behaviour Clusters | Weekly Physical Activity Behaviour Clusters |
---|---|---|
Mutual Information score (MI) | 0.992 | 0.879 |
Adjusted mutual information score (AMI) | 0.569 | 0.523 |
Adjusted Rand Index (ARI) | 0.577 | 0.413 |
Cluster Description | Number of Users | Proportion of | p | Mean Age | |
---|---|---|---|---|---|
Female Users | Male Users | ||||
Active spring starter | 2679 | 8.67 | 8.81 | 0.730 | 41.46 |
Active yearlong | 2562 | 8.28 | 8.44 | 0.686 | 41.48 |
Demotivated spring starter | 4671 | 15.91 | 12.57 | <0.05 | 38.49 |
Highly active yearlong | 1132 | 3.31 | 4.93 | <0.05 | 42.21 |
Inactive | 10,498 | 32.21 | 40.61 | <0.05 | 37.81 |
Moderately active yearlong | 3947 | 13.32 | 11.06 | <0.05 | 40.22 |
Motivated spring starter | 5315 | 18.31 | 13.58 | <0.05 | 39.74 |
Cluster Description | Number of User Weeks | Proportion of Weeks Recorded by | p | |
---|---|---|---|---|
Female Users | Male Users | |||
Active Weekend Warriors | 140,400 | 13.28 | 13.16 | 0.160 |
Consistently highly active | 61,568 | 5.25 | 7.8 | <0.001 |
Consistently somewhat active | 269,056 | 26.83 | 20.37 | <0.001 |
Consistent Low Activity | 251,160 | 21.88 | 30.18 | <0.001 |
Weekday active | 133,167 | 12.47 | 12.93 | <0.001 |
Weekday moderately active | 203,850 | 20.29 | 15.56 | <0.001 |
Cluster Description | Average Number of Activities Undertaken per Week | Average Number of Different Activity Types Undertaken | Average Active Minutes per Week | Proportion of Cluster Weeks Meeting MVPA Guidelines * |
---|---|---|---|---|
Active Weekend Warriors | 9.46 | 1.4 | 107.7 | 19.9% |
Consistently highly active | 17.5 | 1.9 | 303.8 | 42.3% |
Consistently somewhat active | 7.3 | 1.2 | 59.0 | 11.0% |
Consistent Low Activity | 4.2 | 1.2 | 145.6 | 28.1% |
Weekday active | 11.2 | 1.5 | 137.2 | 23.5% |
Weekday moderately active | 8.4 | 1.3 | 72.6 | 13.7% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pontin, F.; Lomax, N.; Clarke, G.; Morris, M.A. Characterisation of Temporal Patterns in Step Count Behaviour from Smartphone App Data: An Unsupervised Machine Learning Approach. Int. J. Environ. Res. Public Health 2021, 18, 11476. https://doi.org/10.3390/ijerph182111476
Pontin F, Lomax N, Clarke G, Morris MA. Characterisation of Temporal Patterns in Step Count Behaviour from Smartphone App Data: An Unsupervised Machine Learning Approach. International Journal of Environmental Research and Public Health. 2021; 18(21):11476. https://doi.org/10.3390/ijerph182111476
Chicago/Turabian StylePontin, Francesca, Nik Lomax, Graham Clarke, and Michelle A. Morris. 2021. "Characterisation of Temporal Patterns in Step Count Behaviour from Smartphone App Data: An Unsupervised Machine Learning Approach" International Journal of Environmental Research and Public Health 18, no. 21: 11476. https://doi.org/10.3390/ijerph182111476
APA StylePontin, F., Lomax, N., Clarke, G., & Morris, M. A. (2021). Characterisation of Temporal Patterns in Step Count Behaviour from Smartphone App Data: An Unsupervised Machine Learning Approach. International Journal of Environmental Research and Public Health, 18(21), 11476. https://doi.org/10.3390/ijerph182111476