*4.2. Individual Algorithms*

We trained all algorithms on the training set of each individual and performed cross-validation to tune the hyperparameters. Table 3 lists the used machine learning algorithms, the set of tested hyperparameters, and the selected grid search values.


**Table 3.** Algorithms, used parameters, and grid search values.

The accuracy and F1-score of the individual algorithms differ. Figure 3 visualizes the results of the average of the individual scores. The accuracy and F1-score of the individual algorithms differ. Figure 3 visualizes the results of the average of the individual scores.

*Sensors* **2018**, *18*, 623 9 of 16

**Figure 3.** Average accuracy and F1-score per model. **Figure 3.** Average accuracy and F1-score per model.

For thirty-five subjects, the best-performing individual model was the Random Forest algorithm,

generalized over all individuals (mean personalized accuracy = 0.93 versus mean generalized accuracy = 0.82). The average accuracy of the Decision Tree model is 0.93 (range 0.91–0.97) and outperforms the generalized, group-based Decision Tree accuracy of 0.75. The accuracy of the single

The mean F1-score of the Random Forest model is 0.90 (range 0.87–0.94). The mean F1-score of the Decision Tree model based on the eight best performing models is 0.90 (range 0.87–0.93). Finally, the best AdaBoost model has an F1-score of 0.92, while the group accuracy for the AdaBoost

The use of grid search to tune the hyperparameters of the algorithms led to several optimized models per individual. To demonstrate the difference this optimization operation can have, we present an example of two individual models with different hyperparameter configurations in Table 4. Table 5 gives an overview of the number of occurrences of a value for the Random Forest

**Table 4.** Example of different tuned personalized Random Forest models.

gini sqrt 50

entropy log2 50

**Participant Parameters Values**

criterion max\_features n\_estimators

criterion max\_features n\_estimators

AdaBoost model is 0.98, which outperforms the group accuracy of 0.85.

1119

1121

algorithm was 0.77.

hyperparameters.

For thirty-five subjects, the best-performing individual model was the Random Forest algorithm, in eight cases this was the Decision Tree algorithm, and for one subject the AdaBoost algorithm performed best. The average accuracy of the Random Forest algorithm is 0.93 (range 0.88–0.99). Thus, in terms of accuracy, the individual Random Forest models score better than its counterpart that was generalized over all individuals (mean personalized accuracy = 0.93 versus mean generalized accuracy = 0.82). The average accuracy of the Decision Tree model is 0.93 (range 0.91–0.97) and outperforms the generalized, group-based Decision Tree accuracy of 0.75. The accuracy of the single AdaBoost model is 0.98, which outperforms the group accuracy of 0.85.

The mean F1-score of the Random Forest model is 0.90 (range 0.87–0.94). The mean F1-score of the Decision Tree model based on the eight best performing models is 0.90 (range 0.87–0.93). Finally, the best AdaBoost model has an F1-score of 0.92, while the group accuracy for the AdaBoost algorithm was 0.77.

The use of grid search to tune the hyperparameters of the algorithms led to several optimized models per individual. To demonstrate the difference this optimization operation can have, we present an example of two individual models with different hyperparameter configurations in Table 4. Table 5 gives an overview of the number of occurrences of a value for the Random Forest hyperparameters.


**Table 4.** Example of different tuned personalized Random Forest models.


**Table 5.** The number of different values per Random Forest hyperparameter.

The accuracy and F1-score of the various machine learning algorithms increase slightly during the day. The size of this increase differs slightly per machine learning algorithm. For instance, the F1-score of Random Forest increases with 10% during the day, starting with an F1-score of 0.89 at 7:00 AM and ending with an F1-score of 0.97 at 6:00 PM. Both Figures 4 and 5 also show the increase in accuracy and F1-score of the baseline algorithm during the day. Its accuracy starts with 0.55 and ends at 1 at the end of the workday, while the F1-score starts at 0 and ends at 1. The accuracy increases for Random Forest, Logistic Regression, and AdaBoost, whereas the accuracy of Neural Networking is best at 11:00 AM and Stochastic Gradient Descent remains the same.

*Sensors* **2018**, *18*, 6

**Table 5.** The number of different values per Random Forest hyperparameter.

**Hyperparameter Value Number of Occurrences**

gini 37

0.25 5 0.5 7 0.75 15 log2 2 sqrt 2 null 9

100 17 50 16 500 6

The accuracy and F1-score of the various machine learning algorithms increase slightly during the day. The size of this increase differs slightly per machine learning algorithm. For instance, the F1-score of Random Forest increases with 10% during the day, starting with an F1-score of 0.89 at 7:00 AM and ending with an F1-score of 0.97 at 6:00 PM. Both Figures 4 and 5 also show the increase in accuracy and F1-score of the baseline algorithm during the day. Its accuracy starts with 0.55 and ends at 1 at the end of the workday, while the F1-score starts at 0 and ends at 1. The accuracy increases

criterion entropy 7

max\_features 0.1 4

n\_estimators 10 3

is best at 11:00 AM and Stochastic Gradient Descent remains the same.

**Figure 4.** Average F1-Score per algorithm, per hour based on the individual scores.**Figure 4.** 23 Average F1-Score per algorithm, per hour based on the individual scores.

11 of 16

**Figure 5.** Average accuracy per algorithm, per hour based on the individual scores. **Figure 5.** Average accuracy per algorithm, per hour based on the individual scores.

### *4.3. The Web Application 4.3. The Web Application*

The Web application is a demonstration of how the aforementioned machine learning techniques could be used in practice, from the perspective of both the coach and the participant. The application allows the user to determine whether a participant will achieve his or her goal for the day, during the day, by applying the individualized algorithms. The procedure for predicting this goal is as follows. First, the user selects a participant identifier from the dropdown menu. After this selection had been made, the application selects the best and personalized machine learning algorithm for this specific participant. Then the user can fill out a form, providing the necessary details to base the prediction on (hour of day, the number of steps so far, and the number of steps in the past hour). Finally, when the user submits the form, the application returns advice personalized for the individual selected from the dropdown menu. The demo application is available at http://personalized-coaching.compsy.nl/. Figure 6 provides a screenshot of both the input fields of the application and the generated prediction and advice.The Web application is a demonstration of how the aforementioned machine learning techniques could be used in practice, from the perspective of both the coach and the participant. The application allows the user to determine whether a participant will achieve his or her goal for the day, during the day, by applying the individualized algorithms. The procedure for predicting this goal is as follows. First, the user selects a participant identifier from the dropdown menu. After this selection had been made, the application selects the best and personalized machine learning algorithm for this specific participant. Then the user can fill out a form, providing the necessary details to base the prediction on (hour of day, the number of steps so far, and the number of steps in the past hour). Finally, when the user submits the form, the application returns advice personalized for the individual selected from the dropdown menu. The demo application is available at http://personalized-coaching.compsy.nl/. Figure 6 provides a screenshot of both the input fields of the application and the generated prediction and advice.

*Sensors* **2018**, *18*, 623 12 of 16


**Figure 6.** Screenshot of the Personalized Coach Web Application. **Figure 6.** Screenshot of the Personalized Coach Web Application.

#### **4. Discussion 5. Discussion**

We investigated machine learning as a means to support personalized coaching on physical activity. We demonstrated that for our particular data sets, the tree algorithms and tree-based ensemble algorithms performed especially well. To demonstrate how the results of machine learning techniques could be used in practice, an application was used to aid the coaching of the physical activity process. Furthermore, the analysis shows that selecting the right algorithm, using the dataset of the individual participant, and tuning its individual algorithm parameters, can lead to significant improvements in predictive performance and is a critical step in machine learning application. All source code, including the different notebooks and the proof-of-concept Web application is available online as open-source software. The source code can serve as a blueprint for other researchers when aiming to apply machine learning for coaching. We investigated machine learning as a means to support personalized coaching on physical activity. We demonstrated that for our particular data sets, the tree algorithms and tree-based ensemble algorithms performed especially well. To demonstrate how the results of machine learning techniques could be used in practice, an application was used to aid the coaching of the physical activity process. Furthermore, the analysis shows that selecting the right algorithm, using the dataset of the individual participant, and tuning its individual algorithm parameters, can lead to significant improvements in predictive performance and is a critical step in machine learning application. All source code, including the different notebooks and the proof-of-concept Web application is available online as open-source software. The source code can serve as a blueprint for other researchers when aiming to apply machine learning for coaching.

Although Random Forest outperformed most of the other algorithms, it is problematic to provide a generalized recommendation for specific algorithms, parameters, or parameter settings [44]. Presumably due to individually different physical activity patterns, different algorithms and parameters have to be considered. As a starting point, we selected the algorithms based on well-established sources [41,42], applied cross-validation, and grid-searched the values of the selected parameters. Nevertheless, it's important to note that these algorithms, parameters, and grid search values might not work best on all individual physical patterns, and the algorithms, parameters, and grid search values should only be used as starting points. Future work might consist of investigating the underlying mechanisms to be able to choose the best algorithm beforehand. Although Random Forest outperformed most of the other algorithms, it is problematic to provide a generalized recommendation for specific algorithms, parameters, or parameter settings [44]. Presumably due to individually different physical activity patterns, different algorithms and parameters have to be considered. As a starting point, we selected the algorithms based on well-established sources [41,42], applied cross-validation, and grid-searched the values of the selected parameters. Nevertheless, it's important to note that these algorithms, parameters, and grid search values might not work best on all individual physical patterns, and the algorithms, parameters, and grid search values should only be used as starting points. Future work might consist of investigating the underlying mechanisms to be able to choose the best algorithm beforehand.

We based the prediction solely on the hour of the day and the number of steps. These steps are naturally increasing over the day, and as such, not independent from each other. By including the

We based the prediction solely on the hour of the day and the number of steps. These steps are naturally increasing over the day, and as such, not independent from each other. By including the cumulative number of steps for each block of data, and by including the number of steps made in the past hour, we assume each block to be independent from the other blocks, and as such, are still able to use the regular machine learning methods.

A limitation of the present work is that all participants included in this study participated in an intervention. This intervention might have made the participants more aware and engaged with the project, and as such, the individualized models might be biased towards the best scenario. When people are not extrinsically motivated to meet their daily physical activity goal, and lower their physical activity, the predictive power of the models and therefor the effect of automated intervention will lessen. On the other hand, when an intervention like the health promotion program ends, the individualized models check the participant on his or her performance as if the program is supporting the participant.

As presented in the state of the art literature, the total number of steps differ significantly between Sunday and rest of the weekdays [5,6,48]. Within this health promotion program, the focus was on improving physical activity during working hours and commuting. Therefore, the machine learning models were trained based on the normal workweek. Only one model per participant, based on the five weekdays, is adequate to predict whether or not a participant will meet his or her threshold. It may be necessary to conduct different models for the weekend and weekdays when a health promotion program is expanded to weekends. A reason to establish more than one or two models per participant is found in the variances between weekdays [5]. Examples of different factors that could influence the level of physical activity are weekly sport obligations, weekly meetings, or lunch walks on certain days. Constructing a model per weekday might establish an even more personalized and precise prediction.

In the present work, we only train our machine learning algorithms on variables provided by the activity tracker, extending this set of variables with other (exogenous) variables from other data sources. For example, the data can be extended to include information on the changes in the weather conditions and/or season, which are known to correlate with the day-to-day activity [5,53], or non-working time during weekdays like national holidays and free time, or part-time working schedule, for the activity level differs between non-occupational and occupational time, or the influence and effectiveness of coaching and interventions. Adding the mentioned factors to the dataset might improve the predictive accuracy of the model and might increase the effectiveness of the coaching process.

To apply the personalized machine learning models effectively, they have to become a part of a larger ecosystem. An ideal coaching process is fully tailored to the individual participant. One of the most important characteristics of the personalization of a coaching strategy consists in the timing and ease to execute triggers to change behavior [54]. To support these two aspects of coaching, timely information on the participant and the effectiveness of the coaching strategy are needed. Coaching might not be limited to a personal real life coach but also may include virtual coaching. An example of a possible use of the system is: at the moment the participant doesn't score a 'yes' for two hours in a row on the prediction of meeting his threshold, a notification is sent out to both the participant and the coach. On the basis of this notification, the participant and the coach can take action; the coach can timely intervene to stimulate his client to become physically active and the participant can become instantly more active. Blok et al. proposed a system which combines the real-time analysis of activity tracker data and other personal streaming data as well as the evaluation of virtual coaching strategies, which enables it to tune the coaching to the person [55]. The present work could serve as a central component of a virtual coach system like that envisioned by Blok et al. [55].

To make the information even more personal and relevant, a promising direction for future work is to include a prediction of the actual number of steps at the end of the day. Adding more (and personalized) information might strengthen the effectiveness of the system. To do so, we could apply a similar procedure to the one presented in this study, but instead replace our classification algorithms with regression machine learning algorithms. The predicted number of steps could be a valuable extension in addition to the currently implemented classification of the step goal.

To conclude, machine learning is a viable asset to automate personalized daily physical activity prediction. Coaching can provide accurate and timely information on the participants' physical activity, even early in the day. This is the result of applying machine learning to the behavior of the individual participant as precisely and frequently measured by wearable sensors. The prediction of the participant meeting his goal in combination with the probability of such achievement allows for early intervention and can be used to provide support for personalized coaching. Also, the motivation for self-coaching might be increased, while every model is personalized and the results are better fitted to the situation. Furthermore, machine learning techniques empower automated coaching and personalization.

**Acknowledgments:** We thank the Hanze University Health Program for providing the physical activity data of the Health Program and all the participants in the experiment.

**Author Contributions:** Miriam van Ittersum conceived and designed the study of the health promotion program; Talko Dijkhuis developed the database, notebooks and performed the experiments; Talko Dijkhuis analyzed the data; Frank Blaauw developed the Web application. All authors have participated in writing the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.
