*2.5. Statistical Analysis*

Based on a prior study of 11 AN patients and 10 HCs whose activity was measured with a shoe-based accelerometer at three time points: (I) while eating lunch, (II) filling out questionnaires, and (III) watching television for 1 h, power was sufficient with 19 analyzed individuals to demonstrate a significant difference in total PA levels (df = 1.19, f = 5.68, *p* = 0.03) [11]. However, we aimed to assess activity continuously for 3 days and parse the analyses into six different PA intensity levels, i.e., (I) at rest, (II) very light, (III) light, (IV) moderate, (V) vigorous and (VI) vigorous >9 METs. Therefore, we assumed that at least four times more patients (i.e., *n* = 44) would be required to have sufficient power. For organizational purposes we capped HCs at *n* = 30 (assuming less heterogeneity among HCs); we increased the sample size of AN patients to *n* = 50.

A *p*-value of <0.05 was set as the significance threshold. All variables were tested in a two-sided fashion. All data are presented as mean ± standard deviation (SD) if following a normal distribution, otherwise as median (25th/75th percentile), or absolute frequency (relative frequency %). Quartiles were computed using R type 8 so that the resulting quantile estimates were approximately median-unbiased, regardless of the distribution. Data following a Gaussian distribution were analyzed by *t*-tests. Wilcoxon tests were applied for group differences for quantitative response variables not following a Gaussian distribution. Analyses for categories were performed by Fisher's exact test. To test the relationship between BMI change and various potential predictors, univariate and multivariate linear models were computed. A regression tree was computed, as this approach does not make assumptions on distributions or linearity. This machine learning technique computes a series of prediction thresholds to split a data set. Given our relatively small sample, splitting the data set into learning and test sets was not feasible; therefore, we applied a jack-knife procedure, classifying each subject based on a tree build from the remaining patients. Statistical analyses were computed using R version 3.4.2, R Core Team 2017.
