*2.6. Validation*

Finally, to evaluate the effectiveness of the classification process, a statistical validation is applied, based on nine metrics: true positive (TP) (conditions correctly classified), true negative (TN) (controls correctly classified), false positive (FP) (controls incorrectly classified), false negative (FN) (conditions incorrectly classified), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy.

Sensitivity can be calculated by Equation (4),

$$Sensitivity = \frac{TP}{TP + FN'} \tag{4}$$

describing the true positive rate, i.e. the probability that a depressive episode is classified rightly.

Specificity can be calculated by Equation (5),

$$Specificity = \frac{TN}{FN + TP'} \tag{5}$$

describing the true negative rate, i.e. the probability that a non-depressive episode is classified rightly.

The PPV value can be defined by Equation (6), being the probability that a new episode of a person suffering from depression is classified as a depressive episode

$$PPV = \frac{Sensitivity \* Prevalence}{Sensitivity \* Prevalence + (1 - Specificity) \* (1 - Prevalence)},\tag{6}$$

where *Prevalence* is the percentage of observations with a condition, in this case depressive episodes.

The NPV value can be defined by Equation (7), being the probability of a episode with absence of depression is classified as negative.

$$NPV = \frac{Specificity \ast (1 - Prevalence)}{(1 - Sensitivity) \ast Prevalence + Specificity \ast (1 - Prevalence)},\tag{7}$$

*Diagnostics* **2020**, *10*, 162

Finally, the accuracy can be calculated with Equation (8), being the total probability that one episode is classified correctly.

$$Accuracy = \frac{TruePositive + TrueNegative}{TruePositive + TrueNegative + FalsePositive + FalseNegative}^{\prime} \tag{8}$$

## **3. Experiments and Results**

In Figure 2 is presented a comparison of the motor activity between a control and a condition in different hours of the day. In the differences can be observed that every hour activity of control and condition shows different patterns. From these data, a segmentation is applied to form data intervals containing the information of one hour time lapses. The structure of the data for every observation is contained by 61 columns; one column for the monitored hour and one column for each minute (60 columns) of motor activity. This segmentation allowed the classification of depressive episodes per hour.

**Figure 2.** Comparison of motor activity in different hours of the day between a control an a condition.

From Figure 2 they can also be distinguished different patterns on the activity of control and condition subjects in different moments of the day. In Figure 2a, at 5 p.m., the control subject signal collected presents higher levels of activity in contrast with the depressive patient. This can be the most expected conduct of a patient with depression. Nevertheless, as the day ends the signal of both, control and condition, starts to change as shown in Figure 2b–d, depressive activity containing higher values.

Therefore, based on this, the data is treated in three different sets, each one corresponding to different moments of the day. One set corresponding the day, one to the night and one to the full day.

Then, for the next stage a feature selection is proposed. The results for each dataset are shown in Table 4. Accuracy is the metric used to evaluate the performance of the models constructed by the FS approach including different number of features (two, four, five, six, seven, eight, nine and ten).

In case of Night Data and Full Day Data datasets, higher accuracy is achieved with nine-features model in classification of depressive and non-depressive episodes . Day data best model is comprised by 8 features, however, the difference with nine-features model is less than 0.1 percent.


**Table 4.** Forward Selection results for the night, day and full day data.

After the feature selection with FS approach, validation step is done in two steps. Firstly, classification is done with the best nine-features model for each dataset, mentioned in Table 5, even when Day Data has higher accuracy with 8 features, best nine features is selected to compare the performance in same circunstances with the other two datasets. The results of this classification are shown in Table 6, described as Best 9 Features Day, Best 9 Features Full Day and Best 9 Features Night. In addition to accuracy, which was used in FS step, sensitivity and specificity were calculated in this validation to give a wide view of the performance of the models.

Secondly, classification is performed using the nine-features set of the Best 9 Features Night, applied to the Day Data and the Full Day Data datasets, because this nine features model is the one which outperforms all other nine features models in all proposed metrics. This in order to evaluate the performance of a general model, i.e. a unique model for all the time of the day. The results of this classification are described as the Best Model Day, Best Model Full Day in Table 6.


**Table 5.** Best nine features model in every dataset; day, night and full day.

From this table can be observed that every model has a significant performance in the classification of depressive episodes. Sensitivity values oscillate from 98.24% to 99.37% and specificity range oscillates between 98.08% and 99.31%, establishing an almost perfect classification of depressive and non-depressive episodes.

The lowest, but still being good results, are those from the Day Data with nine features selected from the Night Data.


**Table 6.** RF results for datasets using the nine features models and the best model from the forward selection on night dataset.
