*3.3. Test Set Result Using Deep Learning*

To further assess the generalization performance of the model, it is trained on the flat course datasets of the first three subjects and tested on the two test datasets obtained from subject 4. The classification accuracies for test set-1 and test set-2 for all five sensor configurations are as shown in Table 17. Again, the sports biomechanics configuration with five sensors has maximum accuracy for both the test sets, reaffirming the aforementioned proposition that this configuration is the optimal set of sensors.


**Table 17.** Classification accuracies for test set-1 and test set-2 for the five different sensor configurations.

Tables 18 and 19 show the confusion matrices for test set-1 and test set-2 for the sports biomechanics configuration, respectively. In test set-1, all the techniques except the classical push-off (P-Off) and double poling (DP) have been classified almost perfectly. One hundred and seventy-two (out of total 241) push-off techniques have been incorrectly classified as double poling and 74 (out of total 295) double poling as push-off, leading to a low classification accuracy for the push-off and the double poling. These are the same two techniques that were confused by the model in the training set, and hence some misclassification in the test set was also expected. For test set-2, a very high overall accuracy of 95.1% is obtained for the sports biomechanics configuration of sensors. Thus, we achieve an overall mean accuracy of 91.15% on the test set of skier 4.

**Table 18.** Confusion matrix for the test set-1, in which the test subject performs one of the techniques of one of the XC-skiing styles repeatedly on a flat course at a time for the sports biomechanics configuration of sensors.


**Table 19.** Confusion matrix for test set-2 in which the test subject performs all skating techniques on a natural course simultaneously for the sports biomechanics configuration of sensors.


It should be emphasized that the overall mean accuracy when a leave-one-out type of testing is performed over the first three skiers is ~80%, which increases to ~91.1% when testing is performed over skier 4 using the same deep learning model. This is due to the fact that the deep learning model tested over skier 4 has been trained over the data of three skiers whereas the same model when tested over each of the first three skiers has been trained over the data of two skiers in a leave-one-out fashion. Thus, the generalization accuracy of the proposed model increases as the size of the training dataset is

increased. These results provide strong evidence in favor of our hypothesis that the accuracy of the deep learning model increases as the training datasets become larger.
