**4. Discussion**

We developed a unified CNN-LSTM based deep learning model for classifying both the classical and skating style techniques simultaneously using the gyroscope data. Even though our model was trained only on the outdoor flat course data, it achieved an accuracy of 87.2% and 95.1% on the flat and natural course test sets, respectively, leading to an overall mean accuracy of 91.15%, using the optimal gyroscope sensor configuration (five sensors: both hands, both feet, and the pelvis). This presents strong evidence in favor of using only the flat course data for training the model and using it to classify XC-skiing techniques both on flat and natural courses, thus eliminating the need for collecting natural

course data for training, which is extremely difficult to procure. To the best of our knowledge, we are the first ones to propose a unified deep learning model for classifying classical and skating techniques simultaneously with high accuracy. A KNN model with manually designed features for the skiing technique classification on the same datasets was further used as a benchmark for evaluating the performance of our unified deep learning model. The KNN algorithm was chosen since it is preferable due to less error rates in classifying the classical style and skating style simultaneously when compared with a Markov model according to an earlier study [20]. The comparison between the accuracies obtained from these two approaches for the validation dataset (Figure 4) and two test sets (Table 22) clearly showed that deep learning is more effective and has higher classification accuracy than the KNN. This result is in line with the findings from a recent study[23], which used a 3D accelerometer to classify only two free skating style techniques (gear 2, gear 3) and reported that the deep learning had the highest accuracy among all investigated classification models.

**Figure 4.** Comparison between the machine learning (KNN) and deep learning methods on classification accuracies for the validation dataset for the five different sensor configurations.


**Table 22.** Comparison between the machine learning (ML-KNN) and deep learning (DL) methods on classification accuracies for test set-1 and test set-2 for the five different sensor configurations.

Even though the developed deep learning model achieved high overall classification accuracy for eight skiing techniques simultaneously, in-depth analysis of the confusion matrices showed that most incorrect classifications occurred for classical push-off and double poling techniques. The classical push-off and double poling techniques have identical motions of upper body and pelvis, the only differences between these two techniques are that a classical push-off begins with a slight jump for the propulsive force and the body movements are faster and exaggerated as compared to double poling. Such exceedingly similar physiological and biological characteristics are the cause for the confusion of the model and lead to misclassifications. In addition, some misclassifications occurred for V2A,

which is a typical technique used in level terrain up to moderate uphill inclines or during transitions between V2 and V1. In V2A skate, the timing sequence for pole push is the same as V2 skate, but it employs one double pole with every second skate, which is different from one double pole with every skate in V2 skate**.** V1 skate is an uphill technique, which employs an asymmetrical poling with every second skate [3,24]. Transitions between similar techniques, V2A and V2, V2A and V1 lead to high classifications errors on V2A. This finding is consistent with the result from a previous study [19].

In order to provide empirical evidence to researchers to base their future studies on the optimal sensor configuration for analysis of XC-skiing techniques, we compared classification accuracies among five different combinations of sensors on the training, validation, and test datasets. The five combinations include whole body with 17 sensors, upper body with 11 sensors, lower body with 7 sensors, sports biomechanics configuration with 5 sensors, and the pelvis configuration with 1 sensor only. Collective results (Tables 10, 20 and 22) show that the sports biomechanics configuration (both hands, both feet, and the pelvis sensors) can achieve a very similar accuracy as the whole body with 17 sensors. The classification accuracy from the sport biomechanics configuration is much higher than the accuracies from the pelvis, the upper, and the lower body sensors. A low classification accuracy for the pelvis sensor indicates that this sensor alone is not sufficient to capture the complex motions of all the body segments during the XC-skiing. Moderate, but not high, classification accuracies for the upper and lower body configuration of sensors are not surprising because in both the configurations, the data of only half of the body segments is available for training the model. In the sports biomechanics configuration, only five sensors, those on the hands, on the feet, and the pelvis, are used. Out of these five body segments, four body segments, both hands and feet, are at the extremes of the body where the motions of the segments are most exaggerated and vigorous, and the pelvis is close to the centre of the mass of the body, which represents an overall motion of the body segments. As the results while using all 17 sensors and five sensors in sports biomechanics configuration are very close to each other, we infer that the other 12 sensors are almost inconsequential and provide no additional information. Thus, the sports biomechanics configuration of sensors is the optimal set and future studies of XC-skiing classification can be based on the data obtained from this set with strong experimental proof.

Several previous studies have attempted to classify XC-skiing techniques by numerous hard rules or machine learning algorithms. Seeberg et al. [18] classified the classical XC-skiing techniques by deriving hard rules based on the data of 11 skiers and achieved an overall sensitivity of 99~100%. They, however, classified only the diagonal stride, double poling, and kick double poling techniques while leaving out push-off from the classification. Among the classical XC-skiing techniques, push-off and double poling are the only two techniques that are substantially misclassified by our algorithm, as is evident from the test set-1 confusion matrix in Table 18. One hundred and seventy-two (out of total 241) push-off techniques have been incorrectly classified as double poling and 74 (out of total 295) double poling as push-off. Moreover, they utilized six IMUs for classification of XC-skiing techniques and a total of 18 sensors were used, since each IMU contains one accelerometer, one gyroscope, and one magnetometer, whereas our model performs classification only with five gyroscope sensors. In addition, our model development does not require expert domain knowledge and a tedious process to derive the hard rules for classification. Rindal et al. [1] classified classical XC-skiing techniques by utilising two sensors, one accelerometer and one gyroscope, on the data of 10 participants and achieved an overall accuracy of 93.9% ± 3%. We achieved an overall mean accuracy of 91.1% by utilising data of four subjects and five gyroscope sensors. Our results rivals the results of [1] in terms of accuracy, but at an additional cost of three extra sensors. However, the data in [1] is a combination of data obtained from outdoor tracks and that obtained on a treadmill in the controlled environment of the laboratory whereas our data is obtained only from natural outdoor tracks. Additionally, they classified only the classical XC-skiing techniques whereas our model classifies both the classical as well as the skating techniques simultaneously. At the same time, our model shows considerable improvement in classification accuracy when the size of the training data is increased. Stoggl et al. [19] classified skating techniques by utilising a single accelerometer on the data of 11 skiers obtained on a

treadmill in the controlled environment of the laboratory, and achieved an accuracy of 86% ± 9% on the test set. As the accuracy achieved by our model is higher and our model has additional advantages in terms of performing classification of both classical and skating techniques simultaneously, we conclude that our model has higher potential of being deployed as a real time classification model for XC-skiing techniques.

Despite the inherent advantages in terms of automatic selection of features, high accuracy, and simultaneous classification of classical and skating XC-skiing techniques, this study suffers from certain limitations. First, due to practical constraints, such as the unavailability of the skiing tracks and tight training schedule of professional skiers, we only obtained the experimental data from a relatively small sample size (four professional skiers) for training, validating, and testing our models, further study should be carried out with a larger sample size for the verification of these results. Second, although the CNN-LSTM network promises high accuracy, the model is slow to train as compared to a model developed using a traditional machine learning algorithm due to the large size of training data that is fed to it. Traditional machine learning approaches rely on manually designed features, compact the raw data into a small number of features after pre-processing, and are much faster to train. Thus, there is a compromise between the time spent in data pre-processing in the case of traditional algorithms and training a deep learning model. However, by utilising computer systems with good software configurations, the deep learning model can be trained in a reasonable time to remain suitable for real time deployment. In this study, we assumed turning points in flat course data and descending and transition points in natural course data as noise and removed them manually by finding frames corresponding to them. These points, however, can be treated as dummy techniques and passed to the model, the study and analysis of which should be taken up as future research work. Last, but not the least, we utilized only the angular velocity due to clear cyclic patterns and for achieving a higher test time efficiency. The development of classification models using linear acceleration and magnetic fields is left as a future research work.
