1. Introduction
Alpine skiing is an interesting, competitive, and complex sports activity that has been a part of the Winter Olympics since the first event. Most studies in this area focused on performance analysis where scholars analyzed the biomechanics of skiers to find the main factors affecting alpine skiing performance such as turning techniques, aerodynamic drag, ground reaction force, or turn radius [
1,
2]. While skiers attempt to optimize these factors to enhance their performance, the risk of injuries will increase [
3]. Therefore, scholars developed an interest in related factors to injury risk [
4]. In [
5], authors studied the effect of ski sidecut on turning mechanics in the context of injury prevention. This motivated scholars to examine turns in detail as the fundamental part of alpine skiing [
4] and develop methods to detect turns [
6,
7].
The mentioned studies utilized several different sensors ranging from video to wearable sensors. Additionally, most of the analyses have been performed under controlled conditions in laboratories or through a limited number of gates. One potential solution to expand these studies is the use of inertial measurement units (IMU) to detect entire activities during a whole day of skiing where a skier performs several skiing techniques. Therefore, it will be possible to analyze each activity in detail.
In recent years, there has been an increasing interest in research on human activity recognition (HAR) due to advances in wearable and visual sensors [
8]. The goal of HAR is to classify incoming signals from human motions into different categories of activities such as human daily activities. Despite the wealth of analysis on human daily activities, the application of HAR to alpine skiing activities has not been researched extensively [
9]. Additionally, many scholars have employed various sensors from visual to commercial IMUs that are not suitable for daily use.
In [
10], scholars investigate finding the place on the skier’s body to attach an inertial sensor so that it is possible to collect the most informative signals. They concluded that the pelvis is the best place to locate a sensor. Although, there the other parts of the body which could be alternatives due to the similar results in their analysis. In [
7], the authors developed an algorithm to detect the starting point of a turn, which can be utilized for regular use. In another study on turn detection [
11], the use of gyroscopes was examined. The data collection for both of these studies was carried out under controlled conditions and still needs validation in the wild. These studies show that turn behavior is well represented in IMU signals which is a key to distinguishing alpine skiing activities. In [
12] they classified four popular turning styles, snowplow, snowplow-steering, drifting, and carving, using a global navigation satellite system (GNSS) and IMU. They analyzed a dataset of 2000 turns from 20 advanced skiers. In another study, Han et al. employed a motion sensor and piezo transducer to collect data from subjects. Then, they analyzed the collected data in a supervised manner to predict the status of a skier during winter sports such as alpine skiing and snowboarding [
13]. In [
14] three IMU sensors were attached to the skier’s chest and skis to collect data from several skiers on one slope. This dataset is analyzed using two long short-term memory (LSTM) networks for skiing activity recognition to detect left/right turn, left/right leg lift, ski orientation, and body position. Although the LSTM classification results show high accuracy, the model needs more validation in different locations under varied conditions. Additionally, they did not report the skill level of subjects in their study.
Our ultimate goal is to provide recreational alpine skiers with performance analysis insights about their skiing. However, in this study, the primary goal is to have an unobtrusive sensor setup feasible for daily use along with an algorithm that is able to distinguish between skiing and not skiing. Today, access to smartphones equipped with IMUs offers the possibility to detect different sorts of activities on the phone [
15]. In this work, we investigate the application of unsupervised machine learning in alpine skiing activity recognition of recreational alpine skiers using smartphone IMU. Often, HAR has been formulated as a supervised task [
15,
16]. Here, we study the use of unsupervised learning in distinguishing skiing activities from other activities. We prefer unsupervised learning over supervised learning since it is not dependent on large labeled samples [
17], which is not easy to gather in the case of alpine skiing. Additionally, in this study, we only need to find the beginning and the end of skiing activities, and we will not differentiate skiing techniques from each other. Moreover, depending on the level of expertise, every subject has a different skiing style, which may affect the supervised learning process negatively due to varying patterns generated by each skier. Finally, employing unsupervised learning increases scalability since we can add more data without any concern about labeling, which eases adding more subjects to the project without prior knowledge about the data labeling.
In the rest of the paper, we give an explanation of data collection in detail. Then, we go through data preprocessing, including orientation tracking, filtering, and feature engineering. The results section compares the results from different settings and algorithms. Finally, we summarize the experiment and present future works.
4. Discussion
The goal of the study was to detect alpine skiing activities via smartphone IMU in an unsupervised manner that is feasible for daily use. Our result shows that by locating a smartphone IMU in the skier’s pocket on the right side, it is possible to record informative signals to recognize alpine skiing activities using unsupervised learning.
Even though orientation tracking works pretty well in isolating gravity, there is still an issue with proper acceleration decomposition. This problem is referred to as rotation about the gravity vector (in our study Y-axis in the world frame). One possible solution to this obstacle is Yaw correction via employing the magnetometer. This enhancement is especially essential for more analysis where we take a closer look into each activity, classify them in different techniques, and avoid drift in speed estimation.
We applied unsupervised learning in our analysis because it gives the possibility to start our study with a small dataset of alpine skiing activities and increase the number of samples incrementally without more training. In contrast, supervised learning needs a large data set of labeled skiing activities which are not easy to collect. This means that we do not use the labeled data in the learning process. However, it is necessary for evaluation. This implies that the quality of data labeling has a direct impact on the assessment of each method,
Figure 7a, where a part of the activity is not labeled but is detected. In addition, we do not distinguish different alpine skiing techniques from each other, so we only need to find the beginning and the end of activities. Although the results show that our approach recognizes skiing activities from the rest of the activities with acceptable accuracy, there is still room for improvement.
If we take another look at the detected skiing activities, it is clear that the beginning and end of each skiing activity include some semi-skiing activities,
Figure 7b, which we should avoid to have an accurate detection pipeline. This issue is the effect of using fixed-size windows where a window covers the majority of activity and some semi-skiing activity. See similarities between a window of “mainly activity” and “activity” in
Figure 6. The other benefit of unsupervised learning is that we can automatically label recorded skiing activities from different recreational skiers into Skiing and Not-Skiing and then classify them in different skiing styles as future work. Additionally, using our mobile application, any skier can easily record their data. This unsupervised approach combined with the mobile application helps considerably in saving time and ease of data collection.
Some of the sessions are very long, which causes high accuracy even when some parts of the activities are not detected. In such cases, getting a very high accuracy value does not show that the algorithm works perfectly, while clustering metrics are more descriptive and show these differences. For example, in a session of more than one hour where there are seven skiing activities, our chosen model finds all the seven activities with an accuracy of 99.17, and GMM_PCA recognizes 8 activities with an accuracy of 98.12. While there is no significant difference between these accuracies, their clustering metrics vary considerably.
Figure 8 and
Table 6 explain these consequences more and show how overfitting affects clustering metrics. So, when the number of samples increases, NMI and ARI are more reliable for evaluation since, as metrics of the goodness of clustering, any overfitting and underfitting affect them negatively.
In our experiment, we tried to consider the highest complexity in the data collection. However, there is still a lack of female skiers. In the future, the proposed algorithm must be evaluated and validated through female skiers with various capabilities, so we ensure that this analysis can detect all the alpine skiing activities independent of users and their physical features. Additionally, we have only two novice skiers. Since novice skiers generate different patterns than more advanced skiers, this approach needs to be examined with more beginner skiers to confirm that it works similarly for skiers with any skill level. Moreover, our chosen model only detects three activities out of five from one of the novice skiers.
Figure 9 shows a comparison between two subjects, one expert and the other novice skier. As the figure implies the expert skier performs the skiing activity faster while generating a consistent pattern.
There are three points in the pipeline which are time-consuming and affect the response time. First of all, orientation tracking has to be applied to the entire input signal and is dependent on the number of samples. So, the length and frequency of the input will influence time consumption at this step. One uncomplicated solution for this issue is to sub-sample the input signal to 50 Hz, which is concluded to have enough information for high-frequency activities [
29]. Second, feature extraction is heavily dependent on the windowing strategy. Scholars concluded a window size between [2, 3, 4, 5] seconds is ideal for HAR applications [
30,
31] which are mainly low-speed activities. But, our analysis shows that short-size windows generate a higher number of samples in the feature space than larger window sizes, which takes longer as an issue of time consumption. Here, one solution is assigning larger window sizes where there is no significant difference in the accuracy. Finally, clustering methods are heavily dependent on the number of samples. As it is studied in [
32], KMeans works better on larger datasets. Therefore, it can always be an option where input is a large set of features.