1. Introduction
Injuries are commonplace in professional soccer. According to a recent study [
1], the overall incidence of injuries in elite male soccer players ranges from 2.5 to 9.4 injuries per 1000 h of exposure. These authors also showed that the risk of injury is higher during matches than training sessions. One of the last epidemiological studies highlighting the increase of injuries over the past 16 years, has emphasized that muscle incidents were the main cause [
2]. Injuries being ubiquitous in this type of complex sport [
3] there are several risk factors such as the number of played matches, the accumulation of fatigue induced by the workload during and following training sessions, etc. Within this context, non-contact injuries are often regarded as preventable and linked to internal and external risk factors related to workload [
4]. It is therefore essential to quantify properly, over time, the training and competitive match workloads for any injury prediction approach in soccer. In addition, the total training load tends to increase along with the annual performance objectives. Therefore, monitoring the internal load experienced by the player as the combination of the physiological (heart rate measurements [
5]) and psychological (perception questionnaires [
6]) stresses and the external load (i.e., the mechanical work completed by the player) during both training and competition is of fundamental importance to allow the individualization of training activities [
7] as well as the identification of potential injury risk at the individual player level.
Many studies have already examined soccer activity. Randers et al. [
8] indicated that analytical tools such as video and wearable technology like Global Positioning System (GPS) devices and inertial sensors can provide accurate mechanical data about players activities both during training and in competition. Several important performance-related features have been highlighted, such as distances travelled at different speeds, accelerations, decelerations and maximum speed [
9]. For instance, the average distance travelled in matches by elite soccer players is of 9 to 12 km [
9,
10]. Sprinting in particular is often considered as a major component of performance, but ultimately it only represents 10% of the total distance covered during matches [
11]. These various metrics among others (e.g., acceleration and deceleration ranges, changes of direction) are regularly used to quantify external load. Both high and low external loads lead to injury risk, with suggestions that there may be an optimum load threshold for individuals [
12]. Besides objective physical tests, it is also possible to use subjective measures to predict injury risk. The session Rating of Perceived Exertion (RPE) has been used for injury risk estimation [
13,
14,
15]. Indeed, recent research in elite soccer recording contact and non-contact injuries has identified a link between internal workload (using session RPE) and injury incidence [
16] while no relationship between internal load and non-contact injuries was observed in other studies [
14].
The large amount of features recorded from the assessment of external load (GPS and inertial sensors) and internal load (associated with subjective well-being questionnaires and RPE) can be used together in order to better capture the relationship between both internal and external loads [
17] and predict in turn players injury. However, the more massive collected data, the more complex their managing. It is now acknowledged that machine learning methods applied to sport can provide accurate diagnostic and decision tools for training management and injury risk assessment but are not yet widely used in the latest scientific studies (see for a review Claudino et al. [
18]). One of the first investigations that tried to predict non-contact injuries in team-sports using machine-learning methods was conducted by Rossi et al. [
19]. Starting from the observation that injury risk assessment by applying the so-called acute:chronic workload ratio (e.g., used in Raya-Gonzaled et al. [
14]) led to inaccurate and poor prediction abilities, Rossi et al. [
19] proposed a multi-dimensional approach to injury prediction in professional soccer based on external load data collected through GPS measurements. For that purpose, they trained decision trees that predict whether or not a player is likely to get injured in the next match or training session. Such non-linear models applied only to external training load showed better performance metrics than traditional statistical methods for predicting injury risk [
19]. However these performances are far from being optimal for the prediction of injuries, i.e., 50% of precision and 80% of recall. Altogether, it appears that, to our knowledge, the scientific literature remains very scarce on the problem of injury prediction in elite soccer from internal and external loads together in a multi-dimensional non-linear machine learning based model.
Therefore, given the amount of data collected in modern elite soccer regarding both external and internal loads, it’s nowadays relevant to apply machine-learning methods to a pre-established set of variables in order to provide useful information for professional coaching. We put forward the main hypothesis that using such type machine-learning methods would be able to inform us with good prediction performance about injury risks over two horizons (one week and one month) by considering the combined evolution of both internal and external training loads. The results of the present study, as being the first one coupling the two types of training loads, could guide the programming and individualization of physical training with the aim of controlling and thus reducing the risk of injury. The evaluation of the proposed approach in this study is done in two steps. First, combining external and internal loads features is proposed to predict injuries with better performances as compared to past studies using only external load. Second, some classification algorithms that perform best on these features are selected. For this aim, various standard machine learning algorithms are compared using standard evaluation metrics such as accuracy, precision, recall and the area under the receiver operator characteristic curve (AUC). In particular, tree-based algorithms with an important interpretation aspect are privileged as they can help to identify and understand how GPS and questionnaires variables impact on injury risk.
2. Materials and Methods
2.1. Procedures and Data Collection
Forty players (mean ± SD; age 29.4 ± 5.8 years; height 175.3 ± 5.2 cm; body mass 76.5 ± 8.2 kg) classified from all offensive and defensive position groups (9 central defenders, 8 fullbacks, 10 central midfielders, 6 wide midfielders, 7 forwards) from the same elite soccer club competing in the French Ligue 2 participated in one full-season (2017/2018) data collection. The study was conducted according to the requirements of the Declaration of Helsinki. Participants gave their written informed consent to participate in the study. Approval for the study was obtained from the Club as player’s data were routinely collected throughout the season.
The training workload, perceptive well-being questionnaires and injury data were monitored over the pre-season period and during the entire competitive period from June 2017 to May 2018, taking into account the different breaks between these periods, the international truces and the winter truce. A total of 245 training sessions, 38 Domino’s Ligue 2 matches, 2 Coupe de la Ligue matches and 3 Coupe de France matches were recorded and analyzed. Altogether, the average recording time in training and match was 68 ± 24 min and 105 ± 11 min, respectively. The average distance covered by all players for both training session and match was 4817 ± 1965 m and 7694 ± 1527 m, respectively; and the average duration was 65 ± 13 min per training session and 78 ± 16 min per match.
During those periods, 142 injuries were inferred from the training notes containing the list of injured players for each training session. The injuries concerned 33 different players.
Figure 1 represents the number of injuries per players: 12 players were injured only once, 5 players were injured 6 times and 1 player had 12 injuries. It is important to note that the real injury times and reasons were not known. It was alleged that when a player was referred as injured in a training session, it was in fact at the last training session that his injury really occurred. The injury labels contain therefore some uncertainty that was not taken into account in this study.
Various types of training load features (see
Table 1) regarding a professional soccer club were collected from 40 players during official competitive matches, pre-season preparation matches, before, during and after training sessions. A first set of features concerned the player’s activity using a GPS tracking system. The GPS system allows real-time player tracking and an early a posteriori analysis for coaching staff. This first set of features reflects the external training load, i.e., the objective physical work performed by the player. The player’s physical activity during each training session and match was measured using a portable 10 Hz GPS system (Optimeye S5, Catapult Innovations, Melbourne, Australia) integrated with a 100 Hz triaxial accelerometer and a gyroscope. The accelerometer and gyroscope components combined with 10 Hz GPS systems have shown acceptable levels of reliability and validity in team sports for distance and high-speed distance-based metrics [
20,
21]. Four main external load features were measured: maximum speed, total distance covered, and number of accelerations and decelerations. Based on the dedicated literature [
22,
23] the following external training load features were retained: the total distance travelled in each specific speed zone (0–1 km/h, 0–6 km/h, 6–15 km/h, 15–20 km/h, 20–25km/h, > 25 km/h) and the PlayerLoad TM (athlete’s mechanical fatigue index according to Barrett, Midgley, and Lovell [
24]), which is a modified vector magnitude expressed as the square root of the sum of the squared instantaneous rated of change in acceleration in each of the three planes and divided by 100.
2.2. Predicting Injuries
This section presents our approach to predict injuries based on a dataset containing both external load and internal load features. As external load features (
) have already shown good results for injury prediction in soccer [
19], this study aims to unveil the predictive power of internal load features (questionnaires) relatively to external load ones. Several classifiers were optimised and compared in terms of predictive performance. Two prediction perspectives (horizons) were considered in this study: injury at 1 week and injury at 1 month.
The models thus constructed can therefore serve as an alert for any new training session for which the model would predict an injury and can be used as an aid to training planning and adjustment. Moreover, the models interpretation can provide knowledge for expert in order to have a better understanding of when, how and why injuries happen.
2.3. Data Pre-Processing and Evaluation Protocol
Imputation by mean (for numerical variables) and frequency (for categorical variables) was performed upstream the model comparison was made. Categorical variables were transformed into binary dummy ones in order to be handled by all models.
Once the dataset was built, all models hyper-parameters were tuned using a Bayesian optimisation procedure (python package scikit-optimise) according to the different evaluation metrics. Since this step was done for models tuning upstream some comparisons between models behaviors and features sets and not for strict model selection with the aim of being directly used for some new unlabelled data, bayesian optimisation was performed before the main experiments and not included inside our evaluation protocol. The values of the tuned hyper-parameters are given in
Appendix A (see
Table A1 and
Table A2).
Finally, the models were evaluated by 10-fold cross-validations using 4 measures of predictive performance (see
Table 2) according to the two predictive horizons previously mentioned (1 week and 1-month). This process was repeated 10 times to check the stability of the model’s performances.
2.4. Predictive Models
The learning models considered in this study are the following:
K-Nearest Neighbours (KNN)
Linear Discriminant Analysis (LDA) [
25,
26].
Logistic regression (logit)
Ridge classifier (Ridge)
Gaussian Naive Bayes classifier (GNB) [
27,
28].
Classification tree (tree) [
29].
Random forest (forest) [
30].
Support Vector Machine (SVM) [
31].
Multi-Layer Perceptron (MLP) [
32,
33,
34].
eXtreme Gradient Boosting (XGB)
KNN classifiers are very simple to compute but have the main drawbacks of involving high computation times for large data-sets and to be hard to interpret since in distances computation between examples, no explicit feature selection or weighting can be directly computed. Classification trees are basic classifiers which can be used in non-linear contexts. They are often used for their graphical outputs which are easily interpretable and provide visualisation of multi-dimensional features impact on class variables. In our context, such tools could help experts to gain knowledge about the relation between training loads and injury risk. They were compared to different generalised linear classifiers which are usually categorized as generative or discriminative models [
35]. Naive Bayes classifier and LDA were used as standard generative models, logit, Ridge, MLP and SVM as discriminative ones. In Rossi et al. [
19], the authors found that classification tree had higher predictive performance than other models (including random forest), but since ensemble models are usually more accurate than simple trees, forest and XGB models were also included in this study. Moreover, all tree-based classifiers (tree, forest and XGB) provide features weights which are precious in terms of models interpretation.
Different sets of attributes (see
Table 3) were considered in order to highlight the potential predictability of injuries levels. First, only the number of past injuries was used as predictor of future injury, then personal features (age, height, weight and
) were added to the learning data. The
and questionnaire data were first separately considered (in addition to past injuries and personal features) and finally the largest set of variables included all together the different input variables (see
Table 1).
All models were compared to a baseline approach (B) which consists in predicting systematically the most frequent class (e.g., if there is 75% no injury and 25% injury, inNode will systematically predict no injury).
All experiments were performed on Python with the following libraries: pandas, xgboost, xgboost, matplotlib, IPython, pydotplus and performance results were plotted with the ggplot2 package of R.
4. Discussion
In view of the overall results of this study, some notable facts should be noted. First, for 1-week injury prediction, questionnaire (internal load features) data are more accurate than GPS (external load features) ones, which even tend to deteriorate injury prediction when included in the learning data. For 1-month injury prediction, the classifiers learnt from GPS or questionnaire data show roughly the same performance levels, the best one being usually reached when combining GPS and questionnaire data. In terms of interpretation, decision trees graphs and features importance weights computation have highlighted a specific player profile at high injury risk and some specific features involved in precision and recall optimisation.
To the best of our knowledge, the work of Rossi et al. [
19] is the single that used a non-linear classifier, decision tree, in a multi-dimensional context to predict injuries in elite soccer. Thus, we decided to focus part of our discussion to this study. For comparison, the decision trees used in the study by Rossi et al. [
19] detected about 80% of the injuries in the sample analyzed with an accuracy of approximately 50% (with external load features). As a result, the algorithm used in our machine learning approach would be able to classify more accurately the so-called at-risk players regarding the past occurrence of injuries and thus be able to continue to perform without being disturbed by “false alarms”. The accuracy of this tree, particularly at 1 week, which differs from Rossi et al. [
19], is made possible by linking GPS data and subjective questionnaires throughout the classifiers, which justifies the contribution of this work to the current literature linking data science and sport science [
17,
19,
36].
In the present study, we showed that subjective variables have a very high predictive/explanatory potential (compared to objective variables) but they are more expensive, i.e., having all players completing questionnaires before and after training can be complicated given their tight schedules and their willingness. Nevertheless, professional teams that can not outfit players with GPS sensors for practical or economic reasons should consider use questionnaires in order to detect players at high injury risk [
37,
38].
Another point that validates the choice of tree-based classifiers is that those models naturally provide feature importance weights that can help coaches to monitor some specific indicators and be used as useful decision support tools for training optimization. It should be noted that in this case, subjective questionnaires are very valuable especially for short-term prediction even when they are completed by only some players at some training sessions. Except for 1-week injury risk precision, ensemble models seem preferable compared to single trees even if they do not provide single tree graphs. In addition, the interest of this study lies in the coupling of the machine learning methods and the variations of the training load (internal and external). It can be noticed that even when both types of features (GPS and questionnaires) are used as inputs, the most important and sensitive features are almost always associated with subjective variables. It can therefore be hypothesized that with these data and this sample in this particular situation, internal load would be a determining factor in the prediction of injury. In other words, it would be essential for each coach to pay particular attention to the athletes’ feelings before and after training sessions in order to prevent injuries from occurring.
To conclude, the fact that questionnaires features can replace GPS ones and even increase predictive performance by doing so suggests that a part of the information related to external load is included in the internal load’s one. While an individual may perform the same external load, their ability to respond to this output (internal load) may differ [
17,
39]. Utilizing both measures provides a comprehensive view on whether an individual is in a state of “readiness” and able to tolerate high loads, or in a state “fatigue” and potentially at risk of injury or decreased performance. Internal load being reflected by the external load provides additional information of the players that the external load could not take into consideration. In our study, we highlighted that several subjective questionnaires reflect likely different aspects of the training load related to the stress that the players may support. For instance, monitoring pre-training perceived fatigue, mood, pain, shape and sleep for each player may offer an indication on the quality of the external output that might be produced prior to a session and provides coaches with the ability to make adjustments if warranted. Monitoring is not limited to either subjective or objective measures, instead they can be used to complement each other. This is consistent with recommendations [
38]. To sump up, the potential efficacy of subjective measures for soccer player monitoring has been established, however optimal implementation practices are yet to be determined.
Limitation and Future Directions
However, in a study with preliminary data, some limitations exist, but are in fact potential sources of improvement. As a result, a larger sample size, extending to several teams with different training strategies over multiple seasons, would allow more general conclusions to be drawn about injury prediction. In addition, the GPS data and questionnaires collection and imputations methods can also be improved. With regard to the completed questionnaires, the influence of greater diligence in the use of these questionnaires by players would be fundamental to observe. As for GPS data, they are present in an average form compared to their initial acquisition frequency of 10 Hz. In the race for performance, it would be interesting to observe the consequences of using all the raw values acquired at this frequency. Also, due to the differences between players, individualization could be considered in regards to the variables relating to external load (data extracted from GPS), by computing speed and acceleration thresholds specific to each player beyond which injuries is likely to occur. By doing so, the predictive potential of GPS variables could be greatly increased, and could have an influence on the training strategies implemented by coaches. Since not-injured players are much easier to find in datasets and injury is not a controllable factor, data augmentation could be used in order to simulate more injury examples from the real ones. Those artificial examples would probably improve the predictive performances of classifiers.
5. Conclusions
The objective of this study was to address the issue of using various machine learning methods for injury prediction from the athlete’s internal and external loads conjointly. The results of this study show that depending on the complexity of the predictive model, the different predictive metrics values for injury prediction are close to 100%, especially with a 1-month time horizon. In addition, it appears that the subjective variables (i.e., internal load) of the pre-session questionnaire (such as sleep quality, fatigue, shape, mood) as well as post-session questionnaire (satisfaction and pleasure) and RPE are found to be determining factors in the occurrence of injuries. Overall, our findings provide further justification for the implementation of a team-wide monitoring strategy of internal load in elite soccer players.
Finally, although the preliminary results of this paper appear encouraging and relevant, future research with a larger sample size by involving several teams from the same championship can provide sufficient data to move from specific conclusions to general ones about machine learning methods.