1. Introduction
Increasing safety is one of the most crucial priorities in aviation transport, leading the aeronautical authorities and industries to increase aircraft reliability constantly, among other initiatives. Aviation safety is a very complex area that can be improved not only by taking into account the reliability of an aircraft and its systems but also considering and monitoring the crews’ performance and everyone involved in the aviation system. Seventy-five percent of aircraft accidents are related to human errors that, in most cases, derive from mental workload and fatigue of pilots and aviation operators in general [
1]. Human error can be a consequence of design flaws, inadequate training, incorrect procedures, old manuals among other technical, environmental and organizational factors [
2]. For example, the increasing technology in aircraft systems that, on the one hand, supports the pilots in reading, interpreting, and controlling flight variables, helps to carry out their tasks more efficiently and accurately. On the other hand, it can deeply increase the mental effort needed to manage the huge amount of information displayed. The drawback of the increasing advance in avionics technology along with job responsibilities, irregular sleep and the stressful shifts can consequently cause fatigue and work overload problems [
1,
3,
4,
5]. On this basis, in recent decades, the accurate comprehension of psychological and physiological characteristics associated with human errors have become of significant interest. The awareness of the pilots’ psycho-physical state and the real time monitoring of the mental workload during simulator training activity or in real flight can improve the efficiency of a crew and the safety of aviation operations. High and low workload conditions for pilots can affect flight performance and predispose crews for making errors [
6,
7,
8].
Different methods are used to evaluate the mental workload that can be grouped into three main categories: task-related performance measurements, subjective assessments, and objective evaluation [
9]. The first category is directly related to the definition of performance, provided by Paas et al. [
10]. Performance can be roughly defined as the effectiveness in accomplishing a particular task, and thus the workload can be evaluated by analyzing the decrease in performance by the pilots.
The subjective assessment is based on standardized evaluations especially developed to make comparisons feasible; questionnaires belong to this category. However, since the responses are subjective, the results may differ depending on the personal interpretation of the evaluated criteria. For example, newly formed pilots experience uneven workload levels for the same task compared to more experienced pilots. On the other hand, physiology data are obtained by biometric sensors and are based on reliable information for evaluating and comparing workload [
11,
12,
13].
Despite the differences, all methods of analysis should be employed in the evaluation of workload. The subjective assessment contains information related to the pilot’s perception that only can be investigated by this method. Simplicity, low cost, and in some cases, lack of invasive procedures [
14], represent reasons that motivate the use of subjective measurements [
15,
16]. The National Aeronautics and Space Administration Task Load Index (NASA-TLX) is the commonly used instrument for subjective workload measurement due to its sensitivity and reliability [
2,
17,
18,
19]. There are two ways for a pilot to evaluate a stressful situation: (i) by a scale, usually numeric; (ii) by comparison between different items. The first method requires the pilot to attribute a value to the accomplished task, while in the comparison between items, the pilot chooses the type of demands that were perceived as most prominent.
For the objective measurement, the evaluation of workload can be obtained through the analysis of physiological signals directly related to the body’s natural response under stress situations. Factors such as Heart Rate Variability (HRV), body temperature, cerebral activity and ocular movement are examples of measurements to analyze the workload level [
8,
20,
21]. Due to the technological advances, the available devices today are smaller, portable, more comfortable and more reliable [
22]. Such advantages are more relevant in a cramped workspace, as the cockpit. A compact, wearable device with no connecting cables is mandatory for an in-flight activity analysis to not obstacle the use of the cockpit instruments. Moreover, distributed sensors potentially allow a more detailed analysis of each action and the identification of the main factors that contribute to stress. On the other hand, these measurements are not faultless. Some measurements are sensitive to the activity that is being carried out, especially in the case of multiple tasks, so for the evaluation of the workload more than one methodology should be considered [
23].
This work presents a study on the relationship between the collected HRV biometric data of the pilots and the subjective data collected through the NASA-TLX questionnaire. In particular, the cardiac rhythm is used to determine the body’s natural response to stressful situations, while the subjective measurements are used to analyze how the pilot perceives these workloads. Moreover, an index is introduced with the aim of quantifying the performance of pilots in executing the requested tasks. The motivation lies in the fact that a reliable correlation of objective and subjective data can aid in the development of tools for real-time measurement of fatigue and workload, enabling more effective prevention of incidents and accidents.
5. Correlation Analysis
The previous sections analyzed the different objective and subjective parameters in terms of HRV, MEI, and OW. The considerations made in that subsections suggest that the association between maneuver performance, subjective perception of workload and its objective estimation during flight is not obvious. For this reason a correlation analysis between the variables was carried out in the present section. The primary purpose was to identify the potential presence of a relationship between the NASA-TLX and the other objective parameters. Thus, the results obtained for all the pairs are presented below, employing a correlation matrix. In order to assess the presence of a correlation, two different methods of analysis were used: the Spearman’s rho [
41] and the Randomized Dependence Coefficient (RDC) [
42].
For the Spearman’s rho methods, the correlation matrix provides values between −1 and +1 [
43]. If the calculated coefficient is 0, there is no relationship. A correlation of −1 or +1 means that there is a perfect correlation, inversely or directly proportional. Values of rho equal to 0.10, 0.30, 0.50 are representative of small, medium, or large effects respectively [
44]. In Spearman’s rho case the significance value p has been calculated. If the p-value is low, it suggests the presence of a monotonic relationship between the two variables.
On the other hand, the RDC was used to search for non-linear correlations. The correlation matrix provides positive values between 0 and +1 [
42]. Values close to +1 suggest the presence of an association pattern, but no gives information about type or direction. In the present work, to compute the RDC, a convergence analysis is performed over 1000 runs, with a number
k of non-linear projections of the copula that varies in the range 1 to 20. A stable result is obtained for all the trials correlation with
, for more details about the method the interested reader is referred to literature [
42].
The standard method of normalization by z-score [
45] was used for the tests.
Correlation matrices are useful for examining the presence of a relationship between two or more continuous variables. Thus,
Table 3 and
Table 4 list, for each pair, the computed value under each considered method for the TO and LA segment, respectively. Analysis of OW data in
Table 3 shows a low correlation value between OW and HRV indexes. A Spearman’s rho correlation coefficient equal to −0.26 are obtained between NASA-TLX and LF/HF but it is not considered significant since
p < 0.05. Similar results are obtained by Hsu et al. [
37]. However, the RDC method reports a value equal to 0.66. This leads to consider the presence of a non-linear correlation between such variables. Moreover, RDC values greater than 0.7 for the pair (OW, SD1) and (OW, SDNN) are also obtained underlining non-linear relationships between HRV indexes and NASA-TLX. Analyzing the correlation between HRV indexes, again for the TO phase, it appears that the relationships between SD1 and SDNN is the most significant, with a value
p < 0.001. Also the correlation among LF/HF, SD1 and SDNN can be considered large with
p < 0.01. The presence of such correlation is confirmed by both Spearman’s rho and RDC coefficients. Last, the Spearman’s rho correlation coefficient values of the pairs with MEI are very low and not significant while the RDC continues to suggests the presence of a non-linear correlation even though its value is lower than the HRV ones.
In
Table 4 the data concerning the LA segment are listed. Generally speaking, the results show lower values than the data listed in
Table 3 exception made for RDC in the pair (OW, MEI) where higher value occurs, equal to 0.88. Also, for the LA phase, a linear correlation is found for the pair (SD1, SDNN) with values close to 1 and significance
. Analogous results were found by Hoshi et al. [
46] in the pairs (LF/HF, SD1) and (SDNN, SD1), with an inverse relationship in (LF/HF, SD1) pairs. Thus, similar conclusions to the ones reported analyzing TO data can be drawn in the present LA case: the RDC suggests the presence of a non-linear relationships among OW, HRV indexes and MEI. This consideration is also supported by the fact that a low value of Spearman’s rank correlation coefficient just indicates that there is no tendency for one variable to either increase or decrease monotonically when the other variable increases.
Previous considerations motivate the search for a nonlinear transformation of raw data that, by applying mathematical functions that changes the variables’ measurement scales, improve the correlation between the data [
47]. After the data transformation, the correlation no longer represents a linear or monotonic relationship on the original measurement scales; nevertheless, the Spearman’s rank correlation can still give an insight on the existing relationship [
47]. The search of the optimal transformation is out of the scope of the present work; however, as a representative case, a possible transformation function, that has been obtained by a trial and error approach, is here reported for the pair (OW, LF/HF) for both TO and LA phases. More in details, the transformation function is applied to each point in the LF/HF data sets, i.e., each data point is replaced with the following transformed value
where the star is used to label the transformed data,
relates the variable to the pilot index while
, as an example.
Table 5 gives the obtained results putting into evidence the influence of the transformation function through the values of its coefficients
a and
b.
As it appears from the results, it is possible to select some nonlinear transformation functions that improve the correlation between variables: a medium-to-large positive monotonic correlation between the index and the subjective NASA TLX index is obtained allowing infering an increase of pilot’s workload from the measurement of a cardiac signal such as the low to high HRV frequency ratio.
6. Conclusions
In this work, the pilots’ workload level was analyzed during the flight mission performed by using a Full Flight Simulator. Workload indexes were studied during two different flight segments presenting high mental demand, namely the take-off and climb phase and the approach and landing maneuver. A performance index named MEI was also proposed to quantify the pilots’ error in tracking the requested path. The results were analyzed considering both their time variation and their average values computed over the flight segments. The objective time-averaged data and the objective pilot-averaged data showed results in agreement with the reference literature in terms of workload levels. Qualitative evaluations were carried out based on comparisons among the different indexes. From those comparisons between the subjective and objective time-averaged measurements, it was possible to observe a higher overall workload for the approach and landing phase with respect to the take-off phase. For part of the pilots’ sample, subjective workload assessment values indicated a similar perception between the flight phases. Moreover, based on overall workload values computed from the NASA-TLX questionnaires, some pilots pointed out a higher workload during the TO phase even if their HRV indexes exhibited the opposite trend. Quantitative analysis was also performed employing statistical correlation approaches and both monotonic and nonlinear relationships were found for some indexes based on the analyzed sample.
The main limitation of the present work is related to the sample size. A larger sample will of course increase the statistical significance of the results. Another limitation regards the application of the RDC method. In fact, even if such method offers the advantage of identifying the presence of a non-linear relationship, at the same time, it does not provide information on the order or direction of such relationship. This leads to a possible future work that regards the identification of the nonlinear transformation functions by means, for instance, of heuristic optimization approaches.
Eventually, results showed that, in the considered aviation framework, it is not possible to evaluate the pilots’ workload level just by means of subjective measurements. In addition, results have shown the possibility and advantages of HRV based workload measurements during flight by using biometric sensors that can be integrated into the cockpit environment. Thus, the identification of the nonlinear transformation between biometric data and the subjective workload level can turn useful to set an algorithm for the online workload monitoring that can lead to an overall improvement of flight safety by giving quasi-real time information about the workload perceived by the pilot during flight.
Other future developments foresee the analysis of other HRV indices, both in time and frequency domain, during different flight segments characterized by workload levels that range from very low to critical. Furthermore, it is possible to extend the research to complementary work environments such as air traffic control operators or unmanned aircraft pilots.