Introduction
Helicopter Landing
Helicopter landing maneuvers comprise complex demands which include solving the conflict between the safety of an aircraft which is not inherently stable and the efficient completion of a mission (e.g. search and rescue). Modern glass cockpits consist of complex display systems (
Figure 1) so that information processing is characterized by high cognitive workload and increasing headdown activities by the pilot (Colvin, Dodhia, & Dismukes, 2005). Thus, there is a growing need for training effective scanning techniques since visual attention is the most crucial resource of pilots (European Aviation Safety Agency, EASA, 2010).
Eye Tracking in Training Simulation
Military aviation studies indicate that simulator training supported by eye tracking feedback can increase the performance of fighter pilots (Wetzel, Anderson, & Barelka, 1998) as well as helicopter pilots (Sullivan, Yang, Day, & Kennedy, 2011) because student pilot scanning techniques can be compared to those used by experts. This offers the chance to identify scanning errors and teach trainees correct scanning techniques by providing them with individual feedback. The principle has already been demonstrated for serious video gaming (
Shapiro & Raymond, 1989): Here, a correlation was found between efficient scanning techniques and performance. Two groups of gamers learned to use efficient scanning techniques and inefficient techniques respectively. The performance of the inefficient group with redundant eye movements and wrong scanning techniques was identical to the performance of an untrained control group. To enable effective training of the optimal scanning techniques of military pilots it is necessary for them to learn to reflect and consciously control their scanning techniques. This principle is well-known from biofeedback: If persons receive visual feedback on biological reactions, their awareness increases in this respect, enhancing the motivation to improve reactions and generating behavioral control (Janelle & Hatfield, 2008).
Scanning Techniques of Helicopter Pilots
In order to understand scanning techniques of pilots in helicopter landing, the basic principles of human information processing should be taken into account. In this context, the visual attention of pilots should be understood as an endogenously controlled process which, in combination with sufficient experience, enables the acquisition of relevant information, also from non-foveal vision (
Williams, 1995; Kasarskis, Stehwien, Hickox, Aretz, & Wickens, 2001).
Target Fixations. This is relevant particularly with regard to aircraft landings during which experienced pilots use special gaze patterns for tactical information acquisition. In our practical context of the Army Aviation School, scanning techniques in the cockpit were addressed pre-test in half-standardized interviews with flight instructors (N = 6). Following expert statements scanning techniques partly consist in gaze concentrations, so-called Target Fixations (TF), on objects or instruments, the duration of which is "longer than a regular gaze" and which are considered to be indicators of the tactical acquisition of information by experienced pilots. The flight instructors pointed out that we must differentiate between TF as used by experienced pilots vs. novices. Unintended TF are used more likely by student pilots who will thus overlook crucial flight situation parameters, particularly under high workload. This is explained by the student pilots’ visual field which is not fully developed or still needs to be trained which is why their parafoveal and peripheral perception will only improve after intensive practice. Pilots who are combat ready, on the other hand, absorb specific information using TF and maintain the desired flight path, for instance by determining the changes in the retinal projection of outside object complexes and maintaining an ideal approach angle via control inputs. Expert TF represent a desirable strategy at the right moment, profiting from a fully-developed parafoveal perception and a larger peripheral vision. Correspondingly,
Colvin et al. (
2005, p. 5) report empirical findings for civil pilots who also use intended concentrations of gaze:
“Scanning the outside world strongly favored looking straight ahead, with many fixations directed only a few degrees to either side. We suspect that many of these fixations represent not scanning for traffic but rather the default position for gaze, centered along the central axis of the pilot, the aircraft, and the direction of travel. Gazing mainly straight ahead, coupled with peripheral vision, allows pilots to maintain control of the aircraft.”
The operationalization problem. In order to define and operationalize TF we need to know what "extended duration" of gaze means. It is somewhat confusing that flight instructors use the term "Target Fixations" although they do not actually mean single fixations but rather the total dwell time in which their gaze is directed to certain target objects or instruments. Since TF, however, is an established term used in pilot jargon we adopted it in this study. Moreover, we should distinguish between gazes inside the cockpit and those outside the window (OTW) and keep in mind that the distribution of gaze only between the cockpit and the outside world is a rough measure. To date, little is known about TF, and the relevant literature does not provide an appropriate temporal indicator to measure this form of scanning technique. Unfortunately, the flight instructors in our expert interviews also did not specify in detail the gaze duration on which a TF measurement should be based; possibly, experienced pilots cannot be expected to be precisely aware of how they use their gazes. Hence, we should consult the advice of aviation safety authorities. Addressing the aspect of "optimal scanning", the Federal Aviation Administration (
FAA, 1998) recommends constant and frequent visual scanning of the airspace for all pilots and the European Aviation Safety Agency (
EASA, 2010) states that a regular gaze inside the cockpit should take approximately three seconds. According to previous findings in civil aviation, pilots do not follow existing recommendations given by the authorities (
Colvin et al., 2005). It has been shown that OTW gazes are too rare and systematic scan paths cannot be identified among pilots (
Anders, 2001). The role of TF in helicopter landing, however, has been neglected so far. Nevertheless, the recommendations of the EASA give us a clue how to differentiate TF on the instruments from regular three-second-gazes inside the cockpit. In contrast, unfortunately, there is no indication of the duration of OTW gazes. Thus, we need an empirical approach to differentiate TF from regular OTW gazes. Regarding this, Inhoff and Radach (1998) proposed a procedure for gaze data: If duration values X are spread around the mean M
X with the standard deviation SD, the duration values X with (M
X – 3*SD) ≤ X ≤ (M
X + 3*SD) are within the normal range, while all values X with X > (M
X + 3*SD) significantly exceed the average, implying gazes with an extended duration.
As mentioned above, we can expect an influence of flight experience on the application of TF by helicopter pilots. Based on the fact that expert pilots benefit from an enlarged functional visual field (
Williams, 1995), identify relevant objects more quickly and make adequate choices of action (
Kasarskis et al., 2001), we can assume that they tend to apply TF in a different way than inexperienced pilots. The sole impact of flight experience on scanning techniques has been suggested by a wide range of empirical results, even if TF have not been considered explicitly. Some of these results are shown below in order to substantiate hypotheses about scanning techniques of helicopter pilots.
Impact of Flight Experience on Scanning Techniques
Dixon, Rojas, Krueger, and Simcik (1990) investigated the scanning techniques and performance of military transport aircraft pilots, varying the size of the field of view (FOV) in the simulator with visual flight rules (VFR). It has been demonstrated that, in contrast to trainees, experienced aviators adapt to a smaller FOV more efficiently to maintain flight parameters. The strategy of experts in the group using a smaller FOV consisted of fewer OTW gazes and more instruments scanning.
By analogy, Bellenkes, Wickens, and Kramer (1997) found different visual scanning techniques of expert and novice pilots for varying flight phases with instrument flight rules (IFR). In comparison to novices, the gaze duration of experts was shorter and fixations on instruments were more frequent. Experts were able to react more flexibly to mission demands in that way.
Similarly, the study by
Kasarskis et al. (
2001) showed that experienced pilots had significantly shorter gaze durations, more fixations in total and more relevant fixations on aim points or instruments than novices while performing simulated landing maneuvers (VFR). Thus, experts had more targeted scanning techniques and were able to allocate their attention more efficiently than inexperienced pilots. We may assume that experienced pilots overlearn scanning techniques relevant for flight control and thus have more spare capacity.
O’Hare (
2002) explains this superiority of experts by using a strategy relating to long-term memory: Experts use experience-driven techniques in attempting to identify stimuli of situational relevance which can assist in solving tasks (also: longterm working memory). They ignore irrelevant stimuli so they have more cognitive resources at their disposal in difficult situations than novices.
The principle of tactical information acquisition by experts is also evident in space flight: Matessa and Remington (2005) modeled astronaut scanning techniques applied to error management by hierarchically breaking down behavior sequences in the Space Shuttle. The participants were to process a sudden error during a simulated Space Shuttle flight. It can be concluded from the eye tracking results that novices (i.e. regular pilots with no experience in space flight) have to fixate their gaze on sudden, unknown information repeatedly for a longer period of time in comparison with experts. In contrast, experienced astronauts conduct effective status analyses more rapidly.
Sullivan et al. (
2011) investigated the scanning techniques of helicopter pilots in a fixed-base simulator by performing a navigation task (VFR). The results showed that performance cannot be predicted by flight experience; scanning techniques, however, correlated with expertise: The more extensive the flight experience, the shorter the gaze duration and the more frequent the saccades between the outside world and the cockpit map. In this study, OTW gazes were more frequent among trainees than among experts. Here, too, experienced pilots employ a more efficient scanning technique for relevant information channels.
Summarizing the results, it cannot be generally assumed that flight experience determines fixations on certain areas for a shorter or longer period of time or more or less frequently, but rather it is to be expected that scans by experts are more targeted than scans by novices. This obviously depends on mission demands. The singular impact of mission demands on scanning techniques has also already been investigated.
Impact of Mission Demands on Scanning Techniques
Colvin, Dodhia, Bechler, and Dismukes, (2003) investigated scanning techniques of pilots for varied task demands. In addition to regular straight level flights, phases with high traffic density were performed (VFR). An individual case analysis showed that increasing demands result in a concentration of gazes on main displays and instruments (tunnel vision) in order to maintain the performance level. In contrast, fixations on peripheral displays were less frequent.
Gaze concentrations on the instrumentation were also investigated (Thomas & Wickens, 2004) by varying control displays (VFR). The displays were equipped with and without raster graphics creating the effect of an optical tunnel. Measurements were conducted to record scanning data and to determine whether pilots are able to identify unexpected other aircraft. When unpredictable events occurred, resulting in increased demands, it became evident that in both conditions non-detectors fixated their gazes more frequently than detectors and their number of OTW gazes was smaller. In contrast, detectors distributed their scans more evenly between the displays and the outside world. As expected, the graphic tunnel of one of the displays adversely affected the detection performance with respect to other aircraft as it facilitated tunnel vision.
While the two studies mentioned above prove the occurrence of tunnel vision effects with increasing demands, the study by DiNocera, Camilli, and Terenzi (2007) implies an inverse trend. The authors measured the subjective workload, performance and scanning data of police pilots over different flight phases (VFR). Results showed that the higher the workload, the more random or untargeted were fixations in the simulated cockpit. Moreover, subjects had shorter gaze durations and saccades were longer (visual scanning randomness). The authors are of the opinion that this serves to optimize information acquisition in order not to miss anything, even in high workload phases.
As indicated, the impact of task demands also does not allow any consistent conclusions concerning pilot scanning techniques: On the one hand the tendency of visual tunneling increases with higher workload, on the other hand visual scanning randomness was found. Since the individual operationalizations of the mission demands are not directly comparable to each other, the question as to what extent the findings can be generalized remains currently unsolved. Moreover, research has not found a consensus on how scanning techniques of pilots can be defined and measured. Obviously, flight experience and mission demands must be assumed to be a combined influencing factor. This interaction effect has not been investigated particularly with respect to landing maneuvers yet and the differences regarding the scanning techniques of expert and trainee helicopter pilots remain vague.
In order to address this, we tested multivariate effects between flight experience and mission demands (independent variables) on TF, workload and performance (dependent variables) of military helicopter pilots. If we assume interactive effects, we can establish multivariate hypotheses (H) which can be tested in a multi-factorial design. We have deliberately avoided the formulation of one sided hypotheses since we do not have sufficient information as yet regarding the occurrence of TF and in what sense these are affected by flight experience or the demands of a mission. Due to previous findings the focus is on an interaction hypothesis using a multivariate approach.
H Flight Experience: The flight experience of helicopter pilots affects the use of TF, performance as well as subjective workload during a mission.
H Mission Demands: Mission demands affect the use of TF, performance and subjective workload of helicopter pilots during a mission.
H Flight Experience x Mission Demands: The factors "flight experience" and "mission demands" affect the use of TF, performance and subjective workload of helicopter pilots interactively.
From a training effectiveness point of view it should be noted that effects of single factors fade into the background if experience and demands have a significant interaction effect (Janssen & Laatz, 2007, p. 377). Additionally, we explored
whether objectively measured and subjectively assessed scanning techniques deviate from one another,
the connection between pilot performance and their scanning techniques, and
the usability of the eye tracking method from the pilots’ point of view.
Methods
Subjects
Thirty-three male helicopter pilots recruited from the German Bueckeburg Army Aviation School voluntarily took part in the study. The sample included 16 student pilots and 17 flight instructors. On average, the experience of the student pilots amounted to approximately 76 hours in the simulator as well as in real aircraft, while the flight instructors on average had 1501 hours of flight and 301 hours of simulator experience.
Equipment
All tests were conducted in the Eurocopter (EC) 135 flight simulator. The simulator with its original cockpit has a six degrees-of-freedom motion system, eight projectors with a 240*90° FOV and a resolution of 1600*1200 pixels.
The Dikablis® head mounted eye tracking system by Ergoneers Ltd was used for data recording. In addition to a head unit the eye tracking system consists of an electronic unit and a computer for storing the data. Areas of Interest (AOIs) can be mapped in the raw videos using special markers and data can be evaluated in quantitative terms. The evaluation (based on statistical inference) was carried out with SPSS 17.0 for Windows.
Landing Maneuvers
A pre-test expert interview (N = 6) was conducted based on recommendations by Denning, Bennett, and Crane (2003), according to whom the definition and operationalization of mission demands should be established with the support of experienced flight instructors. The experts responded as follows to the question of what exactly makes up high demands of a helicopter landing maneuver:
Information from the cockpit must be evaluated rapidly.
A large amount of data must be acquired from the instruments, while information concerning the environment (e.g. ground conditions) is difficult to assess.
There are hardly any fixed reference points within the peripheral FOV.
It is necessary to carefully hover to the landing point.
Subsequently, a difficulty-ranking of different landing maneuvers was established and performed by a sample of 15 flight instructors. Subjects were asked to rank five maneuvers from 1 = low to 5 = high mission demands. The results showed that landing on a terrain-pinnacle was ranked among the low mission demands by most subjects (53.3%). In contrast, landing on a frigate on the open sea was ranked among the high mission demands by the major part of the sample (50.0 %). Accordingly, these two missions were included in the study (
Figure 2).
Both missions were conducted under visual flight conditions. The pinnacle landing included numerous reference objects within the pilot’s peripheral visual field (trees, utility poles) which he could use to orient himself during approach; the landing was therefore a comparatively easy flight maneuver for an expert. After takeoff the pilot was to fly a short traffic pattern followed by landing the helicopter. In contrast, the second mission was more challenging because there were no reference objects on the open sea. Therefore the pilot had to hover to the landing point on top of the frigate based on skillful visual orientation guided by the cockpit instruments. Both missions lasted approximately 5 minutes and were subdivided into three parts: Take-off lasted about one third of the time, the traffic pattern and landing each took another third of the time. Eye tracking data was collected in all three flight phases until touchdown and compared subsequently. The landing approach was initiated at a flight altitude of approximately 300 ft.
Procedure
After each subject had been equipped with the eye tracking head unit and the system had been calibrated, both missions were performed (33*2 trials). The factors "experience" as well as "demands" were combined in randomized pairs to avoid sequence effects. Between flights subjects took a 20 minute break while the next participant performed the mission. After each mission an interview was conducted via radio. The subjects were asked for a self-assessment of their scanning techniques, performance and workload during the mission. After each subject had performed two missions, a final questionnaire was handed out and completed for a usability evaluation of the eye tracking method.
For the analysis of the gaze data AOIs were defined in the eye tracking videos; these are shown in blue in
Figure 3. The AOIs "Instruments" (for head-down gazes into the cockpit) and "OTW" (for head-up gazes out of the cockpit) were selected for data analysis. Both AOIs were dimensioned using the horizontal instrument panel top side with a reference marker and to the outer limits of the eye tracking videos as shown in
Figure 3. Additionally, the AOIs were dynamic (software setting) so they moved transversally with the pilot’s head movements. The AOIs were not subdivided into further subsections for the TF analysis since the general occurrence of TF during a helicopter mission was the focus of the study.
Parameters
TF. Based on the recommendations of
EASA (
2010) according to which a regular gaze inside the cockpit should take approximately three seconds, gazes with X > 3000 ms were coded as "TF
instruments". A sample-related algorithm (Inhoff & Radach, 1998) was used to calculate individual OTW gazes: All gaze duration values X with X > (M
X + 3* SD) were coded as "TF
OTW". Based on the algorithm all those gazes are considered to be TF
OTW whose duration deviates by at least three standard deviations in positive direction from the mean value of the OTW gaze duration.
Performance. According to the pre-test expert interview a performance assessment of helicopter pilots should consist of the following aspects: The remaining mental capability of the pilot after the mission (%), the deviation from the optimal landing point (meters) and the airmanship (safe and foresighted aircraft piloting, crew communication; rated with grades from 1-5). In accordance with our expert interview helicopter pilots define remaining mental capability as the subjective amount of spare capacity after a completed mission. It is noticeable that performance is always evaluated subjectively by the flight instructors at the Army Aviation School and that neither objective main task parameters (e.g. reaction time, error frequency or objective performance) nor the performance in a secondary task are assessed or stored. On the one hand, there is a standardization requirement since the performance rating of the student pilots can be biased by subjective influences of the flight instructors. On the other hand, studies show that the accuracy of performance ratings by experts is frequently very high with respect to the performance of student pilots.
Coladarci (
1986) proves that teachers evaluate their students with an accuracy of .67 ≤ r ≤ .85 in various fields of competence which is a relatively adequate evaluation regarding the objective performance.
Jako and Murphy (
1990) show that decomposition (i.e. subcategorization of the evaluation in several fields) results in a higher accuracy of subjective performance rating. Our approach takes due account of this principle by evaluating the three categories mentioned above (remaining capability, deviation from landing point, airmanship). Moreover, interrater reliability studies show that there is a high consistency among expert evaluations (Borman: r = .97; Akinwuntan, DeWeerdt, Feys, Baten, Arno, & Kiekens, 2003: r = .80, for the evaluation of driving performance).
Other influences play a role when we take a look at the accuracy of expert self-evaluations. According to established psychological findings, self-evaluations of performance are frequently self-serving biased (cf. Stroebe, Jonas, & Hewstone, 2002). For instance, performance tests show no correlations between the selfevaluations of doctors with respect to their expertise and their actually performed skills. By analogy, nurses rate their knowledge about life-saving measures higher than it actually turns out to be when they apply these measures (examples in accordance with Ehrlinger & Dunning, 2003). But the empirical accuracy of self-evaluated performance varies strongly in dependence on experimental design. Moorthy, Munz, Adams, Pandey and Darzi (2006) determine an acceptable accuracy of r = .64 regarding the self-evaluation of performance of medical experts during a simulation operation if the selfevaluation items are tailored in detail to the criteria of the task. In their meta-analysis of 55 studies
Mabe and West (
1982) also already identified significantly positive correlations between self-evaluation and objective performance (medium accuracy r = .29) in many fields (school achievement, job and sport performance). With regard to the empirical findings the performance assessments in the present study were made by the flight instructors. The dependent variable "performance" was compiled as follows for data analysis: performance = (remaining mental capability + airmanship [inverted] - deviation from landing point). Thus, performance was an interval-scaled sum variable which fulfilled the MANOVA requirements. Inverted meant that airmanship was recoded since better (lower) school grades stand for a better performance than poorer (higher) grades. A sample calculation:
Subjective remaining mental capability = 70 %
Airmanship = grade 2 (inverted = 4)
Deviation from landing point = 1.5 meters
Performance score = 70 + 4 – 1.5 = 72.5
Workload. Due to its approved diagnostic properties the NASA-TLX (Hart & Staveland, 1988) was applied post-trial to assess the subjective workload. The NASA-TLX is a questionnaire that measures the perceived workload of a task operator. The NASA-TLX total score consists of a combination of six subscales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration) which are rated within a 100-percent range (5-points steps). The subjective workload assessment via interviews using the NASA-TLX was conducted subsequently to each mission.
Statistical Analyses
After inspection of the raw data 21 eye tracking videos had to be excluded from the 66 data sets due to the poor quality of the marker detection. Thus, 45 eye tracking data sets were included. TF were subsequently extracted from gaze duration data.
Following the descriptive analysis (distributions of gaze duration) the data was verified regarding its possible use for a MANOVA. The demands for all variables were fulfilled: The homogeneity of covariance was determined (Box’s M-test: F [18, 4207] = 0.72; p = .798) and all variables were normally distributed (Kolmogorov-SmirnovTF = 0.82, p = .511; Kolmogorov-SmirnovPerformance = 1.09, p = .183; Kolmogorov-SmirnovWorkload = 0.45, p = .989). Regarding type-one-error α was set .05. For all follow-up tests a Bonferroni adjustment was made.
The conduct of a MANOVA was warranted for the following reasons: Without a MANOVA, several univariate ANOVAs with the same sample would be indicated. This would result in an accumulation of the α error. In addition, a MANOVA can reveal group differences which result from linear combinations. Due to this fact the MANOVA is more exhaustive in comparison with ANOVA. In the case of significant effects, however, the test results do not deliver a clear understanding of where the manifestations of group differences occur. This necessitates post-hoc analyses.
Since a temporal cut-off value "c" for target fixations in aviation has not been empirically validated, further manifestations of c (2000, 4000, 5000, 10000 ms) were tested in follow-up analyses with respect to their impact on explained variance. For the interpretation the conventions by
Cohen (
1988) were applied according to which η² ≥ .01 is a small, η² ≥ .06 is a medium and η² ≥ .14 is a large effect.
Interview data was compared with eye tracking data for the explorative analysis and deviation variables were calculated: deviation = subjective assessment – objective eye tracking data. Thereby, a) negative values indicated an underestimation of the pilot’s own scanning techniques (e.g. more gazes were measured by eye tracking than were subjectively estimated); and b) positive values indicated an overestimation of the pilot’s own scanning techniques (e.g. fewer gazes were measured than were subjectively estimated). Furthermore, the existence of linear correlations (Pearson) was explored.
Results
Figure 4 and
Figure 5 show the frequency distributions of gaze duration for OTW and instruments (N = 45, each distribution for both missions). Both were tested for normal distribution, however they were not normally distributed (Kolmogorov-Smirnov
OTW = 18.51, p < .000; Kolmogorov-Smirnov
instruments = 14.33, p < .000). As is common for gaze duration both distributions showed a left-steep shape. A visual inspection of both distributions already indicated that the percentage of TF
OTW is greater than the percentage of TF
instruments for nearly identical cutoff values (cinstruments = 3000 ms vs. cOTW = 2830 ms).
The descriptive characteristics of TF (%) are shown in
Table 1 and
Figure 6.
Table 2 as well as
Figure 7 and
Figure 8 show performance and workload characteristics. As was expected, the factor "experience" revealed a significant main effect on performance (p < .000); however, there were no main effects regarding the total workload or TF (see
Table 3). The factor "demands" did not reveal any significant main effects on the dependent variables, either. In congruence with the assumption of an interactive influence of experience and demands, however, a significant interaction effect was found for TF (p = .033), but not for performance and workload.
Figure 9 was examined in more detail for a post-hoc analysis. While student pilots had a higher tendency to conduct TF during maneuvers with higher visual demands (frigate), a reverse pattern was found for flight instructors: They conducted more TF for lower visual demands (pinnacle).
With respect to flight experience a follow-up T-test revealed a significant difference regarding the NASA-TLX mental scale (MFlight instructors = 45.6, SD = 23.8; MStudent pilots = 60.6, SD = 24.1; t [64] = - 2.55, p = .013): In both missions, the subjective mental demand was greater for student pilots than for flight instructors. This was obviously not true, however, for the total workload.
Since the cut-off value for target fixations was calculated from the data set and a specified value has not been established yet, it was also varied (see
Table 4) to determine in what way the quality of the multivariate model changes in terms of explained variance (η²). It was found that explained variance was greater for a decreased rather than for an increased cut-off value. A value of 2000 ms would conform to the safety-critical maximum taken from the driving context extracted by means of the secondary task paradigm (Alliance of Automobile Manufacturers, AAM, 2002). Since gaze durations in aviation cannot be easily compared to those conducted in the driving context and model quality was still satisfactory for 3000 ms, the application of this threshold value proved useful. Since for instance a maximum of 10000 ms does not yield a comparably satisfactory separation between the groups (small effect), its predictive power in various expertise and demand groups is apparently lower than for 3000 ms.
In the subsequent analysis of the determined interaction effect the distribution of TF was investigated for takeoff, traffic pattern and landing. For this purpose, a further MANOVA was conducted. Significant differences could again be revealed for the factor combination for TF
OTW in the takeoff phase (F [1, 45] = 4.33, p = .044, Power =.53), for TF
instruments during the traffic pattern (F [1, 45] = 7.41, p = .009, Power = .76) as well as for TF
OTW in the landing phase (F [1, 45] = 5.04, p = .030, Power = .59).
Figure 10 and
Figure 11 were used for data interpretation. Since the distribution of TF was analyzed among flight phases the focus was on the relative vs. the absolute quantitative interpretation, i.e. the figures had to be compared to each other.
Pinnacle. During takeoffs and landings in terrain flight operations experts pilot the aircraft using TF
OTW to a larger extent than student pilots (black and spotted bars in
Figure 10; flight instructors
OTW = 68.3 %, student pilots
OTW = 48.5 %) and also have fewer fixations on instruments during the traffic pattern (white bar in
Figure 10; flight instructors
instruments = 3.1 %, student pilots
instruments = 22.3 %) than trainees. The scanning technique indicates the tactical use by flight instructors of TF in flight phases involving greater workload (takeoff and landing) and at the same time a large amount of environmental information; in this process OTW gazes enable them to benefit from their higher skill of peripheral perception (more TF). Student pilots, on the other hand, apparently conduct shorter scans more frequently, particularly during landing approaches, to be able to assess their environment. The overall smaller amount of TF for trainees in the pinnacle mission (11% vs. 8%) serves to illustrate this. Thus, experts use their peripheral FOV more effectively by means of TF when confronted with a large amount of reference information.
Frigate. A different type of expert scanning technique is apparently used for landing on the frigate. While student pilots tend to fix their gaze on the outside world during takeoff, flight instructors acquire a comparatively greater amount of information from the instruments (black and dark grey bars in
Figure 11; flight instructors
OTW = 7.9 %, student pilots
OTW = 18.8 %). This indicates that they stabilize the aircraft first in spite of the limited environmental information by monitoring flight parameters from instruments. During the traffic pattern over the sea experts predominantly use TF
OTW to orient themselves relative to the frigate (see ratio of light grey vs. white bars in
Figure 11; flight instructors
OTW = 26.7 %, student pilots
OTW = 13.4 %); however, the use of instruments by experts is also more extensive here in comparison to terrain flight. Flight instructors pilot the aircraft over the open sea by means of increased frigate fixation (OTW) even prior to the landing. In contrast, TF
OTW of trainees do not increase until landing, while their gazes during flight tend to be shorter. Accordingly, experts use available information channels more efficiently in this case, too: For the purpose of stabilization at the beginning of the flight they use instruments for a longer period of time than trainees and adhere to an available landing reference point (frigate) at an early stage.
In order to verify the difference between the subjective and objective scanning techniques deviation variables were examined with regard to their manifestation and distribution. The analysis showed that misestimation of OTW and instrument gazes occurred among the pilots. As an explorative MANOVA showed, the selfassessment of OTW gazes was on average positively biased, i.e. fewer OTW gazes were measured than were subjectively estimated (mean difference of objective and subjective data = 17.0 %, SD = 17.6 %). A corresponding inverse pattern was found for instrument gazes (mean difference = - 16.87, SD = 17.59). The explorative MANOVA for the experience and demand factors revealed a significant main effect for experience regarding the misestimation of OTW gazes (F = 10.1, p = .003, Power = .87) and the misestimation of instrument gazes (F = 10.6, p = .002, Power = .89).
An analysis of the mean estimation of gazes inside the cockpit analysis showed that the mean difference for student pilots was - 25.8 % (SD = 12.9 %), while the mean difference for flight instructors was - 9.8 % (SD = 17.8 %). This means that the frequency of instrument gazes tended to be underestimated. In conclusion it can be observed that experts also partly misestimated their scanning techniques; however, the variance was somewhat greater than for student pilots.
The explorative correlation analysis, however, did not reveal any significant linear correlation between individual gaze parameters and performance, although an intercorrelation of gaze data was observed (amount of significant correlations: .19 ≤ r ≤ .99, .000 ≤ p ≤ .006). Further exploration did, however, show a significant correlation between performance and the misestimation of the pilots’ own scanning technique: referring to the instrument check, r = .31 (p =.035). In other words, a connection was established between the correctness of self-assessment and the landing maneuver performance. In order to understand the difference between better and worse performers with respect to their subjective assessment we used the data of pilots whose performance was one standard deviation above or below the mean value. Only flight instructors (n = 9) turned out to be particularly better performers; eight student pilots and three flight instructors were rated worse performers (n = 11). If we compare better to worse performers in
Figure 12, it can be seen that the misestimation was lower for better performers (better performers: M
OTW = 4.5, SD = 17.6; M
instruments = -3.8, SD = 17.1; worse performers: M
OTW = 26.1, SD = 9.1; M
instruments = -26.1, SD = 9.2; F
OTW = 8.3, p = .014; F
instruments = 9.3, p = .010). This observation underlines the differences between experts and trainees: The subjective assessment by experienced and better performers regarding their scanning techniques seems to be more realistic.
Another part of the analysis was the evaluation of the usefulness of eye tracking as a feedback method in simulator training and its usability for real flights. More than half of the sample (53.3 %) could imagine using it as a feedback tool in simulator training on a regular basis once a month. 61.5 % could imagine applying eye tracking in real flights.