Next Article in Journal
Lack of Agreement between Safety Priorities and Practices in Agricultural Operators: A Challenge for Injury Prevention
Previous Article in Journal
Goal Conflicts, Classical Management and Constructivism: How Operators Get Things Done
Previous Article in Special Issue
Assessment of Aircraft Engine Blade Inspection Performance Using Attribute Agreement Analysis
 
 
Article
Peer-Review Record

Air Force Pilot Expertise Assessment during Unusual Attitude Recovery Flight

by Gianluca Borghini 1,2,*, Pietro Aricò 1,2, Gianluca Di Flumeri 1,2, Vincenzo Ronca 2,3, Andrea Giorgi 2,3, Nicolina Sciaraffa 2, Claudio Conca 4, Simone Stefani 5, Paola Verde 6, Angelo Landolfi 7, Roberto Isabella 8 and Fabio Babiloni 1,2,9
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5:
Submission received: 22 January 2022 / Revised: 14 April 2022 / Accepted: 11 May 2022 / Published: 13 May 2022

Round 1

Reviewer 1 Report

Well-described issues, clearly stated conclusions.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

Comments to manuscript safety-1587794

The present manuscript aimed to investigate, if expert and novel Airforce pilots can be differentiated in their mental effort during critical and unexpected flight situations. Six expert pilots (1450 flight hours) and (finally included) five novice pilots (157 hours) participated. During a training flight in a MB339 simulator, five unexpected critical situations were induced by the instructor. Measures were the time to solve the problem by the pilots, subjective ratings (Likert scale) of the perceived mental effort demands, and as objective physiological measures the HRV, and the frontal theta power measured with the EEG. The results revealed no differences in the behavioural data (reaction time), subjective ratings, and HRV scores. However the mental effort index estimation (MEF) assessed via the EEG showed significant lower values in the expert group, despite the low sample size which indicates a large effect size. In particular, MEF differences were found for four of the five unexpected situations but not for standard situations like taxi, take off, climb, and landing. Therefore, the MEF measure seems to be a promising tool to evaluate pilot training.
The paper is well written and the study appears carefully conducted. The only critical question is the low sample size which may be the reason that some some correlations did not reach the significance level. Wikipedia tells me, that the Italian Airforce has still 70 MB339 jets in service, and I assume most of these (except the Frecce Tricolore jets) are based at the 61ÌŠ Stormo which requires at least an equal number of pilots to operate these jets. Maybe the number of expert pilots is low becauce the MB339 is used as a training jet. So please specify why the sample size was low.

Specific comments

The EEG was segmented into epochs of one second length. Which of these epochs were selected to analyse the different flight situations? Fixed time intervals or the epoch with the largest MEF values? If the latter is the case, could the temporal distance between situations and largest MEF be used for additional analysis?

Page 5, line 231. 78 sec for experts and 76 for novices. Are these the mean values (because Fig. 2 presents the median values)?

Page 7, line 256. Typo (modulted)

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

I enjoyed reading the manuscript - it is nicely written and clear. Every approach is explained and substantiated. I realize that in the reported settings it is difficult to collect reasonable sample sizes, but still 6 and 5 participants per group is a very low number. This all leads to questioning of the power of observations: what is by chance? what is underpowered? 

Please provide exact self-report questions used - those are not mentioned anywhere. Also, which aspects were pilots evaluated by the external observer?

Though I find results interesting, I strongly recommend emphasizing the pilot nature of the reported findings with conclusions being presented as suggestions and not as strong statements.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 4 Report

This is interesting study where the Authors use neurophysiological data to evaluate pilots' expertise. Which is worth emphasising, the proposed method provides an alternative to traditional methods which are based on evaluation / measuring performance output which, as such has certain limitations.

The paper provides good rationale for the research and the analysis seems to confirm validity of the approach. However, I can see a few areas where the paper could be improved:

  1. It would be worth to provide more palpable dependency between the EEG signals and the final results. All I can see in the paper is that EEG signals were used, there is also information about electrodes placement. There is also mention that some other vital signs from pilots' were measured. Yet I can't see any of these recordings in the paper. Would be good to include them, explain which EEG signal parameters are mapped into which indicators, etc. As of now I can't really see the link.
  2. Since signal processing is vital part of the study, I would also include more evidence that this has been done. E.g. there is mention of spectral analysis but there is no trace of it whilst this is one of the key element of the analysis which, if not done correctly, may hugely impact the whole method.
  3. The correlations are not convincingly different across the groups in some cases. Yes, there seem to be differences between expert and novice pilots. But still, correlation at the level of 61 is not convincing but (just) acceptable.

 

Other than that, the paper is written in good English. There is thorough discussion of the results and good conclusion. I can see practical applicability for the proposed method and that is probably something worth noting and taking into account while making final decision about the manuscript. In my view its quality is good enough to justify publication and I am happy to recommend this after just maybe a few little improvements to the paper.

  1.  

 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 5 Report

Air Force Pilots’ expertise assessment in terms of mental effort requested during unusual attitudes recovery flight training simulation

The considered manuscript is dedicated to measuring mental effort in pilots during flight training simulation and devising an index for distinguishing between different levels of the pilots' expertise. The work is potentially interesting and has practical significance. I commend the authors for running experiments with real Air Force pilots and custom flight simulation scenarios. However, I see several major problems with the study, which prevent me from recommending acceptance.


1) The construct validity of the experiment appears very doubtful to me. Basically, the authors measure EEG signal, focus on 5 electrodes and frequencies in the theta band, and propose to calculate an index that they just plainly call the mental effort. However, there are plenty of other mental processes and factors that can potentially influence the EEG signal (attention, stress, etc.). The authors provide no discussion of the construct validity of their measure whatsoever.
In the Introduction, they mention this very briefly and make a mass-reference without describing the SotA in detail: "In fact, frontal theta activity is linked to cognitive functions like information processing, decision-making, task demand, and working memory [23]–[32]." In my opinion, this is unacceptable and must be significantly extended.
At the same time, they write "These studies have hence shown that the frontal theta is a reliable indicator of mental effort elicited by tasks of varying complexity." - if this is so well known, what is the novelty of applying this to pilots in this particular study? This needs to be explained as well.
Finally, when introducing the particular formula, they write: "The Mental effort (MEF) index was defined according to the literature and previous results as:" - what previous results? Since the authors imply that the index is the main contribution of their paper, they need to describe its novelty in much more detail.


2) The authors make several overclaims (statements not supported by their study), but do not justify the advantage of the proposed index.
For instance, they write: "mental effort can be used as an indirect measure of operator expertise and capacity" - yes, and so can be the number of hours flown. What is the advantage of the proposed measure? What is the evidence that it better describes the pilot's expertise and capacity? The authors did not disclose its correlation to the performance, for instance - why Table 2 does not include URT?
"The work describes how the employment of diverse and complementary measures can provide a more accurate approach to compare and assess Novice pilots’ expertise and competences." - this claim is not supported by the study. The authors did not measure the pilots' competences, and they did not describe any effects of the expertise (as represented by the hours flown).
My concerts are further supported by the fact that MEF values were NOT higher for presumably more difficult and unusual tasks ("we expected to find an increase of the pilots’ MEF index with an increase of the mental effort demand of the flight phases"). According to the authors (Fig. 7), UPSET conditions actually did not demonstrate an increased MEF compared to the TAXI condition (presumably, the baseline), particularly for expert pilots.
I believe that the authors should at least try to explain this in the Discussion.


3) Problems with the study design and the analysis/reporting.
First of all, the authors make several pairwise comparisons, but they do not employ Holm-Bonferroni correction or anything similar. Consequently, their finding (significant difference for MEF) is very weak statistically.
Second, the authors do not have a control group for EEG: they do not test differences for any other EEG-based measures. What if the effect was random and e.g. an index similar to MEF but calculated for delta band / different electrodes would show a greater difference?
This concern is further enhanced by the limited number of subjects in the study, which seemingly varies between the measures (for EEG "Due to technical issues during the experiments, two of the Novices’ datasets were discarded."). Also, the authors should report N or the degrees of freedom for the correlations.

Overall, I believe that all the above problems potentially can be fixed without re-running the experiments. So, I recommend major revision.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

everything is fine, no more comments

Reviewer 3 Report

Authors have addressed all my comments.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 5 Report

I have read the authors' replies to the comments of all the reviewers, as well as the revised version of the manuscript. I commend the authors for addressing most of the comments, and doing an impressive job improving the review in the Intro and the description of the methods. However, I am not entirely satisfied with the issues raised in the comment #2 from my previous review, as I detail below. I recommend minor revisions, as the manuscript is by and large ready for publication, but I sincerely hope that the authors address and improve this part.

My comment was related to the authors' statement about their goal, which they repeat in the current version of the manuscript: "how the employment of neurophysiological measures can provide an additional and useful measure to assess the pilots’ expertise and skill". In the reply they similarly write: "we did not want to propose and validate new mental effort neurophysiological indexes, but to show whether the neurophysiological measures could provide any added values for pilots’ assessment"

So, my question still is: what exactly is the added value? Added compared not to asking the pilots to subjectively rate their mental effort, but compared to the currently used measures of the pilots' expertise and skill - foremost, the number of hours flown (NHF). In their study the authors basically demonstrate that MEF can be somehow predictive of the group (Novice/Expert) specified based on the NHF. The authors use no third variable that assesses pilots. So, naturally, NHF has better predictive ability of itself, and the added bonus is that no EEG measurement is required - the NHF is always available. In what context and for what would one want to use MEF?

Minor: "we could not include the URTs in Table 2 due to the small number of points for running correlation analysis (3 unusual attitudes recoveries thus 3 URT values for performing correlation analyses)" - I think there's some misunderstanding, sorry if my question has caused it. I was asking about URT as a measure of a pilot's performance - a possible "third variable" that I mention above. If the authors could demonstrate that MEF is better correlated to the time (in a certain stage/condition of the flight) than NHF, it could strengthen their results.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 3

Reviewer 5 Report

> In other words, the added value of the work is to show the benefit of using the neurophysiological measures (MEF) in combination with the conventional ones for a more accurate pilot’s expertise assessment.

My question was about the added value of MEF (in "to show whether the neurophysiological measures could provide any added values for pilots’ assessment"), but never mind.

Looking at the current replies of the esteemed authors, I still believe that some of their conclusions and statements in the Introduction do not correspond to the results:

* "identify mental effort peaks corresponding to particular procedures" - the authors did not demonstrate the differences in MEF between the different (presumably, more or less demanding) stages in the flight.

* "show the benefit of using the neurophysiological measures (MEF) in combination with the conventional ones for a more accurate pilot’s expertise assessment" - the authors did not assess the pilot's expertise beyond the number of hours flown and did not demonstrate any improvement in accuracy, consequently not showing a benefit.

However, as I specified in my previous review, I consider this a relatively minor issue, not critical for the publication, and I do not mind accepting the paper.

Back to TopTop