Next Article in Journal / Special Issue
LICIC: Less Important Components for Imbalanced Multiclass Classification
Previous Article in Journal
Empirical Study on the Factors Influencing Process Innovation When Adopting Intelligent Robots at Small- and Medium-Sized Enterprises—The Role of Organizational Supports
Previous Article in Special Issue
Dynamic Handwriting Analysis for Supporting Earlier Parkinson’s Disease Diagnosis
 
 
Article
Peer-Review Record

Towards Expert-Based Speed–Precision Control in Early Simulator Training for Novice Surgeons

Information 2018, 9(12), 316; https://doi.org/10.3390/info9120316
by Birgitta Dresp-Langley
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Information 2018, 9(12), 316; https://doi.org/10.3390/info9120316
Submission received: 14 October 2018 / Revised: 1 December 2018 / Accepted: 5 December 2018 / Published: 9 December 2018
(This article belongs to the Special Issue eHealth and Artificial Intelligence)

Round  1

Reviewer 1 Report

This paper discusses the feasibility of AI to detect minimally invasive surgical skill acquisition on a simulator. The paper is well-written and easy to read. It addresses a highly relevant topic in the domain of surgical education and presents a next step in using simulator metrics for evaluation of skill acquisition.


In the paper an AI system is described to generate feedback on surgical skill acquisition on a simulator. The main issue with the paper is that the added value of the AI systems seems very limited. The generated feedback (slow down, speed up, keep up or stop training) is very general. For example, feedback could be related to how much faster the trainee should become. Do we need AI for this general feedback or are currently available metrics from commercially available simulators sufficient? As proposed in the discussion, it might be more interesting to test the AI system in classifying skill acquisition patterns: is an AI system able to tell novice and experts apart, based on performance patterns? However, this question does not seem to be addressed in the current paper.


Also, there are some minor issues in the paper that should be clarified.


Introduction

I am confused by the statement in lines 56-58: what is meant by 'well-suited concepts for reinforcement learning procedures', how does AI allow building prior benchmark knowledge into simulator training? It seems like a rather unsupported, bold statement.


Line 65: what is meant by 'automatic skill evolution'? Is the evolution of skill (whatever that may be) automatic? Or is detection of skill evolution automatic? This seems a relevant distinction that might be clarified.


Method

A specific description of how the data from the novices and expert is going to be compared seems to be missing. This also includes a missing description and justification (with literature references) how the speed-accuracy trade-off is examined and analysed for this sample. The method mentions 'speed' and 'precision' but not how they are analysed in relation to each other.


Furthermore, a description of how the AI system for detection of skill acquisition is validated also seems to be lacking.


Results

The graphs on page four are very difficult to interpret and do not help to get the message across. Furthermore, in the third graph on that same page the lines are so close together, it is hard to tell what conclusion could be drawn from that data. Also, from the graphs conclusions about speed-accuracy trade-offs are drawn, but as stated above, it is unclear from the method section how this trade-off was defined and measured. 


The results section starts with a comparison of the novice's and expert's performance, however, I would expect a validation of the automatic skill acquisition detection system first. How accurate is the AI system?


Author Response

Reviewer A stated:

“In the paper an AI system is described to generate feedback on surgical skill acquisition on a simulator. The main issue with the paper is that the added value of the AI systems seems very limited. The generated feedback (slow down, speed up, keep up or stop training) is very general. For example, feedback could be related to how much faster the trainee should become. Do we need AI for this general feedback or are currently available metrics from commercially available simulators sufficient? As proposed in the discussion, it might be more interesting to test the AI system in classifying skill acquisition patterns: is an AI system able to tell novice and experts apart, based on performance patterns? However, this question does not seem to be addressed in the current paper.”

Reply: The previous version of the article seems to have induced a few misunderstandings, and the paper has been fully revised accordingly, taking into account the detailed comments of this reviewer constructively. All new text taking into account the reviewer comments are marked in RED and BLUE text in the different sections of the new, revised manuscript for easy tracking. It has been clarified that 1) the approach presented here is conceptual. The most important challenge for research on the use of artificial intelligence in medical/surgical training is that of providing analyses, concepts, and benchmark criteria based on a deeper conceptual understanding of 1) truly relevant criteria of individual skill evolution in (early) simulator training, where a large number of novices are trained and then selected for to undergo further, more specific training 2) what needs to be detected and what needs to be controlled for in individual performance by an intelligent agent (ultimately, this will be a specific computer algorithm adapted to a specific simulator system), and 3) the extent to which an automatic system of performance/strategy detection and control may be superior to session guidance by a human expert/tutor. The major problem in early simulator training with complete novices is, indeed, one of defining and validating objective individual performance criteria, comparing them to those of a true expert (benchmarking) who has performed on the same simulator, and then reinforcing the right task strategies systematically by an appropriate feed-back system to enable optimal individual learning. The feed-back provided needs to be regular, ideally trial by trial as is made clear in the article and truly useful to the individual, i.e. help them improve their individual performance. Automatically generated between-user comparisons at the end of sessions may seem useful to a selection committee, but they do not provide help or guidance to the individual trainee. This has now been made clear in the revised paper by a clearer presentation of the analyses, now performed on a larger dataset to validate the suggested approach, and a more explicit explanation of the principles of benchmarking that are the most likely to produce useful conclusions on individual performance evolution. It is made clear that the benchmarks directly motivate the decision model for controlling individual strategies detrimental to optimal performance evolution. The principles of intelligent automatic performance supervision in surgical simulator training suggested here are generic in the sense that they could be adapted to a variety of physical task simulators (of which there are many, and provided a measure for task time and task precision can be validated, as in the case of our experimental simulator here (we also have to bear in mind that many surgical simulators used for early training do not give user feed-back at all).


Reviewer A raised detailed minor issues regarding the introduction, methods, and results sections

Reply:

- the suggested “confusing” statements and “bold” overstatements have been removed from the introduction, or reformulated

- “automatic detection of skill evolution”, indeed, not “automatic skill evolution”!

-the comparison of novice performance data and expert benchmark statistics, in terms of individual means for task time and task performance and their standard deviations, for a same number of training trials across 3D and 2D, or across 2D camera view conditions is now made clear in the text and in the Tables 1a,b and 2a,b – the raw data exploited for all the analyses are now provided as excel tables in the supplementary materials section. Tables 1a,b on page 5 show expert and novice statistics for comparison of single session performance levels (one expert, ten novices) proving that the experimental simulator does, indeed, tell apart expert and novice performance statistics; Tables 2a,b on page 7 show expert and novice statistics for comparison of last session performance levels (four novices, trained in 8 or 20 sessions) to single session expert performance levels. Again, the statistics and conclusions are clear-cut.

-how the speed-accuracy trade-off functions (page 6) of novices (now 4 instead of 2) who trained in a large number of simulator sessions (eight or 20) were generated is now made clear (pages 5, 6) and a direct link to a reference has been added to the text on page 5.

-then new tables and figures with new text produce clear-cut statistical conclusions, as now shown, and effects are made visible at a glance; the raw data files in the supplementary excel tables give an idea of the amount of data the statistics were computed on (several hundreds of individual data points relative to time and precision for the speed-accuracy analyses and between 80 and 120 individual data points for each parameter for the expert-novice statistics compared in Tables 1a,b and 2 a,b

- the data acquisition system of the experimental simulator is fully automatic, as explained in the material and methods and the results sections; the suggested (and validated) benchmarking procedure and the procedure for trial-by-trial performance feed-back according to the decision model in Figure 3 of the results section of the revised manuscript are conceptual, and can be adapted to any simulator system that provides measures for performance time and precision, as explained in the article; the functional properties of an appropriate intelligent tutor are clearly described, the feed-back approach is still conceptual at this stage.



Reviewer 2 Report

The review paper on AI in novel surgeon  is an interesting topic . I have not suggestions for review. I accept the paper in the present form.


Author Response

Thank you!



Reviewer 3 Report

The manuscript discuss the interesting and important issue of how to use feedback to improve performance in a training simulator, and specifically - improve precision in a motor task.

I found that using only 3 participants, 2 novices and one expert, is not enough for a journal paper. I suggest to submit it to a conference.

Some more issues:

In the introduction, the author should refer also to the negative consequences of giving too detailed feedback, e.g., dependence on the feedback. There is broad literature on this issue that should be addressed.

The second paragraph of the introduction is too long, and should be divided to subjects. It is not clear what the author wanted to evaluate.

The method section is unclear. The paragraphs are too long. It should be divided to the standard sections: design, participants, task, and procedure.

 

Author Response

Reviewer B stated:

"The manuscript discusses the interesting and important issue of how to use feedback to improve performance in a training simulator, and specifically - improve precision in a motor task. I found that using only 3 participants, 2 novices and one expert, is not enough for a journal paper. "

Reply: We now have included the statistics from 14 different novices in the light of those of the expert surgeon; ten trained in sessions with 120 data per novice and task parameter (generating the statistics in Table 1 of the revised manuscript), and four trained in sessions with 1600 and 640 data per novice and task parameter for the speed-accuracy analysis. Statistics of the last sessions are then compared with the expert’s single session statistics shown now in Table 2 of the revised manuscript. The raw data are included in three supplementary excel tables for the supplementary section to give an idea of the vast amount of automatically generated data we have on the experimental simulator.

Reviewer B also stated:

"In the introduction, the author should refer also to the negative consequences of giving too detailed feedback, e.g., dependence on the feedback. There is literature on this issue that should be addressed."

Reply: This suggestion was fully taken into account and new text was added to the new introduction of the revised manuscript (cf. the first paragraph on page 2) and with appropriate references (reference to prior work by others as indicated by [1] and [2] in the new references section).

Reviewer B also stated:

“The second paragraph of the introduction is too long, and should be divided to subjects. It is not clear what the author wanted to evaluate. The method section is unclear. The paragraphs are too long. It should be divided to the standard sections: design, participants, task, and procedure.”

Reply: these remarks were also fully taken into account, and the sections were rewritten accordingly for more clarity; the method section was divided into sub-sections as suggested.



Reviewer 4 Report

This work introduces a technique to evaluate performances and effectiveness of surgical simulators giving feedbacks to students in training; by comparing their speed and accuracy performances with skilled people, their exercises will result validated into a feed-back systems.

 

One of the problems of this work is that it heavily refers to other previous works of the authors, resulting in a small applicative variation far from being complete and well-formalized.

English and punctuation, especially in the initial part, must be revised as very long sentences and repetitions impact on the text comprehension.

When using the term AI, it is not clear how the agent (system) would be implemented or whether there would be a user or task modeling: sentences like “AI enhanced training model proposed” (line 146) or “Based on the known performance score of an expert built into the system” (line 237) remain all rather vague. The model mentioned does not present methods with which compare and classify performances, if not with simple controls on the time of a task (it .. then) that is rather elementary.

 

Othet issues:

Line 75: t_zero -> t0  (and so the others..)

Section 2: it would be better to depict scheme and images to explain the materials and the knowledges exposed

All the images not at high resolution.

In some Figure captions there are references to "graph/images on the right/left"; please update the positions

All the metrics lack of a formal definition (formulas)


Author Response

Reviewer C stated:

“This work introduces a technique to evaluate performances and effectiveness of surgical simulators giving feedbacks to students in training; by comparing their speed and accuracy performances with skilled people, their exercises will result validated into a feed-back systems. One of the problems of this work is that it heavily refers to other previous works of the authors, resulting in a small applicative variation far from being complete and well-formalized. English and punctuation, especially in the initial part, must be revised as very long sentences and repetitions impact on the text comprehension. When using the term AI, it is not clear how the agent (system) would be implemented or whether there would be a user or task modeling: sentences like “AI enhanced training model proposed” (line 146) or “Based on the known performance score of an expert built into the system” (line 237) remain all rather vague. The model mentioned does not present methods with which compare and classify performances, if not with simple controls on the time of a task (it .. then) that is rather elementary."

Reply: Thank you for your comments. All text added or modified in response to the reviewer comments is marked in RED and BLUE text in the different sections of the new, revised manuscript for easy tracking. It has now been clarified that 1) while the acquisition system of the experimental simulator is fully automatic, the feed-back approach based on an intelligent tutor system proposed here is still in the conceptual stage. However, this does not mean that it is to be dismissed for lack of formalism! The most important challenge for research on the use of artificial intelligence in medical/surgical training is that of providing analyses, concepts, and benchmark criteria that are based on some deeper conceptual understanding of 1) truly relevant criteria of individual skill evolution in (early) simulator training, where a large number of novices are trained and then selected for to undergo further, more specific training 2) what needs to be detected and what needs to be controlled for in individual performance by an intelligent agent (ultimately, this will be a specific computer algorithm adapted to a specific simulator system), and 3) the extent to which an automatic system of performance/strategy detection and control may be superior to session guidance by a human expert/tutor. The major problem in early simulator training with complete novices is, indeed, one of defining and validating objective individual performance criteria, comparing them to those of a true expert (benchmarking) who has performed on the same simulator, and then reinforcing the right task strategies systematically by an appropriate feed-back system to enable optimal individual learning. The feed-back provided needs to be regular, ideally trial by trial as is made clear in the article and truly useful to the individual, i.e. help them improve their individual performance. Automatically generated between-user comparisons at the end of sessions may seem useful to a selection committee, but they do not provide help or guidance to the individual trainee. This has now been made clear in the revised paper by a clearer presentation of the analyses, now performed on a larger dataset to validate the suggested approach, and a more explicit explanation of the principles of benchmarking that are the most likely to produce useful conclusions on individual performance evolution. It is made clear that the benchmarks directly motivate the decision model for controlling individual strategies detrimental to optimal performance evolution. The principles of intelligent automatic performance supervision in surgical simulator training suggested here are generic in the sense that they could be adapted to a variety of physical task simulators (of which there are many, and provided a measure for task time and task precision can be validated, as in the case of our experimental simulator here (we also have to bear in mind that many surgical simulators used for early training do not give user feed-back at all).

A final reply relative to the reviewer’s comment about “lack of formalism”: the formalisms for statistical benchmarking (means, medians, standard deviations, significance thresholds and so on) are well-known and need not be made explicit; the formal structure of a system that will generate the feed-back suggested here is a relatively minor issue. The most important challenge for intelligent systems design (AI) in surgical training is that of finding the right approach to human performance analysis before a useful formal system; i.e. one that is superior to a human tutor, should be developed. This still unresolved problem is of an essentially conceptual nature.



Round  2

Reviewer 1 Report

I would like to compliment the author with the revised version of the manuscript. The aim of the manuscript is now much clearer and embedded in the literature on metrics-based feedback and evaluation. Furthermore, the methodological approach is clarified as well as the limitations of the approach. Presentation of the results is improved significantly and the additional raw data is a very useful addition! The strategy use by the (one) expert and four novices is now the focus of the results and it is much clearer why the speed-accuracy trade-off is examined. Metrics-based feedback and assessment of image-guided skills is the next step in simulation-based surgical training and this manuscript proposes a possibly valuable, yet under-explored, approach. Please correct minor language issues before finalizing the manuscript.

Author Response

Thank you for your time and expertise! The manuscript was checked carefully sentence by sentence and minor language issues have been sorted out. These very few final corrections are highlighted in red in the text of the last manuscript version, submitted today.



Reviewer 3 Report

I thank the authors for their improvements. However, the main problem with the study is the small number of participants. Comparing one experts to 10, and then to 4, novices, is not enough to have statistically-based conclusions.

Author Response

The average precision performance of a highly proficient expert surgeon (endoscopist) in a single session on the experimental simulator (and with no previous training on the simulator!) is between one and three standard deviations more precise than that of ANY of 10 novices in a single session under the same conditions. It is also shown to be still (between one to two standard deviations) more precise than that from the last training session of 3 out of 4 other novices after a training period of 8 to 10 sessions. This is a clear-cut conclusion and amply justifies the conceptual model of simulator training control on the basis of an expert's benchmark data, as is proposed here.



Reviewer 4 Report

This work introduces a technique to evaluate performances and effectiveness of surgical simulators giving feedbacks to students in training; by comparing their speed and accuracy performances with skilled people, their exercises will result validated into a feed-back systems.

The improvements of the bibliography, together with the fact that has been removed the original part that vaguely described an artificial intelligence approach (turning it towards a benchmarking system) undoubtedly are beneficial for this work.

Maybe the “materials and methods” section would need some figures or schemes to better describe the tasks and the proposed method without making too much reference to author's previous works.


Author Response

Thank you for your time and expertise. An additional grpahic illustration (new Fig 1) with snapshot views of the experimental simulator system is now provided. This should help make the text descriptions self-sufficient.

Back to TopTop