Next Article in Journal
Questions for the Psychology of the Artful Mind
Previous Article in Journal
The Perceptual and Aesthetic Aspects of the Music-Paintings Congruence
Previous Article in Special Issue
Grasping Discriminates between Object Sizes Less Not More Accurately than the Perceptual System
 
 
Article
Peer-Review Record

Errors in Imagined and Executed Typing

by Stephan F. Dahm * and Martina Rieger
Reviewer 1:
Reviewer 2:
Submission received: 8 April 2019 / Revised: 30 October 2019 / Accepted: 31 October 2019 / Published: 20 November 2019
(This article belongs to the Special Issue Visual Control of Action)

Round 1

Reviewer 1 Report

This study looks at error detection in motor execution and imagery, and examines which sources of feedback are used/reported in each task, and according to typists’ typing style.

On the whole, this study is interesting and can help understand mechanisms at play during motor imagery and in general during typing. I think it can be even better if the theoretical perspective is more developed. Some important terminology needs more explanation as well to fully understand the perspective of the authors. I detail major and minor comments below.

 

 

Major:

1) This would be quite extensive but I would recommend reorganizing your discussion by questions rather than results sections. One such division (in any order) could be: 1- influence of typing style, 2- influence of keyboard visible, 3- ME vs. MI. Or anything that you find relevant. You can start from the hypotheses you have laid out at the end of the introduction. The way the discussion is organized at the moment makes it hard to know if you got the answers to the questions you asked.

In general, start by discussing the most important results and then go into the specifics. Your train of thought is rather difficult to follow at the moment. For instance, the list of all your results in the first paragraph of the discussion is hard to parse. If your take home message is: there is more than visual feedback from the screen for error detection, go from there.

There is definitely some interesting results in how the sources of errors reported depends on typing style. It seems like experts mostly increase touch (from ME to MI), while hunt and peck increase planning/vision and decrease touch. You should discuss this result extensively.

 

 

2) I would like to see a more detailed discussion of forward models. A full description of the components making up a forward model is needed in the introduction. In particular, which component would be missing (in our opinion) in imagery, or based on previous studies. I think isolating components would be very interesting and add value to your work. It could also convince people of the value of doing motor imagery. For instance, could there be no efference copy? Or only less information? Maybe a less precise efference copy? Or no error monitoring? Basically, can you say *anything* about what could be happening in MI in this framework? I would expect clear assumptions in the introduction that your data allow you to confirm/refute them? This is also relevant in the context of the ideomotor theory and action effects (see p.17, l.560-62).

 

3) There seems to be a confusion between what you refer to as internal and prediction. The distinction you draw between the two is not really clear. In particular, in the introduction, you list sources of feedback: “visual feedback from the screen, visual feedback from the keyboard, visual feedback from the fingers, tactile feedback from touching the keys, and kinesthetic feedback from the movement of the fingers.” But later in the introduction you cite “comparison of internally predicted movement consequences with intended movement consequences” as a source of feedback as well. More references might help the reader to navigate your arguments here. And how about internal error detection/monitoring during response selection?

Later in the method (p.6, l.240): “The category internal prediction was assigned when participants had the impression that something was going wrong but could not name a specific source for error detection.” Is that really internal prediction or internal monitoring during planning? I guess what I find confusing is the word “prediction”. I would agree that this is an “internally detected error”. Unless you have a good reason to use the word prediction, in which case you should explain it clearly. Then in Appendix B: “internal prediction” could also be in the “review of the planning process”.

A broader question is whether you think all sources of feedback can be reported as easily. For instance, internal error detection exists which might be hard to report, except maybe the feeling that “something felt wrong”. This internal detection might be used especially for expert typists, but it is unclear whether they would be able to explicitly report it (see p. 18, l.586). Maybe you should cite this point in limitations as well.

 

4) I am not sure I understand (or agree with) your distinction between planning vs. execution errors. I do not agree with your classification of insertions/substitutions and transpositions as execution, more than omissions. Omissions is a deletion and there is no reason to think that it would arise at a totally different level than insertions/substitutions (at least no model of spelling makes this assumption). Unless this is a force issue of the key not being pressed. Your example of a word substitution (p.6, l.229) is a typical planning lexical error. Also, if the timing of two keystrokes is poorly specified, this is likely to happen at the planning stage. 

If you want to keep this distinction (not sure it is actually that useful to interpret your data), you should clarify what you consider happens at each stage, based on existing theories for instance. Rumelhart & Norman (1986)’s model would probably be useful to discuss. You should also consider other conditions such as typing from dictation, composition typing, etc. Would your claims be true for typing outside of copy-typing? If not, you should mention that you refer to copy-typing only.

 

5) a. What are the precise instructions that you gave for MI? In particular, did you emphasize to imagine the feedback on the screen? Were people looking at their hands, the keyboard, the screen, closing their eyes? It seems like all this could influence what they report as a source of error detection. You should clarify this in the method section, and discuss it along your results in the general discussion. It could change a lot of your interpretations (see also 1) ).

 

b. I am not sure I understand your argument for the importance of the ME-SC condition as a “control” for MI. From your data, it seems like people are still reporting using feedback from the screen even in MI. It could depend a lot on the instructions you gave for MI and what people were doing in this condition (see above).

 

 

6) I find the choice of giving the text to copy on a sheet of paper curious. Why didn’t you present it on the screen? It seems like it would be a more natural condition for people to type from a screen. It would also avoid a lot of eye/head movements. Even though you are not interested in timing, the more lag there is between seeing the model and starting to type, the more likely it is that errors will occur in the buffer holding the text for copy. For covering the screen, you could just remove feedback from the screen when people are typing. If you had specific reasons to choose this design, could you justify this choice in the manuscript?

 

 

7) A lot of things are wrong in Appendix A:

- Is the number for each category the total number of errors for that error type (e.g., insertion, reported 186, unreported 226)? It doesn’t add up with the numbers for each subcategory below (horizontal same finger is already 314 reported).

- You should add the total number of errors.

- You could separate your errors as lexical vs. segmental errors (see Pinet & Nozari, 2018).

- “Oktoberfest,” seems like it could be in the same category as space errors. Unless you separated it because the error is at the end of the word and not in the middle of it?

- The example for substitution doesn’t contain an error.

- “Vertical omission”: I’m not sure how you determine the direction of an omission. Did you mean “letter”?

- Could you provide examples for all error types (eventually with another word)? Some are missing, like transposition homologous.

 

8) I am not convinced that Figure 2b and the associated analysis is necessary or adds much to the manuscript. It seems like a subset of the next analysis with the three levels for “action”. I see that you are using percentages vs. raw counts. But what are we learning from this analysis?

 

 

Minor:

1-Your naming of conditions is confusing. “SC” could also mean “screen covered” and I got confused a few times. I would advise renaming them. Also, naming EXE+SC/EXE-SC as “Action” in the first analysis doesn’t make a lot of sense (although I do understand from the rest of the analyses where it’s coming from).

 

2-You use ref. 28 in support of “Most insertions occur because two neighboring keys are inadvertently pressed at the same time”. I’m not sure there is support in this reference for such a strong claim. This is a report of errors from only one typist in copy-typing. How about other reports of typing errors? Do they find that insertions are mostly of adjacent keys? I’m not certain this is true for typing from dictation or composition typing for instance. If you talk about copy-typing only, you should explicitly say so.

 

3-Why did you exclude participants that did not report errors in one of the conditions? It seems like “no reports” could still be counted as “less” than in other conditions. And it would avoid wasting valuable data. Could you justify this choice, or include them?

 

4-Could you state what the typing test consist of? Was it copy typing? How long? You should report accuracy in % in Table 1, we don’t know how long the text was so the number of errors is not very informative.

 

5-Did people tend to stop in the middle of the word to report their error or wait until the end? Were there more errors at the end of the word in general? I am guessing that if people had a tendency to wait until the end of the word to report their errors, maybe errors at the beginning of the word would have been “forgotten” or ignored. Did you look at the position of the error reported within the word (whether executed or reported in both ME and MI)?

 

6-Instead of actual vs. nonfactual errors, you could use the terminology defined by signal detection theory: hit (error produced and detected), miss (error produced and not detected), and false alarms (error detected when trial correct). The false alarms rates that you report are actually really high, especially with visual feedback from the screen. There should be around 1% or less. Not above 5%. Do you have an explanation for this?

 

7-Appendix B: Could you add percentages?

 

8-For post hoc analyses, could you state in the results when you’re running some? The reporting at the moment follows directly the ANOVA reports and is a bit confusing to understand which test the effect you are reporting comes from.

 

9-Could you report the raw numbers of errors and percentages (total on the whole task), to have an idea of accuracy on the task? Because we don’t know how many words they type, it is hard to figure out at the moment if they make a lot of errors or not.

 

10-p.16, l.467: you don’t present the data for hunt-and-peck typists with covered keyboard so you cannot use them in the discussion.

 

11-p.16, l. 502: there are fewer errors *reported* in MI than ME. You don’t know what the actual error rate in MI is.

 

12-Could you emphasize what is the novelty of your study compared to Snyder et al., 2016? If it is imagination, then focus on this. If you want to replicate their results, say so both in the introduction and in the discussion.

 

13-p.17, l.523-529: What happens if you compare the number of errors instead of the percentages then? Is it the case that it is actually the same number of errors between conditions?

 

14-p.17, l.548: “One explanation may be that insertions are detected mainly by tactile feedback, whereas substitutions and transpositions are also detected by other sources for error detection (e.g. vision of the keyboard).” You didn’t cross source of feedback and error types. If you want to make this claim, you have the data to verify it.

 

15-p.19 l.623: I would have liked to see this already in the introduction! It can give a great motivation for studying motor imagery.

 


Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

summary

in this study, the authors tested 62 participants selected for typing with one of two styles (10-finger, and 2-finger) on a series of typing tests, both under actual execution and imagined execution conditions. Participants were required to report errors that they made, and actual errors were recorded using key-logging software.

The research is interesting and worthy of investigation. However I found the manuscript to be extremely long (more than 10,000 words to report a single experiment), quite hard to follow, and I have other major concerns about the design and analysis of the results. I would either suggest that this study forms a good pilot study for a replication or other follow-up, or that the report should be greatly simplified and shortened so that the reader can extract the main results quickly and easily. Please see my comments below.

I sign all my reviews: Nick Holmes

Major/general

- the introduction (and the manuscript as a whole) feels too long. for example, on line 79, the authors begin 'In the present study...', but then take 75 more lines to explain the purpose of the study. At around 2000 words, this is a very long introduction for a single-experiment report. I suggest substantial shortening, to as little as 500-750 words to introduce the concepts of motor execution and imagery, previous research on typing, and what question the new study is answering.

- removal of participants and conditions: by removing 12 participants in one group, and 4 in the other based on lack of error reports, the authors are biasing their sample and are thus unable to generalise their results to 'all typists' - all their conclusions must therefore be limited to 'typists who report errors during this task'

- A similar problem arises with only testing the 'hunt-and-peck' typists on half of the conditions. Surely it's best to test everyone equally and report all the data (or at least make this data available)? Were the ('preliminary') data collected for all participants but later excluded, or not collected at all for the later participants? All data should be made available for analysis, and all stages of data collection should be described.

- The authors' approach has lead to an unbalanced experimental design which requires a lot more text (i.e., reporting 4 ANOVAs across 7 tables and 3 figures, rather than one full ANOVA). Perhaps the authors did indeed test all typists on all conditions, but they are just not reporting the 'unanalysable' data? (I see no reason why numbers and percentages are not analysable when they are large.)

- 'Participants were asked to type as fast as possible' - but they also had to stop to verbally or manually report their errors. This seems contradictory, and raises the question of speed-accuracy trade-offs. Did the authors measure speed at all? If yes, how were speed-accuracy trade-offs assessed?

- 'percentage error reports' - presumably, sometimes participants did *not* make an error, but still reported it an error. How did the authors deal with this? Signal detection theory could be used here (reports could be greater than 100%, but if recalculated as 'hits' and 'false alarms', this would at least allow the hits to be reported as a true percentage of the actual errors).

Minor/specific

8-9 - do all readers know what 'ten-finger typists and hunt-and-peck typists' are? perhaps say 'so called...'? or otherwise explain? In general, the abstract is too detailed and needs more narration and explanation of the concepts and purpose of the report

14 - 'kinesthesis/touch' - is this 'and' or 'or'?

44 - 'additional weight' - of an object? not clear

157 - '(some of them were using only one thumb)' - this is a little ambiguous; Also, 'Ten' at the start of the participants section is confusing

166 - Table legend is too long: only explain things that are not obvious and/or are abbreviated; t values need degrees of freedom

252-255: these results should be in the results section I think

288 - all means need to be accompanied with SE or other measure of spread

291 - all statistical values need the test statistics and degrees-of-freedom, not just p

Table 2: can be made more efficient: no need to repeat the d.f., just put them in the F column: F(1,44). Better still, report the stats more concisely in the text, not a table.

Figure 3: no need for colour; use white

343: 'Non-significant effects are printed in gray.' - please give equal fonts to all results - as long as we can read the statistics, we can make our own conclusions (again: better in text)

I have not commented further on the results or discussion. I feel that a thorough re-write is required before such minor comments can be useful

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I appreciate that you shorten the results section and made the research questions clearer. The presentation of forward models makes your hypotheses clearer too. I believe the manuscript has been greatly improved with these extensive modifications. I only have a few more comments.

 

Your experimental plan ends up being very complex. It would be helpful if you could state the main effects before going into the interactions. It is a bit odd that you discuss them in the discussion, but have not stated them anywhere in the results sections.

 

Some of your classification of errors as higher order vs. motor command is arbitrary. For instance, the doubling of a letter could be debated as a planning error (see works on geminates, such as McCloskey et al., 1994; Glasspool & Houghton, 2005, etc.). I am not asking you to revise your classification, but simply to state somewhere based on what evidence/previous literature you classified errors and that some classification might be debated.

 

I believe it is important to present some additional analyses in Supplementary Material as you did. However, I think it would be more beneficial to the reader if you could add a couple of sentences to interpret the results of each analysis. I am not talking about an in-depth discussion, but some concluding sentences to know what to make of these results.

 

I am not sure you computed the false alarm rate in the right way. False alarm rates are computed out of the total number of correct trials: after all, they are correct trials and what you are trying to see is how many correct trials were falsely reported as errors. Hit and miss rates are computed out of the total number of error trials: out of all errors, you are trying to see how many were reported (hit) or not (miss). If you did compute it that way, feel free to ignore this remark.

 

In the method section (p.5, Design and procedure): it is not clear which conditions are performed in each session. Are they performing all of them in each?

 

p.6: “False alarms and incorrectly identified errors were included in the number of reported errors, because they cannot be identified in imagination.” This is a bit unclear, since false alarms are incorrectly identified errors.

Author Response

We want to thank the reviewers for the positive feedback on our modifications of the manuscript. In the following, we answer to the remaining comments in detail.

 

Reviewer 1

 

Point 1

Your experimental plan ends up being very complex. It would be helpful if you could state the main effects before going into the interactions. It is a bit odd that you discuss them in the discussion, but have not stated them anywhere in the results sections.

Response 1:

We added the main effects to the results section in the text:

-the effect of action in line 238f :“The significant main effect of action indicated that more errors were reported in EXE+S (M = 2.3±0.3) than in EXE-S (M = 1.2±0.2, p < .001) and in IMA (M = 0.6±0.2, pmax = .001).”

-the effect of error type in line 252: “The significant main effect of error type was modified by the significant interaction between error type and source.”

-the effect of source in line 261: “The significant main effect of source was modified by the significant interaction between action and source.”

Analogously, this information was added in the comparison of both typing styles.

Point 2

Some of your classification of errors as higher order vs. motor command is arbitrary. For instance, the doubling of a letter could be debated as a planning error (see works on geminates, such as McCloskey et al., 1994; Glasspool & Houghton, 2005, etc.). I am not asking you to revise your classification, but simply to state somewhere based on what evidence/previous literature you classified errors and that some classification might be debated.

Response 2:

Doubling errors are errors in which a wrong letter is doubled in words which contain a double letter (Logan, 1999, JEP HP&P). For instance, when typing ‘Inssbruck’ instead of ‘Innsbruck’. Particularly in well-known and easy words, this is unlikely to be a spelling error or due to misreading of the template. The occurrence of doubling errors might be explained by the existence of a double schema during the creation of motor commands, which is applied to the wrong letter (Rummelhart & Norman, 1982).

 

We added two references for more detailed descriptions of the subcategories. (See line 198: Grudin, 1983; Logan, 1999)

We are aware that not all errors can be unambiguously classified. To emphasize this point, we added the following to the limitations: “Not all errors can be assigned to higher-order-planning errors and motor command errors with one hundred percent certainty. For instance, in difficult words, it might be unclear whether an omission of a letter that has to be typed twice in a row has occurred because the participant does not know the right spelling (the error would then be classified as a higher-order-planning error, because the spelling was correct on the template) or due to a failure to create a sufficiently strong motor command (the error would then be classified as a motor command error). Hence, error subcategories which could not be unequivocally assigned to our categories were not included in the analysis. Still, there might be a few errors within the subcategories included in the analysis, in which the assignment to higher-order-planning or motor command errors might be debated. On rare occasions errors classified as motor command errors were detected by reviewing the planning process, which does not seem plausible.

It also has to be noted that different authors provide different reasons for the occurrence of some errors [28,29,44]. The classification we adopted was based on the literature on copy typing [28,29]. In spoken language, similar errors may occur for different reasons [44]. For example, an interchange error may occur due to permutation of the motor commands [29] or due to phonological similarity of two phonemes [44]. We cannot rule out the possibility that covert speech contributed to some typing errors.” (line 420ff)

Point 3

I believe it is important to present some additional analyses in Supplementary Material as you did. However, I think it would be more beneficial to the reader if you could add a couple of sentences to interpret the results of each analysis. I am not talking about an in-depth discussion, but some concluding sentences to know what to make of these results.

Response 3:

We added a summary of the results after each analysis in the supplement. Further, we now point to consistencies with data or interpretations in the manuscript.

 

Point 4

I am not sure you computed the false alarm rate in the right way. False alarm rates are computed out of the total number of correct trials: after all, they are correct trials and what you are trying to see is how many correct trials were falsely reported as errors. Hit and miss rates are computed out of the total number of error trials: out of all errors, you are trying to see how many were reported (hit) or not (miss). If you did compute it that way, feel free to ignore this remark.

Response 4:

The reviewer is correct. We calculated the false alarm rate incorrectly in the previous version of the manuscript. We now included an analysis of correctly computed false alarms (out of the total number of correct trials) in the supplement (3rd analysis in the supplement).

However, we additionally report the complementary probability of the variable we previously calculated in the manuscript, that is, the percentage of actual errors of reported errors (see 4th analysis in the supplement). We had previously reported the percentage of nonfactual errors of actual errors. We used the complementary probability of this variable now to avoid confusion with the false alarm rate. This variable seems important to us, because in MI false alarms cannot be distinguished from hits. We therefore thought it might be important to know the percentage of actual errors of reported errors.

Point 5

In the method section (p.5, Design and procedure): it is not clear which conditions are performed in each session. Are they performing all of them in each?

Response 5:

We added the following to the methods section: “In each session, participants performed all experimental conditions. They were asked to read the template before they started each experimental condition.” (line 167f)

Point 6

p.6: “False alarms and incorrectly identified errors were included in the number of reported errors, because they cannot be identified in imagination.” This is a bit unclear, since false alarms are incorrectly identified errors.

Response 6:

The term “incorrectly identified errors” denotes something different than false alarms in the present manuscript. Participants did not only report whether an error occurred or not (1st step of reporting), but also what kind of error occurred (2nd step of reporting). The term “incorrectly identified error” refers to the second step of error reporting and is meant to indicate that participants correctly said that an error occurred, but then gave a false report about what they actually did wrong. We now give the definition in the manuscript (line 227f) and it is again explained in the supplemental material (analysis 5): “It can happen that participants correctly reported that an error occurred, but were not correct about what had actually gone wrong. Those errors are incorrectly identified errors.”

 

Reviewer 2

 

Point 1

The authors have responded well to my comments and the manuscript has improved. I still think the manuscript is too long (18 pages for one experiment), but the authors should be free to use that space if they feel it is justified. My only remaining concern is that, from line 238 onwards, p-values are given only with d-values, but without details of the statistical test that they derive from. I appreciate the need to shorten and simplify the manuscript, but a p-value on its own tells us very little. Are these post-hoc tests? The authors should tell us at least in each section what the upcoming p-values are. I appreciate that some post-hoc tests do not give much detail, but at least the type of post-hoc test should be stated.

Response 1:

All p-values in the text derive from Sidak-adjusted posthoc t-tests (see data analysis). To make it clearer that all p-values refer to post hoc testing, we added the following to the results section: “In the following, all reported p-values and d-values refer to the posthoc comparisons.” (lines 237 & 273f).

 

The final adaptations shortened the manuscript to 17 pages. The actual text including figures and tables only has 13 pages.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors have responded well to my comments and the manuscript has improved. I still think the manuscript is too long (18 pages for one experiment), but the authors should be free to use that space if they feel it is justified. My only remaining concern is that, from line 239 onwards, p-values are given only with d-values, but without details of the statistical test that they derive from. I appreciate the need to shorten and simplify the manuscript, but a p-value on its own tells us very little. Are these post-hoc tests? The authors should tell us at least in each section what the upcoming p-values are. I appreciate that some post-hoc tests do not give much detail, but at least the type of post-hoc test should be stated.

Author Response

We want to thank the reviewers for the positive feedback on our modifications of the manuscript. In the following, we answer to the remaining comments in detail.

 

Reviewer 1

 

Point 1

Your experimental plan ends up being very complex. It would be helpful if you could state the main effects before going into the interactions. It is a bit odd that you discuss them in the discussion, but have not stated them anywhere in the results sections.

Response 1:

We added the main effects to the results section in the text:

-the effect of action in line 238f :“The significant main effect of action indicated that more errors were reported in EXE+S (M = 2.3±0.3) than in EXE-S (M = 1.2±0.2, p < .001) and in IMA (M = 0.6±0.2, pmax = .001).”

-the effect of error type in line 252: “The significant main effect of error type was modified by the significant interaction between error type and source.”

-the effect of source in line 261: “The significant main effect of source was modified by the significant interaction between action and source.”

Analogously, this information was added in the comparison of both typing styles.

Point 2

Some of your classification of errors as higher order vs. motor command is arbitrary. For instance, the doubling of a letter could be debated as a planning error (see works on geminates, such as McCloskey et al., 1994; Glasspool & Houghton, 2005, etc.). I am not asking you to revise your classification, but simply to state somewhere based on what evidence/previous literature you classified errors and that some classification might be debated.

Response 2:

Doubling errors are errors in which a wrong letter is doubled in words which contain a double letter (Logan, 1999, JEP HP&P). For instance, when typing ‘Inssbruck’ instead of ‘Innsbruck’. Particularly in well-known and easy words, this is unlikely to be a spelling error or due to misreading of the template. The occurrence of doubling errors might be explained by the existence of a double schema during the creation of motor commands, which is applied to the wrong letter (Rummelhart & Norman, 1982).

 

We added two references for more detailed descriptions of the subcategories. (See line 198: Grudin, 1983; Logan, 1999)

We are aware that not all errors can be unambiguously classified. To emphasize this point, we added the following to the limitations: “Not all errors can be assigned to higher-order-planning errors and motor command errors with one hundred percent certainty. For instance, in difficult words, it might be unclear whether an omission of a letter that has to be typed twice in a row has occurred because the participant does not know the right spelling (the error would then be classified as a higher-order-planning error, because the spelling was correct on the template) or due to a failure to create a sufficiently strong motor command (the error would then be classified as a motor command error). Hence, error subcategories which could not be unequivocally assigned to our categories were not included in the analysis. Still, there might be a few errors within the subcategories included in the analysis, in which the assignment to higher-order-planning or motor command errors might be debated. On rare occasions errors classified as motor command errors were detected by reviewing the planning process, which does not seem plausible.

It also has to be noted that different authors provide different reasons for the occurrence of some errors [28,29,44]. The classification we adopted was based on the literature on copy typing [28,29]. In spoken language, similar errors may occur for different reasons [44]. For example, an interchange error may occur due to permutation of the motor commands [29] or due to phonological similarity of two phonemes [44]. We cannot rule out the possibility that covert speech contributed to some typing errors.” (line 420ff)

Point 3

I believe it is important to present some additional analyses in Supplementary Material as you did. However, I think it would be more beneficial to the reader if you could add a couple of sentences to interpret the results of each analysis. I am not talking about an in-depth discussion, but some concluding sentences to know what to make of these results.

Response 3:

We added a summary of the results after each analysis in the supplement. Further, we now point to consistencies with data or interpretations in the manuscript.

 

Point 4

I am not sure you computed the false alarm rate in the right way. False alarm rates are computed out of the total number of correct trials: after all, they are correct trials and what you are trying to see is how many correct trials were falsely reported as errors. Hit and miss rates are computed out of the total number of error trials: out of all errors, you are trying to see how many were reported (hit) or not (miss). If you did compute it that way, feel free to ignore this remark.

Response 4:

The reviewer is correct. We calculated the false alarm rate incorrectly in the previous version of the manuscript. We now included an analysis of correctly computed false alarms (out of the total number of correct trials) in the supplement (3rd analysis in the supplement).

However, we additionally report the complementary probability of the variable we previously calculated in the manuscript, that is, the percentage of actual errors of reported errors (see 4th analysis in the supplement). We had previously reported the percentage of nonfactual errors of actual errors. We used the complementary probability of this variable now to avoid confusion with the false alarm rate. This variable seems important to us, because in MI false alarms cannot be distinguished from hits. We therefore thought it might be important to know the percentage of actual errors of reported errors.

Point 5

In the method section (p.5, Design and procedure): it is not clear which conditions are performed in each session. Are they performing all of them in each?

Response 5:

We added the following to the methods section: “In each session, participants performed all experimental conditions. They were asked to read the template before they started each experimental condition.” (line 167f)

Point 6

p.6: “False alarms and incorrectly identified errors were included in the number of reported errors, because they cannot be identified in imagination.” This is a bit unclear, since false alarms are incorrectly identified errors.

Response 6:

The term “incorrectly identified errors” denotes something different than false alarms in the present manuscript. Participants did not only report whether an error occurred or not (1st step of reporting), but also what kind of error occurred (2nd step of reporting). The term “incorrectly identified error” refers to the second step of error reporting and is meant to indicate that participants correctly said that an error occurred, but then gave a false report about what they actually did wrong. We now give the definition in the manuscript (line 227f) and it is again explained in the supplemental material (analysis 5): “It can happen that participants correctly reported that an error occurred, but were not correct about what had actually gone wrong. Those errors are incorrectly identified errors.”

 

Reviewer 2

 

Point 1

The authors have responded well to my comments and the manuscript has improved. I still think the manuscript is too long (18 pages for one experiment), but the authors should be free to use that space if they feel it is justified. My only remaining concern is that, from line 238 onwards, p-values are given only with d-values, but without details of the statistical test that they derive from. I appreciate the need to shorten and simplify the manuscript, but a p-value on its own tells us very little. Are these post-hoc tests? The authors should tell us at least in each section what the upcoming p-values are. I appreciate that some post-hoc tests do not give much detail, but at least the type of post-hoc test should be stated.

Response 1:

All p-values in the text derive from Sidak-adjusted posthoc t-tests (see data analysis). To make it clearer that all p-values refer to post hoc testing, we added the following to the results section: “In the following, all reported p-values and d-values refer to the posthoc comparisons.” (lines 237 & 273f).

 

The final adaptations shortened the manuscript to 17 pages. The actual text including figures and tables only has 13 pages.

Author Response File: Author Response.docx

Back to TopTop