Next Article in Journal
An Unexpected Spontaneous Motion-In-Depth Pulfrich Phenomenon in Amblyopia
Previous Article in Journal
The Role of Perspective Taking on Attention: A Review of the Special Issue on the Reflexive Attentional Shift Phenomenon
 
 
Article
Peer-Review Record

Decoding Images in the Mind’s Eye: The Temporal Dynamics of Visual Imagery

by Sophia M. Shatek 1,*, Tijl Grootswagers 1,2, Amanda K. Robinson 1,2,3 and Thomas A. Carlson 1,2
Reviewer 1: Anonymous
Reviewer 2:
Submission received: 14 May 2019 / Revised: 20 September 2019 / Accepted: 18 October 2019 / Published: 21 October 2019

Round 1

Reviewer 1 Report

Review of “decoding images in the mind’s eye: the temporal dynamics of visual imagery”

 

Brief Summary

This is an interesting study, exploring an important and understudied area, looking at the temporal dynamics of visual mental imagery.  As the authors point out, although there have been lots of studies looking at overlapping neural substrates for visual perception and visual imagery, few have looked at temporal similarities across the two.  In this study participants completed a retro-cue task, in which they were presented with 4 visual stimuli, and then cued to form a visual mental image of one of the 4.  After image formation, they were then presented with 4 test stimuli, and had to indicate which of the 4 they had just formed an image of.  This allowed the researchers to be confident that the participants were using imagery to complete the task rather than semantic labelling or other techniques.  They compared EEG recordings for the ‘picture’ stage/perception of the task, when being presented with the visual stimuli, with the 2 image forming stages of the task using time-resolved multivariate pattern analysis.  They found that whilst they could decode stimulus category and identity from the visual presentation of the stimuli, this was not possible with the mental images of the stimuli.  And they found that the neural representations of vision and imagery did not overlap at any of the time points, and differences in ability to form vivid imagery did not impact the results.  In the Discussion the authors carefully consider possible explanations for not being able to decode mental imagery from the current data, focusing on stimulus and design-related factors.  As concluded by the authors, the study raises lots of important questions and highlights methodological considerations when using EEG and time-series decoding to investigate the temporal dynamics of mental imagery.

 

Broad comments:

Whilst this is an interesting and generally well-written paper, I am concerned that the reasons informative patterns of activity were not found for mental imagery could be more related to study design than a representation of the actual underlying activity.  The authors do make a good attempt to discuss these issues in the Discussion, but I feel more is needed. 

My reasons for this concern are due to some of the changes that have been made to the retro-cue task, especially relating to the 2 stages of imagery (the ‘Cue Locked-imagine’ and ‘response locked imagine).  Please clarify why you have added this additional requirement of participants clicking the mouse to indicate they have formed the image.  Do you think the results might be different at all if this mouse click requirement was not present?  What benefit does it add? 

In the Discussion the authors mention the differences between EEG and MEG as a possible reason for the current findings, in comparison to Dijkstra et al (2018).  Have other studies used EEG to explore temporal issues relating to mental imagery at all?  Would you expect the same lack of informative patterns of activity in the imagery condition if you used MEG? What are the benefits of using EEG?  With the reduced likelihood of detecting an effect due to using EEG, and the working memory requirements needed for 4 rather than 2 pictures, and the additional mouse-click requirement, the brain activity involved in the 2 imagery epochs is going to be dramatically different to that involved with passively viewing pictures, and similarities between the imagery and perception processes in the visual areas may simply be missed.  

 

Throughout the paper – please be more consistent in your labelling of conditions / epochs as this gets confusing across and within the various sections / figures

 

 

Specific comments:

Abstract:             

line 22: authors should consider rephrasing the sentence “our results indicate that the dynamics….” with particular focus on the use of ‘compared to’ as the research did not directly statistically compare the variability of temporal dynamics of imagery and perception processes, but rather found that whilst the temporal dynamics of perception processes are not variable across or within participants, the temporal dynamics for imagery processes are highly variable across and within participants. 

Line 26-27: the final sentence states that the implications of the results for understanding neural processes underlying mental imagery are discussed – however, it seems that this isn’t really possible, as the null results point more towards discussing reasons for differing results to Dijkstra et al (2018). 

 

Introduction:

Line 68: “they occur at a later time and are more diffuse” – it isn’t clear what you are referring to here, do you mean to say that the imagery and vision activation patterns appear at different times, or that the imagery activation patterns appear at a later time? 

Lines 87-90: Authors need to give the reasons for all predictions here (currently only the expectation that category information in mental imagery would be decodable is justified)

 

Method and Material

It would be good to help clarify why the authors have made so many changes to the design of the study, compared to other retro-cue designs exploring mental imagery.  In the Discussion the reason 4 pictures instead of 2 is made clear.  However, other differences such as not having fixation points between the pictures, and lack of mask between pictures and imagery stage are not so clear.  Also, would inclusion of item-by-item image vividness ratings be useful?  As the VVIQ analysis was not significant, maybe not, but it would provide a potentially more sensitive and task specific rating.  Also, have the authors considered exploring the impact of possible spillover effects?  So look at the role of target location in sequence of pictures?  Could this be more relevant as your design has 4 pictures compared to the other studies that have just 2?

Please add more details about the instructions given to participants about forming the mental image – for example, were they asked to imagine cued stimulus as vividly as possible? 

Line 161: word missing (block?) from “We also included a pattern estimator at the beginning of each to investigate…”

Figure 1

For (c) Imagery Sequence it would be helpful to have the 3 stages added to the process, to indicate the conditions “Vision”, “Cue-locked imagine” and “Response-locked imagine” so that this is clear from the Figure without needing to refer back to the text.  

Figure 3

Please check the labels A and B are correctly matched from the graphs to the legend.

 

Discussion:

Line 330: it’s not clear why participants’ subjective ease in forming images of Sydney Harbour bridge due to the lines / arches is relevant to the greater decoding accuracy for these pictures in the visual conditions.  Please clarify.   

Line 405: not sure if this sentence is incomplete? “…”

Line 427: not sure if Borst and Kosslyn, 2010 is best reference to use here as it is looking at spatial imagery which seems very different to the mental imagery used in this study, so consider selecting a different reference


Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Summary

This study targets an important under-investigated aspect of mental imagery: its temporal dynamics. Using EEG and multivariate patterns analyses, it aims to track neural representations during imagery over time. The task and stimuli are carefully designed and well-controlled and the analyses are well executed. The manuscript is clearly written and easy to read. The authors do not find above-chance decoding of imagery signals, making it difficult to draw clear conclusions. Despite this null-finding, I believe that the results are of importance for the scientific community. Specifically, they can greatly inform the choice of stimuli and design for future research using EEG/MEG to study mental imagery. I make some suggestions that could potentially increase decoding accuracy. However, as mentioned above, even if decoding accuracy during imagery remains at chance, I still think the results are of interest. I have a few minor points about specific interpretations of previous findings and a few textual comments.

 

The reported analyses are carefully executed and well-motivated. Given the chance decoding finding within imagery, I have a few suggestions/thoughts that could potentially increase decoding accuracy. I can imagine that the authors have already considered most of them, and did not report them because they did not work, but just in case I would like to mention them here and hear the author’s thoughts on them:

o   Very little EEG pre-processing was done. In line with that, the authors do not report doing any artefact rejection. However, artefacts, especially large eye-movements and/or blinks, could create outliers in activation and have large effects on the classifiers. It might be worth doing artefact rejection and ICA to remove eye-related artefacts and performing the analyses again to check if this improves results. In line with this, previous studies have found that eye-movements can be a huge confound in temporal decoding analyses even if participants are instructed to fixate (e.g. Mostert et al. 2018; Quax et al. 2019).

o   It is likely that not all 64 electrodes contain relevant information about the imagined or perceived stimuli. Adding features to classifiers that only contain noise can significantly decrease decoding accuracy. One way to deal with this is to apply PCA on all concatenated data and select the components that contain the largest part of the variance (e.g. 95%) and use these as features for the classifiers instead of all electrodes. You could do this per participant to maximally increase sensitivity.

o   Cross-decoding between Vision trials and Imagery trials still requires LOO or n-fold cross-validation because they sometimes belong to the same ‘trial’: The Vision epochs come right before the Imagery epochs, this could result in a bias in the classifier due to auto-correlations in the signal if the epochs of the same ‘trials’ are used for training and testing the classifier (see also Dijkstra et al. 2018). It might be that the authors did already do this, but it was not entirely clear to me from the text.

o   The authors use 12 ms sliding windows (3 time points), whereas Dijkstra et al. use 30 ms (9 time points) for imagery decoding. Have the authors tried increasing this window systematically to see if decoding accuracy increases? I understand that too large a time window results in the loss of temporal information, but it might give an indication whether there is decodeable information present in the signal at all but just obscured by temporal variation.

o   Information on the number of trials in each class and condition is missing. Low trial numbers might go some way in explaining chance decoding (e.g. Dijkstra et al. 2018 had ±100 trials in each class and still only achieved just-above-chance decoding within imagery).       

o   In line with this, the authors mention that they use all 4 stimuli during Vision to train classifiers on. However, the neural responses to stimuli 2-4 are likely contaminated by the prior stimuli, resulting in less clean representations. If trial numbers permit, it might increase decoding accuracy within perception and potentially also cross-decoding to imagery if the authors only use the first image in the imagery sequence, which doesn’t suffer from this contamination, to train Vision classifiers on (in line with Dijkstra et al. 2018).

o   Did the authors also calculate the generalization between Pattern Estimators and Vision and Pattern Estimators and Imagery? Because participants were not engaged in a task during the perception of the stimuli during this phase, their neural representations might have suffered less from potential confounds such as eye-movements (see e.g. Mostert et al. 2018).

o   Training on perception and testing on imagery generally gives lower decoding accuracy than training on imagery and testing on perception (e.g. Lee et al 2012 & Dijkstra et al 2018). This is likely due to the fact that all stimulus features that are present in the imagery representation are also present in the perception representation, but not the other way around (i.e. imagery is less ‘rich’). Given the low within-imagery decoding, I don’t expect this to work, but have the authors tried training on Imagery and testing on Vision? Again, using cross-validation and calculating temporal generalization to take into account that processes likely happen at different time points.

 

Page 2, line 45: clearer overlap with higher-order abstract visual processing. Albers et al. 2013 show overlap in low-level visual cortex for grating stimuli, which most people see as canonical low-level visual stimuli. This reference might therefore not be the best to use here. Furthermore, Mechelli et al. 2004 used DCM to show effective connectivity underlying representations in occipito-temporal areas during imagery and perception. The study that showed that occipito-temporal representations overlap in the first place, and which produced the data used by Mechelli et al, is Ishai, Ungerleider & Haxby (2000).

Page 2, line 80: the low exemplar decoding in the Dijkstra et al. 2018 study does not have to indicate a dissociation between low-and high-level processes because there were much fewer trials for exemplar decoding than for category decoding and within-category exemplars were highly similar. Therefore, this is more likely to be a power issue. The current study potentially overcomes this by including clear dissociable exemplars within categories. Rephrasing this part might make the motivation of the current study clearer: the biggest difference with the Dijkstra et al. 2018 study is the use of dissimilar exemplars which potentially could reveal a dissociation between high and low-level visual representations. 

For future reference: the VVIQ2 is an existing and validated updated version of the VVIQ with reversed scoring.

Page 5, line 161: “… a pattern estimator at the beginning of each .. to investigate …”  missing word

Could the authors show the temporal generalization results within Imagery, Vision and cross-decoding? This would help in comparing their results directly to those from Dijkstra et al. 2018.

Because this study is about the temporal dynamics of mental imagery, and not a lot of studies report on the time it takes to generate a mental image, I think that Figure S2 should go into the main manuscript, maybe as part of Figure 1? This is a clear representation of the time it takes for people to generate a mental image from items present in working memory and is potentially interesting for others.

I really appreciate the individual subject analysis, and I think it greatly strengthens the manuscript. If the authors decide to report any of the extra analyses suggested above, it would be great if they could again do individual subject statistics to check for effects.

It is not entirely clear to me why the results of the study demonstrate variability of imagery processes within subjects over time (stated as one of the key conclusions in the abstract and the discussion). It is likely that other factors have contributed to the null-finding, right? Furthermore, it might not be obvious to all readers why temporal variability could influence the results, so it might be worth explaining this point a bit more in detail.

I don’t think that that the discrepancy in findings between this study and the Dijkstra et al. 2018 study could be due to non-imagery strategies in that study. The task in the Dijkstra et al. 2018 study was to focus on imagery vividness, not to select the correct category in each trial. To ensure that participants were imagining the correct stimulus, 16% of trials were catch trials during which participants were asked, after imagining the stimulus, which of 8 highly similar within-category exemplars they just imagined. Only remembering the label ‘face’ or ‘house’ would result in very low performance on these trials.

Page 13, line 405: part of the sentence is missing.

The recent review by Keogh and Pearson (2019) might be relevant for the point about differences in strategies between participants.

 

 


Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop