Next Article in Journal
Objective Refraction Status before and after Cycloplegia: From Childhood to Young Adulthood
Previous Article in Journal
A Multi Comparison of 8 Different Intraocular Lens Biometry Formulae, Including a Machine Learning Thin Lens Formula (MM) and an Inbuilt Anterior Segment Optical Coherence Tomography Ray Tracing Formula
 
 
Article
Peer-Review Record

Bridging a Gap in Coherence: The Coordination of Comprehension Processes When Viewing Visual Narratives

by Maverick E. Smith 1,2, John P. Hutson 2,3, Mi’Kayla Newell 3, Dimitri Wing-Paul 3, Kathryn S. McCarthy 3, Lester C. Loschky 2 and Joseph P. Magliano 3,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Submission received: 23 April 2024 / Revised: 16 August 2024 / Accepted: 20 August 2024 / Published: 30 August 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper is methodologically sound.  The analyses of the results are rigorous and reasonable.

 

Where the paper could be improved is in making the presentation of the introduction, the method, and the results more modularly-separable from the theoretical overlay.

 

I have no quarrel with the theoretical interpretation.  However, at present, the paper is largely unreadable without having to buy into the concepts and terminology of the theoretical overlay.  That is a shame, because theories come and theories go.  I suspect these data will stand.  So, it would do the reading audience a huge favor to present the approach, the experimental design, the method, and the results using only the terms that describe the measures via their operational definitions.  That means, for example, it should be possible to understand the results on their own, without ever having to know how the measures map onto the theoretical ideas of “mapping,” “shifting,” and “model integration.”

 

As I understand things, independent participants (college students) in two experiments were shown the same sequences of kid’s picture book stories and asked to do different tasks in each experiment. The main IV was the presence or absence of bridging-actions pictures.  Experiment 1 showed that people segment picture stories and look longer in places where bridging-actions are missing (Figure 3).  Experiment 2 shows that people talk more (work harder to make sense) in a talk-aloud protocol when bridging-actions are missing (Table 2).  Correlations between these two measures (using story sequence as the unit) showed that the probability of segmentation was negatively correlated with the frequency of explanations, but positively correlated with the frequency of paraphrasing when the bridging actions were present (Figure 4).  When the bridging actions were absent, the opposite occurred.  The probability of segmentation was positively correlated with the frequency of explanations, but negatively correlated with the frequency of paraphrasing.

 

I would prefer to see all those results laid out first, without reference to the author’s theoretical framework.  That framework can be retained in the intro to motivate the design of the study, but they should refrain from using the theoretical terms to describe the method and the results.  In the Discussion, the results should first be summarized using the terms of the measures.  Then they should be interpreted with respect to plausible generic (atheoretical) links between the probability of segmentation and frequency of types of verbalization. Answer the question:  why might any cognitive researcher expect these relations to hold?  This will allow the reader to form their own understanding of the data that needs explanation.  Then, and only then, should the authors’ favored theoretical take be given to the data, along with the consideration of alternative ways of understanding the same data. 

 

Related to this point — that the paper is too heavily oriented to a particular theoretical framework — I noted that that there an unusually high proportion of self-references in the paper. I think it is reasonable to retain only 4-5 of them to make the necessary points to sustain the theoretical summary.

 

I make these recommendations to help preserve these data for future use by researchers with other theoretical stances.  As it stands, the study is too easy to dismiss as either (1) trivial because the outcome seems quite intuitive, or (2) a platform for promoting a particular theoretical stance with data that could easily be accommodated by other accounts.

Author Response

  • The paper is methodologically sound. The analyses of the results are rigorous and reasonable.

Authors’ Response to Reviewer #1 Comment #1: We thank Reviewer #1 for their kind words and for their help with revisions.

 

  • Where the paper could be improved is in making the presentation of the introduction, the method, and the results more modularly-separable from the theoretical overlay.

Authors’ Response to Reviewer #1 Comment #2: We appreciate this constructive comment and we have made several revisions in response to it. Based on this and Reviewer #2’s comments, we reread the introduction and found it to be very dense, which limits its accessibility to a broad audience. We made substantial revisions to the introduction by making the constructs and the theoretical assumptions of SPECT more accessible to a broad readership.

 

With respect to the methods section, we were careful to specify the measures, rather than the constructs.We also split the “Results and Discussion” section into two sections, so that we no longer include an interpretation of the Results in the Results sections. We still reference the Computational Effort and the Coherence Gap Resolution hypotheses in the Results section of Experiment 2 when needed to remind readers that there is a theoretically motivated reason why the relationship between explanations and segmentation in the Bridging-Action absent condition could have been positive or negative.

 

  • I have no quarrel with the theoretical interpretation. However, at present, the paper is largely unreadable without having to buy into the concepts and terminology of the theoretical overlay.  That is a shame, because theories come and theories go.  I suspect these data will stand.  So, it would do the reading audience a huge favor to present the approach, the experimental design, the method, and the results using only the terms that describe the measures via their operational definitions.  That means, for example, it should be possible to understand the results on their own, without ever having to know how the measures map onto the theoretical ideas of “mapping,” “shifting,” and “model integration.”

Authors’ Response to Reviewer #1 Comment #3: We revised the manuscript by removing references to mapping and shifting when describing the results. We separated the Results and Discussion sections in both Experiments, and we also revised the General Discussion by describing what we found before using SPECT to guide the interpretation of the results. We redescribe each result, and then we interpret it.

 

            In addition, we appreciate the reviewer’s skepticism of our theoretical framing, as that is the cornerstone of science. However, we also want to convey that SPECT adopts and extends assumptions of the Structure Building Framework, which describes different comprehension processes.  

 

  • As I understand things, independent participants (college students) in two experiments were shown the same sequences of kid’s picture book stories and asked to do different tasks in each experiment. The main IV was the presence or absence of bridging-actions pictures. Experiment 1 showed that people segment picture stories and look longer in places where bridging-actions are missing (Figure 3).  Experiment 2 shows that people talk more (work harder to make sense) in a talk-aloud protocol when bridging-actions are missing (Table 2).  Correlations between these two measures (using story sequence as the unit) showed that the probability of segmentation was negatively correlated with the frequency of explanations, but positively correlated with the frequency of paraphrasing when the bridging actions were present (Figure 4).  When the bridging actions were absent, the opposite occurred.  The probability of segmentation was positively correlated with the frequency of explanations, but negatively correlated with the frequency of paraphrasing.

I would prefer to see all those results laid out first, without reference to the author’s theoretical framework.  That framework can be retained in the intro to motivate the design of the study, but they should refrain from using the theoretical terms to describe the method and the results.  In the Discussion, the results should first be summarized using the terms of the measures.  Then they should be interpreted with respect to plausible generic (atheoretical) links between the probability of segmentation and frequency of types of verbalization. Answer the question:  why might any cognitive researcher expect these relations to hold?  This will allow the reader to form their own understanding of the data that needs explanation.  Then, and only then, should the authors’ favored theoretical take be given to the data, along with the consideration of alternative ways of understanding the same data.

 

Authors’ Response to Reviewer #1 Comment #4: The description of the results made by Reviewer #1 is correct. We edited the Discussion of Experiment 2 by first restating each result in Experiment 2. We then interpret the results using SPECT as the lens for interpreting the data.

 

We also note that the results have important implications for other theories of event comprehension, such as Event Segmentation Theory in the General Discussion. This highlights the point that these results are not trivial and other accounts cannot easily explain them. Specifically, we discuss how the finding that people made few predictions on images was surprising because Event Segmentation Theory argues that people generate predictions and segment when predictions fail.

 

The Discussion of Exp 2 reads as follows:

“This is surprising, given theoretical proposals that emphasize the role of predictions in event comprehension (Zacks et al., 2007); however, it is consistent with prior work showing that predictions happen infrequently and tend to occur only when possible narrative outcomes are highly constrained (Magliano et al., 1996). The outcomes in the End-State pictures may not have been constrained enough to provoke predictions. Alternatively, think-alouds may not be sensitive to predictive inferences on End-State pictures.”

 

We also note in the General Discussion how the results inform Event Segmentation Theory, which provides a different mechanism for accounting for event segmentation than the one proposed by SPECT.

 

The General Discussion reads as follows:

            Finally, these results also have important implications for Event Segmentation Theory (Zacks et al., 2007), which describe a different set of mechanisms than those proposed by SPECT to account for event segmentation. Specifically, Event Segmentation Theory says that the event model generates predictions for the near future, and that people segment and shift to create a new event model when there are spikes in prediction error. Event Segmentation Theory does not contain an explicit mechanism to support bridging inference generation and mapping. We found that explanations and segmentation were negatively associated when Bridging-Actions were absent, which indicates that backward mapping (Papenmeier et al., 2019), in addition to prediction errors, affects shifting.

 

Finally, we also elaborate more on the unexpected finding that explanations and segmentation was positively associated in the Bridging-Action present condition. We discuss this finding because it is an avenue for future research. The answer to the question about whether any cognitive researcher would expect these results to hold is that they may not, and these effects should be replicated.

 

The general discussion reads as follows:

 

“We also found a positive relationship between the likelihood of segmenting and the frequency of explanations in the Bridging-Action present condition. This finding was unexpected, as the competing Coherence Gap Resolution and Computational Effort hypotheses focus on the direction of the relationship when participants needed to resolve coherence gaps. One possibility is that differences in the background and foreground actions may help explain the nature of the relationship between explanations and segmentation in the present condition. For example, the foreground in Figure 1 shows the saxophone player searching for the frog in the saxophone. We selected the target episodes in the stories because the foregrounded actions require a bridging inference when the Bridging-Action picture was absent. The ease of comprehending the foregrounded actions in the present condition may have led viewers to attend more to the background actions, which influenced the generation of explanations about why those actions were happening (e.g., why are the other musicians angry at the saxophone player?). The effort required to explain the background actions may have increased the likelihood of shifting to create a new event model, consistent with a modified version of the Computational Effort Hypothesis applied to the present condition. Future research should replicate this unexpected finding and examine this possibility.”

 

 

  • Related to this point — that the paper is too heavily oriented to a particular theoretical framework — I noted that that there an unusually high proportion of self-references in the paper. I think it is reasonable to retain only 4-5 of them to make the necessary points to sustain the theoretical summary.

Authors’ Response to Reviewer #1 Comment #5: We removed many of the references to our own work from the introduction. We retained 6 self citations in the introduction for the following reasons:

 

Loschky et al., (2020) describe the theoretical framework and the assumptions from the framework that we are testing.. Results reported by Magliano et al., (1999) provided evidence that motivated the Computational Effort Hypothesis.

 

We used the results from Magliano et al., (2016) and Hutson et al., (2018) to motivate the exploration of viewing time and explanations on End-State pictures. We kept the reference to Trabasso & Magliano (1996) because we used a modified version of their coding scheme to code participants’ think alouds in Experiment 2. We kept the reference to Magliano & Graesser (1991) because we used the three pronged method, which coordinates think-alouds, theories of discourse processing and behavioral measures to study inference generation processes.

 

We also want to note that this study has a very strong theoretical motivation and that there are currently no competing frameworks in the study of visual narratives that can account for the relationship between explanations and the likelihood of perceiving a new event. With that being said, there were distinct hypotheses, motivated by SPECT, that we tested. We did not make any revisions to deemphasize the current theoretical framing in the introduction.

 

  • I make these recommendations to help preserve these data for future use by researchers with other theoretical stances. As it stands, the study is too easy to dismiss as either (1) trivial because the outcome seems quite intuitive, or (2) a platform for promoting a particular theoretical stance with data that could easily be accommodated by other accounts.

 

Authors’ Response to Reviewer #1 Comment #6: We appreciate the constructive comments and the intention. To emphasize that the results are not intuitive, we edited the abstract to showcase the two alternative competing hypotheses pertaining to the relationship between explanations and event segmentation when Bridging-Action pictures within the episodes were absent. Namely, one hypothesis predicted that the relationship between explanations and segmentation would be positive in the Bridging-Action absent condition. The alternative competing hypothesis predicted that the relationship would be negative. Clearly, if one hypothesis seems intuitive to a given reader, the alternative hypothesis cannot seem intuitive.  Both hypotheses were included in the previous submission, but perhaps we did not emphasize them enough, and so they were easy to overlook.  Thus, we have endeavored to highlight them more in the current revision, by including them in the abstract.

To our knowledge, there are no theoretical frameworks that make counter assumptions to SPECT regarding the relationship between explanations and segmentation. We emphasize this issue in the General Discussion section when discussing how Event Segmentation Theory could be updated to include a mechanism for backward mapping.

 

 

Reviewer 2 Report

Comments and Suggestions for Authors

Please, see attachment

Comments for author File: Comments.pdf

Author Response

  • General: The manuscript is very wordy with a very long Introduction. While the topic is certainly of general interest (the “cartoon caption contest” of “The New Yorker” comes to mind), it seems that the theme of the paper is not entirely appropriate for the scope of “Vision” which publishes more “hard-data” driven studies. The authors may want to consider a more psychology-focused journal.

 

Authors’ Response to Reviewer #2 Comment #1: We interpreted your comment about the wordiness of the introduction as reflecting that it was extremely informationally dense. We revised the introduction to make it more accessible, and we were able to reduce the length of the introduction by 3 pages.

We know that this manuscript is not typical of Vision. However, we were invited to contribute a manuscript that reflected our research on the comprehension of visual narratives. We hope that the readership finds the general topic of visual narratives interesting and warrants investigation. Furthermore, vision is a topic within psychology.  One of our authors (Loschky) teaches "Sensation & Perception," most of which is about vision, as a course in a Psychology Department. So, there is no separation between the study of vision and the field of Psychology (unless you are concerned only with, say, the biology of the retina). The work in our manuscript falls within the topic of "visual cognition," which is essentially "high-level vision." We assume we were invited to submit a paper to Vision for this reason.

  • Abstract: There should be no references in the abstract. Instead, short explanations would be more appropriate. Also explain the “hypotheses” that the study was testing (l. 40).

 

Authors’ Response to Reviewer #2 Comment #2: We removed the references in the abstract. We also added descriptions of the two competing hypotheses to the abstract. We thank Reviewer #2 for the suggestion. 

 

  • Introduction: is very long, but nevertheless lacks certain elements: e.g., why is there a figure for “Back-end” processing, but not for “Front-end” processing.

 

Authors’ Response to Reviewer #2 Comment #3: The focus of the paper is on the coordination of processes in the back-end. As such, we removed the detailed discussion of front-end processes from the introduction. We  also decided to remove the entire figure illustrating SPECT from the paper, to reduce the degree to which the paper is perceived as being theoretically dense.

 

  • Furthermore, I am not sure about the “narrative” shown in Fig. 1. There seems to be detractor scene apart from the main story. This aspect does not have been treated at all.

 

Authors’ Response to Reviewer #2 Comment #4: Figure 1 contains panels from a published picture story in a series that we used in the present study. Figure 1A and 1B show the manipulation of the presence of the Bridging-Action, which we explain throughout the paper. We edited the Figure by showing the Bridging-Action presence condition before the Absent condition and we edited the caption. We also changed the example in Figure 1 to a different example that we used in the experiments.

 

We also added the following to the General Discussion:

 

“We also found a positive relationship between the likelihood of segmenting and the frequency of explanations in the Bridging-Action present condition. This finding was unexpected, as the competing Coherence Gap Resolution and Computational Effort hypotheses focus on the direction of the relationship when participants needed to resolve coherence gaps. One possibility is that differences in the background and foreground actions may help explain the nature of the relationship between explanations and segmentation in the present condition. For example, the foreground in Figure 1 shows the saxophone player searching for the frog in the saxophone. We selected the target episodes in the stories because the foregrounded actions require a bridging inference when the Bridging-Action picture was absent. The ease of comprehending the foregrounded actions in the present condition may have led viewers to attend more to the background actions, which influenced the generation of explanations about why those actions were happening (e.g., why are the other musicians angry at the saxophone player?). The effort required to explain the background actions may have increased the likelihood of shifting to create a new event model, consistent with a modified version of the Computational Effort Hypothesis applied to the present condition. Future research should replicate this unexpected finding and examine this possibility.”

 

  • Also, introductory remarks and methodological explanations are intermingled (starting at l. 230).

 

Authors’ Response to Reviewer #2 Comment #5: We edited the “Current Experiments and Hypotheses” section by limiting the use of theoretical terms used in that section. Note that we needed to describe some aspects of the methods to sufficiently discuss the hypotheses and the predicted relationship between segmentation and explanations when the Bridging-Action was absent in that section.

 

  • Methods/Results: these appear thorough and well done.

 

Authors’ Response to Reviewer #2 Comment #6: We thank the reviewer for this assessment.

 

  • Discussion: There needs to be a firm statement about the message/conclusion of the study

 

Authors’ Response to Reviewer #2 Comment #7: Please note the following in the Conclusion section of the paper:

“These results suggest a complex relationship between paraphrases, explanations, and event segmentation. Specifically, explanations may promote mapping and reduce the likelihood that viewers perceive a new event. In contrast, paraphrases may reflect laying a foundation after segmenting (i.e., shifting to build a new event model). Thus, the present pair of studies are the first empirical confirmation of SPECT’s critical assumption that the processes of event segmentation and inference generation inform shifting and mapping, and the assumption that viewers segment when they fail to map incoming information onto the event model. “

 

 

  • Figures and Tables: The figure captions are inadequate, i.e., non-existent The readers should not be left to their own devices to divine what is illustrated. I am at a loss how to interpret Fig. 1! What is shown in Fig. 2?

 

Authors’ Response to Reviewer #2 Comment #8: We added missing details about how this illustrates the experimental manipulation  to Figure 1. We also removed Figure 2 from the paper. After discussing this with the authors, we decided that many of the details in the figure were not central to the current research question.

 

  • The same is true for the Table captions, especially since the lay-out of the tables is odd. What message are the authors trying to convey?

Authors’ Response to Reviewer #2 Comment #9: We edited Table 2 so that it follows APA formatting recommendations, and we edited the title of the Table. We also added references to the tables when we discussed the results shown in them. 

Reviewer 3 Report

Comments and Suggestions for Authors

 

This manuscript offers a valuable exploration of how individuals comprehend visual narratives, with a particular emphasis on the coordination between the processes of mapping (integrating new information) and shifting (segmenting experiences into distinct event models). The authors conducted two experiments to investigate the effects of coherence gaps, created by manipulating the presence or absence of Bridging-Action pictures in wordless picture stories, on these comprehension processes.

 

The study makes a significant contribution to the understanding of visual narrative comprehension by providing empirical evidence for the coordination of mapping and shifting processes. It also offers insights into how coherence gaps are managed during narrative understanding. The findings have implications for theories of narrative comprehension and could inform educational and instructional design in visual media.

 

However, several issues need to be addressed before the manuscript can be considered for publication:

 

1. Experiment Setup Details:

 

(1) The manuscript lacks details regarding the viewing distance between participants and the pictures in the experiments. This is a crucial aspect, as prior research (Li et al., 2013) has demonstrated that viewing distance and prior knowledge can significantly influence image perception. How did the authors control for or account for this factor in their experimental design?

 

Li L, Asano A, Muraki Asano C, Okajima K. Statistical quantification of the effects of viewing distance on texture perception. Journal of the Optical Society of America. A, Optics, Image Science, and Vision. 2013 Jul;30(7):1394-1403. DOI: 10.1364/josaa.30.001394. PMID: 24323155.

Explanation of Psychological Principles:

 

(2) The manuscript discusses a positive correlation between picture interpretation and event segmentation, as depicted in Figure 3B. However, a more detailed explanation of the psychological principles underlying this correlation would enhance the clarity and impact of the findings.

 

2. The complexity of the experimental images is an important factor that can influence event understanding. Did the authors consider and control for the complexity of the images used in their experiments? If so, how was this done, and what measures were taken to avoid its potential confounding effects on event understanding?

 

3. Prior knowledge plays an important role in image perception. The paper does not mention how to control or eliminate the effect of subjects' prior knowledge on the results of the experiment. This shortcoming may affect the internal validity of the experiment because it is not possible to determine whether subjects' understanding is due to the experimental material itself or to their prior knowledge.

 

4. The manuscript explores the role of causality in visual narrative comprehension. It would be beneficial to include a discussion on how the manipulation of causal cues was ensured to have a significant impact on participants' understanding of the narrative. Can the authors provide specific metrics or indicators used to quantify the strength or clarity of causality?

 

5. Visual narrative comprehension may vary across different cultures or age groups. Did the authors consider these factors in their study design? If so, how might these differences affect the generalizability of the experimental results? 

Author Response

 

 Experiment Setup Details:

(1) The manuscript lacks details regarding the viewing distance between participants and the pictures in the experiments. This is a crucial aspect, as prior research (Li et al., 2013) has demonstrated that viewing distance and prior knowledge can significantly influence image perception. How did the authors control for or account for this factor in their experimental design?

Li L, Asano A, Muraki Asano C, Okajima K. Statistical quantification of the effects of viewing distance on texture perception. Journal of the Optical Society of America. A, Optics, Image Science, and Vision. 2013 Jul;30(7):1394-1403. DOI: 10.1364/josaa.30.001394. PMID: 24323155.

Authors’ Response to Reviewer #3 Comment #1: We added details about how we controlled participants’ viewing distance from computer monitors in Experiment 1. Given practical limitations for data collection, we did not control viewing distance from the monitor in Experiment 2, which was conducted at a different university.  We have added this as a Limitation in the General Discussion, which says:

"One limitation of this study is that the think-aloud and segmentation data came from different participants at different institutions. Some of the procedures, such as viewing distance, differed between institutions. We do not think that such minor variations between labs affected our key results, though it is important to note that differences in viewing distance can be important for lower-level perception of sensory information such as visual texture (Li et al., 2013)."

Explanation of Psychological Principles:

(2) The manuscript discusses a positive correlation between picture interpretation and event segmentation, as depicted in Figure 3B. However, a more detailed explanation of the psychological principles underlying this correlation would enhance the clarity and impact of the findings.

 Authors’ Response to Reviewer #3 Comment #2: We thank Reviewer 3 for this excellent suggestion. We added a brief discussion of how this result is consistent with at least one prior study in the General Discussion. The General Discussion now reads as follows:

“Furthermore, the inability to generate explanations to bridge the gap could have increased the need to describe the actions conveyed in the pictures as participants laid the foundation for the subsequent event model, hence the positive trend between the likelihood of segmentation and picture paraphrasing shown in Figure 3B. Loschky et al. (2015) lends some credibility to this possibility. They manipulated the amount of a film clip participants saw before viewing a critical shot. To understand the critical shot in the clip, viewers had to generate an inference to connect it with prior information they watched in the movie. Those who saw more of the clip, and thus, those who had a richer event model of the narrative, were more likely to generate an inference that causally connected the two shots in their think alouds. Those participants were also less likely to perceive an event boundary on the critical shot, and they were less likely to describe the contents of the shot. Conversely, those who watched less of the film clip prior to the critical shot, and thus, those who had a poorer event model of the narrative, were less likely to generate the inference, they were more likely to perceive the critical shot as the start of a new event, and they were more likely to describe the contents of the shot. Taken together with our results, explaining actions goes together with inference generation, as part of mapping information onto the viewer's current event model, whereas shifting (i.e., segmenting) goes together with describing the contents of the scene,  as part of laying the foundation for a new event model. Future research could norm the degree of causal relatedness between pictures when the Bridging-Action was absent and evaluate these possibilities.”

  1. The complexity of the experimental images is an important factor that can influence event understanding. Did the authors consider and control for the complexity of the images used in their experiments? If so, how was this done, and what measures were taken to avoid its potential confounding effects on event understanding?

Authors’ Response to Reviewer #3 Comment #2: The materials section in Experiment 1 now reads as follows. We bolded the relevant statements in the response for emphasis.

“Participants viewed six picture stories (ranging from 24–26 images each) selected from picture story books written by Mercer Mayer (Mayer, 1967, 1973, 1975, 1980; Mayer & Mayer, 1971). We used the same pictures as Magliano et al. (2016) (see also Brich et al., 2024; Huff et al., 2020; Hutson et al., 2018). Magliano and colleagues edited the original pictures to reduce their complexity, because the original pictures contained a considerable amount of background details that were inconsistent across pictures and stories.“

We also discuss how the complexity of the background and foreground actions in the picture stories may help explain the positive relationship between explanations and segmentation in the Bridging-Action present condition. The General Discussion now reads as follows:

“We found a positive relationship between the likelihood of segmenting and the frequency of explanations in the Bridging-Action present condition (Figure 3A). This finding was unexpected, because the competing Coherence Gap Resolution and Computational Effort hypotheses focus on the direction of the relationship when Bridging-Action pictures were absent. One possibility is that differences in the complexity of the background and foreground actions may help explain the nature of the relationship between explanations and segmentation in the present condition. For example, the foreground in Figure 1 shows the musician searching for the frog in the saxophone. We selected the target episodes in the stories because the foregrounded actions require a bridging inference when the Bridging-Action picture was absent. The ease of comprehending the foregrounded actions in the present condition may have led viewers to attend more to the background actions, which influenced the generation of explanations about why those actions were happening (e.g., why are the other musicians angry at the saxophone player?). The effort required to explain the background actions may have increased the likelihood of shifting to create a new event model, consistent with a modified version of the Computational Effort Hypothesis applied to the present condition. Future research should norm the complexity of the foreground and background actions, replicate this unexpected effect, and examine how the foreground and background actions influence explanations and event segmentation.

  1. Prior knowledge plays an important role in image perception. The paper does not mention how to control or eliminate the effect of subjects' prior knowledge on the results of the experiment. This shortcoming may affect the internal validity of the experiment because it is not possible to determine whether subjects' understanding is due to the experimental material itself or to their prior knowledge.

 Authors’ Response to Reviewer #3 Comment #3: We added a discussion about the role that experience with picture stories may have to the General Discussion.

“Further refinement of SPECT could also come from exploring individual and group-level differences in inference generation and event segmentation. Differences such as age, working memory capacity, general knowledge, domain specific knowledge, reading skill, and experience reading picture stories contribute to how viewers engage inference generation and segmentation processes (Calvo, 2005, Gernsbacher, Varner, & Faust, 1990; Hutson, et al., 2021; McCarthy & Goldman, 2019; Pitts et al., 2021; Singer et al., 1992; Whitney et al., 1991). For instance, older adults, poor comprehenders, and individuals with lower working memory capacity may be less likely to explain actions and more likely to perceive event boundaries at a coherence gap (Gernsbacher, Varner, & Faust, 1990; Whitney et al., 1991). One important individual difference to consider with respect to visual narratives is exposure to the medium (Cohn, 2020). Visual narratives follow conventions, and it is well documented that individual differences in experience with visual media (both consumption and production) can affect participants’ comprehension of picture stories (Cohn & Kutas, 2015; Cohn & Maher, 2015; Cohn et al., 2012).  There is a rich history of research exploring individual and group-level differences in the context of reading, and we hope to see more research on this issue in the future when people comprehend picture stories. The results of the present study suggest that this research should focus on the relationship between mapping and shifting.”

 

  1. The manuscript explores the role of causality in visual narrative comprehension. It would be beneficial to include a discussion on how the manipulation of causal cues was ensured to have a significant impact on participants' understanding of the narrative. Can the authors provide specific metrics or indicators used to quantify the strength or clarity of causality?

 Authors’ Response to Reviewer #3 Comment #4: We agree that future work should norm the degree to which pictures in picture stories are causally related when Bridging-Action pictures are absent. We revised the General Discussion by acknowledging that “the materials used in this study were naturalistic picture stories; therefore, we did not control the level of causal relatedness between the Beginning- and End-State pictures.” and that “episode-specific variability in the degree of causal relatedness may have affected the likelihood of explaining, paraphrasing, segmenting, and the nature of their relationship.”

 We also encourage “future research [to] norm the degree of causal relatedness between pictures when the Bridging-Action was absent and evaluate” the possibility that causal relatedness after removing pictures affects the relationship between segmentation and inference generation processes measured in think alouds.

 

  1. Visual narrative comprehension may vary across different cultures or age groups. Did the authors consider these factors in their study design? If so, how might these differences affect the generalizability of the experimental results?

 Authors’ Response to Reviewer #3 Comment #5: We agree that individual and group-level differences are important factors that can influence the engagement of mapping and shifting processes. We added the following to our General Discussion.

“Further refinement of SPECT could also come from exploring individual and group-level differences in inference generation and event segmentation. Differences such as age, working memory capacity, general knowledge, domain specific knowledge, reading skill, and experience reading picture stories contribute to how viewers engage inference generation and segmentation processes (Calvo, 2005, Gernsbacher, Varner, & Faust, 1990; Hutson, et al., 2021; McCarthy & Goldman, 2019; Singer et al., 1992; Whitney et al., 1991). For instance, older adults, poor comprehenders, and individuals with lower working memory capacity may be less likely to explain actions and more likely to perceive event boundaries at a coherence gap (Whitney et al., 1991).An important individual difference to consider with respect to visual narratives is exposure to the medium (Cohn, 2020). Visual narratives follow conventions, and it is well documented that individual differences in experience with visual media (both consumption and production) can affect participants’ comprehension of picture stories (Cohn & Kutas, 2015; Cohn & Maher, 2015; Cohn et al., 2012).  There is a rich history of research exploring individual and group-level differences in the context of reading, and we hope to see more research on this issue in the future when people comprehend visual narratives. The results of the present study suggest that this research should focus on the relationship between mapping and shifting.

It is also important to consider the extent that culture affects the comprehension of picture stories  (Cohn, 2024). Cultural differences influence how viewers engage front-end processes of attentional selection and back-end processes involved in comprehension (Nisbett & Masuda, 2003; Nisbett, Peng, Choi, Norenzayan, 2001). Moreover, there are cultural variations in the conventions of visual narratives that vary across North America, Europe, and Asia (Cohn, 2024), which could have important implications on comprehension.  As such, replicating this study with samples from different cultures is warranted.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I am satisfied with the authors responses to my concerns on the original draft.

Author Response

We are pleased that we have addressed the reviewers concerns.

 

 

Back to TopTop