Next Article in Journal
Radial Peripapillary Capillary Plexus Perfusion and Endothelial Dysfunction in Early Post-SARS-CoV-2 Infection
Next Article in Special Issue
The Riemannian Geometry Theory of Visually-Guided Movement Accounts for Afterimage Illusions and Size Constancy
Previous Article in Journal
Pitch–Luminance Crossmodal Correspondence in the Baby Chick: An Investigation on Predisposed and Learned Processes
Previous Article in Special Issue
Binocular Viewing Facilitates Size Constancy for Grasping and Manual Estimation
 
 
Article
Peer-Review Record

Can People Infer Distance in a 2D Scene Using the Visual Size and Position of an Object?

by John Jong-Jin Kim 1,* and Laurence R. Harris 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 29 March 2022 / Revised: 26 April 2022 / Accepted: 29 April 2022 / Published: 4 May 2022
(This article belongs to the Special Issue Size Constancy for Perception and Action)

Round 1

Reviewer 1 Report

The topic of the paper is very interesting and the methodology used is very well applied. The discussion and conclusion is also well written and overall impression of the paper is very good. 

Author Response

We would like to thank the referee for taking the time to review our manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

Review: "Can people infer distance in a 2D scene using the visual size and position of an object?" by John J.-J. Kim and Laurence R. Harris

The authors investigate participants' ability to judge the size and position of familiar objects in 3-dimensional space. They perform a series of online experiments in which 2-dimensional screens are first calibrated to the participants' screen sizes and to the size of one familiar object, the participant's own smartphone. Participants then perform various tasks, placing a smartphone with fixed screen size on the correct depth (elevation) in the scene or adjusting a smartphone at a fixed position to have the right pictorial size. Results unequivocally show that participants have great difficulty doing this, especially in the second task, and that they tend to place the object too far and make it too large.

I find this data quite convincing. The paper is lucidly written, each experiment is well conducted, and all experiments give converging evidence about the underlying distortions in spatial representations. I still have a few annotations, though.

MAJOR

- The theoretical motivation for the study is very light. There has been a lot of work on the perception of three-dimensional space from pictorial cues. For example, what are the relevant psychophysical finding, e.g., Stevens' law for distance and object size? If the exponents are less than 1, the observed biases in adjustment (too far, too large) would be predicted. In light of all the work in distance estimation, can the authors make more specific hypotheses? Are there different theories with different predictions? In pictorial scenes with multiple depth cues, how are the different cues weighted and integrated? Just as one possible example: If participants use perceived area of the phone instead of the perceived height or width, their Stevens' exponent would change from 1.0 to 0.7, and the variance in their estimates would increase as well. The discussion section presents such possibilities, but a discussion in hindsight is no replacement for a solid set of ante hoc hypotheses.

- The pictorial 2-D environment is a bit underconstrained, making it more difficult for the observer to judge distances (except perhaps in the experiment that adds familiar objects). In particular, it is difficult for the observer to estimate the distance of the walls to the viewpoint position. Without that information, it is difficult to assess the grain size of the walls' texture. Moreover, the walls come into view only at a certain distance in front of the viewpoint (where they enter the monitor frame), and the angle at which the nearest visible wall texture elements are observed must be estimated as well. The ground texture is quite useless to begin with, but even the wall's texture quickly becomes unreliable, already at medium distances. Never forget that actual 2D games remove a lot of the spatial uncertainty in visually impoverished environments by allowing the player to move!

- Participants' drop-out / exclusion rate is alarmingly high. At the same time, quite a few participants that remained in the analysis give quite implausible responses, deviating from the veridical values by more than an order of magnitude, even in the more reliable size-to-position task. This suggests that maybe not every person on board followed the instructions.

MINOR

- p. 1: size constancy is in all likelihood not computed in V1 but in higher-order areas feeding back to V1.

- Fig. 2 misses the important variable h_Object. The picture may benefit from graphically emphasizing the similar triangles.

- p. 4: the upper and lower boundaries on the participants' ages cannot be known.

- p. 5: if the participants already used a ruler to measure the black rectangle and their own distance from the screen, why couldn't they measure their phones and screens directly? Instead, the authors rely on an estimate of relative phone size to estimate the screen size. My current smartphone has 25% longer edges than my previous one, giving it more than 50% larger screen area. So the error in using standard sizes for actual sizes might be substantial, especially when it is not clear which source of information (length or area) is eventually used.

- Are the results in line with any psychophysical law?

Author Response

Thank you for taking time to review our manuscript. Please find below for our responses.

Reviewer’s Comment:

Review: "Can people infer distance in a 2D scene using the visual size and position of an object?" by John J.-J. Kim and Laurence R. Harris

The authors investigate participants' ability to judge the size and position of familiar objects in 3-dimensional space. They perform a series of online experiments in which 2-dimensional screens are first calibrated to the participants' screen sizes and to the size of one familiar object, the participant's own smartphone. Participants then perform various tasks, placing a smartphone with fixed screen size on the correct depth (elevation) in the scene or adjusting a smartphone at a fixed position to have the right pictorial size. Results unequivocally show that participants have great difficulty doing this, especially in the second task, and that they tend to place the object too far and make it too large.

I find this data quite convincing. The paper is lucidly written, each experiment is well conducted, and all experiments give converging evidence about the underlying distortions in spatial representations. I still have a few annotations, though.

MAJOR

Comment 1:

- The theoretical motivation for the study is very light. There has been a lot of work on the perception of three-dimensional space from pictorial cues. For example, what are the relevant psychophysical finding, e.g., Stevens' law for distance and object size? If the exponents are less than 1, the observed biases in adjustment (too far, too large) would be predicted. In light of all the work in distance estimation, can the authors make more specific hypotheses? Are there different theories with different predictions? In pictorial scenes with multiple depth cues, how are the different cues weighted and integrated? Just as one possible example: If participants use perceived area of the phone instead of the perceived height or width, their Stevens' exponent would change from 1.0 to 0.7, and the variance in their estimates would increase as well. The discussion section presents such possibilities, but a discussion in hindsight is no replacement for a solid set of ante hoc hypotheses.

Response:

We tried to fit our results into Stevens’s power law as well as Gilinsky’s formula for perceived distance, and they did not fit well with either model. It can be because the Size and Distance are perceived independently as we suggested in the manuscript. It is also possible that perceiving distance on a 2D scene on a screen has limited our participants’ perception of space and produced different results from those previously observed in a real 3D world. We will investigate it further in the future studies with a larger display in a more controlled environment.

When using the power law to estimate perceived size, the exponent was approximately 0.46 which is neither 1 for length nor 0.7 for area. We did not, however, directly ask our participants to indicate their perceived size instead they had to position the target in the 2D scene. Therefore, it does not seem feasible to accurately determine which they used from our results.

We revised our introduction (pg. 4) and discussion (pg., 15-16) to mention the results not being aligned with these models.

 

Comment 2:

- The pictorial 2-D environment is a bit underconstrained, making it more difficult for the observer to judge distances (except perhaps in the experiment that adds familiar objects). In particular, it is difficult for the observer to estimate the distance of the walls to the viewpoint position. Without that information, it is difficult to assess the grain size of the walls' texture. Moreover, the walls come into view only at a certain distance in front of the viewpoint (where they enter the monitor frame), and the angle at which the nearest visible wall texture elements are observed must be estimated as well. The ground texture is quite useless to begin with, but even the wall's texture quickly becomes unreliable, already at medium distances. Never forget that actual 2D games remove a lot of the spatial uncertainty in visually impoverished environments by allowing the player to move!

Response:

That is an important point that the 2-D environment we provide is difficult for observer to judge distance. The observers may be able to get some information from the angle of the lines coming from the vanishing point (i.e., walls and the ground) to estimate the distance of the walls or their heights. However, it was intentionally designed to give ambiguous distance information (e.g., unknown sizes for the stones on the wall), so the observers would be forced to use the target’s visual size and its position in the picture. We explained this in page 6, section 2.1.4.

We revised the discussion (pg. 15) to address some distance information that can be used from our pictorial 2-D scene.

Comment 3:

- Participants' drop-out / exclusion rate is alarmingly high. At the same time, quite a few participants that remained in the analysis give quite implausible responses, deviating from the veridical values by more than an order of magnitude, even in the more reliable size-to-position task. This suggests that maybe not every person on board followed the instructions.

Response:

Due to the nature of an online experiment with undergraduate students participating for course credit, we conducted a thorough evaluation to make sure the data collected were from legitimate responses which resulted in a high rate of participants being dropped out from the analysis. Although it is possible those may not have followed the instruction fully, they did not show clear sign of illegitimate responses according to our criteria. The criteria and procedures for removing participants are explained in the section 2.7. Post-test Data Cleaning on page 11. We revised the section slightly to address this.

Past studies evaluating perceived size and distances have shown large individual differences (e.g., Gogel & Mertens, 1967; Higashiyama, 1983). The deviation of responses may be due to the individual differences rather than, although possible, the participants not following the instructions. If they simply could not do the task, it further suggests that they could not infer distance from the size/position of the target. We revised the discussion (pg. 17 - 18) to add to the possible explanation of the results.

 

MINOR

Comment 4:

- p. 1: size constancy is in all likelihood not computed in V1 but in higher-order areas feeding back to V1.

Response:

We revised the paragraph to address your concern.

Comment 5:

- Fig. 2 misses the important variable h_Object. The picture may benefit from graphically emphasizing the similar triangles.

Response:

The figure was revised to graphically show hobject.

Comment 6:

- p. 4: the upper and lower boundaries on the participants' ages cannot be known.

Response:

Although many participants declined to provide their age or keep it hidden when participating for the study, they had to be in between the age of 18 and 45 in order to be able to sign up for the study via the York University’s URPP - Sona System. We re-phrased the paragraph to make this clearer.

Comment 7:

- p. 5: if the participants already used a ruler to measure the black rectangle and their own distance from the screen, why couldn't they measure their phones and screens directly? Instead, the authors rely on an estimate of relative phone size to estimate the screen size. My current smartphone has 25% longer edges than my previous one, giving it more than 50% larger screen area. So the error in using standard sizes for actual sizes might be substantial, especially when it is not clear which source of information (length or area) is eventually used.

Response:

Originally, in the Exp 1 before we asked participants to measure the reference length (the black rectangle), a) we expected that using their smartphone to set the target size (i.e., placing it on the screen and adjusting the target size to match it to the phone) give them better sense of the target being the same size as their smartphone and b) we were not sure whether participants would have rulers at home to measure their smartphone. We found it difficult to verify whether the phone sizes entered were realistic, so we later added the measurement of the black rectangle, for Exp 2 and 3. At the beginning of the experiment, we ask them to use a ruler app on their smartphone during the ‘Enter Reference Length’ stage if they do not have a physical ruler at home.

We kept the process of entering the smartphone size, and using standard sizes, to keep it consistent throughout experiments. We understand that there are errors associated with using the standard sizes rather than asking them to directly measure their phone sizes. By using a standard average size, we expected the errors were also averaged out.

Comment 8:

- Are the results in line with any psychophysical law?

Response:

We revised our discussion to point out that our results did not fit well with some of the psychophysical laws and provided possible explanations. (pg., 15)

Author Response File: Author Response.docx

Reviewer 3 Report

In the Reviewer opinion the research paper entitled “Can people infer distance in a 2D scene using the visual size and position of an object?” is good.

In this study was tested how to use an object’s size and its position in a 2D image to determine its distance. In a series of online experiments, participants viewed a target representing their smartphone rendered within a 2D scene. They either positioned it in the scene at the distance they thought was correct based on its size or adjusted the target to the correct size based on its position in the scene. In all experiments, the adjusted target size and positions were not consistent with their initially presented positions and sizes and were made larger and further on average.

Some comments which greatly enhance the understanding of the paper and its value are presented below. Specific issues that require further consideration are:

  1. The title of the manuscript is matched to its content. In my opinion question in the title is very interesting
  2. The Introduction generally covers the cases.
  3. The methodology was clearly presented.
  4. In the Reviewer’s opinion, the current state of knowledge relating to the manuscript topic has been presented, but the author's contribution and novelty are not enough emphasized.
  5. Experimental program and results looks interesting and was clearly presented.
  6. In the Reviewer’s opinion, the bibliography, comprising 30 references, is more less representative.
  7. An analysis of the manuscript content and the References shows that the manuscript under review constitutes a summary of the Author(s) achievements in the field.
  8. In the Reviewer’s opinion the manuscript is well written, and it should be published in the journal after minor revision.

Author Response

Thanks for the comments. Please find below for our responses.

Reviewer’s Comment:

In the Reviewer opinion the research paper entitled “Can people infer distance in a 2D scene using the visual size and position of an object?” is good.

In this study was tested how to use an object’s size and its position in a 2D image to determine its distance. In a series of online experiments, participants viewed a target representing their smartphone rendered within a 2D scene. They either positioned it in the scene at the distance they thought was correct based on its size or adjusted the target to the correct size based on its position in the scene. In all experiments, the adjusted target size and positions were not consistent with their initially presented positions and sizes and were made larger and further on average.

Some comments which greatly enhance the understanding of the paper and its value are presented below. Specific issues that require further consideration are:

  1. The title of the manuscript is matched to its content. In my opinion question in the title is very interesting
  2. The Introduction generally covers the cases.
  3. The methodology was clearly presented.
  4. In the Reviewer’s opinion, the current state of knowledge relating to the manuscript topic has been presented, but the author's contribution and novelty are not enough emphasized.

Response for 4:

We revised discussion to further emphasize our contribution to the literatures (pg., 16 & 19).

  1. Experimental program and results looks interesting and was clearly presented.
  2. In the Reviewer’s opinion, the bibliography, comprising 30 references, is more less representative.

Response for 6:

We revised discussion and added a few more references in the process.

  1. An analysis of the manuscript content and the References shows that the manuscript under review constitutes a summary of the Author(s) achievements in the field.
  2. In the Reviewer’s opinion the manuscript is well written, and it should be published in the journal after minor revision.

Response for 1, 2, 3, 5, 7, 8:

Thanks for taking time to review our manuscript.

Author Response File: Author Response.docx

Reviewer 4 Report

In my opinion, the explanation in the text of the paper requires:

- what is the Bonferroni correction;

- what is repeated measures ANOVA;

- Why were the results of some participants not analyzed in individual experiments (e.g. experiment 2 - 48 participants), what were the rejection criteria?

Author Response

Thank you for taking time to review our manuscript. Please find our responses for your comments below.

Reviewer comment 1:

In my opinion, the explanation in the text of the paper requires:

- what is the Bonferroni correction;

- what is repeated measures ANOVA;

Response:

We added an explanation for the Bonferroni correction and the repeated measures ANOVA when they are first mentioned in the manuscript on page 11.

 

Reviewer comment 2:

- Why were the results of some participants not analyzed in individual experiments (e.g. experiment 2 - 48 participants), what were the rejection criteria?

Response:

Due to the nature of an online experiment with undergraduate students participating for course credit, we conducted a thorough evaluation to make sure the data collected were from legitimate responses. Those participants not analyzed were suspected of not doing the tasks properly. The criteria and procedures for removing participants are explained in the section 2.7. Post-test Data Cleaning on page 11.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors have convincingly addressed by concerns and improved the paper accordingly. I recommend publication.

Back to TopTop