Next Article in Journal
A Python Algorithm for Shortest-Path River Network Distance Calculations Considering River Flow Direction
Next Article in Special Issue
Identifying GNSS Signals Based on Their Radio Frequency (RF) Features—A Dataset with GNSS Raw Signals Based on Roof Antennas and Spectracom Generator
Previous Article in Journal
Basic Features of the Analysis of Germination Data with Generalized Linear Mixed Models
 
 
Data Descriptor
Peer-Review Record

SocNav1: A Dataset to Benchmark and Learn Social Navigation Conventions

by Luis J. Manso 1,*, Pedro Nuñez 2, Luis V. Calderita 2, Diego R. Faria 1 and Pilar Bachiller 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 11 December 2019 / Revised: 28 December 2019 / Accepted: 9 January 2020 / Published: 14 January 2020
(This article belongs to the Special Issue Data from Smartphones and Wearables)

Round 1

Reviewer 1 Report

The paper concerns a dataset for contextual analysis of robot subjective disturbance. The dataset fits into the modern research gaps and seems to be well prepared. The paper is generally well-written.

However, I recommend three major and a couple of minor changes.

The major issues:

(1) the paper provides no validation of the dataset. If it is supposed to be the benchmark, as indicated in the introduction section, some preliminary results from learning should be provided. If you provide some training results, it would constitute a feasibility study to prove the dataset is correctly prepared for the purpose.

(2) the dataset is by definition divided into training, test and validation sets, which is in my opinion, rather a disadvantage than an advantage. The common practice in machine learning is to use cross-validation protocols, such as leave-one-out or others than to have the training, testing and validation sets fixed. If you consider this split necessary, I would recommend to provide a clear justification of the decision and to provide the procedure for cross-validation protocol users.

(3) the paper provides general information only about labeling discrepancies among the annotators. Please consider providing more detailed measures of inconsistency. Inter-rater consistency is usually measured by Kappa coefficient. Moreover, apart from inter-rater consistency, intra-rater consistency check would be also interesting.

Minor issues:

(1) In the introduction, line 11, you mention non-Euclidean machine learning algorithms. Why those? They are never mentioned again in the paper.

(2) The location of the human is a range or the center of the human? (line 89)

(3) line 100- scenarios were classified or rather labeled?

(4) Figure 1 - some people might print the paper in grey scale. Please provide a visual icon-label descriptions for non-color users.

(5) lines 174-176 - measure of dispersion - this variable was not defined, no scale provided.

 

Author Response

(1) the paper provides no validation of the dataset. If it is supposed to be the benchmark, as indicated in the introduction section, some preliminary results from learning should be provided. If you provide some training results, it would constitute a feasibility study to prove the dataset is correctly prepared for the purpose.

Thank you very much for your recommendation.

We have replaced the title of the section “Basic analysis” with “Analysis and validation of the dataset”. In this section, we have included new measures providing information about the discrepancy of the subjects as well as some results obtained applying the dataset to GNNs for social navigation. A new figure (figure 3)  has been added showing the results of a GNN trained with the proposed dataset. 

(2) the dataset is by definition divided into training, test and validation sets, which is in my opinion, rather a disadvantage than an advantage. The common practice in machine learning is to use cross-validation protocols, such as leave-one-out or others than to have the training, testing and validation sets fixed. If you consider this split necessary, I would recommend to provide a clear justification of the decision and to provide the procedure for cross-validation protocol users.

We agree that cross-validation is the best approach when the size of the dataset is relatively small. However, we consider that using hold-out is more appropriate in this context: a) it is prohibitively time consuming when using large datasets, especially when applying deep learning and hyperparameters have to be tuned too; b) given the complexity and variability of the data, 556 scenarios for development and final testing can be considered a large number while being only 3.4% of the size of the training dataset; and c) providing a split encourages a proper experiment design for comparison purposes (avoiding using test samples while training), which unfortunately is more common than one would initially expect. We have included information about this in the introduction.

Having said that, if the reviewer insists, we would be willing to merge the three splits and offer the dataset in a single file too, keeping the split data as an optional format.

 

(3) the paper provides general information only about labeling discrepancies among the annotators. Please consider providing more detailed measures of inconsistency. Inter-rater consistency is usually measured by Kappa coefficient. Moreover, apart from inter-rater consistency, intra-rater consistency check would be also interesting.

Thank you very much for your suggestion. We have extended section 4 and included a further analysis of the data considering inter-rater and intra-rater consistency. Results for three subjects have been depicted in table 1. Additional explanations have also been included.

Minor issues:

(1) In the introduction, line 11, you mention non-Euclidean machine learning algorithms. Why those? They are never mentioned again in the paper.

Dear reviewer,

The reason lies in the nature of the data since the underlying structure is not a Euclidean space. In lines 38-39 the reader is invited to deepen this by means of reference 9. 

Line 38-39: Because of the structured nature of the data, SocNav1 is particularly well-suited to be used to benchmark non-Euclidean machine learning algorithms such as Graph Neural Networks (see [9]).

Consequently, we believe that it is not necessary to add more information to the paper.

(2) The location of the human is a range or the center of the human? (line 89)

It's the centre of the human. Specifically, it is the center of a bounding-box 40 cm wide and 20 cm deep. We have rewritten the sentence:

Line 90: xPos, yPos (it is the center of the human and represents its location) expressed in centimetres...

(3) line 100- scenarios were classified or rather labeled?

Thank you. We have changed it to 'labelled'.

(4) Figure 1 - some people might print the paper in grey scale. Please provide a visual icon-label descriptions for non-color users.

Thank you for this recommendation. The suggestion has been made.

(5) lines 174-176 - measure of dispersion - this variable was not defined, no scale provided.

Thank you for your comment. We have included the definition of the pooled standard deviation. In addition, we have clarified the interpretation of the obtained values according to the maximum difference between two labels (lines 201-209). 

 

Reviewer 2 Report

The paper presents a synthetic dataset of social navigation: a scenario is constructed where humans and robots are placed in a room. A human subject must now annotate how 'safe' the scenario is. The paper is well-written. I think the dataset would be useful for applications such as the ones described in the paper.

0) 'Safe for whom'? It is not clearly defined if subjects have a specific human in the scene to 

1) Please define the term 'scenario' in Section 2. A reader understands what it means later in the manuscript - but a definition early on, with an example would be useful.

2) Humans are assumed to be 20 cm 'deep'. Can 'deep' be changed to another word?

3) Line 98: What are the 'two sets of possible scenarios'? What do they indicate in the context of the dataset? How do the two sets differ? It would be useful to add a description to understand the range of scenarios covered in the dataset. This question is related to (1) above.

4) Please define the term 'interaction area' in Section 3.

5) It would be useful to add a note on how the diversity of scenarios was ensured.

6) The parallel black lines indicate interactions. They seem to be arbitrarily defined. Please add a note on how they were sampled/generated.

 

7) The analysis of the dataset, however, suggests that the notion of safety/safe space is subjective for human subjects.

8) Additional details of human subjects, their background and relevant demographics could be added. This is related to (7) above.

Author Response

Dear reviewer, thank you very much for your review and suggestions. Your comments are very helpful to improve our paper.

We have studied all the comments and made the necessary modifications in the paper to include your suggestions. We also include a response to each of your comments below.

 

0) 'Safe for whom'? It is not clearly defined if subjects have a specific human in the scene to 

Thank you for your comment. Subjects do not have a specific human assigned in the scene. They have to label each situation taking into account the disturbance caused by the robot to any human. We have clarified this point in the section “Methods” (lines 148-150).

1) Please define the term 'scenario' in Section 2. A reader understands what it means later in the manuscript - but a definition early on, with an example would be useful.

Thank you for your suggestion. A definition was added in lines 83-84. 

2) Humans are assumed to be 20 cm 'deep'. Can 'deep' be changed to another word?

We have replaced the word “deep” with “from chest to back”.

3) Line 98: What are the 'two sets of possible scenarios'? What do they indicate in the context of the dataset? How do the two sets differ? It would be useful to add a description to understand the range of scenarios covered in the dataset. This question is related to (1) above.

The second set was created later to increase the number of scenarios. Both of them differ in the specific scenarios they include, but there exists no other difference. They were randomly generated using the same generative process to cover a wide variety of situations. This point has been clarified in the text (lines 99-101).

4) Please define the term 'interaction area' in Section 3.

Thank you for your comment. The term has been defined in the text (lines 156-158). 

5) It would be useful to add a note on how the diversity of scenarios was ensured.

Thank you for your comment. An explanation has been added in lines 132-135.

6) The parallel black lines indicate interactions. They seem to be arbitrarily defined. Please add a note on how they were sampled/generated.

Thank you for your comment. An explanation has been added in lines 142-146.

7) The analysis of the dataset, however, suggests that the notion of safety/safe space is subjective for human subjects.

Yes, in fact, we ask the person who labels: what would be the degree of social acceptance of the robot in that scenario?

8) Additional details of human subjects, their background and relevant demographics could be added. This is related to (7) above.

Thanks for your comment a brief description of the subjects were added (lines 106-109). 

Round 2

Reviewer 1 Report

The authors provided validation with inter-rater and intra-rater measures.

The paper is much more comprehensive.

Good job.

Back to TopTop