Next Article in Journal
An Overview of Self-Heating Phenomena and Theory Related to Damping and Fatigue of Metals
Next Article in Special Issue
Effects of Digital Citizenship and Digital Transformation Enablers on Innovativeness and Problem-Solving Capabilities
Previous Article in Journal
Energy Consumption Characteristics for Design Parameters of Permanent Magnet-Based Al Billet Heater
Previous Article in Special Issue
Project Management Information Systems (PMISs): A Statistical-Based Analysis for the Evaluation of Software Packages Features
 
 
Article
Peer-Review Record

Rhythmic-Synchronization-Based Interaction: Effect of Interfering Auditory Stimuli, Age and Gender on Users’ Performances

Appl. Sci. 2022, 12(6), 3053; https://doi.org/10.3390/app12063053
by Alessio Bellino
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(6), 3053; https://doi.org/10.3390/app12063053
Submission received: 21 January 2022 / Revised: 8 February 2022 / Accepted: 11 February 2022 / Published: 17 March 2022
(This article belongs to the Collection Human Factors in the Digital Society)

Round 1

Reviewer 1 Report

1. Comments to Authors

1.1. Overview and general recommendation

1.1.1.

The authors perform an investigation of the Rhythmic-synchronization-based interaction, in order to evaluate the impact of interfering auditory stimuli by considering relevant variables (e.g., age). The research reports a user study with 103 participants (3 of them were however not considered for the study), and shows that interfering stimuli does not play a significant role in synchronization interactions, while age and gender directly impact their performance.

 

The study is well-written and interesting, though I have some (few) point to be addressed for publication. Hence I suggest a minor revision of the paper.

1.2. General comments

1.2.1.

In Section 2, the authors mention “However, no study has studied the effect of age, gender and interfering auditory stimuli on the ability to match a visually represented rhythm. Only by studying these aspects we can understand the extent to which similar techniques are usable (1) in contexts where interfering auditory stimuli may be present (e.g., in a living room with the radio on) and (2) by a heterogeneous population.”. I believe the last sentence could be expanded and better justified, to clearly illustrate the potential applications of this research work.

1.2.2.

In Subsection 4.1, the authors report “(M: 34, SD:14)”. However, I suggest reporting the mean as “ME”, otherwise there may be confusion with “(51M / 52F)” that has been previously reported.

1.2.3.

In Subsection 4.1., the authors report “Therefore, participants were required to have a personal computer able to emit sounds and/or wear headphones to conduct the experiment remotely.”: is there any specific environmental condition in which the participants performed the experiments? These conditions may play a role in the results of the analysis.

1.2.4.

In Subsection 4.6., the authors claim to have used non-parametric tests to detect significant differences: to which test are they referring? Figure 5 or others? I was looking for analysis concerning, for instance, the median age of the stratified subgroups.

Author Response

In Section 2, the authors mention “However, no study has studied the effect of age, gender and interfering auditory stimuli on the ability to match a visually represented rhythm. Only by studying these aspects we can understand the extent to which similar techniques are usable (1) in contexts where interfering auditory stimuli may be present (e.g., in a living room with the radio on) and (2) by a heterogeneous population.”. I believe the last sentence could be expanded and better justified, to clearly illustrate the potential applications of this research work.

Response: Thanks for the suggestion.

Changes: We reworked Section 2 (see lines 56-52, 63-76, and 88-91).

 

 

In Subsection 4.1, the authors report “(M: 34, SD:14)”. However, I suggest reporting the mean as “ME”, otherwise there may be confusion with “(51M / 52F)” that has been previously reported.

Response: Since the APA manual suggests using M for mean, we changed M for males and F for females to avoid confusion.

Changes: Please, see text highlighted in yellow in section 4.1 (line 172).

 

In Subsection 4.1., the authors report “Therefore, participants were required to have a personal computer able to emit sounds and/or wear headphones to conduct the experiment remotely.”: is there any specific environmental condition in which the participants performed the experiments? These conditions may play a role in the results of the analysis.

Response: The experiments were generally done at home (we were in strict lockdown when the experiment was conducted), and there were only a few specific environmental conditions to be met. In the introductory video (https://www.youtube.com/watch?v=RLozq5ROPUw) it was explicitly asked to use headphones (if possible) to reduce the effect of external noise, to make sure there were no distractions during the experiment, and to put the phone in airplane mode. The same recommendations were shown later in text form before beginning the experiment.

Changes: Please, see text highlighted in yellow in section 4.1 (lines 175-177). We also highlighted in yellow some text in section 4.4 (lines 239-245), where more details about the recommendations were given. Moreover, we added a limitation (suggested by another reviewer) where we stated that we did not have full control over the experimental conditions for conducting the experiment online (lines, 448-449, and 463-479).

 

 

In Subsection 4.6., the authors claim to have used non-parametric tests to detect significant differences: to which test are they referring? Figure 5 or others? I was looking for analysis concerning, for instance, the median age of the stratified subgroups.

Response: we used non-parametric tests to compare data between subject and within subjects what regards the result (Section 5). Particularly, Wilcoxon Signed-Rank test is a non-parametric test that was used to compare mute and sound conditions (corresponding to figure 5 and 6), Wilcoxon Rank Sum Test is a non-parametric test that was used to compare male and females in the different age groups (corresponding to figures 8, 10, 12 and 14) and Spearman's rank-order correlation is a non-parametric test that was used to check correlation between (1) age and time of activation and (2) age and error rate (corresponding to figures 7, 9, 11 and 13).

Regarding age of the stratified subgroups (displayed in Table 1), we used the mean instead of median for representing central tendency because ages are commonly presented using means and standard deviation. Also, in section 4.1 we used means (and standard deviation) to describe age participants – as you also previously noted when pointing out the possible confusion between M(ales) and M(ean). Moreover, when we mentioned that we would use non-parametric tests and median, we referred to the results. A matter of fact, ages groups are do not constitutes properly results, but only a stratification on which analysis was conducted. What matters, for the purposes of analysis, is that the different age groups are similar to avoid confounding effect. When talking about median to present the results, we are referring to the number of errors and activation times, which are precisely presented using medians throughout the article.

Changes. We reworked subsection 4.6 (see text highlighted in lines 283-299) to better clarify the use of non-parametric tests. We added standard deviation (SD) and absolute difference between means in Table 1 for sake of completeness.

Reviewer 2 Report

The manuscript's topic related to rhythmic-synchronization-based interaction is fascinating and may be very useful in many fields of life and science. I have several comments:

  • Please include more information about the studied group - strict inclusion/exclusion criteria for the study.
  • You have made your division into age groups. Please check whether age differences between males and females in a given group are statistically significant because the incorrectly performed division will affect the results obtained. Please include this information in the text.
  • I understand that the study was done at the time of the pandemic, but please, in the future, think about recording the course of the study (for example, by registering the image from the computer's camera). It will allow a full view of the situation because it is impossible to perform research in a fully controlled environment in such conditions. This is one of the limitations of the presented research protocol. It is suggested that this be added as a point to the limitations.
  • It was a good idea to add YouTube links to the article. The person reading the paper allows for a more thorough understanding of the research protocol.
  • There is a difference in sound perception between headphones and loudspeakers. It is suggested to check the data for this. Please divide the subjects into those who listened directly to headphones and those who received acoustic stimuli from loudspeakers. If the size of the group allows, conduct a similar analysis to the one presented above, distinguishing by gender and age group. If, on the other hand, there is a considerable disproportion in the number of headphones/speakers groups, please try to exclude the smaller group from the analysis, as it may influence the results obtained.

Formal:

  • Please mark statistically significant features on all figures (some are and some are not)
  • In line 401, there is no citation. Please add the appropriate items.

Author Response

Please include more information about the studied group - strict inclusion/exclusion criteria for the study.

Response: We added this information.

Changes: See highlighted text in subsection 4.1 (lines 169-171).

 

 

You have made your division into age groups. Please check whether age differences between males and females in a given group are statistically significant because the incorrectly performed division will affect the results obtained. Please include this information in the text.

Response:  We argue that it is not appropriate to test whether the difference in age for each group is statistically significant. In fact, we should assume that these differences are statistically significant and consider only practical importance (as referred in [5]) or substantial significance (as referred in [1]).

Let's try to elaborate more: even when a difference is significant, this only means that there is a significant difference between the means, and this difference (when is the difference in two different intervention groups) is usually called absolute effect size (which is what actually matters in a study, see [1] for a discussion on significance and effect size). On the other hand, even when a difference is not significant, it could be due to the low number of participants. In fact, it is easy to get a significant difference with many participants - and the fact that a difference is not significant with few participants does not mean that it will not be significant as the number of participants increases (see [2], which is discussed in [1]). For these reasons, we believe that it is not appropriate to test for significant differences, but we only need to evaluate for the difference between the mean ages (absolute effect size) because this is independent of the sample size (as mentioned in [1], “Unlike significance tests, effect size is independent of sample size”). Also, checking only for significant differences could create paradoxical situations (see discussion in [1] on aspirin to prevent myocardial infarction).

Returning to our study, we would like to mention that there are no significant differences in groups 26-40 and 41+, but there is a significant difference in group 18-25 (p=0.047), where males are significantly older than females by 1.3 years (see Table 1). And what is that supposed to mean? Does this mean that, since there is a significant difference, then the groups cannot be compared? Clearly the conclusion cannot be this, since the difference, although significant, is only 1.3 years. That is, the absolute effect size makes the difference irrelevant for our context. Therefore, it would be unrealistic to think that such a limited difference could alter the results, as the difference should always be associated with the context to determine how relevant it really is. In fact, we should ask ourselves: could a mean difference of 1.3 years between males and females in the 18–25-year-old group alter the results (confounding effect) with regard to activation times and errors? In this case, the answer is clearly no. Previous studies, for example, have shown that, “beginning at about age 20, reaction times increased at a rate of approximately 0.5msec/yr for simple reaction time and 1.6msec/yr for disjunctive reaction time” (see [3]). For these reasons, a difference of 1.3 years in our context has no practical importance (despite being statistically significant). Therefore, it is important to distinguish between statistically significant differences, and practical importance (see [4]). Consider also that females, despite being younger than 1.3 and therefore should be somehow advantaged by age, make more errors than males and have longer activation times. This further confirms that this age difference is not able to alter the results (there is no confounding effect).

Let's elaborate more by taking a paradoxical example. Let's imagine that we are calculating reaction times between two groups, the first group is composed of 50 males (38 users aged 22 and 12 users aged 23), while the second group is composed of 50 males (12 users aged 22 and 38 users aged 23). The difference between the ages is extremely statistically significant (p<0.0001), but the difference between the means (absolute effect size) is only 0.52 years (22.76-22.24). Does this therefore mean that the groups cannot be compared? One should not reach that conclusion, and intuitively it is also quite easy to see why. Although it is inappropriate - and objectionable - to construct the groups in this way (but that is another matter), it is easy to see that around 22 years of age psychomotor development is already quite advanced, so there is no reason to think that there is a significant decay in psychomotor performance from age 22.24 to 22.76, i.e., the age difference is extremely statistically significant, but the psychomotor decay after 0.52 years (22.76 – 22.24) has no practical importance. In fact it is less than 1msec if we consider the average reaction time decay over time [3] – or is totally irrelevant if you consider [4], where it is stated that psychomotor decay begins at 24 years old (not earlier).

Very little difference, however, may be important in other contexts. For example, suppose (absurdly - we don't know if such a study has ever been done, assuming it can be done) that we calculate the reaction times of 50 2-year-old males versus 50 3-year-old males. The results could clearly be altered by the difference in age (confounding effect). In fact, from ages 2 to 3, it is reasonable to expect children's psychomotor skills to grow substantially.

For these reasons, we do not believe that it is appropriate to test whether the difference in ages in each group is statistically significant (we should assume that they are!), but it is only necessary to consider the difference between the means (absolute effect size, as called in [1]) and discuss whether this difference is acceptable given the context. Note that in [1], the difference between the means is called the “absolute effect size” since it is assumed that it is the difference between two different intervention groups.

In our case, since the difference in age is not the consequence of two different intervention groups, we will simply call this difference as “absolute difference between means” (see table 1 description). Note moreover that throughout the article, in addition to presenting the p-value, we are consistent with the approach of also presenting the effect size. In fact, in all cases in which we present a p-value, we also show the comparison between the medians (whose difference represents, precisely, the effect size) e.g., see lines 332, 334, etc.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/

[2] https://www.polyu.edu.hk/mm/effectsizefaqs/thresholds_for_interpreting_effect_sizes2.html

[3] https://pubmed.ncbi.nlm.nih.gov/8014399/

[4] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0094215

[5] http://www.stat.columbia.edu/~gelman/research/published/signif4.pdf

 

Changes: We added the standard deviations and difference between means in Table 1 for sake of completeness. In addition, we discussed how the mean difference between ages is irrelevant to our context (lines 279-282). Finally, we revised the related works to address the observations made here (lines 154-160).

 

 

I understand that the study was done at the time of the pandemic, but please, in the future, think about recording the course of the study (for example, by registering the image from the computer's camera). It will allow a full view of the situation because it is impossible to perform research in a fully controlled environment in such conditions. This is one of the limitations of the presented research protocol. It is suggested that this be added as a point to the limitations.

Response: We will certainly consider this limitation and really thank the reviewer for expressing the concern since it allowed us to think more deeply about the lack of full control of environmental conditions. However, we believe that this point is related to the next one, so we address it later.

Changes: See below.

 

 

There is a difference in sound perception between headphones and loudspeakers. It is suggested to check the data for this. Please divide the subjects into those who listened directly to headphones and those who received acoustic stimuli from loudspeakers. If the size of the group allows, conduct a similar analysis to the one presented above, distinguishing by gender and age group. If, on the other hand, there is a considerable disproportion in the number of headphones/speakers groups, please try to exclude the smaller group from the analysis, as it may influence the results obtained.

Response: In the introductory video (https://www.youtube.com/watch?v=RLozq5ROPUw), it was recommended to use headphones (if possible) to reduce the effect of external noise (so we assume that most participants have used them). Nevertheless, we did not ask participants to state whether they actually used headphones or loudspeakers, so we don't have data to say how many actually used headphones or loudspeakers. At any rate, we argue that this is not a determining variable because perception, from a rhythmic point of view (which is precisely what we were interested in), remains virtually identical for any type of headphones or loudspeakers (from the worst quality piezoelectric speaker to the best quality closed-back headphones). In fact, rhythm is an exclusively temporal factor, and is therefore independent from the type of headphone or loudspeakers used. Therefore, the rhythm of a piece of music will always be the same regardless of the quality and type of headphone or loudspeakers. Since in our experiment we dealt with the rhythmic interference between the auditory stimulus and the visual stimulus, there is no reason to think that the quality of headphones or speakers can somehow alter the rhythmic qualities of the interfering drum loop. Clearly, the perception between loudspeakers and headphones generally changes (e.g., in terms of immersiveness and feeling of closeness of sound), but we can't talk about an exact dichotomy either. In fact, headphones and loudspeakers in the market are in themselves very different in terms of quality and fidelity – which are parameters that greatly influence the immersiveness and closeness of the sound. In addition, variability is high because there are open headphones (which let through external noises), closed-back headphones (which limit external noises), and the frequency response (which determines the fidelity) can change a lot between one headphone and another, or between one loudspeaker and another.

For these reasons, we believe that it is not appropriate to distinguish between loudspeakers and headphones as they are not comparable given the high variability within the same categories (there is continuity between headphones and loudspeakers, and it is not an exact dichotomy), and the differences should also not be theoretically relevant since, we reiterate, rhythmic perception is virtually identical between loudspeakers and headphones. However, this may be a problem related to the lack of control of the experimental setting (due to online experiments). In fact, having done the experiment online, it is likely to be assumed that participants had headphones (or loudspeakers) of different quality, so we cannot state that participants did the experiment under exactly the same conditions (as they would have with a in-presence experiment). However, we believe that this variability in conditions (difference in quality and fidelity of sound output) is random, and thus manifested uniformly across age groups and sexes. In other words, there is no reason to think that, for example, all males had very good quality headphones (or loudspeakers), and all females had very bad quality headphones (or loudspeakers). In fact, considering it as a random variable, it is safe to assume that the quality of the headphones (or loudspeakers) is uniformly distributed across all groups analyzed in this study (males, females, young, and old). Thus, while we think it is important to mention this limitation, we do not believe that it significantly affected the results. In fact, such differences in conditions could certainly create some noise in the data, but since noise occurs uniformly across all groups, it will ultimately not affect the results. In addition, having done the experiment online may also have its upsides (but we do not mention this in the paper and leave it to the reader's interpretation). For example, the fact that the participants did the experiment where they live makes somehow the experiment more realistic (because it is done under a variability of conditions that are closer to real usage) than an experiment done in a laboratory (which by its nature, is much more artificial).

Changes: We added a limitation in section 7 (lines 463-479) where we discuss the impossibility of having total control over the experimental conditions (due to online experiments) mentioning some possible criticalities (including the difference between headphones and speakers).

 

 

Please mark statistically significant features on all figures (some are and some are not)

Changes: We marked figures appropriately. Thanks.

 

In line 401, there is no citation. Please add the appropriate items.

Changes: Thanks. We added citations (they are the same discussed in the previous works), see line 423.

Round 2

Reviewer 2 Report

Dear authors,

Thank you very much for your comprehensive answers. I fully agree with you, and any changes made are accepted. Good luck!

Back to TopTop