Next Article in Journal
Unsupervised Learning Data-Driven Continuous QoE Assessment in Adaptive Streaming-Based Television System
Next Article in Special Issue
Measurements of Room Acoustic and Thermo-Hygrometric Parameters—A Case Study
Previous Article in Journal
Determinants of Top Speed Sprinting: Minimum Requirements for Maximum Velocity
Previous Article in Special Issue
A Multi-Source Separation Approach Based on DOA Cue and DNN
 
 
Article
Peer-Review Record

Ear Centering for Accurate Synthesis of Near-Field Head-Related Transfer Functions†

Appl. Sci. 2022, 12(16), 8290; https://doi.org/10.3390/app12168290
by Ayrton Urviola 1,*,‡, Shuichi Sakamoto 2,‡ and César D. Salvador 1,‡
Reviewer 1:
Reviewer 2:
Appl. Sci. 2022, 12(16), 8290; https://doi.org/10.3390/app12168290
Submission received: 25 June 2022 / Revised: 13 August 2022 / Accepted: 16 August 2022 / Published: 19 August 2022
(This article belongs to the Special Issue Immersive 3D Audio: From Architecture to Automotive)

Round 1

Reviewer 1 Report

This paper presents a method for synthesising near-field HRTFs, that uses spherical wave (SW) assumptions instead of plane wave (PW). This is important as most far-field HRTFs assume plane wave propagation, which is inappropriate at distances <1m. The authors compare the synthesised HRTFs with SW to PW, and show clear improvements. This article appears to be an extension of the I3DA conference paper of a similar name. It is a very relevant topic and one of great interest to the wider research community, and the method proposed is logical and seems to be largely well executed. Some issues with the evaluation remain, and some literature is missing. Therefore this reviewer's recommendation is to revise the article.   specific comments  
  • One highly relevant piece of literature that is missing is Armstrong 2019 thesis https://etheses.whiterose.ac.uk/27166/. This explores near-field HRTF measurement with physical ear-centering and far-field synthesis (so the synthesis is a bit of the opposite to your work here, but the principles are highly relevant), and it also compares physical ear-centering to time-alignment (such as in Zaunschirm 2018) of standard head-centered HRTFs.
  • fig 4 colours of lines are not explained in the caption, and the in-text description does not state which colours refer to which vertices of the icosahedron. Please use a legend.
  • Figs 5 and 6 are hard to read. There are so many plots. An average error value would be beneficial here, or a rethink of how to display the data.
  • The subcaptions being only in the main figure captions, rather than separately at each subfigure, are detrimental to readability. It would be much easier if each subcaption label (a), (b) etc, had the subcaption there, as well as in the main figure caption.
  • Using HRTF simulations that do not include a torso is a surprise, the torso has a significant scattering effect at lower elevations. Please include a reason for why no torso was used, and also include this in the discussion - what possible effects would this have had on the results? I assume there would have been more complex reflection patterns at frequencies up to 1kHz which would certainly be relevant, as they increase the overall complexity of the HRTFs.
  • Why was subdivision of icosahedron used as the quadrature? This is not a very even distribution. T-designs offer orthonormal sampling of spherical harmonics and would be a more appropriate choice. This should be justified.
  • If you are sticking with subdivision of icosahedron, is some solid angle weighting being implemented? Please comment.
  • Perhaps just N=2,5, and 14 would be sufficient. the N=11 seems unnecessary, as 11th order is rarely used in practical applications. This may help improve the readability of the figures.
  • N=14 is not very high - lots of literature suggests N>30 is necessary for transparent SH translation up to ~ 20kHz. Can you comment on why 14 was used? This might explain why there is still rather high error for the N=14 plots.
  • line 195 was r_head actually 16cm or are you talking about the diameter?
  • It would be easier to compare Figure 7 left and right columns if they had the same c axis (e.g. -12 to 0 for both)
  • Why was -15db chosen as the line, other than being used in other research? Literature generally states that ~3dB difference is perceivable.
  • No statistical analysis has been presented, and no perceptual evaluation. How much better would the SW method be, perceptually, than the PW?
  • MagLS (schorkhuber 2019) is a similar method to standard PW time alignment, but performs better than the zaunschirm 2018 method. How does this compare to the presented SW method? 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents a method to improve the derivation of near-field HRTF from sparse far-field HRTF measurements, using ear centering by spherical-wave translation.

The paper is interesting and the topic is relevant to the field of spatial audio rendering.

However, I have some concerns about the novelty and the added value of this paper compared to the previously published paper by the authors (ref [23]).
Although the theory is more general and the authors explain in the Introduction the additions in the paper compared to [23], in my opinion, the papers still look very similar to each other, especially in the evaluation section. Specific points:

- Section 4 - Evaluation- is almost a complete copy-paste from [23], with the exact same errors and figures (the only different is the colormap). I think that having 2 full pages of the exact same figures as published before is a waste of the journal space. I would suggest to revise this section completely. This can include short summary of the results presented in [23] (maybe showing a plot with the average errors across distances, for only one grid, and refer to [23] for more details) and then add more relevant evaluations, such as other error metrics (e.g. ITD/ILD errors, other perceptual model errors) and even a listening test.

- Section 3 is a nice addition to [23], but it is currently reads like it was added as an "afterthought", where the evaluation design is not presented at all. There is no description of what is the error presented in Fig.5, what HRTF was used, what is the reference and what is f_max. 
All the information is only revealed in Sec.4. So it might make more sense to include it after Sec.4 or as a sub-section of it.

- The Conclusion section is also almost a complete copy-paste from [23]. In my opinion, it doesn't make sense that the same suggestions for extensions would be written in two consecutive papers, without addressing any of them in the new paper. 

 

Minor comments:

- Introduction - the additions to the introduction compared to [23] are great. The review of previous literature is really good. 
Saying this, I think that the last paragraph of the section should be revised. The list of gaps in [23] and then list of additions in the current paper are redundant. I would suggest to have only the second list (with the relevant added context). I think that it would be clear enough to just positively state the novelty of the current paper.

- line 64 - "HRTFS" -> HRTF or HRTFs

- Figure 1 is not helpful. It is a standard spherical coordinate system and I think it can be removed.

- line 161- Fig.5 caption says that the center panel is SW operators, while in the test it written PW.

- Line 167 - double "the"

- Do you have any explanation to how a relatively small differences in Fig.4 (+-0.5 dB is not much, right?) are translated to such large errors in Fig.5?

- line 177 - the sentence "... in the framework of a spherical sampling of the theory presented in..." is not clear to me. Can you please rephrase it.

- line 191 - you already mentioned what is c

- lines 248-251 - what do you mean by "overall improvement of 6dB" and "overall enhancement of 3dB"? how these numbers where calculated?

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

manuscript is much improved now, readability is far better and the figures are good too. 

Concluding remarks on future work should state which sort of auditory models could be used in future. could the performance of the method presented in this paper be improved in any way? if so, please add suggestions for this into the further work too. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop