Next Article in Journal
Reliability Analysis of Response-Controlled Buildings Using Fragility Curves
Previous Article in Journal
Temperature Sensing with Thin Films of Flame-Formed Carbon Nanoparticles
Previous Article in Special Issue
Design Dimensions of Co-Located Multi-Device Audio Experiences
 
 
Article
Peer-Review Record

An Algorithm for Generating Virtual Sources in Dynamic Virtual Auditory Display Based on Tensor Decomposition of Head-Related Impulse Responses

Appl. Sci. 2022, 12(15), 7715; https://doi.org/10.3390/app12157715
by Tong Zhao, Bosun Xie * and Jun Zhu
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(15), 7715; https://doi.org/10.3390/app12157715
Submission received: 16 June 2022 / Revised: 24 July 2022 / Accepted: 26 July 2022 / Published: 31 July 2022
(This article belongs to the Special Issue Techniques and Applications of Augmented Reality Audio)

Round 1

Reviewer 1 Report

This paper provides a new algorithm to generate a virtual source based on tensor decomposition. The contents include a beneficial method that contributes to generating an immersive sound environment inside virtual spaces. I recommend the following revisions.

 

1. Please provide more concrete data of the computed HRTFs, HRIRs, and other detailed data. Otherwise, it is quite difficult for you to show what you did with only the former quite difficult theoretical part of the paper. Especially, in the latter part of this paper, to show the validity of the results, please add figures on the detailed data of calculated numerical contents. By adding these concrete results, it is more friendly for the reader to understand the former theory part as well.

 

2. 

It is difficult to understand what is advantageous in this proposed method compared to the conventional method. Especially, looking at Figures 3 and 4, the results of the proposed method are indicating results with almost the same extent of accuracy as the conventional ones. With only these results, I think, it is not desired for the readers to come to use this new method to take advantage.

 

3. 

Figure 2 seems to be indicating the estimated numerical error, but this figure is not friendly to the reader, because, from this figure, the error values are difficult to read. So, please add some slice data in some representative azimuths. 

 

 

4.

Figure 3 is quite difficult to understand the localization performance. If you adopt this presentation method, please add an additional figure and explanation that support the understanding of this figure.

 

5.

In Fig. 4, you indicated that there is no significant difference between conventional and present results, however, it can be seen that the inclination of the lines in the area where the target distance has values over 0.6 m are slightly different between these two. 

For example, the results of conventional and present in the 180-degree condition seem to have a significant difference. I think it will be profitable to provide more results and discussion on this figure.

 

6.

You say that the percentages of front–back and up-down confusions in directional localization were less than 5% and 1%, and it is due to "the incorporation of dynamic localization cues in dynamic VAD, which alleviate the dependence of individualized spectral cues in front–back and vertical localization". This explanation is, I think, lacking and the reader may not be able to understand the reason. Please provide a more detailed and concrete reason for this improvement. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper presents an efficient way of creating virtual auditory scenes where sound sources can change dynamically. The main idea is to use a tensor decomposition of a set of HRTFs. 

As a result, a new model in Figure 1 was proposed, and it is computationally more efficient and probably can avoid possible artifacts caused by updating the filter coefficients. 

Overall, the proposed algorithm seems reasonable, and the test results show that it can nicely approximate the direct filtering using HRIR coefficients.

But adding more test results would help confirm the benefits of the proposed algorithm:

(1) It was claimed that the proposed algorithm could control the distance and direction of ‘moving’ virtual sources separately and avoid the audible artifact compared to the conventional (HRTF-based direct filtering) method. 

Although the statement seems correct, it is necessary to confirm that by showing how the values of the coefficients (c_n’ in Figure 1) change in the case of moving targets.

(2) In Section 5.2, it was quoted that M=13 and N=8 account for 99.0% and 99.1%, respectively. It will be better if you provide plots showing the energy variance according to the number of eigenmodes.

(3) Figure 2 shows a view of the reconstruction errors averaged over the distances. But since the variation of the errors is not typical, a more detailed quantitative summary of the results using tables would be necessary.

(4) The subjective test results in figures 3 and 4 show that the proposed dynamic VAD has similar subjective performance, i.e., no performance loss compared to the conventional method (direct convolution-based rendering). However, on the contrary, those results seem more appropriate in proving that the HRTF prediction method proposed in reference 24 is correct rather than demonstrating the relevance of the proposed method. There need more discussions on this. 

 

More minor comments are

- page 4, Eqs. 2 and 3: tau need to be defined

- page 4, Eq. 3 and page 6, Eq. 12: symbols in the equation need to be explained (probably convolution), also the meaning of the subscript n in Eq. 12. 

- page 10, Eqs. (16) and (17): r_I(i) in both equations seem to have different meanings. If so, use different notations.

- Section 6.2: Better change words – present work -> proposed method. - Conventional method – what is the conventional method? Direct convolution using HRIRs?

- page 13, line 399: A 33-order binaural ... -> A 33rd-order binaural...

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The paper has been revised enough following my suggestion. It can be accepted.

Back to TopTop