AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio
Abstract
:Featured Application
Abstract
1. Introduction
2. Background
2.1. Ambisonics
2.2. Opus 1.2 Codec with Channel Mapping
2.3. Subjective and Objective Methods of Assessing Audio Quality
3. Subjective Listening Tests Using the MUSHRA Methodology
3.1. Method
3.2. Testing Platform and Testing Environment
3.3. Testing Procedure
- distortion of the audio signal as compared to the reference
- undesired sounds that add artefacts to the audio clips under test
- how accurately test audio sources are positioned as compared to the reference
- how well test audio sources track movements of the reference
3.4. Experiment 1-Single Point Audio Sources
3.4.1. Content
3.4.2. Localization
3.4.3. Conditions
3.4.4. Results
3.5. Experiment 2-Multiple Point Audio Sources
3.5.1. Content
3.5.2. Localization
3.5.3. Conditions
3.5.4. Results
4. An Objective Model for Coded Spatial Audio QoE Prediction
4.1. ViSQOLAudio
4.2. AMBIQUAL Design Considerations
4.3. Deriving Listening Quality from B-Format Ambisonic Audio
4.4. Deriving Localization Accuracy from B-Format Ambisonic Audio
- ;
- ;
- ,
- ;
- ;
- ,
- ;
- ;
- ,
4.5. Compensating for Empty or Non-Existing Channels
4.6. AMBIQUAL Results
4.6.1. Listening Quality
4.6.2. Localization Accuracy
5. Validation Experiments
5.1. Full Sphere Localization Accuracy Prediction
5.2. Multi-Point Sound Sources
6. Discussion and Ongoing Work
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Gerzon, M.A. Ambisonics in multichannel broadcasting and video. J. Audio Eng. Soc. 1985, 33, 859–871. [Google Scholar]
- Brettle, J.; Skoglund, J. Open-Source Spatial Audio Compression for VR Content. In Proceedings of the SMPTE 2016 Annual Technical Conference and Exhibition, Los Angeles, CA, USA, 25–27 October 2016; pp. 1–9. [Google Scholar] [CrossRef]
- Narbutt, M.; Skoglund, J.; Allen, A.; Hines, A. Streaming VR for Immersion: Quality aspects of Compressed Spatial Audio. In Proceedings of the 2017 23rd International Conference on Virtual System Multimedia (VSMM), Dublin, Ireland, 31 October–4 November 2017. [Google Scholar]
- Narbutt, M.; Skoglund, J.; Allen, A.; Chenin, M.; Hines, A. Ambiqual—A full reference objective quality metric for ambisonic spatial audio. In Proceedings of the Tenth International Conference on Quality of Multimedia Experience (QoMEX), Cagliari, Italy, 29 May–1 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Siddig, A.; Ragano, A.; Jahromi, H.Z.; Hines, A. Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect. In Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems, Amherst, MA, USA, 18 June 2019; pp. 28–33. [Google Scholar]
- Zotter, F.; Frank, M. Ambisonics, A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Bertet, S.; Daniel, J.; Parizet, E. Investigation on Localisation Accuracy for First and Higher Order Ambisonics Reproduced Sound Sources. Acta Acust. United Acust. 2013, 99, 642–657. [Google Scholar] [CrossRef] [Green Version]
- Rudzki, T.; Gomez-Lanzaco, I.; Stubbs, J.; Skoglund, J.; Murphy, D.T.; Kearney, G. Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes. Appl. Sci. 2019, 9, 2618. [Google Scholar] [CrossRef] [Green Version]
- Valin, J.M.; Vos, K.; Terriberry, T. Definition of the Opus Audio Codec; IETF: Fremont, CA, USA, 2012. [Google Scholar]
- Valin, J.M.; Bran, C. WebRTC Audio Codec and Processing Requirements; IETF: Fremont, CA, USA, 2016. [Google Scholar]
- Skoglund, J.; Graczyk, M. IETF Internet-Draft: Ambisonics in an Ogg Opus Container; IETF: Fremont, CA, USA, 2017. [Google Scholar]
- Yan, Z.; Wang, J.; Li, Z. A Multi-criteria Subjective Evaluation Method for Binaural Audio Rendering Techniques in Virtual Reality Applications. In Proceedings of the 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shanghai, China, 8–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 402–407. [Google Scholar]
- Fleßner, J.H.; Biberger, T.; Ewert, S.D. Subjective and Objective Assessment of Monaural and Binaural Aspects of Audio Quality. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 1112–1125. [Google Scholar] [CrossRef]
- Rudzki, T.; Gomez-Lanzaco, I.; Hening, P.; Skoglund, J.; McKenzie, T.; Stubbs, J.; Murphy, D.; Kearney, G. Perceptual Evaluation of Bitrate Compressed Ambisonic Scenes in Loudspeaker Based Reproduction. In Proceedings of the Audio Engineering Society Conference: 2019 AES International Conference on Immersive and Interactive Audio, York, UK, 27–29 March 2019; Audio Engineering Society: New York, NY, USA, 2019. [Google Scholar]
- ITU. ITU-R Rec. P.863: Perceptual Objective Listening Quality Assessment; Int. Telecomm. Union: Geneva, Switzerland, 2014. [Google Scholar]
- Hines, A.; Skoglund, J.; Kokaram, A.C.; Harte, N. ViSQOL: An objective speech quality model. EURASIP J. Audio Speech Music Process. 2015, 2015, 1. [Google Scholar] [CrossRef] [Green Version]
- Thiede, T.; Treurniet, W.C.; Bitto, R.; Schmidmer, C.; Sporer, T.; Beerends, J.G.; Colomes, C. PEAQ-The ITU standard for objective measurement of perceived audio quality. J. Audio Eng. Soc. 2000, 48, 3–29. [Google Scholar]
- Hines, A.; Gillen, E.; Kelly, D.; Skoglund, J.; Kokaram, A.; Harte, N. ViSQOLAudio: An objective audio quality metric for low bitrate codecs. J. Acoust. Soc. Am. 2015, 137, EL449–EL455. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kämpf, S.; Liebetrau, J.; Schneider, S.; Sporer, T. Standardization of PEAQ-MC: Extension of ITU-R BS.1387-1 to Multichannel Audio. In Proceedings of the Audio Engineering Society Conference: 40th International Conference: Spatial Audio: Sense the Sound of Space, Tokyo, Japan, 8–10 October 2010; Audio Engineering Society: New York, NY, USA, 2010. [Google Scholar]
- ITU. ITU-R Rec. P.800: Methods for Subjective Determination of Transmission Quality; Int. Telecomm. Union: Geneva, Switzerland, 1996. [Google Scholar]
- ITU. ITU-R Rec. BS.1534-3: Subjective Assessment of Sound Quality; Int. Telecomm. Union: Geneva, Switzerland, 2015. [Google Scholar]
- ITU. ITU-T Rec. BS.1116-3: Methods for the Subjective Assessment of Small Impairments in Audio Systems; Int. Telecomm. Union: Geneva, Switzerland, 2015. [Google Scholar]
- ITU. ITU-T Rec. P.1310: Spatial Audio Meetings Quality; Int. Telecomm. Union: Geneva, Switzerland, 2017. [Google Scholar]
- A MUSHRA Compliant Web Audio API Based Experiment Software. Available online: https://github.com/audiolabs/webMUSHRA (accessed on 4 April 2020).
- Kronlachner, M. AmbiX v0.2.10–Ambisonic Plug-In Suite. 2015. Available online: http://www.matthiaskronlachner.com/?p=2015 (accessed on 4 April 2020).
- SADIE II Database, Binaural and Anthropomorphic Measurements for Virtual Loudspeaker Rendering. 2018. Available online: https://www.york.ac.uk/sadie-project/database.html (accessed on 4 April 2020).
- EBU Tech. 3253-E, Sound quality assessment material. In SQUAM CD (Handbook); EBU Technical Centre Brussels: Grand-Saconnex, Switzerland, 1988. [Google Scholar]
- Hines, A.; Gillen, E.; Kelly, D.; Skoglund, J.; Kokaram, A.; Harte, N. Perceived Audio Quality for Streaming Stereo Music. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; ACM: New York, NY, USA, 2014; pp. 1173–1176. [Google Scholar]
- Gorzel, M.; Allen, A.; Kelly, I.; Kammerl, J.; Gungormusler, A.; Yeh, H.; Boland, F. Efficient encoding and decoding of binaural sound with resonance audio. In Proceedings of the AES International Conference on Immersive and Interactive Audio, York, UK, 27–29 March 2019; Audio Engineering Society: New York, NY, USA, 2019. [Google Scholar]
- Harte, N.; Gillen, E.; Hines, A. TCD-VoIP, a research database of degraded speech for assessing quality in VoIP applications. In Proceedings of the 2015 7th International Workshop on Quality of Multimedia Experience, QoMEX 2015, Pylos-Nestoras, Greece, 26–29 May 2015. [Google Scholar]
- Hines, A.; Harte, N. Speech Intelligibility prediction using a Neurogram Similarity Index Measure. Speech Commun. 2012, 54, 306–320. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sloan, C.; Harte, N.; Kelly, D.; Kokaram, A.C.; Hines, A. Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio. IEEE Trans. Broadcast. 2017, 63, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Rayleigh, L. XII. On our perception of sound direction. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1907, 13, 214–232. [Google Scholar] [CrossRef] [Green Version]
- Park, M.; Nelson, P.A.; Kang, K. A model of sound localisation applied to the evaluation of systems for stereophony. Acta Acust. United Acust. 2008, 94, 825–839. [Google Scholar] [CrossRef]
- Yost, W.A. Fundamentals of Hearing: An Introduction; Koninklijke Brill NV: Leiden, The Netherlands, 2013. [Google Scholar]
- Moreau, S.; Daniel, J.; Bertet, S. 3D Sound Field Recording with Higher Order Ambisonics–Objective Measurements and Validation of a 4th order Spherical Microphone. In Proceedings of the Audio Engineering Society 120th Convention, Paris, France, 20–23 May 2006. [Google Scholar]
- Merimaa, J. Analysis, Synthesis, and Perception of Spatial Sound: Binaural Localization Modeling and Multichannel Loudspeaker Reproduction; Helsinki University of Technology: Espoo, Finland, 2006. [Google Scholar]
- Tervo, S. Direction estimation based on sound intensity vectors. In Proceedings of the 17th European Signal Processing Conference, Glasgow, UK, 24–28 August 2009; pp. 700–704. [Google Scholar]
- Zacharov, N.; Pike, C.; Melchior, F.; Worch, T. Next generation audio system assessment using the multiple stimulus ideal profile method. In Proceedings of the Quality of Multimedia Experience (QoMEX), 2016 Eighth International Conference, Lisbon, Portugal, 6–8 June 2016. [Google Scholar]
Label | Music Type | Source |
---|---|---|
vega | Vocals (Suzanne Vega) | CD |
castanets | Castanets | EBU |
glock | Glockenspiel | EBU |
vegaRev | Vocals (Suzanne Vega) w. Reverb effect | processed CD |
castanetsRev | Castanets w. Reverb Effect | processed EBU |
pinkRev | Bursty Pink Noise w. Reverb Effect | synthetic |
Elevation | ±35 | ±35 | ±45 | ±45 | ±90 | 0 | 0 | 0 | 0 | 0 |
Azimuth | ±135 | ±45 | 0 | 180 | 0 | 0 | ±135 | 180 | ±45 | ±90 |
Ambisonics | Bit Rate | Bit Rate Per | |
---|---|---|---|
Type | Order | (kbps) | Channel (kbps) |
Reference | 3 | 12,288 | 768 |
3OA 512 | 3 | 512 | 32 |
3OA 256 | 3 | 256 | 16 |
FOA 128 | 1 | 128 | 32 |
FOA 32 (anchor) | 1 | 32 | 8 |
Label | Music Type | Source |
---|---|---|
castanetsRev | Castanets w. Reverb Effect | processed EBU |
pinkRev | Bursty Pink Noise w. Reverb Effect | synthetic |
tub | Tubular bells | EBU |
xyl | Xylophone | EBU |
fem | Female voice | EBU |
babble | Babble noise | TCDVOIP |
tr | Triangle | EBU |
piano | Piano | EBU |
Ambisonics | Bit Rate | Bit Rate Per | |
---|---|---|---|
Type | Order | (kbps) | Channel (kbps) |
Reference | 3 | 12,288 | 768 |
3OA 512 | 3 | 512 | 32 |
3OA 384 | 3 | 384 | 24 |
3OA 256 | 3 | 256 | 16 |
FOA 128 | 1 | 128 | 32 |
FOA 96 | 1 | 96 | 24 |
FOA 64 | 1 | 64 | 16 |
FOA 32 (anchor) | 1 | 32 | 8 |
Listening Quality | Localization Accuracy | |||||
---|---|---|---|---|---|---|
Pearson | Spearman | RMSE | Pearson | Spearman | RMSE | |
1 | 0.899 | 0.816 | 55.14 | 0.922 | 0.919 | 59.27 |
2 | - | - | - | 0.864 | 0.883 | 63.13 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Narbutt, M.; Skoglund, J.; Allen, A.; Chinen, M.; Barry, D.; Hines, A. AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio. Appl. Sci. 2020, 10, 3188. https://doi.org/10.3390/app10093188
Narbutt M, Skoglund J, Allen A, Chinen M, Barry D, Hines A. AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio. Applied Sciences. 2020; 10(9):3188. https://doi.org/10.3390/app10093188
Chicago/Turabian StyleNarbutt, Miroslaw, Jan Skoglund, Andrew Allen, Michael Chinen, Dan Barry, and Andrew Hines. 2020. "AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio" Applied Sciences 10, no. 9: 3188. https://doi.org/10.3390/app10093188
APA StyleNarbutt, M., Skoglund, J., Allen, A., Chinen, M., Barry, D., & Hines, A. (2020). AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio. Applied Sciences, 10(9), 3188. https://doi.org/10.3390/app10093188