Figure 1.
Ambisonics spherical harmonics for orders up to three. FOA includes the top two lines of basis functions (4 channels) and 3OA includes all four lines (16 channels). The ACN channels 2, 6 and 12 contain only vertical components.
Figure 1.
Ambisonics spherical harmonics for orders up to three. FOA includes the top two lines of basis functions (4 channels) and 3OA includes all four lines (16 channels). The ACN channels 2, 6 and 12 contain only vertical components.
Figure 2.
WebMUSHRA Graphical User Interface.
Figure 2.
WebMUSHRA Graphical User Interface.
Figure 3.
Single point audio source localization: (
a) fixed position (azimuth 60
, elevation 60
), (
b) audio source moving horizontally above the listener’s head, (
c) audio source moving up in elevation on the left hand side, then down on the right hand side. Reproduced from [
3].
Figure 3.
Single point audio source localization: (
a) fixed position (azimuth 60
, elevation 60
), (
b) audio source moving horizontally above the listener’s head, (
c) audio source moving up in elevation on the left hand side, then down on the right hand side. Reproduced from [
3].
Figure 4.
Localizations of multiple point audio sources: (a,b,d) one source with fixed localization and one with dynamic azimuth localization moving horizontally (i.e., rotating above or below the listener’s head), (c,e,f) one source with fixed localization and one with the audio source moving up in elevation on the left hand side, then down on the right hand side.
Figure 4.
Localizations of multiple point audio sources: (a,b,d) one source with fixed localization and one with dynamic azimuth localization moving horizontally (i.e., rotating above or below the listener’s head), (c,e,f) one source with fixed localization and one with the audio source moving up in elevation on the left hand side, then down on the right hand side.
Figure 5.
NSIM scores derived from the Similarity Map between Reference and Test Gammmatonegrams. Further examples and descriptions can be found in [
33].
Figure 5.
NSIM scores derived from the Similarity Map between Reference and Test Gammmatonegrams. Further examples and descriptions can be found in [
33].
Figure 6.
Interaural Time Difference (ITD) and Interaural Level Difference (ILD) between left and right ears.
Figure 6.
Interaural Time Difference (ITD) and Interaural Level Difference (ILD) between left and right ears.
Figure 7.
Phaseograms, similarity map, and similarity scores for 100 Hz pure sine wave signals. The Reference signal is localized at Azimuth = 60, Elevation = 60, the Test signal at Azimuth = 50, Elevation = 60 and resulting localization accuracy score is 0.944.
Figure 7.
Phaseograms, similarity map, and similarity scores for 100 Hz pure sine wave signals. The Reference signal is localized at Azimuth = 60, Elevation = 60, the Test signal at Azimuth = 50, Elevation = 60 and resulting localization accuracy score is 0.944.
Figure 8.
Phaseograms, similarity map, and similarity scores for 100 Hz pure sine wave signals. The Reference signal is localized at Azimuth = 60, Elevation = 60, the Test signal at Azimuth = 10, Elevation = 60 and resulting localization accuracy score is 0.918.
Figure 8.
Phaseograms, similarity map, and similarity scores for 100 Hz pure sine wave signals. The Reference signal is localized at Azimuth = 60, Elevation = 60, the Test signal at Azimuth = 10, Elevation = 60 and resulting localization accuracy score is 0.918.
Figure 9.
Channel occupancy of B-format Ambisonics audio as a function of azimuth (AZ) and elevation (EL) of sound sources. Empty channels are represented here with the ‘x’ sign.
Figure 9.
Channel occupancy of B-format Ambisonics audio as a function of azimuth (AZ) and elevation (EL) of sound sources. Empty channels are represented here with the ‘x’ sign.
Figure 10.
Values of weighting factors that maximize correlation between AMBIQUAL results and subjective MUSHRA scores.
Figure 10.
Values of weighting factors that maximize correlation between AMBIQUAL results and subjective MUSHRA scores.
Figure 11.
Listening quality-subjective vs. AMBIQUAL results for single point audio sources: (a) aggregated subjective results and (b) AMBIQUAL quality predictions by encoding scheme (mean values with 95% confidence intervals shown) and scatter of subjective scores vs. AMBIQUAL results by encoding scheme (c) and by sample type (d).
Figure 11.
Listening quality-subjective vs. AMBIQUAL results for single point audio sources: (a) aggregated subjective results and (b) AMBIQUAL quality predictions by encoding scheme (mean values with 95% confidence intervals shown) and scatter of subjective scores vs. AMBIQUAL results by encoding scheme (c) and by sample type (d).
Figure 12.
Localization accuracy-subjective vs. AMBIQUAL results: (a,b) aggregated subjective localization accuracy scores by encoding scheme (one point audio sources and multi-point audio sources respectively) showing mean values with 95% confidence intervals shown. Plots (c,d) show aggregated AMBIQUAL results by encoding scheme (one and multi-point sources respectively). Plots (e,f) scatter of subjective vs. AMBIQUAL results by encoding (one and multi-point sources respectively), (g,h) scatter of subjective vs. AMBIQUAL results by sample type (one and multi-point sources respectively).
Figure 12.
Localization accuracy-subjective vs. AMBIQUAL results: (a,b) aggregated subjective localization accuracy scores by encoding scheme (one point audio sources and multi-point audio sources respectively) showing mean values with 95% confidence intervals shown. Plots (c,d) show aggregated AMBIQUAL results by encoding scheme (one and multi-point sources respectively). Plots (e,f) scatter of subjective vs. AMBIQUAL results by encoding (one and multi-point sources respectively), (g,h) scatter of subjective vs. AMBIQUAL results by sample type (one and multi-point sources respectively).
Figure 13.
Localization accuracy scores distributed on a sphere. The reference audio source was localized at azimuth = 60
, elevation = 60
(the red circle represents the reference audio source). Reproduced from [
4].
Figure 13.
Localization accuracy scores distributed on a sphere. The reference audio source was localized at azimuth = 60
, elevation = 60
(the red circle represents the reference audio source). Reproduced from [
4].
Figure 14.
Localization accuracy as a function of azimuth and elevation with fixed reference audio source, localized at an offset point of azimuth = 60
, elevation = 60
. The asymmetry in the results is caused by the source being closer to one ear than the other. Reproduced from [
4].
Figure 14.
Localization accuracy as a function of azimuth and elevation with fixed reference audio source, localized at an offset point of azimuth = 60
, elevation = 60
. The asymmetry in the results is caused by the source being closer to one ear than the other. Reproduced from [
4].
Figure 15.
Localization accuracy scores distributed on a sphere. In this case, the reference audio source was localized at azimuth = 30, elevation = 30 (the red circle represents the reference audio source).
Figure 15.
Localization accuracy scores distributed on a sphere. In this case, the reference audio source was localized at azimuth = 30, elevation = 30 (the red circle represents the reference audio source).
Figure 16.
Localization accuracy as a function of azimuth and elevation. A fixed reference audio source is located at an offset point of azimuth = 30, elevation = 30. The asymmetry in results is caused by the source being closer to one ear than the other.
Figure 16.
Localization accuracy as a function of azimuth and elevation. A fixed reference audio source is located at an offset point of azimuth = 30, elevation = 30. The asymmetry in results is caused by the source being closer to one ear than the other.
Table 1.
Samples used during single point audio listening tests (reproduced from [
3]).
Table 1.
Samples used during single point audio listening tests (reproduced from [
3]).
Label | Music Type | Source |
---|
vega | Vocals (Suzanne Vega) | CD |
castanets | Castanets | EBU |
glock | Glockenspiel | EBU |
vegaRev | Vocals (Suzanne Vega) w. Reverb effect | processed CD |
castanetsRev | Castanets w. Reverb Effect | processed EBU |
pinkRev | Bursty Pink Noise w. Reverb Effect | synthetic |
Table 2.
Azimuth and elevation angles in degrees used for 26-point Lebedev Quadrature layout.
Table 2.
Azimuth and elevation angles in degrees used for 26-point Lebedev Quadrature layout.
Elevation | ±35 | ±35 | ±45 | ±45 | ±90 | 0 | 0 | 0 | 0 | 0 |
Azimuth | ±135 | ±45 | 0 | 180 | 0 | 0 | ±135 | 180 | ±45 | ±90 |
Table 3.
Encoding/compression schemes used with single point audio sources.
Table 3.
Encoding/compression schemes used with single point audio sources.
| Ambisonics | Bit Rate | Bit Rate Per |
---|
Type | Order | (kbps) | Channel (kbps) |
---|
Reference | 3 | 12,288 | 768 |
3OA 512 | 3 | 512 | 32 |
3OA 256 | 3 | 256 | 16 |
FOA 128 | 1 | 128 | 32 |
FOA 32 (anchor) | 1 | 32 | 8 |
Table 4.
Multiple point audio samples used during listening tests.
Table 4.
Multiple point audio samples used during listening tests.
Label | Music Type | Source |
---|
castanetsRev | Castanets w. Reverb Effect | processed EBU |
pinkRev | Bursty Pink Noise w. Reverb Effect | synthetic |
tub | Tubular bells | EBU |
xyl | Xylophone | EBU |
fem | Female voice | EBU |
babble | Babble noise | TCDVOIP |
tr | Triangle | EBU |
piano | Piano | EBU |
Table 5.
Encoding/compression schemes used with multiple point audio sources.
Table 5.
Encoding/compression schemes used with multiple point audio sources.
| Ambisonics | Bit Rate | Bit Rate Per |
---|
Type | Order | (kbps) | Channel (kbps) |
---|
Reference | 3 | 12,288 | 768 |
3OA 512 | 3 | 512 | 32 |
3OA 384 | 3 | 384 | 24 |
3OA 256 | 3 | 256 | 16 |
FOA 128 | 1 | 128 | 32 |
FOA 96 | 1 | 96 | 24 |
FOA 64 | 1 | 64 | 16 |
FOA 32 (anchor) | 1 | 32 | 8 |
Table 6.
Correlation between subjective MUSHRA and objective AMBIQUAL results from experiments 1 and 2.
Table 6.
Correlation between subjective MUSHRA and objective AMBIQUAL results from experiments 1 and 2.
| Listening Quality | Localization Accuracy |
---|
| Pearson | Spearman | RMSE | Pearson | Spearman | RMSE |
---|
1 | 0.899 | 0.816 | 55.14 | 0.922 | 0.919 | 59.27 |
2 | - | - | - | 0.864 | 0.883 | 63.13 |