Next Article in Journal
In-House Knowledge Management Using a Large Language Model: Focusing on Technical Specification Documents Review
Previous Article in Journal
Tunable Perforated Panel Sound Absorbers for Variable Acoustics Room Design
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Room Impulse Response Dataset of a Recording Studio with Variable Wall Paneling Measured Using a 32-Channel Spherical Microphone Array and a B-Format Microphone Array

Audio & Acoustic Signal Processing Group, The Australian National University, Canberra 2601, Australia
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2024, 14(5), 2095; https://doi.org/10.3390/app14052095
Submission received: 5 February 2024 / Revised: 28 February 2024 / Accepted: 29 February 2024 / Published: 2 March 2024
(This article belongs to the Section Acoustics and Vibrations)

Abstract

:
This paper introduces RSoANU, a dataset of real multichannel room impulse responses (RIRs) obtained in a recording studio. Compared to the current publicly available datasets, RSoANU distinguishes itself by featuring RIRs captured using both a 32-channel spherical microphone array (mh acoustics em32 Eigenmike) and a B-format soundfield microphone array (Rode NT-SF1). The studio incorporates variable wall panels in felt and wood options, with measurements conducted for two configurations: all panels set to wood or felt. Three source positions that emulate typical performance locations were considered. RIRs were collected over a planar receiver grid spanning the room, with the microphone array centered at a height of 1.7 m. The paper includes an analysis of acoustic parameters derived from the dataset, revealing notable distinctions between felt and wood panel environments. Felt panels exhibit faster decay, higher clarity, and superior definition in mid-to-high frequencies. The analysis across the receiver grid emphasizes the impact of room geometry and source–receiver positions on reverberation time and clarity. The study also notes spatial variations in parameters obtained from the two microphone arrays, suggesting potential for future research into their specific capabilities for room acoustic characterization.

1. Introduction

Sound waves emitted from a source within a room undergo intricate interactions with the surrounding environment, influenced by factors such as the room’s geometry, surface materials, and furnishings. These elements collectively determine the acoustic response of the room and contribute to the auditory experience. Considering the room as a linear time-invariant system, the sound transformations between a source and a receiver are quantitatively encapsulated as a room impulse response (RIR), uniquely describing the room’s acoustic fingerprint [1].
The RIRs can facilitate a comprehensive understanding of the room acoustics, which is pivotal in the design of performance venues [2,3], correction of acoustic anomalies [4], and selection of acoustic treatments [5]. It is also integral in the development of speech processing algorithms [6], the design of immersive audio in extended reality (XR) environments [7], and various soundfield analysis and synthesis methods. The spatial variations captured by RIRs across the room enable accurate listener translations, providing an enriched immersive experience. Therefore, the availability of diverse and authentic RIRs can greatly benefit the research and development of numerous audio applications.
RIRs can be obtained through two primary methods: simulation-based generation using wave-based and ray-based methods [8], or real-world measurements. While simulated RIRs are quicker to generate, they lack the authenticity of real acoustic environments due to challenges in modeling complex scattering effects and material compositions [9,10]. Hence, algorithms validated using simulated data may malfunction when applied to real conditions. Even though resource-intensive, measured RIRs provide a realistic representation of real-world scenarios and preserve important perceptual features. Typically, a singular microphone or an array of microphones is employed to measure RIRs at spatially distributed receiver positions in a given room. Recently, higher-order ambisonic microphone arrays have been widely used to acquire directional characteristics of the acoustic field [11,12,13].
Several measured RIR datasets have been collected in the past, and they are framed to meet the demands of certain applications, like acoustic analysis, room geometry estimation, speech enhancement, automatic speech recognition, and VR environment design. These datasets differ in key attributes, like the type, size, and configurations of the room measured, the number and density of source and receiver locations, and the type and configuration of loudspeakers and microphones. Some of the publicly available datasets [12,13,14,15,16,17,18,19,20] are compared in Table 1. The OpenAIR database [11] contains impulse response data from more than 50 environments, mostly collected in the Ambisonic B-format.

Contribution

This paper introduces RSoANU, the RIR dataset of a recording studio [21] at the School of Music, Australian National University (ANU). This studio features variable wall paneling with felt and wood options. The RIRs are recorded using both a 32-channel spherical microphone array and a B-format soundfield microphone array for two wall configurations, three source positions, and a 1 m × 1 m receiver grid spanning a measurement region of dimensions 7 m × 7 m. Additional receiver points with a finer grid spacing of 50 cm around each active source are also included. This dataset comprises 396 4-channel and 396 32-channel RIRs with metadata of Cartesian coordinates of source and receiver positions, wall panel setting, microphone device, temperature, and other signal parameters.
To the best of our knowledge, no existing datasets offer RIRs of a room recorded with both a higher-order spherical microphone array and a B-format array, especially across an extensive receiver grid. RSoANU dataset is particularly intended to facilitate the analysis of spatial soundfields and room acoustics with both higher-order and first-order ambisonics, taking into account the differences in spatial resolution, frequency response, directivity, and economic considerations of the two types of microphone arrays. Additionally, the receiver grid is designed to support the validation of RIR interpolation/extrapolation and soundfield translation methods for VR applications. Since the room environment and source–receiver locations are closely controlled during the measurement, direct comparisons between different acoustic settings can also be performed.
This paper also presents an analysis of room acoustic parameters using the RSoANU dataset. The obtained results enable the identification of potential acoustic issues in the ANU Recording Studio, providing insights for necessary treatments and aiding in the selection of optimal recording locations. Given the scarcity of recording studio datasets [11,13], especially those with variable environments, the findings from this dataset offer valuable guidance for the design of similar studio settings.

2. Measurement Setup and Technique

2.1. Room Setup

The RSoANU dataset consists of RIRs measured in the ANU Recording Studio [21] with a floor dimension of 8.5 m × 9 m. The studio features a double-height space with a floor-to-ceiling height of 5.6 m. Two adjacent walls are fitted with variable wall paneling, offering options in both felt and wood materials, whilst the other two walls have fixed paneling. The room also includes a window and a sliding door made of glass material providing access to the control room.
The variable wall panels set to felt and wooden configurations are shown in Figure 1a and Figure 1b, respectively. The structural details and dimensions of the felt and wooden panels can be seen in Figure 1c and Figure 1d, respectively. In the wooden configuration, angled plywood frames with a width of 0.603 m and a depth of 0.398 m alternate with slat resonators measuring 0.617 m in width and 0.137 m in depth. Note that these depths are measured from the middle of the angled surface to the wall. Behind the angled plywood surface, three layers of 50 mm insulation separated by air cavities are affixed, while one insulation layer is attached behind the slat surface. In the felt configuration, each panel has a depth of 0.268 m with two layers of 50 mm insulation behind the surface. More details on similar variable wall panel structures can be found in [22].

2.2. Equipment

During the data collection process, a Tannoy System 600 loudspeaker (Tannoy Ltd.: Coatbridge, Scotland) served as the sound source. It has a frequency response with ± 3 dB tolerance bandlimited to 52–20 kHz with cross-over frequency at 1.8 kHz [23].
The RIRs were measured using two microphone arrays: the Rode NT-SF1 microphone (Røde: Sydney, Australia) and the mh acoustics em32 Eigenmike (mh acoustics: Summit, NJ, USA). The NT-SF1 comprises four matched condenser cardioid microphone capsules arranged in a tetrahedral array. It is capable of capturing spatial sound in first-order ambisonics with a frequency response of ± 3 dB tolerance in the range of 30–20 kHz [24]. The em32 consists of thirty-two omnidirectional electret microphones mounted on a rigid sphere of 8.4 cm diameter. It can capture spatial sound up to fourth-order ambisonics with a frequency response bandlimited to 30–20 kHz, offering a nominally flat response up to 8 kHz [25,26].
The source and the receivers are shown in Figure 2. The Eigenmike and NT-SF1 were connected to a computer using the Eigenmike Interface Box (EMIB) and the Zoom F6 [27], respectively. Note that the Zoom F6 interface was configured to record in Ambisonic B-format with a Furse–Malham (FuMa) normalization scheme.

2.3. Recording Setup

The source–receiver arrangement for the RIR measurement is depicted in Figure 3, with the room’s bottom-left corner considered as the coordinate origin.
The RIRs were measured for three distinct source locations, with only one source being active during each measurement. The source locations are marked as S1, S2, and S3 in Figure 3, and their Cartesian coordinates are mentioned in Table A1. The height and direction of the sound source were set to reflect the placement of typical performers or ensembles in the recording studio. Source S1 was positioned closest to the front door, at a height of 1.2 m, reflecting a typical drum kit height. Source S2 in the middle of the room was positioned at a height of 1.4 m to reflect a singer’s projection height [28]. Source S3 was positioned at a height of 0.9 m, corresponding to the height of the strings on a grand piano. In terms of source orientation, both S1 and S2 faced the control room, while S3 was pointed to the left wall. Note that these source heights are measured from the studio floor to the base of the loudspeaker. Users can refer to the loudspeaker dimensions [23] to adjust the source height with reference to the loudspeaker cone.
Receiver locations were set up in an 8 × 8 planar grid pattern with 1 m × 1 m spacing. These positions are marked as circles in Figure 3. A few points in the square grid were excluded since they were inaccessible due to the placement of furnishings and the grand piano (blank space to the right side of S3). In addition to these receiver points, a denser grid with 50 cm spacing was also included in front of each active source. These are indicated by colored triangles in Figure 3 corresponding to their respective source. During the measurement, the microphone array was centered at these positions at a consistent height of 1.7 m, reflecting the average height of an Australian [28]. This height was measured from the floor to the center of the array.
Each receiver position is assigned a code name based on its coordinates along the x and y axes. Code names of some of the receiver positions are indicated in Figure 3. For instance, 01 x 00 y is the code name of the receiver location in the second column from the left and the first row from the bottom. The Cartesian coordinates and code names of the receiver positions on the regular grid are listed in Table A2. The additional receiver points surrounding the sources are labeled differently; for example, e 30 x 45 y corresponds to the receiver associated with S1 in the fourth column, positioned between the fifth and sixth rows from the bottom. The Cartesian coordinates and code names of these source-dependent receiver positions are listed in Table A3.

2.4. Recording Method

An exponential sine sweep (ESS) signal of 2 s duration and 20 Hz to 25 kHz frequency range generated at a sampling rate of 48 kHz was used as the excitation signal. We chose an ESS as it is widely used for RIR measurements due to its large dynamic range, excellent signal-to-noise ratio (SNR), and ability to reject harmonic distortions and non-linear effects [13,17,29]. The sweep signal was preceded and followed by 1 s silence periods to avoid time-aliasing problems.
In the initial acoustic setting for the RIR measurement, all variable wall panels were set to felt, and the loudspeaker was positioned at S1. The loudspeaker volume was set to receive around 65 dB of loudness at 1 m distance. The ESS signal was played through the loudspeaker, and recordings were made at each receiver position using the em32 Eigenmike. Next, all the variable wall panels were switched to wood, and the measurements were repeated using the em32 Eigenmike. The process was repeated for source locations S2 and S3. The entire measurement procedure ws then repeated using the Rode NT-SF1 microphone. Following each recording, the RIRs were estimated through deconvolution of the recorded sweeps with the inverse sweep signal. MATLABTM R2021a was employed for controlling the source and receiver signals, as well as for performing the deconvolution process.
Before each measurement for every room configuration and source position, microphones underwent calibration, and latency adjustments were performed. The room temperature was also recorded before each run and was very stable, with variations of less than 0.5 °C across all measurements.

3. Dataset Description

The RSoANU dataset comprises RIRs captured in the ANU Recording studio, considering two room configurations and three source positions (S1, S2, and S3). RIRs were recorded for 65, 67, and 66 receiver positions for S1, S2, and S3, respectively. Recordings conducted using both the em32 Eigenmike and Rode NT-SF1 microphone generated a total of 396 32-channel and 396 4-channel RIRs, respectively. Note that the RIRs recorded by Rode NT-SF1 microphones are in Ambisonic B-format with FuMa normalization. All RIRs are saved with a sampling frequency of 48 kHz and a duration of 1.5 s.
The RIRs in the RSoANU dataset are organized in folders with the structure shown in Figure 4. Each multichannel RIR is saved in two file formats: MATLABTM struct (*.mat) and 16-bit audio (*.wav). The filenames follow a convention based on the source code names and receiver code names outlined in Table A1, Table A2 and Table A3. For the em32 Eigenmike, the 32-channel RIR is saved under the I R _ D a t a field name in the (*.mat) struct file. For the Rode NT-SF1 microphone, the RIR is saved under the B f o r m a t field name with separate child fields for W , X , Y , Z channels. The room temperature, sampling rate, microphone array type, source and receiver coordinates, excitation signal, and raw recording are recorded as metadata. The (*.mat) struct files for NT-SF1 also include the normalization standard under the B f o r m a t _ s t a n d a r d metadata field.
The RSoANU dataset is publicly accessible online at https://doi.org/10.5281/zenodo.10720345 [30] under the Creative Commons Attribution 4.0 International license.

4. Room Acoustic Parameter Analysis

In this section, we utilize the RSoANU dataset to analyze the acoustics of the ANU Recording Studio by examining the parameters: reverberation time ( T 20 ), clarity ( C 80 ), definition ( D 50 ), and center time ( T s ) [31]. The parameters are derived from the W channel for NT-SF1 microphone recordings and from the average of 32 channels for the em32 Eigenmike recordings.
Table 2 presents the mean and standard deviation of all parameters for each source position in both felt and wood panel environments, calculated from the RIRs collected at all receiver positions. The difference in acoustics offered by the felt and wooden panel environments is evident from the parameter values in Table 2. Additionally, we can observe high standard deviations in C 80 due to its strong reliance on the receiver positioning relative to the sound source.
In the following subsections, we delve into the variations in each parameter across frequencies in octave bands ranging from 125 Hz to 8 kHz. For T 20 and C 80 , we further investigate variations across different receiver locations using parameter values averaged across the octave bands.

4.1. Reverberation Time

Reverberation time is the duration it takes for the sound pressure level to decay by 60 dB after the source ceases. From the Schroeder curve generated through backward integration of the squared RIR, we calculate the reverberation time ( T 20 ) by extrapolating the least-squares fit between 5 dB and 25 dB following the ISO 3382 recommendation [31,32].
Figure 5a,b present the T 20 values across frequency in octave bands, derived from RIRs measured by the Rode NT-SF1 microphone and em32 Eigenmike, respectively. The two environments with felt and wood panels show similar T 20 values from 125 to 250 Hz. However, from 500 Hz to 8 kHz, the wood panel environment consistently demonstrates higher T 20 compared to the felt environment. The average T 20 for the felt and wood panel environments is approximately 0.74 s and 0.84 s, respectively. The faster sound decay in the felt panels aligns with their higher sound absorption capabilities compared to the wood panels. The overall trend remains consistent between the NT-SF1 and Eigenmike measurements, but the latter shows slight variations in T 20 values among different source positions beyond 500 Hz.
Figure 6a,b illustrate 2D heatmaps of T 20 values across receiver locations for both paneling environments and all three source locations, calculated from the RIRs measured by the NT-SF1 microphone and em32 Eigenmike, respectively. In these visualizations, higher T 20 values are noticeably closer to the front and right walls, particularly in the front-right corner of the room (refer to Figure 1 and Figure 3 for orientation). This observation suggests the influence of the overhang in this corner, a phenomenon consistent with similar effects observed in other rooms with such structures [33,34]. Additional acoustic treatment around this corner can help to achieve a more uniform distribution of reverb effect across the room.
In Figure 6a, we can see lower T 20 values for a few receiver points directly in front of the source. However, this observation is moderated in Figure 6b for the Eigenmike measurements. Moreover, the local variations in T 20 are distinct between NT-SF1 (Figure 6a) and Eigenmike (Figure 6b). This discrepancy warrants deeper analysis, considering that the Eigenmike captures finer spatial details of the reflected sound compared to NT-SF1.

4.2. Clarity

Clarity ( C 80 ) characterizes the transparency of the sound and is computed from the RIR h ( t ) as follows [1,31]:
C 80 = 10 log ( ( 0 80 ms h 2 ( t ) d t ) ( 80 ms h 2 ( t ) d t ) ) dB
Figure 7a,b present the C 80 values across frequency in octave bands, derived from RIRs measured by the Rode NT-SF1 microphone and em32 Eigenmike, respectively. C 80 has an inverse relation to T 20 [35] and we can observe this between Figure 5 and Figure 7. Similar to the T 20 plots in Figure 5, the distinction between the felt and wood panels becomes apparent after 500 Hz, where the felt panel environment exhibits higher clarity. This aligns with expectations, as felt panels absorb more late reflections, resulting in enhanced clarity.
Figure 8a,b illustrate 2D heatmaps of C 80 values across receiver locations for both room paneling environments and all three source locations, calculated from the RIRs measured by the NT-SF1 microphone and em32 Eigenmike, respectively. Higher C 80 values are concentrated directly in front of the source location, with lower values for receivers situated further away. This underscores the influence of loudspeaker directivity and interference of reflections from the walls. Source S3 comparatively has the lowest levels of clarity. This may be potentially due to its proximity to glass doors and windows along the control panel wall, leading to increased interference from reflections. In both felt and wood environments, the highest C 80 values were observed for source S1. These observations are crucial for recording singers and speech sources, guiding the selection of optimal source–receiver placements.
The spatial spread of high clarity around the sources is more restricted in Figure 8b compared to Figure 8a. This again corroborates the differences in spatial capturing capabilities of the two microphones, necessitating further analysis as mentioned in Section 4.1.

4.3. Definition

The acoustic definition ( D 50 ) serves as a valuable metric for speech intelligibility, quantifying the distinctness of sound [1]. It is calculated from the RIR h ( t ) as follows [1,31]:
D 50 = 0 50 ms h 2 ( t ) d t 0 h 2 ( t ) d t × 100 %
Figure 9a,b present the D 50 values across frequency in octave bands, derived from RIRs measured by the Rode NT-SF1 microphone and em32 Eigenmike, respectively. The overall trend aligns with C 80 as they are closely correlated parameters [1]. Felt panels yield a higher level of definition, with an average of 71%, compared to wooden panels, with an average of 67%. The average D 50 suggests that this recording studio can ensure a minimum of 90% syllable intelligibility in both environments [1].

4.4. Center Time

Center time ( T s ) signifies the center of gravity or first moment of the squared RIR h 2 ( t ) and is computed as follows [1,31]:
T s = 0 t h 2 ( t ) d t 0 h 2 ( t ) d t
Figure 10a,b present the T s values across frequency in octave bands, derived from RIRs measured by the Rode NT-SF1 microphone and em32 Eigenmike, respectively. The overall trend aligns with T 20 shown in Figure 5. Similar to other parameters, the distinction between the felt and wood panels becomes evident after 500 Hz. The felt and wood panels yield an average T s of 0.04 s and 0.05 s, respectively. These values indicate that the felt environment can offer slightly higher music transparency and better speech intelligibility than the wood environment [1].

5. Noise Interference and Outlier Measurements

During the organization of the dataset, an outlier measurement by the NT-SF1 microphone was identified at the receiver coordinate ( x = 0.75 , y = 5.25 , z = 1.7 ) m labeled as 00 x 04 y . The irregularity in the W-channel RIR at 00 x 04 y , compared to the neighboring receiver point at 00 x 03 y , is shown in Figure 11.
Upon closer investigation, it became apparent that the anomaly was likely caused by air conditioner noise, given the proximity of a duct directly above this receiver point. Measurements repeated at and around this receiver point confirmed that the issue persisted at different heights at ( x = 0.75 , y = 5.25 ) m. However, the distortion diminished when the receiver was shifted by approximately 50 cm in the xy plane.
In the recording studio, there are four air conditioner ducts, but only the duct above 04 x 00 y appeared to influence the measurements. The impact of the other ducts was either obscured by furnishings or mitigated by felt surfaces. It is noteworthy that the measurements taken by the em32 Eigenmike at this position did not exhibit any noticeable distortions.
Increasing the SNR with respect to the receiver at 00 x 04 y can mitigate the effects of noise interference. However, for this dataset, the loudspeaker volume was set to emulate typical studio performance levels and was kept unchanged while measuring at all receiver positions, reflecting real performance scenarios. Moreover, the large receiver region considered in our dataset makes SNR optimization impractical.
Despite the acknowledged impact of air conditioner noise, it is essential to highlight that the RIR measurements were intentionally conducted without switching off the air conditioners to capture real recording scenarios. Therefore, the RIRs corresponding to the receiver code 00 x 04 y measured by the NT-SF1 microphone are still included in the dataset but separated into an ‘Outlier’ folder, warranting consideration for their specific nature in any application.

6. Conclusions

This paper introduced the RSoANU dataset, a collection of RIRs recorded in the Australian National University (ANU) Recording Studio. The dataset encompasses diverse acoustic scenarios featuring two wall panel environments, three source positions, and a planar grid of receiver locations. Recorded with both a 32-channel spherical microphone array and a B-format soundfield microphone array, the dataset includes 396 32-channel and 396 4-channel RIRs.
We utilized the RSoANU dataset to analyze the acoustics within the ANU Recording Studio by examining key parameters such as reverberation time ( T 20 ), clarity ( C 80 ), definition ( D 50 ), and center time ( T s ). The results highlighted distinct differences between felt and wood panel environments, emphasizing the impact of panel material on acoustic properties. Felt panels consistently exhibited faster sound decay, higher clarity, and better definition compared to wood panels, particularly in the mid-to-high frequency range. The variations in the parameter values across the receiver grid for different source positions revealed the influence of room environment and source–receiver positioning on acoustic features. Notably, C 80 exhibited higher values directly in front of the source location, while T 20 showed elevated values near the front and right walls.
The RSoANU dataset holds the potential to contribute to the research and development of various audio applications, facilitating their validation and enhancement in authentic acoustic environments. It can aid in the designing of recording studios and guide optimal source and receiver placements across different recording scenarios. Future work will involve a more in-depth acoustic analysis, focusing on examining the trade-offs between information captured by a higher-order spherical microphone array and a commercially viable B-format soundfield microphone array.

Author Contributions

Conceptualization, G.C., A.B. and T.A.; data curation, G.C. and A.B.; formal analysis, G.C. and A.B.; investigation, G.C.; methodology, G.C. and A.B.; project administration, G.C., A.B. and T.A.; resources, A.B. and T.A.; supervision, A.B. and T.A.; validation, G.C., A.B. and T.A.; visualization, G.C. and A.B.; writing—original draft, G.C. and A.B.; writing—review and editing, G.C., A.B. and T.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

RSoANU dataset is openly available in Zenodo at https://doi.org/10.5281/zenodo.10720345 under the Creative Commons Attribution 4.0 International license.

Acknowledgments

Authors would like to thank Matthew Barnes and Craig Greening of the ANU School of Music for providing access to the recording studio and granting permission to conduct the measurements.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ANUAustralian National University
RIRRoom Impulse Response
VRVirtual Reality
XReXtended Reality
ESSExponential Swept Sine
SNRSignal-to-Noise Ratio

Appendix A

Table A1, Table A2 and Table A3 display the Cartesian coordinates and code names corresponding to the source, receiver, and source-dependent receiver locations, respectively.
Table A1. Source coordinates (in meters) and code names.
Table A1. Source coordinates (in meters) and code names.
Code NameXYZ
S14.756.751.2
S24.254.751.384
S362.250.93
Table A2. Receiver coordinates (in meters) and code names.
Table A2. Receiver coordinates (in meters) and code names.
Code NameXYZCode NameXYZ
00 x 00 y 0.751.251.704x00y4.751.251.7
00 x 01 y 0.752.251.7 04 x 01 y 4.752.251.7
00 x 02 y 0.753.251.7 04 x 02 y 4.753.251.7
00 x 03 y 0.754.251.7 04 x 03 y 4.754.251.7
00 x 04 y 0.755.251.7 04 x 04 y 4.755.251.7
00 x 05 y 0.756.251.7 04 x 05 y 4.756.251.7
00 x 06 y 0.757.251.7 04 x 06 y 4.757.251.7
00 x 07 y 0.758.251.7 04 x 07 y 4.758.251.7
01 x 00 y 1.751.251.7 03 x 00 y 5.751.251.7
01 x 01 y 1.752.251.7 05 x 01 y 5.752.251.7
01 x 02 y 1.753.251.7 05 x 02 y 5.753.251.7
01 x 03 y 1.754.251.7 05 x 03 y 5.754.251.7
01 x 04 y 1.755.251.7 05 x 04 y 5.755.251.7
01 x 05 y 1.756.251.7 05 x 05 y 5.756.251.7
01 x 06 y 1.757.251.7 05 x 06 y 5.757.251.7
01 x 07 y 1.758.251.7 05 x 07 y 5.758.251.7
02 x 00 y 2.751.251.7 06 x 00 y 6.751.251.7
02 x 01 y 2.752.251.7 06 x 01 y 6.752.251.7
02 x 02 y 2.753.251.7 06 x 02 y 6.753.251.7
02 x 03 y 2.754.251.7 06 x 03 y 6.754.251.7
02 x 04 y 2.755.251.7 06 x 04 y 6.755.251.7
02 x 05 y 2.756.251.7 06 x 05 y 6.756.251.7
02 x 06 y 2.757.251.7 06 x 06 y 6.757.251.7
02 x 07 y 2.758.251.7 06 x 07 y 6.758.251.7
03 x 00 y 3.751.251.7 07 x 00 y 7.751.251.7
03 x 01 y 3.752.251.7 07 x 01 y 7.752.251.7
03 x 02 y 3.753.251.7 07 x 02 y 7.753.251.7
03 x 03 y 3.754.251.7 07 x 03 y 7.754.251.7
03 x 04 y 3.755.251.7 07 x 04 y 7.755.251.7
03 x 05 y 3.756.251.7 07 x 05 y 7.756.251.7
03 x 06 y 3.757.251.7 07 x 06 y 7.757.251.7
03 x 07 y 3.758.251.7 07 x 07 y 7.758.251.7
Table A3. Coordinates (in meters) and code names of additional receiver locations corresponding to each source.
Table A3. Coordinates (in meters) and code names of additional receiver locations corresponding to each source.
Source
Code Name
Receiver
Code NameXYZ
S1 e 30 x 45 y 3.755.751.7
e 35 x 45 y 4.255.751.7
e 35 x 50 y 4.256.251.7
e 40 x 45 y 4.755.751.7
e 45 x 45 y 5.255.751.7
e 45 x 50 y 5.256.251.7
S2 e 25 x 25 y 3.253.751.7
e 25 x 30 y 3.254.251.7
e 30 x 25 y 3.753.751.7
e 35 x 25 y 4.253.751.7
e 35 x 30 y 4.254.251.7
e 40 x 25 y 4.753.751.7
e 45 x 25 y 5.253.751.7
e 45 x 30 y 5.254.251.7
S3 e 45 x 00 y 5.251.251.7
e 45 x 05 y 5.251.751.7
e 45 x 10 y 5.252.251.7
e 45 x 15 y 5.252.751.7
e 45 x 20 y 5.253.251.7
e 50 x 05 y 5.751.751.7
e 50 x 15 y 5.752.751.7

References

  1. Kuttruff, H. Room Acoustics, 6th ed.; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar] [CrossRef]
  2. Ando, Y. Design Study. In Concert Hall Acoustics; Springer: Berlin/Heidelberg, Germany, 1985; Chapter 6; pp. 89–101. [Google Scholar] [CrossRef]
  3. Farina, A.; Amendola, A.; Capra, A.; Varani, C. Spatial Analysis of Room Impulse Responses Captured with a 32-Capsule Microphone Array. In Proceedings of the 130th Audio Engineering Society Convention, London, UK, 13–16 May 2011. [Google Scholar]
  4. Cox, T.J.; D’Antonio, P.; Avis, M.R. Room Sizing and Optimization at Low Frequencies. J. Audio Eng. Soc. 2004, 52, 640–651. [Google Scholar]
  5. Long, M. Architectural Acoustics, 2nd ed.; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
  6. Dong, H.Y.; Lee, C.M. Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering. Eurasip J. Audio Speech Music Process. 2018, 2018, 1–13. [Google Scholar] [CrossRef]
  7. Kim, H.; Remaggi, L.; Jackson, P.J.; Hilton, A. Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360° Images. In Proceedings of the 2019 IEEE Conference on Virtual Reality and 3D User Interfaces, Osaka, Japan, 23–27 March 2019; pp. 120–126. [Google Scholar] [CrossRef]
  8. Vorländer, M. Computer simulations in room acoustics: Concepts and uncertainties. J. Acoust. Soc. Am. 2013, 133, 1203–1213. [Google Scholar] [CrossRef] [PubMed]
  9. Schröder, D. Physically Based Real-Time Auralization of Interactive Virtual Environments. Ph.D. Thesis, RWTH Aachen University, Aachen, Germany, 2011. [Google Scholar]
  10. Brinkmann, F.; Aspöck, L.; Ackermann, D.; Lepa, S.; Vorländer, M.; Weinzierl, S. A round robin on room acoustical simulation and auralization. J. Acoust. Soc. Am. 2019, 145, 2746–2760. [Google Scholar] [CrossRef] [PubMed]
  11. Murphy, D.T.; Shelley, S. OpenAIR: An Interactive Auralization Web Resource and Database. In Proceedings of the 129th Audio Engineering Society Convention, San Francisco, CA, USA, 4–7 November 2010. [Google Scholar]
  12. Götz, G.; Schlecht, S.J.; Pulkki, V. A Dataset of Higher-Order Ambisonic Room Impulse Responses and 3D Models Measured in a Room with Varying Furniture. In Proceedings of the 2021 Immersive and 3D Audio: From Architecture to Automotive (I3DA), Bologna, Italy, 8–10 September 2021; pp. 1–8. [Google Scholar] [CrossRef]
  13. Kearney, G.; Daffern, H.; Cairns, P.; Hunt, A.; Lee, B.; Cooper, J.; Tsagkarakis, P.; Rudzki, T.; Johnston, D. Measuring the Acoustical Properties of the BBC Maida Vale Recording Studios for Virtual Reality. Acoustics 2022, 4, 783–799. [Google Scholar] [CrossRef]
  14. Stewart, R.; Sandler, M. Database of omnidirectional and B-format room impulse responses. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 165–168. [Google Scholar] [CrossRef]
  15. Hadad, E.; Heese, F.; Vary, P.; Gannot, S. Multichannel audio database in various acoustic environments. In Proceedings of the 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), Juan-les-Pins, France, 8–11 September 2014; pp. 313–317. [Google Scholar] [CrossRef]
  16. Eaton, J.; Gaubitch, N.D.; Moore, A.H.; Naylor, P.A. Estimation of Room Acoustic Parameters: The ACE Challenge. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 24, 1681–1693. [Google Scholar] [CrossRef]
  17. Szöke, I.; Skácel, M.; Mošner, L.; Paliesek, J.; Černocký, J. Building and evaluation of a real room impulse response dataset. IEEE J. SElect. Top. Signal Process. 2019, 13, 863–876. [Google Scholar] [CrossRef]
  18. Carlo, D.D.; Tandeitnik, P.; Foy, C.; Bertin, N.; Deleforge, A.; Gannot, S. dEchorate: A Calibrated Room Impulse Response Dataset For Echo-Aware Signal Processing. EURASIP J. Audio Speech, Music Process. 2021, 2021, 1–15. [Google Scholar] [CrossRef]
  19. Koyama, S.; Nishida, T.; Kimura, K.; Abe, T.; Ueno, N.; Brunnström, J. MESHRIR: A Dataset of Room Impulse Responses on Meshed Grid Points for Evaluating Sound Field Analysis and Synthesis Methods. In Proceedings of the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 17–20 October 2021; pp. 1–5. [Google Scholar] [CrossRef]
  20. Prawda, K.; Schlecht, S.J.; Välimäki, V. Calibrating the Sabine and Eyring formulas. J. Acoust. Soc. Am. 2022, 152, 1158–1169. [Google Scholar] [CrossRef] [PubMed]
  21. Recording Studio—ANU School of Music Sound Recording & Music Technology Studios. Available online: https://music.cass.anu.edu.au/services/studio-services/recording-studio (accessed on 4 February 2024).
  22. John Sayers Productions. The Recording Manual Part II—Studio Design. 2000. Available online: https://www.tendolla.com/design/studio%20design.pdf (accessed on 28 February 2024).
  23. TANNOY. TANNOY SYSTEM 600 NEARFIELD MONITOR. Available online: https://www.fullcompass.com/common/files/2709-System600SpecSheet.pdf (accessed on 4 February 2024).
  24. RODE. NT-SF1. Available online: https://edge.rode.com//pdf/products/94/NT-SF1_V02.pdf (accessed on 28 February 2024).
  25. mh acoustics. Em32 Eigenmike® Microphone Array Release Notes (v18.0). 2014. Available online: https://mhacoustics.com/sites/default/files/EigenmikeReleaseNotesV18.pdf (accessed on 28 February 2024).
  26. mh acoustics. Eigenbeam Data Specification for Eigenbeams. 2016. Available online: https://www.lesonbinaural.fr/EDIT/DOCS/mh_acoustics_eigenmike.PDF (accessed on 28 February 2024).
  27. Zoom. The Zoom F6. Available online: https://zoomcorp.com/en/us/field-recorders/field-recorders/f6/ (accessed on 1 March 2024).
  28. 4338.0—Profiles of Health, Australia, 2011-13. Available online: www.abs.gov.au/ausstats/[email protected]/lookup/4338.0main+features212011-13 (accessed on 28 February 2024).
  29. Guidorzi, P.; Barbaresi, L.; D’Orazio, D.; Garai, M. Impulse Responses Measured with MLS or Swept-Sine Signals Applied to Architectural Acoustics: An In-depth Analysis of the Two Methods and Some Case Studies of Measurements Inside Theaters. Energy Procedia 2015, 78, 1611–1616. [Google Scholar] [CrossRef]
  30. Chesworth, G.; Bastine, A.; Abhayapala, T.D. RSoANU: RIR Dataset. Zenodo, Dataset. 2024. Available online: https://zenodo.org/records/10720345 (accessed on 28 February 2024).
  31. ISO 3382-1; Acoustics—Measurement of Room Acoustic Parameters—Part 1: Performance Spaces. International Organization for Standardization: Geneva, Switzerland, 2009.
  32. Karjalainen, M.; Ansalo, P.; Mäkivirta, A.; Peltonen, T.; Välimäki, V. Estimation of Modal Decay Parameters from Noisy Response Measurements. J. Audio Eng. Soc. 2002, 50, 867–878. [Google Scholar]
  33. Gulsrud, T.; Exton, P.; van der Harten, A.; Kirkegaard, L. Room Acoustics Investigations in Hamer Hall at the Arts Centre, Melbourne. In Proceedings of the International Symposium on Room Acoustics (ISRA), Melbourne, Australia, 29–31 August 2010. [Google Scholar]
  34. Kimmich, J.M.; Schlesinger, A.; Ochmann, M.; Frank, S. Optimisation of the Orchestra Pit Acoustics in Opera Houses by Acoustical Simulations using the Finite Element Method. MATEC Web Conf. 2018, 251, 1–8. [Google Scholar] [CrossRef]
  35. Adelman-Larsen, N.W.; Thompson, E.R.; Gade, A.C. Suitable reverberation times for halls for rock and pop music. J. Acoust. Soc. Am. 2010, 127, 247–255. [Google Scholar] [CrossRef] [PubMed]
Figure 1. ANU Recording Studio: (a,b) panorama view; (c,d) zoomed view of panels.
Figure 1. ANU Recording Studio: (a,b) panorama view; (c,d) zoomed view of panels.
Applsci 14 02095 g001
Figure 2. Receivers and source used in data collection.
Figure 2. Receivers and source used in data collection.
Applsci 14 02095 g002
Figure 3. Diagram of the source and receiver positions. Light blue rectangular shapes indicate locations of furnishings.
Figure 3. Diagram of the source and receiver positions. Light blue rectangular shapes indicate locations of furnishings.
Applsci 14 02095 g003
Figure 4. Organization of RSoANU dataset.
Figure 4. Organization of RSoANU dataset.
Applsci 14 02095 g004
Figure 5. Reverberation time ( T 20 ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Figure 5. Reverberation time ( T 20 ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Applsci 14 02095 g005
Figure 6. Reverberation time ( T 20 ) in seconds at different receiver locations for each of the three source locations and the two wall paneling types calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike. The black arrowhead indicates the source location and orientation. The white areas represent spaces where no receivers were positioned. Note that colormap limits are different for felt and wood panels.
Figure 6. Reverberation time ( T 20 ) in seconds at different receiver locations for each of the three source locations and the two wall paneling types calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike. The black arrowhead indicates the source location and orientation. The white areas represent spaces where no receivers were positioned. Note that colormap limits are different for felt and wood panels.
Applsci 14 02095 g006
Figure 7. Clarity ( C 80 ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Figure 7. Clarity ( C 80 ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Applsci 14 02095 g007
Figure 8. Clarity ( C 80 ) in dB at different receiver locations for each of the three source locations and the two wall paneling types calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike. The black arrowhead indicates the source location and orientation. The white areas represent spaces where no receivers were positioned.
Figure 8. Clarity ( C 80 ) in dB at different receiver locations for each of the three source locations and the two wall paneling types calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike. The black arrowhead indicates the source location and orientation. The white areas represent spaces where no receivers were positioned.
Applsci 14 02095 g008
Figure 9. Definition ( D 50 ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Figure 9. Definition ( D 50 ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Applsci 14 02095 g009
Figure 10. Center time ( T s ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Figure 10. Center time ( T s ) across frequency in octave bands for the felt and wood environments calculated from the RIRs measured by (a) Rode NT-SF1 microphone, (b) em32 Eigenmike.
Applsci 14 02095 g010
Figure 11. RIRs measured by the NT-SF1 microphone in the wood panel environment with source at S2 and receiver at the coordinates (a) 00 x 04 y and (b) 00 x 03 y .
Figure 11. RIRs measured by the NT-SF1 microphone in the wood panel environment with source at S2 and receiver at the coordinates (a) 00 x 04 y and (b) 00 x 03 y .
Applsci 14 02095 g011
Table 1. Comparison of publicly available RIR datasets. (#)RIRs indicate the number of RIRs times the number of channels. (#)Rooms indicate the number of rooms times the number of configurations. The last row corresponds to the introduced dataset.
Table 1. Comparison of publicly available RIR datasets. (#)RIRs indicate the number of RIRs times the number of channels. (#)Rooms indicate the number of rooms times the number of configurations. The last row corresponds to the introduced dataset.
Name#RIRs#Rooms#Source Positions#Reciever PositionsT60 (s)Microphone
C4DM [14]468 × 1,
468 × 4
3 × 11130, 1691.7–2.7Omnidirectional, B-format array
Hadad [15]234 × 81 × 32630.16–0.618-element linear array
ACE [16]14
× {2,3,5,8,32}
7 × 1120.34–1.25{2,3,5,8,32}-element
arrays of different
configurations
BUT
ReverbDB [17]
1426 × 19 × 13–10310.59–1.852-10 omnidirectional
microphones of
several mountings
dEchorate [18]60 × 301 × 10610.18–0.756 5-element
linear array
MeshRIR [19]3969 × 1,
441 × 32
1 × 11, 323969, 4410.19–0.38Omnidirectional
Motus [12]3320 × 321 × 830410.5–1.5em32 Eigenmike
Arni [20]26560 × 11 × 5312150.25–1.5Omnidirectional
BBC Maida
Vale [13]
1586 × 322 × 13–74–450.7–0.9em32 Eigenmike
RSoANU396 × 4,
396 × 32
1 × 2365–670.6–0.9B-format array,
em32 Eigenmike
Table 2. Mean and standard deviation of reverberation time ( T 20 ), clarity ( C 80 ), definition ( D 50 ), and center time ( T s ) calculated using RIRs across the receiver region.
Table 2. Mean and standard deviation of reverberation time ( T 20 ), clarity ( C 80 ), definition ( D 50 ), and center time ( T s ) calculated using RIRs across the receiver region.
ParameterFelt PanelsWood Panels
S1S2S3S1S2S3
T 20  (s)0.74 ± 0.030.74 ± 0.040.73 ± 0.030.84 ± 0.030.84 ± 0.040.83 ± 0.04
C 80   (dB)8.20 ± 8.647.91 ± 6.67.94 ± 5.656.83 ± 5.626.88 ± 5.436.84 ± 4.27
D 50  (%)70.40 ± 8.4569.98 ± 8.9871.27 ± 8.0466.39 ± 8.4866.17 ± 9.3967.39 ± 8.24
T s  (ms)40.47 ± 10.7341.58 ± 11.9140.56 ± 11.2346.44 ± 11.0847.31 ± 12.7246.13 ± 11.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chesworth, G.; Bastine, A.; Abhayapala, T. Room Impulse Response Dataset of a Recording Studio with Variable Wall Paneling Measured Using a 32-Channel Spherical Microphone Array and a B-Format Microphone Array. Appl. Sci. 2024, 14, 2095. https://doi.org/10.3390/app14052095

AMA Style

Chesworth G, Bastine A, Abhayapala T. Room Impulse Response Dataset of a Recording Studio with Variable Wall Paneling Measured Using a 32-Channel Spherical Microphone Array and a B-Format Microphone Array. Applied Sciences. 2024; 14(5):2095. https://doi.org/10.3390/app14052095

Chicago/Turabian Style

Chesworth, Grace, Amy Bastine, and Thushara Abhayapala. 2024. "Room Impulse Response Dataset of a Recording Studio with Variable Wall Paneling Measured Using a 32-Channel Spherical Microphone Array and a B-Format Microphone Array" Applied Sciences 14, no. 5: 2095. https://doi.org/10.3390/app14052095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop