Next Article in Journal
Heat Generation and Temperature Control during Bone Drilling for Orthodontic Mini-Implants: An In Vitro Study
Next Article in Special Issue
On the Sequence of Unmasked Reflections in Shoebox Concert Halls
Previous Article in Journal
On Unmanned Aerial Vehicles Light Show Systems: Algorithms, Software and Hardware
Previous Article in Special Issue
Coherent Image Source Modeling of Sound Fields in Long Spaces with a Sound-Absorbing Ceiling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Power Response and Modal Decay Estimation of Room Reflections from Spherical Microphone Array Measurements Using Eigenbeam Spatial Correlation Model

by
Amy Bastine
*,†,
Thushara D. Abhayapala
and
Jihui (Aimee) Zhang
Audio & Acoustic Signal Processing Group, The Australian National University, Canberra 2601, Australia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2021, 11(16), 7688; https://doi.org/10.3390/app11167688
Submission received: 30 June 2021 / Revised: 9 August 2021 / Accepted: 19 August 2021 / Published: 21 August 2021
(This article belongs to the Special Issue Advances in Architectural Acoustics)

Abstract

:

Featured Application

Room Mode Analysis.

Abstract

Modal decays and modal power distribution in acoustic environments are key factors in deciding the perceptual quality and performance accuracy of audio applications. This paper presents the application of the eigenbeam spatial correlation method in estimating the time-frequency-dependent directional reflection powers and modal decay times. The experimental results evaluate the application of the proposed technique for two rooms with distinct environments using their room impulse response (RIR) measurements recorded by a spherical microphone array. The paper discusses the classical concepts behind room mode distribution and the reasons behind their complex behavior in real environments. The time-frequency spectrum of room reflections, the dominant reflection locations, and the directional decay rates emulate a realistic response with respect to the theoretical expectations. The experimental observations prove that our model is a promising tool in characterizing early and late reflections, which will be beneficial in controlling the perceptual factors of room acoustics.

1. Introduction

In any enclosed acoustic space, the sound received by a listener is the superposition of the direct sound from the source and the reflected sounds from the surrounding surfaces. The numerous reflections termed reverberation cause persistence of sound even after the source ceases, until these reflected waves decay due to absorption by the surrounding surfaces. The intricate sound field generated by these reflected waves provides the sense of acoustic space to the perceived sound. However, severe reverberation can cause spectral distortions and reduce speech intelligibility. The study of reverberation is complicated since it is a product of many factors like sound frequency, room shape, room size, room geometry, source and receiver locations, source and receiver directivity, etc. A comprehensive understanding of the reflection sound field distribution, resonant frequencies, and modal decay rates is necessary to control audible artifacts and achieve desired sound perception quality in room acoustic applications.
Initially, the objective parameters like reverberation time, percentage articulation (PA) [1], decay rates [2], and statistical measures of room impulse responses (RIR) [3] were the only measures of reverberation. However, later studies [4,5] found that these measures vary with the sound frequency and wall surface properties. This necessitated the frequency-dependent spatio-temporal analysis of sound fields for accurate characterization of room acoustics. The existing 3D room acoustic parameter estimation methods either depend on predictions based on computational acoustics or derive the parameters directly from real sound field measurements. The room acoustic analysis using prominent computational models like ray/geometrical [6,7], wave/element [8], statistical energy [9], or synthetic RIR [10,11] methods are computationally complex and applicable to limited frequency ranges. The lack of proper consideration of the source and environment factors, frequency-dependent wave behavior, and precise reflection methods reduce the estimation accuracy of these computational approaches, especially in highly reverberant environments [12]. Furthermore, the analysis of intermediate frequencies using these computational models is complicated because of the dominant diffraction effects and the influence of both wave and ray acoustic behaviors.
The characterization of real acoustic environments requires 3D acoustic scene analysis using spatial sound field measurements. This led to the development of several microphone arrays designs [13,14,15] and processing methods like sound intensity mapping [16], plane-wave decomposition (PWD) and steered beamforming [17,18,19], sound intensity vector analysis [20], and multi-channel correlation model [21]. Gover et al. used PWD beamforming in [18] to estimate the angular distribution and anisotropy index of the spatial sound field from the RIRs recorded by a spherical microphone array. The recent works in [22,23,24] allow similar analysis in terms of isotropy measures and directional energy decays using Schroeder integration [25] and PWD of directional RIRs. However, these methods require a large number of RIR measurements for an accurate analysis of the room acoustic field. This problem was overcome with the introduction of higher-order spherical harmonic (eigenbeam)-based processing of spherical microphone array measurements [12,26,27,28], which provided higher spatial resolution for analysis compared to the previous methods. Subsequently, more robust techniques [29,30,31] were developed to achieve efficient parameterization of the spatial sound field using modal decomposition. In [32], the eigenbeam rotational invariance technique (EB-SPRIT) was used to identify room modes and damping parameters from RIRs. In [33,34], Samarasinghe et al. used the spatial correlation of higher-order eigenbeams to estimate the directional characteristics of the reverberant field, and this approach was able to achieve an accurate estimation of direct-to-reverberant energy ratio and dominant reflection directions.
The majority of the existing methods of directional characterization of room reflections derive the parameters from the aggregate sound field formed by the direct and reflected waves. Even though the direct path can be removed from the RIRs, the spatial resolution for directional analysis will be limited by the number of microphones. Moreover, a fine-scale separation of the spatial components of the direct path and reflected path is difficult without the knowledge of the source directivity. Additionally, the lack of incorporation of frequency-dependent surface reflectivities with distinct decay times can cause severe errors in the reflected sound field power distribution estimated by the existing methods [18,24,32]. Hence, a competent room characterization tool should integrate the frequency, time, and spatial dependencies in the formulation of the reflected sound field.
In this paper, we utilize the spatial correlation of higher-order eigenbeams to estimate the directional power response of room reflections by processing the RIR measurements. The proposed technique further facilitates room mode analysis and directional decay rate estimation. In comparison to the previous version of this method in [33,34], we model the reflection power as a function of time, frequency, and direction for comprehending the influence of frequency-dependent wall absorption properties of the room surfaces. This method allows the estimation of the directional features of reflections with higher spatial resolution independent of the direct sound component. The room mode features, directional decay rates and dominant reflection locations generated from the proposed tool can serve many applications like room response equalization, acoustic treatment design, architectural design simulations, room geometry inference, auralization of historic buildings, archaeoacoustics, and other machine hearing technologies.
The remainder of this paper is organized as follows: Section 2 discusses the formulation and implementation procedure of the eigenbeam spatial correlation model for estimating the reflection power response. Section 3 presents the experimental results including the time-frequency spectrum of reflection power, directional decay rates, and dominant reflection directions. Section 4 concludes the paper with a summary of the key findings and mentions the future research plans.

2. Reflection Power Estimation Using Eigenbeam Spatial Correlation Model

In this section, we present the formulation and synthesis of reflection power as a function of time, frequency, and space in the spherical harmonics domain.

2.1. Problem Formulation

Consider a convex room with a single sound source and a spherical microphone array of radius R with Q omnidirectional microphones centered at a location O, as shown in Figure 1. Let the spherical coordinate y o = ( r o , θ o , ϕ o ) denote the sound source location with respect to O. Similarly, the q th microphone element is located at x q = ( R , θ q , ϕ q ) for q { 1 , 2 , , Q } . In this paper, all the elevation angles are [ 0 , π ] downwards from the Z-axis and the azimuth angles are [ 0 , 2 π ) counterclockwise from the X-axis.
We treat the room as a linear time-invariant (LTI) acoustic transmission system whose dynamic behavior is represented by the RIRs derived from the spherical microphone array measurements. Let H ( x q , y o , t , k ) be the room transfer function (RTF), between the source at y o and the microphone element at x q , obtained from the short-time Fourier transform (STFT) of the RIR. Here, t is the STFT temporal frame index and k = 2 π f / c is the wavenumber with f and c representing the frequency and speed of sound, respectively. Since the incident sound field at the receiver contains the direct sound and the reflections, we can decompose the RTF H ( x q , y o , t , k ) as
H ( x q , y o , t , k ) = H d ( x q , y o , t , k ) + H r ( x q , y o , t , k )
where H d ( x q , y o , t , k ) and H r ( x q , y o , t , k ) are the direct path and reflected path components, respectively.
Assuming that the distance between y o and x q is significantly larger than the aperture size of the microphone array, we can represent H d ( x q , y o , t , k ) and H r ( x q , y o , t , k ) as a composition of plane waves in the spatial domain as
H d ( x q , y o , t , k ) = G D ( t , k | y o ) e i k y ^ o . x q
H r ( x q , y o , t , k ) = y ^ G R ( t , k , y ^ | y o ) e i k y ^ . x q d y ^
where G D ( t , k | y o ) is the direct path gain with respect to O, y ^ o is the unit vector along the source direction, i = 1 , G R ( t , k , y ^ | y o ) is the gain of the reflected plane wave arriving from the direction y ^ = ( 1 , θ , ϕ ) , and y ^ d y ^ = 0 2 π 0 π sin θ d θ d ϕ . Here, we have modeled the reflection gain G R as a non-isotropic directional distribution function that varies with frequency and time to comprehend a real room with inhomogeneous surfaces that have frequency-dependent wall impedance and damping coefficients.
By examining E H d H d * based on (2), where E { · } represents the statistical expectation operator, we can express the direct path power as
P D ( t , k | y o ) = E | G D ( t , k | y o ) | 2
where | · | denotes the absolute value. Similarly, by examining E H r H r * based on (3), we can write the power of the reflected sound field component incoming from the direction y ^ as
P R ( t , k , y ^ | y o ) = E | G R ( t , k , y ^ | y o ) | 2 .
We aim to estimate the reflection power P R ( t , k , y ^ | y o ) from the RTFs H ( x q , y o , t , k ) q obtained using a spherical microphone array. Since P R ( t , k , y ^ | y o ) is a spherical function, we can simplify its estimation using the spherical harmonic decomposition [35] given by
P R ( t , k , y ^ | y o ) = v = 0 u = v v γ v u ( t , k | y o ) Y v u ( y ^ )
where γ v u ( t , k | y o ) are the reflection power coefficients and Y v u ( · ) is the spherical harmonic function of v th order and u th mode. Thus, we can calculate the reflection power for any incoming direction and time-frequency bin once we estimate γ v u ( t , k | y o ) coefficients.

2.2. Methodology

For determining the γ v u ( t , k | y o ) coefficients, we utilize the spatial correlation of higher-order spherical harmonic (eigenbeam) coefficients of the incident sound field. The estimation of the reflection power response involves two main steps:
  • Step 1: Estimating spherical harmonic coefficients of the incident sound field
In this work, since we are interested in characterizing the room response independent of the source power spectrum, we assume a sound source emitting an impulse signal and treat H ( x q , y o , t , k ) as the incident sound field on the spherical microphone array. For deducing the higher-order spherical harmonic coefficients of the incident sound field, we represent H ( x q , y o , t , k ) as the spherical harmonic decomposition of Helmholtz wave equation solution to the interior sound field problem [12] as
H ( x q , y o , t , k ) = n = 0 m = n n α n m ( t , k | y o ) b n ( k R ) Y n m ( x ^ q )
where α n m ( t , k | y o ) are the modal coefficients of the spatial sound field, x ^ q is the unit vector in the direction of the q th microphone, and
b n ( k R ) = j n ( k R ) for an open array j n ( k R ) j n ( k R ) h n ( k R ) h n ( k R ) for a rigid array
with j n ( · ) and h n ( · ) denoting the spherical Bessel and Hankel functions of order n, respectively, and ( · ) represents the first derivative operation. From (7), we can estimate α n m ( t , k | y o ) coefficients using the orthogonal property of spherical harmonics [36] as
α n m ( t , k | y o ) = q = 1 Q H ( x q , y o , t , k ) Y n m * ( x ^ q ) b n ( k R )
where ( · ) * denotes the complex conjugation operation. Practically, we truncate α n m ( t , k | y o ) to an order N, such that N = k R and Q ( N + 1 ) 2 , where · denotes the ceiling operation, to avoid errors due to spatial aliasing and high-pass nature of higher-order Bessel functions [36].
  • Step 2: Estimating reflection gains using the spatial correlation model
We can now estimate γ v u ( t , k | y o ) from the α n m ( t , k | y o ) coefficients using the spatial correlation matrix expression [33] given by
Λ 0000 Λ 001 1 Λ 00 N N Λ 1 100 Λ N N N N Λ ( t , k | y o ) = δ 0000 d 000000 d 0000 V V δ 001 1 d 001 100 d 001 1 V V δ 00 N N d 00 N N 00 d 00 N N V V δ 1 100 d 1 10000 d 1 100 V V δ N N N N d N N N N 00 d N N N N V V B ( k , y o ) × P D γ 00 γ 1 1 γ V V γ V V Ω ( t , k | y o )
where
Λ n m n m = E α n m ( t , k | y o ) α n m * ( t , k | y o )
δ n m n m = 16 π 2 i ( n n ) Y n m * ( y ^ o ) Y n m ( y ^ o )
d n m n m v u = 16 π 2 i ( n n ) ( 1 ) m ( 2 v + 1 ) ( 2 n + 1 ) ( 2 n + 1 ) 4 π W 1 W 2
with W 1 = v n n 0 0 0 and W 2 = v n n u m m representing the Wigner 3j symbols [37].
The elements in Λ ( t , k | y o ) and B ( k , y o ) can be generated from the α n m ( t , k | y o ) coefficients and source direction information, respectively. Now, we can solve (10) to estimate Ω ( t , k | y o ) by
Ω ^ ( t , k | y o ) = B ( k , y o ) Λ ( t , k | y o )
where [ · ] ^ and [ · ] indicate estimated values and pseudo-inversion operator, respectively. While solving (14), the order of γ v u ( t , k | y o ) in Ω ^ ( t , k | y o ) is truncated to V ( N + 1 ) 4 1 , where · indicate flooring operation, to avoid an underdetermined system [34]. Once the γ v u ( t , k | y o ) coefficients are extracted from Ω ^ ( t , k | y o ) , we can generate the reflection power using Equation (6) for different incoming directions y ^ and time-frequency bins. From P R ( t , k , y ^ | y o ) , we can estimate the total reflected power in any time-frequency bin as
P T ( t , k | y o ) = y ^ P R ( t , k , y ^ | y o ) d y ^ .
Substituting (6) in (15) and using the symmetrical property of spherical harmonics [35] P T ( t , k | y o ) = γ 00 ( t , k | y o ) . We can now use P R ( t , k , y ^ | y o ) and P T ( t , k | y o ) to analyze the reflection power variations with time, frequency, and direction.

3. Experimental Analysis

In this section, we present the analysis of the reflection power response of two rooms from their RIR datasets recorded using an em32 Eigenmike [38], which is a Q = 32 element rigid spherical microphone array of radius R = 0.042 m. Both the RIR datasets were measured using a source signal generated from a directional loudspeaker. The first RIR dataset available from the work in [39] is for a small audio laboratory room of size 3.54 × 4.06 × 2.70 m, hereafter referred to as Room-1. The second RIR dataset from [40] pertains to a larger classroom of size 6.5 × 8.3 × 2.9 m, hereafter referred to as Room-2. According to these datasets, the reverberation time ( T 60 ) of Room-1 and Room-2 are 0.329 s and 1.12 s, respectively. From the datasets, we have selected the RIRs for different source positions in the XY plane, i.e., θ o = 90 at different ϕ o angles, and at 1 m distance from the microphone array center. The direct path component from the source arrives at the receiver around 0.0026 s and 0.0028 s for Room-1 and Room-2, respectively.
From the selected 32-channel RIRs, we obtain H ( x q , y o , t , k ) using the STFT operation with a 1024-sample Hanning window with 50 % overlap, 2048-point fast Fourier transform (FFT), and 48 KHz sampling frequency. We then follow the process described in Section 2.2 to generate P R ( t , k , y ^ | y o ) for 500 uniformly distributed y ^ directions derived from spiral-based sampling [41] ∀ t , k bins in the frequency band of 20 to 1500 Hz. These 500 spiral sampled directions provide sufficient spatial resolution to assimilate the sound reflectivity variations across the room surfaces at a reasonable computation cost. Finally, we estimate P T ( t , k | y o ) for analyzing the time-frequency spectrum of the reflection power of the two rooms. While dealing with the temporal response in the following sections, the 0 s in the time-index indicates the moment of sound event occurrence. However, the reflection power response is calculated only from 0.01 s which is the center of the first STFT frame. This frame size was selected after considering a reasonable time-frequency resolution for proper spectral and temporal analysis of reflections in both rooms.

3.1. Theoretical Background

Here we discuss important theoretical concepts of room acoustics and room response characteristics according to prevalent literature [5,42,43,44] to validate the experimental analysis.

3.1.1. Modal Decay

The reverberation field inside a room leads to the persistence of sound even after the source ceases. The duration of this sound persistence, called the reverberation time R T [5], is the most commonly used measure of room acoustic quality. In practical applications, acousticians calculate R T as the 60 dB decay time since source cessation and is referred to as T 60 [43]. Typically, such estimations assume diffuse sound field conditions and average wall absorption and calculate R T as a single value to characterize the room acoustics. However, in reality, the wall absorption factors change with frequency [5,44], and hence accurate R T estimates should be frequency-dependent. Furthermore, the room architecture, variations in surface materials, and source-receiver properties affect the reflection path length [44] and magnitude, which, in turn, influence the decay of different frequency components. Therefore, decay times should be a function of frequency and direction. Since an analytical solution to decay rate estimation is complex, we can derive them numerically through reflection sound field analysis.

3.1.2. Room Modes

The sound propagation in any acoustic enclosure follows different wave characteristic phenomena like reflection, scattering, diffraction, and interference. Such a complex interaction of innumerous waves is characterized through the acoustical wave equation [5]. The frequencies corresponding to the eigenvalues of the acoustic wave equation can form standing waves inside the room to create a resonant behavior leading to non-uniform distribution of reflection power and extended reverberation [5,43,44]. These frequencies are often referred to as room modes or eigenfrequencies.
According to [5,43], at low frequency ranges, the number of resonant frequencies will be small, and they can be excited individually. Hence, the room response will be quite irregular and anisotropic for these frequencies. When we move towards the higher frequencies, the eigenvalues are densely spaced, so they cannot be independently excited. Even though the higher frequencies contribute to the reflected sound pressure, the lack of independent resonance combined with increased scattering makes them relatively uniform and less prominent compared to the lower frequencies. Hence, in a typical room response, we expect high reflection powers with some resonant peaks for low (<300 Hz) to mid (300 to 600 Hz) audible frequencies and decaying magnitude towards the high (>600 Hz) frequencies. The cross-over frequency [5,43] that separates the resonant low-frequency response and the high-frequency diffused reflections is termed as Schroeder frequency ( ν S ) . It can be calculated using the empirical formula
ν S 2000 T 60 Δ
where Δ is the room volume. From the dimensions and T 60 of the test rooms, (16) gives ν S 184 Hz and 169 Hz for Room-1 and Room-2, respectively.
For a rectangular enclosure, we can calculate the eigenvalues of the wave equation [5,42,43,44] as
ν n x n y n z = c 2 n x l x 2 + n y l y 2 + n z l z 2
where { n x , n y , n z } are non-negative integers and l x × l y × l z are the room dimensions. When two of { n x , n y , n z } equals zero, the solution of (17) gives the axial modes which are considered to be stronger with low decay rates compared to other modes [42]. We can calculate the tangential modes with two non-zero integers in { n x , n y , n z } and oblique modes by substituting all non-zero integers in { n x , n y , n z } .
Figure 2 shows the room mode distribution in Room-1 and Room-2. The axial and tangential modes are calculated from (17), and the line heights in Figure 2 represent the number of resonances occurring at a frequency since different { n x , n y , n z } combinations can result in the same ν n x n y n z frequency. The axial modes were given a higher nominal weight [44] while calculating this distribution due to their inherent prominence. Theoretically, an empty rectangular room of the same dimensions should replicate this trend in their frequency response. However, in a real room environment, the interference of normal modes of different decay rates [44] and the influence of inhomogeneous surfaces and source directivity alter the assumptions behind (17). Therefore, the real room response may vary from the predicted distribution.
For practical validation of the real acoustic phenomenon, we will use the power response generated using the proposed technique to identify the variations in the room mode distribution and modal decays compared to the above theoretical expectations.

3.2. Reflection Power Spectrum

Figure 3 and Figure 4 show the spectrogram of P T ( t , k | y o ) for different source positions in Room-1 and Room-2, respectively. For both rooms, the lower frequencies show some irregular peaks, and the reflection power of late reverberation clearly decays towards the higher frequencies as we predicted in Section 3.1.2. Additionally, the reflection power is maximum in the initial time instants, and then the power decays with time for all frequencies due to surface absorption. It should be noted that the power decay trend is varying with the frequencies due to the frequency-dependent wall impedance property [5]. Apart from some magnitude variations, the time-frequency spectrum trend is maintained for all source positions in both rooms. In the following sections, we will analyze the reflections power variations with frequency and time in more detail.

3.2.1. Frequency Response of Reflection Power

Figure 5 and Figure 6 show the frequency response of time-averaged P T ( t , k | y o ) for different source positions in Room-1 and Room-2, respectively. These figures provide a clear view of the low-frequency peaks and the decay of power towards the higher frequencies. In Room-1, we can observe high powers around 164 Hz, 211 Hz, and 281 Hz before the onset of the power decay. Compared to Figure 2a, 164 Hz and 211 Hz are closer to the theoretical room modes, whereas many other predicted modes do not appear in the observed response in Figure 5. Similarly, some of the observed peaks in Room-2 around 164 Hz, 304 Hz, 328 Hz, and 492 Hz vary from the theoretical room mode estimates shown in Figure 2b. Additionally, the identification of ν S is difficult from these responses, but is clearly greater than the predicted ν S values mentioned in Section 3.1.2. This error is caused by the approximation in (16) by use of frequency-averaged T 60 and from the influence of source directivity.
It should also be noted that there are no substantial variations in the frequency response of Room-2 for different source positions. Additionally, in Room-1, the differences are not drastic as should be expected in a smaller room with significant reverberation. This is the result of the formulation of reflection gains with respect to a common listening position (O) and the separation of the direct path component from the reflections. A direct analysis of the frequency response of RIR will show significant differences with the change in source positions. Therefore, the proposed technique can be used to predict the room response behavior independent of the source positions.

3.2.2. Temporal Response of Reflection Power

Figure 7 and Figure 8 show the temporal response of P T ( t , k | y o ) at different frequencies for different source positions in Room-1 and Room-2, respectively. As evident from these figures, the reflection power decays due to surface absorption, and the decay trend is similar for all source positions. Since the damping constants of room surfaces are frequency-dependent, each frequency in Figure 7 and Figure 8 decays at different rates. The lower frequencies like 70 Hz, 141 Hz, and 211 Hz have slower decay rates compared to the other frequencies. As we move from 281 Hz to 633 Hz in Figure 7 and Figure 8, the decay rate stabilizes towards the higher frequencies. Furthermore, the decay of higher frequencies is nearly linear, whereas the lower frequencies (70 Hz to 211 Hz) exhibit a non-linear decay, especially in Room-2. This can be attributed to the highly non-uniform power distribution of the lower frequency resonant modes, which leads to the concentration of sound absorption to certain surfaces [42,43]. In comparison, the high frequencies have more diffused distribution of reflection power, and hence the decay behavior is averaged over broader surface areas.

3.3. Decay Time

From the time-frequency spectrum of reflection power, we can estimate the decay time of each frequency to predict the strong room modes in a real room environment. Figure 9 and Figure 10 show the 60 dB decay time of each frequency estimated from the P T ( t , k | y o ) values for different source positions in Room-1 and Room-2, respectively. Even though the temporal response at each frequency in Figure 7 and Figure 8 seems relatively independent of the source positions, the decay times of the frequencies is slightly different for each source position according to Figure 9 and Figure 10. The average decay time, maximum decay time, and the corresponding frequency for each source position in both rooms are summarized under Table 1. We can say that the strongest modes in Room-1 are ≈140 Hz, ≈164 Hz, and ≈258 Hz, which are closer to the peak power frequencies observed in Figure 5. However, in Room-2, the frequencies with maximum decay time are different from the frequencies with maximum reflection power. Hence, we need a deeper insight into the directional variations of power and decay time which we will analyze in the next section.

3.4. Directional Decays and Dominant Reflection Directions

As we discussed in Section 3.1.1, decay times are a function of frequency and direction. Additionally, from Section 3.3, we found that the modes with higher decay times can be different from the modes with high reflection powers. Therefore, a more comprehensive analysis of the spatial spectrum of these reflections is necessary to identify room surfaces causing the observed behaviors for the frequencies of interest. Figure 11a,b shows the directional decay times of Room-1 for y o = ( 1 , 90 , 40 ) and y o = ( 1 , 90 , 120 ) , respectively, obtained from the 60 dB decay time of P R ( t , k , y ^ | y o ) in each y ^ direction. Figure 12a,b shows the directions with high reflection powers in Room-1 for y o = ( 1 , 90 , 40 ) and y o = ( 1 , 90 , 120 ) , respectively. The letters indicated near the locations of highest reflection powers in Figure 12 are coarsely mapped onto the real Room-1 environment in Figure 13. As evident from this figure, the locations around ‘A’, ‘C’, ‘D’, and ‘E’ have glass surfaces with high reflectivity, and hence the observed dominant power directions are valid. Furthermore, there is no evident pattern between the distributions in Figure 11 and Figure 12 for the given modal frequencies, and hence the feature predictions based on computational room acoustic models can be imprecise. In such cases, we can employ the proposed technique to reproduce authentic spatio-temporal room responses.
According to Figure 11 and Figure 12, the directions of high decay times and dominant reflections are different from each other for every frequency and source position. Even though the dominant reflection locations and directional decay distribution have many common factors of influence, the reflection power in a direction strongly depends on the source directivity and source-to-wall distance, whereas the directional decay is mainly a function of the wall impedance coefficients and reflection paths. Hence, as seen in Figure 11, the directional decay will be different between the frequencies due to wall impedance variations, as well as for different source positions due to change in reflection path. In contrast, if we observe Figure 12 and Figure 13, the dominant reflection locations ’A’, ’B’, and ’C’ have similar azimuth values and source-to-wall distance when the source is at y o = ( 1 , 90 , 40 ) . Likewise, the elevation values of the dominant reflection locations ’D’ and ’E’ in Figure 12b are nearly the same when the source is at y o = ( 1 . 90 , 120 ) . Additionally, the location of ’F’ is in the close vicinity of the source position. Thus, the dominant reflection locations are principally determined by the source position and source directivity. For locations with same source-to-wall distance, the dominant reflections will depend on the reflectivity of the surface materials.
Based on the above observations, the analysis of both directional decay and directional power is essential in characterizing the room reflections. This is particularly important while managing the features of early reflections and late reverberations to achieve desired perception quality. Since the early reflections undergo very few boundary reflections [45], they are mainly defined by the source directivity and source-to-wall distance. Hence, we can use the dominant reflection directions to characterize the behavior of early reflections. The late reverberation undergoes multiple boundary reflections, and they are integrated both spatially and temporally before reaching the receiver [45]. Since the late reverberation characteristics are primarily characterized by the surface absorption and room shape [45,46], we can analyze the directional decay rates to study their behavior. We can further visualize the power spectrum of P R ( t , k , y ^ | y o ) across time for an extensive analysis of the variations in the anisotropic spatial properties between the early reflections and late reverberations.
The precise knowledge of frequencies and surfaces contributing to the salient features of these reflections will be useful for defining the perceptual targets for modal control methods [47], optimizing room mode redistribution to improve acoustic quality [48], and devising active [49] and passive [50,51] room acoustic treatment methods.

4. Conclusions

In this paper, we presented a reflection power response estimation technique utilizing the spatial correlation of higher-order eigenbeams derived from spherical microphone array measurements. The formulation of the reflection gain as a function of time, frequency, and direction helps in comprehending a faithful room response for a realistic non-diffuse sound field. The experimental results validate the frequency response and temporal response of the reflection power against the theoretical expectations.
The proposed technique can estimate the resonant frequencies and modal decays caused by directional speakers and complex room environments. Furthermore, the directional decay times and dominant reflection directions facilitate the distinction of early and late reflection features. The insights from this room acoustic evaluation technique will be beneficial in controlling the acoustic quality while designing performance spaces. Particularly, the findings from this method will be more reliable than computational room models while deciding acoustic treatment schemes compatible with the source directivity. Additionally, the room mode features identified from this method can be incorporated in spectral equalization algorithms to improve speech intelligibility and remove audible artifacts. The dominant reflection locations and directional decay spectrum can aid in the inference of room geometry and calibration of the room acoustics in virtual reality-based rendering of heritage sites.
The method can also be adapted for blind estimation of the discussed characteristics from the direct processing of microphone recordings for any arbitrary source signal, since we can separate the reflected power from the direct path power. Moreover, apart from spherical microphone arrays, any arbitrary array designs that can generate accurate spatial sound field coefficients can be integrated with the proposed algorithm. The future work shall expand the method to include multiple sources in noisy environments to conceive more real-world applications.

Author Contributions

Conceptualization, A.B., T.D.A. and J.Z.; Methodology, A.B., T.D.A. and J.Z.; Software, A.B.; Formal analysis, A.B.; Investigation, A.B.; Validation, A.B., T.D.A. and J.Z.; Resources, A.B., T.D.A. and J.Z.; Writing—original draft preparation, A.B.; Writing—review and editing, T.D.A. and J.Z.; Visualization, A.B.; Supervision, T.D.A. and J.Z.; Project administration, T.D.A.; Funding acquisition, T.D.A.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Australian Research Council (ARC) Discovery Project Grant No. DP180102375.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RIRRoom impulse response
PWDPlane-wave decomposition
EB-SPRITEigenbeam rotational invariance technique
LTILinear time invariant
RTFRoom transfer function
STFTShort time Fourier transform
FFTFast Fourier transform

References

  1. Morse, P.M.; Bolt, R.H. Sound waves in rooms. Rev. Mod. Phys. 1944, 16, 69. [Google Scholar] [CrossRef]
  2. Karjalainen, M.; Antsalo, P.; Makivirta, A.; Valimaki, V.; Peltonen, T. Estimation of Modal Decay Parameters from Noisy Response Measurements. J. Audio Eng. Soc. 2002, 50, 5290. [Google Scholar]
  3. Stewart, R.; Sandler, M. Statistical measures of early reflections of room impulse responses. In Proceedings of the Conference Digital Audio Effects (DAFx-07), Bordeaux, France, 10–15 September 2007; pp. 59–62. [Google Scholar]
  4. Long, M. Architectural Acoustics; Elsevier: Amsterdam, The Netherlands, 2005. [Google Scholar]
  5. Kuttruff, H. Room Acoustics, 6th ed.; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  6. Allen, J.B.; Berkley, D.A. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 1979, 65, 943–950. [Google Scholar] [CrossRef]
  7. Lehmann, E.A.; Johansson, A.M. Prediction of energy decay in room impulse responses simulated with an image-source model. J. Acoust. Soc. Am. 2008, 124, 269–277. [Google Scholar] [CrossRef] [Green Version]
  8. Hamilton, B. Finite Difference and Finite Volume Methods for Wave-Based Modelling of Room Acoustics. 2016. Available online: https://www.researchgate.net/profile/Brian-Hamilton-5/publication/310902744_Finite_Difference_and_Finite_Volume_Methods_for_Wave-based_Modelling_of_Room_Acoustics/links/583acf2a08ae3a74b4a01683/Finite-Difference-and-Finite-Volume-Methods-for-Wave-based-Modelling-of-Room-Acoustics.pdf (accessed on 27 May 2021).
  9. Lyon, R.H. Theory and Application of Statistical Energy Analysis; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
  10. Kim, H.; Hernaggi, L.; Jackson, P.J.; Hilton, A. Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360° Images. In Proceedings of the Virtual Reality 3D User Interfaces, Osaka, Japan, 23–27 March 2019; pp. 120–126. [Google Scholar]
  11. Remaggi, L.; Neidhardt, A.; Hilton, A.; Philip, J.B.J. Perceived quality and spatial impression of room reverberation in VR reproduction from measured images and acoustics. In Proceedings of the 23rd International Congress Acoustics, Aachen, Germany, 9–13 September 2019. [Google Scholar]
  12. Samarasinghe, P. Modal based Solutions for the Acquisition and Rendering of Large Spatial Soundfields. Ph.D. Thesis, College of Engineering and Computer Science, Australian National University, Canberra, Australia, 2014. [Google Scholar]
  13. Schroeder, M.R. Measurement of sound diffusion in reverberation chambers. J. Acoust. Soc. Am. 1959, 31, 1407–1414. [Google Scholar] [CrossRef]
  14. Broadhurst, A. An acoustic telescope for architectural acoustic measurements. Acta Acust. United Acust. 1980, 46, 299–310. [Google Scholar]
  15. Yamasaki, Y.; Itow, T. Measurement of spatial information in sound fields by closely located four point microphone method. J. Acoust. Soc. Jpn. (E) 1989, 10, 101–110. [Google Scholar] [CrossRef] [Green Version]
  16. Merimaa, J.; Lokki, T.; Peltonen, T.; Karjalainen, M. Measurement, Analysis, and Visualization of Directional Room Responses. In Proceedings of the Audio Engineering Society Convention, New York, NY, USA, 21–24 September 2001. [Google Scholar]
  17. Ward, D.B.; Abhayapala, T.D. Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans. Speech Audio Process. 2001, 9, 697–707. [Google Scholar] [CrossRef] [Green Version]
  18. Gover, B.N.; Ryan, J.G.; Stinson, M.R. Measurements of directional properties of reverberant sound fields in rooms using a spherical microphone array. J. Acoust. Soc. Am. 2004, 116, 2138–2148. [Google Scholar] [CrossRef] [Green Version]
  19. Park, M.; Rafaely, B. Sound-field analysis by plane-wave decomposition using spherical microphone array. J. Acoust. Soc. Am. 2005, 118, 3094–3103. [Google Scholar] [CrossRef]
  20. Tervo, S.; Korhonen, T.; Lokki, T. Estimation of reflections from impulse responses. Build. Acoust. 2011, 18, 159–173. [Google Scholar] [CrossRef]
  21. Hioka, Y.; Niwa, K.; Sakauchi, S.; Furuya, K.; Haneda, Y. Estimating Direct-to-Reverberant Energy Ratio Using D/R Spatial Correlation Matrix Model. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 2374–2384. [Google Scholar] [CrossRef]
  22. Alary, B.; Massé, P.; Välimäki, V.; Noisternig, M. Assessing the anisotropic features of spatial impulse responses. In Proceedings of the EAA Spatial Audio Signal Processing Symposium, Paris, France, 6–7 September 2019; pp. 43–48. [Google Scholar]
  23. Nolan, M.; Berzborn, M.; Fernandez-Grande, E. Isotropy in decaying reverberant sound fields. J. Acoust. Soc. Am. 2020, 148, 1077–1088. [Google Scholar] [CrossRef]
  24. Berzborn, M.; Nolan, M.; Fernandez-Grande, E.; Vorländer, M. On the directional properties of energy decay curves. In Proceedings of the 23rd International Congress Acoustics, Aachen, Germany, 9–13 September 2019. [Google Scholar]
  25. Schroeder, M.R. New method of measuring reverberation time. J. Acoust. Soc. Am. 1965, 37, 1187–1188. [Google Scholar] [CrossRef]
  26. Abhayapala, T.D.; Ward, D.B. Theory and design of high order sound field microphones using spherical microphone array. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; Volume 2, pp. 1949–1952. [Google Scholar]
  27. Poletti, M.A. Three-dimensional surround sound systems based on spherical harmonics. J. Audio Eng. Soc. 2005, 53, 1004–1025. [Google Scholar]
  28. Lovedee-Turner, M.; Murphy, D. Three-dimensional reflector localisation and room geometry estimation using a spherical microphone array. J. Acoust. Soc. Am. 2019, 146, 3339–3352. [Google Scholar] [CrossRef]
  29. Rafaely, B.; Balmages, I.; Eger, L. High-resolution plane-wave decomposition in an auditorium using a dual-radius scanning spherical microphone array. J. Acoust. Soc. Am. 2007, 122, 2661–2668. [Google Scholar] [CrossRef]
  30. Rafaely, B.; Peled, Y.; Agmon, M.; Khaykin, D.; Fisher, E. Spherical Microphone Array Beamforming. In Speech Processing in Modern Communication: Challenges and Perspectives; Cohen, I., Benesty, J., Gannot, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 281–305. [Google Scholar] [CrossRef]
  31. Sun, H.; Mabande, E.; Kowalczyk, K.; Kellermann, W. Joint DOA and TDOA estimation for 3D localization of reflective surfaces using eigenbeam MVDR and spherical microphone arrays. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 113–116. [Google Scholar] [CrossRef]
  32. Kereliuk, C.; Herman, W.; Wedelich, R.; Gillespie, D.J. Modal analysis of room impulse responses using subband ESPRIT. In Proceedings of the International Conference Digital Audio Effects (DAFx-18), Aveiro, Portugal, 4–8 September 2018. [Google Scholar]
  33. Samarasinghe, P.N.; Abhayapala, T.D.; Chen, H. Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 25, 310–319. [Google Scholar] [CrossRef] [Green Version]
  34. Samarasinghe, P.N.; Abhayapala, T.D. Blind estimation of directional properties of room reverberation using a spherical microphone array. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 351–355. [Google Scholar]
  35. Williams, E.G. Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography; Academic Press: Cambridge, MA, USA, 1999. [Google Scholar]
  36. Rafaely, B. Fundamentals of Spherical Array Processing; Springer: Berlin/Heidelberg, Germany, 2015; Volume 8. [Google Scholar]
  37. Olver, F.W.; Lozier, D.W.; Boisvert, R.F.; Clark, C.W. NIST Handbook of Mathematical Functions; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
  38. em32 Eigenmike® Microphone Array Release Notes (v17.0). Available online: https://mhacoustics.com/sites/default/files/ReleaseNotes.pdf (accessed on 6 April 2021).
  39. Birnie, L.I.; Abhayapala, T.D.; Samarasinghe, P.N. Reflection Assisted Sound Source Localization through a Harmonic Domain MUSIC Framework. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 28, 279–293. [Google Scholar] [CrossRef]
  40. Olgun, O.; Hacihabiboglu, H. METU SPARG Eigenmike em32 Acoustic Impulse Response Dataset v0.1.0. Available online: http://doi.org/10.5281/zenodo.2635758 (accessed on 28 September 2020).
  41. Semechko, A. Suite of Functions to Perform Uniform Sampling of a Sphere. 2020. Available online: https://www.mathworks.com/matlabcentral/fileexchange/37004-suite-of-functions-to-perform-uniform-sampling-of-a-sphere (accessed on 20 July 2020).
  42. Cox, T.J.; D’Antonio, P.; Avis, M.R. Room sizing and optimization at low frequencies. J. Audio Eng. Soc. 2004, 52, 640–651. [Google Scholar]
  43. Crocker, M.J. Handbook of Noise and Vibration Control; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  44. Everest, F.A. Master Handbook of Acoustics. J. Acoust. Soc. Am. 2001, 110, 1714–1715. [Google Scholar] [CrossRef]
  45. Schimmel, S.M.; Muller, M.F.; Dillier, N. A fast and accurate “shoebox” room acoustics simulator. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 241–244. [Google Scholar]
  46. Izumi, Y.; Otani, M. Relation between Direction-of-Arrival distribution of reflected sounds in late reverberation and room characteristics: Geometrical acoustics investigation. Appl. Acoust. 2021, 176, 107805. [Google Scholar] [CrossRef]
  47. Fazenda, B.M.; Stephenson, M.; Goldberg, A. Perceptual thresholds for the effects of room modes as a function of modal decay. J. Acoust. Soc. Am. 2015, 137, 1088–1098. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  48. Papadopoulos, C.I. Redistribution of the low frequency acoustic modes of a room: A finite element-based optimisation method. Appl. Acoust. 2001, 62, 1267–1285. [Google Scholar] [CrossRef]
  49. Fazenda, B.; Wankling, M.; Hargreaves, J.; Elmer, L.; Hirst, J. Subjective preference of modal control methods in listening rooms. J. Audio Eng. Soc. 2012, 60, 338–349. [Google Scholar]
  50. Fuchs, H.; Lamprecht, J. Covered broadband absorbers improving functional acoustics in communication rooms. Appl. Acoust. 2013, 74, 18–27. [Google Scholar] [CrossRef]
  51. Cox, T.; d’Antonio, P. Acoustic Absorbers and Diffusers: Theory, Design and Application; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
Figure 1. Geometric illustration of the spherical microphone array centered at the coordinate origin and the single sound source located at y o = ( r o , θ o , ϕ o ) .
Figure 1. Geometric illustration of the spherical microphone array centered at the coordinate origin and the single sound source located at y o = ( r o , θ o , ϕ o ) .
Applsci 11 07688 g001
Figure 2. Room mode distribution in (a) Room-1 (b) Room-2.
Figure 2. Room mode distribution in (a) Room-1 (b) Room-2.
Applsci 11 07688 g002
Figure 3. Reflection power response of Room-1 for different source positions.
Figure 3. Reflection power response of Room-1 for different source positions.
Applsci 11 07688 g003
Figure 4. Reflection power response of Room-2 for different source positions.
Figure 4. Reflection power response of Room-2 for different source positions.
Applsci 11 07688 g004
Figure 5. Reflection power with frequency for different source positions in Room-1.
Figure 5. Reflection power with frequency for different source positions in Room-1.
Applsci 11 07688 g005
Figure 6. Reflection power with frequency for different source positions in Room-2.
Figure 6. Reflection power with frequency for different source positions in Room-2.
Applsci 11 07688 g006
Figure 7. Reflection power with time for different frequencies and source positions in Room-1.
Figure 7. Reflection power with time for different frequencies and source positions in Room-1.
Applsci 11 07688 g007
Figure 8. Reflection power with time for different frequencies and source positions in Room-2.
Figure 8. Reflection power with time for different frequencies and source positions in Room-2.
Applsci 11 07688 g008
Figure 9. Decay time with frequency for different source positions in Room-1.
Figure 9. Decay time with frequency for different source positions in Room-1.
Applsci 11 07688 g009
Figure 10. Decay time with frequency for different source positions in Room-2.
Figure 10. Decay time with frequency for different source positions in Room-2.
Applsci 11 07688 g010
Figure 11. Directional decay times inside Room-1 for the peak frequencies when source is located at (a) y o = ( 1 , 90 , 40 ) (b) y o = ( 1 , 90 , 120 ) .
Figure 11. Directional decay times inside Room-1 for the peak frequencies when source is located at (a) y o = ( 1 , 90 , 40 ) (b) y o = ( 1 , 90 , 120 ) .
Applsci 11 07688 g011
Figure 12. Dominant reflection directions inside Room-1 for the peak frequencies when source is located at (a) y o = ( 1 , 90 , 40 ) (b) y o = ( 1 , 90 , 120 ) .
Figure 12. Dominant reflection directions inside Room-1 for the peak frequencies when source is located at (a) y o = ( 1 , 90 , 40 ) (b) y o = ( 1 , 90 , 120 ) .
Applsci 11 07688 g012aApplsci 11 07688 g012b
Figure 13. Mapping of dominant reflection directions in Room-1. The letters A to C and D to F represent the directions of highest reflection powers with respect to Figure 12a,b, respectively.
Figure 13. Mapping of dominant reflection directions in Room-1. The letters A to C and D to F represent the directions of highest reflection powers with respect to Figure 12a,b, respectively.
Applsci 11 07688 g013
Table 1. Maximum and average decay times in Room-1 and Room-2.
Table 1. Maximum and average decay times in Room-1 and Room-2.
In RoomSource PositionMaximum Decay Time (s)Frequency (Hz) with Maximum Decay TimeAverage Decay Time (s)
Room-1 y o = ( 1 , 90 , 40 ) 0.41141400.2822
y o = ( 1 , 90 , 120 ) 0.45401640.2899
y o = ( 1 , 90 , 200 ) 0.48231400.2995
y o = ( 1 , 90 , 280 ) 0.43982580.2936
Room-2 y o = ( 1 , 90 , 0 ) 1.13494920.8133
y o = ( 1 , 90 , 90 ) 1.10663280.8288
y o = ( 1 , 90 , 180 ) 1.04983280.8341
y o = ( 1 , 90 , 270 ) 1.06405860.8182
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bastine, A.; Abhayapala, T.D.; Zhang, J. Power Response and Modal Decay Estimation of Room Reflections from Spherical Microphone Array Measurements Using Eigenbeam Spatial Correlation Model. Appl. Sci. 2021, 11, 7688. https://doi.org/10.3390/app11167688

AMA Style

Bastine A, Abhayapala TD, Zhang J. Power Response and Modal Decay Estimation of Room Reflections from Spherical Microphone Array Measurements Using Eigenbeam Spatial Correlation Model. Applied Sciences. 2021; 11(16):7688. https://doi.org/10.3390/app11167688

Chicago/Turabian Style

Bastine, Amy, Thushara D. Abhayapala, and Jihui (Aimee) Zhang. 2021. "Power Response and Modal Decay Estimation of Room Reflections from Spherical Microphone Array Measurements Using Eigenbeam Spatial Correlation Model" Applied Sciences 11, no. 16: 7688. https://doi.org/10.3390/app11167688

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop