Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment

Gößwein, Jonathan Albert; Grosse, Julian; Van de Par, Steven

doi:10.3390/app7060541

Open AccessArticle

Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment

by

Jonathan Albert Gößwein

^*,

Julian Grosse

and

Steven Van de Par

Acoustics Group, Cluster of Excellence “Hearing4All”, Carl von Ossietzky University, 26111 Oldenburg, Germany

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2017, 7(6), 541; https://doi.org/10.3390/app7060541

Submission received: 15 March 2017 / Revised: 15 May 2017 / Accepted: 17 May 2017 / Published: 24 May 2017

(This article belongs to the Special Issue Spatial Audio)

Download

Browse Figures

Versions Notes

Abstract

:

State-of-the-art stereo recording techniques using two microphones have two main disadvantages: first, a limited reduction of the reverberation in the direct sound component, and second, compression or expansion of the angular position of sound sources. To address these disadvantages, the aim of this study is the development of a true stereo recording microphone array that aims to record the direct and reverberant sound field separately. This array can be used within the recording and playback configuration developed in Grosse and van de Par, 2015. Instead of using only two microphones, the proposed method combines two logarithmically-spaced microphone arrays, whose directivity patterns are optimized with a superdirective beamforming algorithm. The optimization allows us to have a better control of the overall beam pattern and of interchannel level differences. A comparison between the newly-proposed system and existing microphone techniques shows a lower percentage of the recorded reverberance within the sound field.

Keywords:

intensity-based stereo-recording; convex optimization; superdirective beamformer; white noise gain; logarithmic array design; spatial audio

1. Introduction

Sound reproduction systems play an important role in our everyday life. They allow us to listen to recordings from a different place and a past time. Many different methods for the recording and playback of sound exist, utilizing different combinations of microphone and loudspeaker setups. The most common one is a simple stereo reproduction, but there are more complex reproduction techniques, such as wave field synthesis [1] or ambisonics [2]. Even though the state-of-the-art methods achieve a very good accuracy in reproducing sound fields, they do not consider the interaction between the acoustics of the recording and playback environment. In particular, extra reverberation is created by the playback environment, and in addition, there is no control over the spatial distribution of the reverberant sound field, which may influence the apparent source width and perceived listener envelopment. For this reason, ongoing investigations aim to improve the performance of these methods.

In particular, Grosse and van de Par proposed a new way of recording and playing back sound fields [3]. The main idea behind their research was to record the direct and reverberant sound field separately in order to be able to render it in a playback room while optimizing certain perceptually-motivated criteria for the authentic audio reproduction. These criteria aim for recreating the reverberant sound field in the playback environment as faithfully as possible by optimizing the amount and spectral shape of the reverberation, as well as the interaural cross-correlation created by the reproduced reverberant sound field, such as it is created in the reproduction room, including its added reverberant effect. In their paper, Grosse and van de Par assumed that optimizing these perceptual criteria is sufficient for an authentic reproduction of the sound field present in the recording room, which is created by a single source. This claim was supported by subjective evaluations. The playback and recording configuration can be seen in Figure 1. In addition to the two basic stereo loudspeakers, the proposed approach used two dipole loudspeakers to excite and equalize the reverberant sound field. For the optimized rendering, the system relies on the presence of a relatively dry direct signal to be rendered on the frontal loudspeakers and a reverberant signal to be optimized and rendered on the dipole loudspeakers. To record the direct sound, a microphone

(C)

was positioned close to the sound source. This also avoided early reflections, which could cause a change in coloration [4,5]. For recording the reverberant sound field, two microphones

(B^{l}, B^{r})

were placed at two distant positions in the diffuse field.

Since the method of Grosse and van de Par [3] until now is limited to a single source and only records the direct sound field with one microphone, an extension is needed to also represent the spatial distribution of sources within the direct sound field signals as perceived at the listener position. Although this could in principle be achieved by using multiple close microphones and an appropriate mixing scheme, in this contribution, we want to provide a method with only a single `true-stereo’ microphone setup that is placed at the intended listener position within the recording room. Particular attention has to be paid to reduce the reverberant sound field in the direct sound field signals to be able to separately optimize the rendering of the direct and reverberant sound fields according to perceptual criteria within the playback room [3].

Although the specific design criteria for the proposed microphone array are envisioned to be used in the audio reproduction system of Grosse and van de Par [3], it can also be considered to use the proposed microphone array to record a relatively dry spatial image of the sound sources on stage to be combined with a reverberant track that can be mixed at a level that the recording engineer deems suitable. In this case, however, it will not necessarily fulfill the optimization criteria as formulated in Grosse and van de Par [3] that create a faithful audio reproduction.

The state-of-the-art true stereo systems combine two microphones with a characteristic directivity pattern, placed at different distances and under different angles relative to one another. Depending on these parameters, a deviating spatial rendering of the distributed sources can be observed [6]. Despite this, for use in the method proposed by Grosse and van de Par [3], these systems have some disadvantages that make them unsuitable to be implemented in this specific sound reproduction system because there is a high percentage of recorded reverberant sound, which should be avoided in the system of [3].

We overcome these disadvantages with the development of a new method of a true stereo microphone array, using a superdirective beamforming algorithm that is applied on two logarithmically-spaced microphone arrays. Correct, frequency-dependent interchannel level differences are captured by optimizing the shape of the two main lobes of the arrays. Together, they create the proper interchannel level difference required for an accurate spatial reproduction of the sound field while ensuring that no interchannel phase differences occur that can result in unintended changes in the perceived location of sound sources. Additionally, an optimal side lobe suppression is applied to reduce the influence of the reverberant sound field on the recording of the direct sound. This proposed stereo microphone array is compared to the state-of-the-art stereo microphone configurations mentioned earlier that shows a clearly reduced level of the reverberant sound field.

2. Methods

The following section is divided into five parts. The first Section 2.1 gives a brief introduction to the most relevant theory on beamforming needed for our proposed method. Section 2.2 focuses on the issue of the robustness of beamforming algorithms. The desired directivity pattern is specified in Section 2.3, which is based on a stereo intensity-panning rule related to the auditory processing of the interaural level differences. Section 2.4 introduces an optimal array design to suppress side lobes and, in this way, reduce the influence of the reverberant sound field on the recording of the direct sound. Further, a specific filter design is proposed in Section 2.5, which will be used and evaluated throughout this study. The design is based on a superdirective beamforming algorithm and describes how the directivity pattern that is specified in Section 2.3 can be used for the optimization.

2.1. Beamforming

Beamforming describes the process of forming the directivity pattern of several microphones, which are arranged into an array, with signal processing techniques to obtain a specific, frequency-dependent directivity pattern. The directivity pattern

b (f, ϕ)

of a linear discrete microphone array, consisting of N microphones, is calculated as follows [7]:

b (f, ϕ) = \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} w_{n} (f) G_{n} (f, ϕ)

(1)

where

ϕ

denotes the angle ranging from

- π

to

π

, f the frequency,

w_{n} (f)

the frequency-dependent complex weighting filtering applied to microphone n and

G_{n} (f, ϕ)

the steering vector denoting the direction and frequency-dependent transfer function from the sound source to microphone n. Such a microphone array is illustrated in Figure 2.

Assuming a far field condition with the microphones that have an omnidirectional directivity pattern, the transfer function states:

G_{n} (f, ϕ) = e^{- i \frac{2 π f}{c} x_{n} cos (ϕ)}

(2)

where c is the speed of sound and

x_{n}

represents the distance of the n-th microphone to the center of the array [7].

The influence on the directivity patterns of the microphones in the array can be taken into account by changing the transfer function

G_{n}

. The filter optimization used to match the directivity pattern of the array with a desired one is called beamforming. The look direction of the microphone array is defined as the angle of the main lobe of the desired directivity pattern, which is also called the steering angle.

There are several beamforming algorithms based on an analytic solution for the optimal filter

w_{n} (f)

and some others on a numerical approximation. Analytic solutions allow us to set N constraints on the directivity pattern for a finite number of frequencies, as for example described in [8]. Since we have a higher number of constraints in our problem, we will use numerical methods that allow accommodating a higher number of constraints to control the directivity pattern.

Equation (1) will be solved numerically, and for this purpose, the frequency range is discretized into P frequencies

f_{p}, p = 0, \dots, P - 1

and the angular range into M angles

ϕ_{m}, m = 0, \dots, M - 1

:

b (f_{p}, ϕ_{m}) = \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} w_{n} (f_{p}) G_{n} (f_{p}, ϕ_{m})

(3)

Equation (3) is reformulated in matrix notation as:

b_{m} (f_{p}) = G_{m n} (f_{p}) w_{n} (f_{p})

(4)

where the directivity pattern is an

M \times 1

vector

b_{m}^{T} (f_{p}) = [b (f_{p}, ϕ_{0}), b (f_{p}, ϕ_{1}), \dots, b (f_{p}, ϕ_{M - 1}]

, the transfer function an

M \times N

matrix

{[G (f_{p})]}_{m n} = e^{- i \frac{2 π f_{p}}{c} x_{n} cos (ϕ_{m})}

and the filter a

N \times 1

vector

w_{n} (f_{p}) = {[w_{- \frac{N - 1}{2}} (f_{p}), w_{- \frac{N - 3}{2}} (f_{p}), \dots, w_{\frac{N - 1}{2}} (f_{p})]}^{T}

[7]. All bold variables are either vectors or matrices in the remainder of this manuscript.

2.2. Robustness and White Noise Gain

One of the problems that beamforming algorithms often have is their lack of robustness. This property is related to a resistance to the presence of spatially white noise and can be impaired by deviations from the specified microphone characteristics and microphone position errors. These imperfections affect the beamformer in a manner similar to a recorded spatially white noise that is amplified. Hence, the White Noise Gain (WNG) is a measure commonly used for quantifying the robustness of a beamformer design. The WNG shows the ability of a beamformer to suppress spatial white noise, because it expresses the gain of the beamformer in the desired look direction relative to the amplification of spatially white noise.

The WNG

A (f_{p})

is defined as follows:

A (f_{p}) = \frac{{|b_{s t e e r} (f_{p})|}^{2}}{w_{n}^{H} (f_{p}) w_{n} (f_{p})}

(5)

where

b_{s t e e r} (f_{p})

denotes the value of the directivity pattern in steer direction [7]. A high value of the WNG

A (f_{p}) > 1

corresponds to a robust beamforming design, whereas a small value

A (f_{p}) < 1

effectively corresponds to an amplification of spatial white noise [7]. The maximum possible value of the WNG is equal to the number of microphones used:

max (A (f_{p})) = N

(6)

which corresponds to a uniform filter [7]:

| w_{n} (f_{p}) | = \frac{1}{N}

(7)

2.3. Desired Directivity Pattern

The playback of the recorded signals should be in a stereophonic configuration, as mentioned in Section 1 and illustrated in Figure 3a.

The playback approach proposed by Grosse and van de Par [3] uses two loudspeakers for the direct sound reproduction with a typical base angle of

ϕ_{b a s e} = 60^{\circ}

relative to the listener’s position [9]. There are several approaches to shift a phantom source from one loudspeaker to the other, utilizing phase differences

Δ p h a s e = p h a s e_{1} - p h a s e_{2}

and/or level differences (amplitude panning)

Δ L e v e l = L e v e l_{1} - L e v e l_{2}

applied on the two loudspeaker signals.

Based on this playback configuration, the recording configuration presented in this paper consists of two crossed end-fire microphone arrays with a

60^{\circ}

opening angle, sharing one center microphone and using omnidirectional microphones, illustrated in Figure 3b. The microphone positions in this figure can only be considered as a sketch, the absolute positions can be found in Section 3. The phantom-source shifting approaches of the playback configuration can be used to formulate either the correct phase and/or level differences between the two arrays. In this way, the perceived location of the sound source in the playback situation is identical to the one of the recording provided that the distribution of recorded sound sources does not span more than

60^{\circ}

of angle. Although not evaluated here, in principle, a different opening angle could be used for the microphone arrays, thus effectively compressing or expanding the reproduced sound stage. We restrict our proposed method to have only level differences, and for this reason, the desired directivity pattern

\hat{b}

is purely real valued. With this desired directivity pattern, the phase of the directivity pattern is mainly controlled by the array design, which will be explained in Section 2.4.

In this paper, the phantom source shifting approach of amplitude panning is used for formulating the desired directivity pattern of Array 1

{\hat{b}}_{a r r a y 1}

and Array 2

{\hat{b}}_{a r r a y 2}

[9]:

\begin{matrix} {\hat{b}}_{a r r a y 1} (ϕ_{δ}) = \sqrt{{(1 + {(\frac{tan (ϕ_{δ}) - tan (ϕ_{b} / 2)}{tan (ϕ_{δ}) + tan (ϕ_{b} / 2)})}^{2})}^{- 1}} \\ {\hat{b}}_{a r r a y 2} (ϕ_{δ}) = \sqrt{{(1 + {(\frac{tan (ϕ_{δ}) + tan (ϕ_{b} / 2)}{tan (ϕ_{δ}) - tan (ϕ_{b} / 2)})}^{2})}^{- 1}} \end{matrix}

(8)

The angle area

ϕ_{δ}

between both arrays is defined by:

ϕ_{δ} = \{ϕ_{m} | - ϕ_{b} / 2 \leq ϕ_{m} \leq ϕ_{b} / 2\}

(9)

with the constant

ϕ_{b} = ϕ_{b a s e} = 60^{\circ}

. The derivation of the desired directivity patterns according to [9] gives two possible recording room assumptions: an anechoic chamber or a real room. The latter one is chosen for Equation (8) since the microphone array configuration will be used in real rooms, such as concert halls.

The desired directivity pattern of the one array is the mirror-flipped version of the other array. This symmetry of the recording configuration makes it possible to formulate one desired directivity pattern, which is the same for both arrays. The following parts of the desired directivity pattern, the first

{\hat{b}}_{b e a m}

valid for the beam area and the second

{\hat{b}}_{s t e e r}

valid for the steering angle, consider a microphone array aligned on the

0^{\circ}

axis corresponding to the steering angle

ϕ_{s t e e r} = 0^{\circ}

:

{\hat{b}}_{b e a m} = \{\begin{matrix} \sqrt{{(1 + {(\frac{tan (ϕ + ϕ_{b} / 2) - tan (ϕ_{b} / 2)}{tan (ϕ + ϕ_{b} / 2) + tan (ϕ_{b} / 2)})}^{2})}^{- 1}} & for - ϕ_{b} \leq ϕ < 0^{\circ} \\ \sqrt{{(1 + {(\frac{tan (ϕ - ϕ_{b} / 2) + tan (ϕ_{b} / 2)}{tan (ϕ - ϕ_{b} / 2) - tan (ϕ_{b} / 2)})}^{2})}^{- 1}} & for 0^{\circ} < ϕ \leq ϕ_{b} \end{matrix}

(10)

{\hat{b}}_{s t e e r} (ϕ_{s t e e r} = 0^{\circ}) = 1

(11)

In the following subsections, an optimal array design in terms of optimal microphone positions and an optimal filter design is proposed to achieve the desired directivity pattern.

2.4. Array Design

The positions of the microphones have an influence both on the filter

w_{n} (f_{p})

and the transfer function

G_{m n} (f_{p})

, and thus, on the directivity pattern itself. The optimal microphone positions selected for this paper maximize the spatial aliasing frequency and, at the same time, minimize the frequency from which beamforming is effectively possible. The spatial aliasing frequency describes the lowest frequency

f_{a l}

for which aliasing effects occur, which is caused by a spatial undersampling of the array for sound waves at high frequencies. The aliasing leads to side lobes with the same amplitude as the main lobe. The spatial aliasing frequency of an array with linear microphone spacing is usually given in the literature as:

f_{a l} = \frac{c}{2 ▵ x}

(12)

with

▵ x

as the space between the microphones [10].

A small microphone spacing sets an upper limit to the spatial aliasing frequency. In contrast, a large microphone spacing sets a lower limit to the frequency from which beamforming is effectively possible. In order to have good directional properties of the microphone array across a wide frequency range, an irregularly-spaced microphone array is used in which both kinds of spacing can occur. A linear-shaped, logarithmically-spaced, to the reference microphone (

n = 0

), symmetrical array is used in this paper. Consequential, the number of the used microphones N has to be uneven

(N \in N_{U})

. The symmetry around one central microphone ensures a purely real directivity. The microphone positions are calculated as follows [11].

\begin{matrix} (x_{n + 1} - x_{n}) = (x_{n} - x_{n - 1}) ξ if n > 0 \\ (x_{n - 1} - x_{n}) = (x_{n} - x_{n + 1}) ξ if n < 0 \end{matrix}

(13)

with:

\begin{matrix} x_{0} & = 0 \\ ξ & = {(l_{s p r e a d})}^{\frac{2}{N - 3}} \\ (x_{1} - x_{0}) & = (x_{0} - x_{- 1}) = \frac{L e n g t h}{2 \sum_{n = 1}^{\frac{N - 1}{2}} ξ^{n - 1}} \end{matrix}

where

L e n g t h

is the total length of the array. The array parameter

l_{s p r e a d} \in R^{> 0}

is a free variable describing the ratio between the spacing of the microphones at the extremities of the array and the spacing of the microphones at the center of the array. Linear microphone spacings are archived with

l_{s p r e a d} = 1

. If

l_{s p r e a d} < 1

, the spacing of the microphones at the extremities of the array is smaller than the one at the center of the array. In the case of

l_{s p r e a d} > 1

, it is the opposite.

2.5. Filter Design

In this section, an optimal filter design is proposed to fit the directivity pattern of the array, whose design was specified in Section 2.4, to the desired directivity pattern specified in Section 2.3. The following filter design is based on numerical convex optimization and has the advantage that only one global minimum exists. In general, this end-fire design can also be used with different desired directivity patterns and array designs. In Section 3, we indicate the ideal values of the constants for the desired directivity pattern and array design proposed in this study.

The aim of this algorithm is to minimize the quadratic error

{error}_{m}

between the directivity pattern obtained by a microphone array

b_{m} (f_{p})

and a desired frequency independent directivity pattern

{\hat{b}}_{m}

[7]:

\begin{matrix} {error}_{m} = G_{m n} (f_{p}) w_{n} (f_{p}) - {\hat{b}}_{m} = b_{m} (f_{p}) - {\hat{b}}_{m} \\ min_{w_{n} (f_{p})} {∥ {error}_{m} ∥}_{2}^{2} \end{matrix}

(14)

This minimization task will be subjected to additional constraints, and therefore, the beamformer will be termed the Constrained Least-Squares Beamformer (CLSB).

In the following subsections, the main minimization task and the used constraints will be explained paying particularly attention to the WNG and different spatial areas. These areas are shown in Figure 4.

Additionally, this optimization process is placed within an optimization loop in order to optimize several important constants. This optimization procedure will be explained in the last subsection of this section.

2.5.1. White Noise Gain

Such a convex optimization procedure allows including a frequency-dependent lower bound

γ (f_{p})

for the WNG when optimizing the filters

w_{n} (f_{p})

[7]:

\begin{matrix} A (f_{p}) & = \frac{| b_{s t e e r} (f_{p}) |^{2}}{w_{n}^{H} (f_{p}) w_{n} (f_{p})} \geq γ (f_{p}) \\ with γ (f_{p}) \in R^{\geq 0} \end{matrix}

(15)

This constraint has a direct influence on the robustness and on how well the desired directivity pattern can be achieved. A high value for the lower bound reduces the accuracy of forming the directivity pattern because the filter is too restricted by this constraint, whereas a low value leads to a not robust filter. In Section 3, an optimal value for this lower bound will be discussed.

2.5.2. Steering Angle

In the direction of the steering angle

ϕ_{s t e e r}

, representing the direction of the main lobe of the microphone array, the directivity pattern obtained by the array is constrained to the value of the desired directivity pattern [7]:

G_{s t e e r, n} (f_{p}) w_{n} (f_{p}) = b_{s t e e r} (f_{p}) \overset{!}{=} {\hat{b}}_{s t e e r}

(16)

In this way, the directivity pattern is normalized to

{\hat{b}}_{s t e e r}

. The steering angle is limited to the array-axis, since the goal is an end-fire array.

2.5.3. Beam Area

The area around the steering angle is the beam area, which defines the main lobe of the directivity pattern:

\begin{matrix} ϕ_{b e a m} & = {ϕ_{m} | ϕ_{s t e e r} - ϕ_{b} \leq ϕ_{m} \leq ϕ_{s t e e r - 1} \land ϕ_{s t e e r + 1} \leq ϕ_{m} \leq ϕ_{s t e e r} + ϕ_{b}} \\ with ϕ_{b} \in R^{\geq 0} \end{matrix}

(17)

ϕ_{s t e e r - 1}

and

ϕ_{s t e e r + 1}

indicate one discrete angle before and after the steering angle, respectively. The constant

ϕ_{b}

can be chosen freely and defines the width of the beam area. Fitting the directivity pattern to the desired one, an angle-dependent upper bound

ϵ_{b e a m}

is set to the error (cf. Equation (14)) in this area:

\begin{matrix} abs ({error}_{b e a m}) \leq ϵ_{b e a m} \\ with ϵ_{b e a m} \in R^{\geq 0} \end{matrix}

(18)

where

abs ()

denotes the absolute value of every entry of the vector argument. In this case,

ϵ_{b e a m}

is a column vector with as many entries as the directivity pattern in the beam area.

2.5.4. Unconstrained Area

An angle area without any constraints is defined to avoid an effective discontinuity in the intermediate zone between the beam and the stop area, which would have a negative impact on the optimized solution that would be obtained:

\begin{matrix} ϕ_{u n c o n s t r a i n e d} & = {ϕ_{m} | ϕ_{s t e e r} - ϕ_{b} - ϕ_{u} \leq ϕ_{m} < ϕ_{s t e e r} - ϕ_{b} \land ϕ_{s t e e r} + ϕ_{b} < ϕ_{m} \leq ϕ_{s t e e r} + ϕ_{b} + ϕ_{u}} \\ with ϕ_{u} \in R^{\geq 0} \end{matrix}

(19)

The constant

ϕ_{u}

can be chosen freely and defines the width of the unconstrained area.

2.5.5. Stop Area

The remaining area is called the stop area:

ϕ_{s t o p} = {ϕ_{m} | ϕ_{s t e e r} + ϕ_{b} + ϕ_{u} < ϕ_{m} < ϕ_{s t e e r} - ϕ_{b} - ϕ_{u}}

(20)

The main optimization task is applied to this area. In the context of this work, the sound from this direction can be assumed to be mainly reverberant sound that does not belong to the direct sound and is therefore undesired. For this reason, the desired directivity pattern in this area is set to zero to suppress sound coming from this area as much as possible [7]:

\begin{matrix} min_{w_{n} (f_{p})} {∥ {error}_{s t o p} ∥}_{2}^{2} \\ with {\hat{b}}_{s t o p} = 0 \end{matrix}

(21)

In addition to this optimization, an upper bound

ϵ_{s t o p}

is set to the uniform norm of the directivity pattern:

\begin{matrix} ∥ {error}_{s t o p} ∥_{\infty} \leq ϵ_{s t o p} \\ with ϵ_{s t o p} \in R^{\geq 0} \end{matrix}

(22)

This upper bound is not angle-dependent, but restricted to the stop area because of the uniform norm and will play an important role in the following loop design.

2.5.6. Loop Design

Choosing the correct upper bound for the beam area is difficult: on the one hand, a low upper bound for the beam area leads to a good fit in this area (low

{error}_{b e a m}

values), but to undesired side lobes in the stop area (high

{error}_{s t o p}

values). Consequential, the direct sound will be recorded correctly, but is mixed with the undesired reverberant sound field, which should be ideally suppressed. On the other hand, a high upper bound for the beam area leads to the opposite, a bad fit in the beam area (high

{error}_{b e a m}

values), but low undesired side lobes (low

{error}_{s t o p}

values). The following loop design finds a frequency-dependent optimal upper bound for the beam area, which is a compromise between a good fit in the beam area and only small side-lobes in the stop area.

As a first step in the loop design, the upper bound of the beam area is initialized in matrix notation:

(23)

The rows cover the beam area, whereas the columns cover the different iterations of the following loops with k as the counter, where

k = K

indicates the last iteration. The upper bound starts in the first iteration with

ϵ_{b e a m}^{k = 1} = 0

and continues linearly spaced with step size

α

. The step size is designed in such a way that the maximum value of the upper bound of the beam area

{\hat{b}}_{s t e e r} - \hat{b} (ϕ_{s t e e r} \pm ϕ_{b})

is reached in overall K steps. Either

\hat{b} (ϕ_{s t e e r} - ϕ_{b})

or

\hat{b} (ϕ_{s t e e r} + ϕ_{b})

can be chosen to calculate

α

, since they are equal according to the symmetry of the desired directivity pattern. The upper bound then ends with the difference between

{\hat{b}}_{s t e e r}

and

{\hat{b}}_{b e a m}

at the row specific angle. If this difference is reached before the last iteration

(k < K)

, this value will stay till this iteration is reached. This will be the case for every row, except the first and the last one. This procedure ensures that

{\hat{b}}_{s t e e r}

stays the maximum value of the directivity pattern.

In contrast to the upper bound of the beam area, the bound of the stop area is initialized as a vector, since there is no angle dependency:

\begin{matrix} ϵ_{s t o p}^{l} & = \begin{matrix} l = 1 & \dots & l = L \\ ({\hat{b}}_{s t e e r} \cdot b_{s t o p}^{f i r s t} & \dots & {\hat{b}}_{s t e e r}) \end{matrix} \\ with L \in N^{> 1}, l \in N^{\leq L} and b_{s t o p}^{f i r s t} \in R^{\geq 0, \leq 1} \end{matrix}

(24)

The entries with the counter l, where

l = L

indicates the last iteration, correspond to the iterations of the following loops and are linearly spaced. The constant

b_{s t o p}^{f i r s t}

controls the maximum allowed value of the directivity pattern in the stop area for the first iteration.

The loop design itself can be seen in Figure 5 and is repeated for every frequency

f_{p}

, where the constants

K_{t e m p}

and

K_{s t e p}

can be chosen freely so that

K / K_{t e m p} \in N

and

K_{t e m p} / K_{s t e p} \in N

, respectively. These two constants regulate the part of the upper bound of the beam area, which is used in the looped optimization process.

The first loop repeats the optimization with the first part of the upper bound of the beam area

(from ϵ_{b e a m}^{k = 1}

to

ϵ_{b e a m}^{k = K_{t e m p} \leq K})

till Equation (22) with

ϵ_{s t o p}^{1}

is true. A result of the optimization, fulfilling Equation (22), is denoted as valid. If this is not the case, Loop 2 repeats Loop 1 with different upper bounds of the stop area

(from ϵ_{s t o p}^{2} to ϵ_{s t o p}^{L})

. If still no valid result is found, Loop 3 increases

K_{t e m p}

with the step width of

K_{s t e p}

. The upper bounds, for which the loop design finds a valid solution, are denoted as optimal

ϵ_{b e a m}^{o p t}

and

ϵ_{s t o p}^{o p t}

. The filter

w

, which corresponds to these upper bounds, is also denoted as optimal

w^{o p t}

. For the case that

K_{s t e p}

increases

K_{t e m p}

over K

(K_{t e m p} + K_{s t e p} > K)

, the last

k = K

calculated result of the optimization is taken as a valid solution.

3. Setup

The following setup is used for the numerical simulations, whose results are described in Section 4 and Section 5. The angular range is discretized into

M = 360

linearly-spaced angles

{ϕ_{0} = 0^{\circ},

ϕ_{1} = 1^{\circ}, \dots, ϕ_{359} = 360^{\circ}}

. The frequency range covers the range of

f_{p = 0} = 0 H z

to

f_{p = 256} = 24 k H z

generated at a sampling rate of

f_{s} = 48 k H z

using a filter length of 512 samples. This results in

P = 257

linear spaced frequency bins. This frequency range covers the spectral content of music [12] that is to be recorded by these microphone arrays. To obtain impulse responses of the filters, the complex spectrum was mirrored, conjugated and transformed towards the time domain via an ifft.

The microphone array consists of

N = 9

omnidirectional microphones and has a total length of

L e n g t h = 1 m

. The array design is done with

l_{s p r e a d} \approx 35

, so that the smallest microphone spacing

(s)

in the center of the array is

s = 0.01 m

. Following that, the spatial aliasing frequency can be maximized to a frequency of

f_{a l} \approx 17, 000 H z

. For practical reasons, the limitation is set to

s = 0.01 m

to ensure enough space for the microphones. The absolute microphone positions are set as follows (displayed in millimeter precision):

x_{n = - 4} = - 0.500 m

,

x_{n = - 3} = - 0.150 m

,

x_{n = - 2} = - 0.043 m

,

x_{n = - 1} = - 0.010 m

,

x_{n = 0} = 0 m

,

x_{n = 1} = 0.010 m

,

x_{n = 2} = 0.043 m

,

x_{n = 3} = 0.150 m

,

x_{n = 4} = 0.500 m

.

After having specified the microphone positions, the convex functions of the CLSB, shown in Section 2.5, are solved utilizing CVX, a package for specifying and solving convex programs [13,14]. Parts of these convex functions are the WNG constraint and the loop design.

For the WNG constraint, the lower bound

γ

for the WNG

A (f_{p})

is set up as follows:

γ (f_{p}) = \{\begin{matrix} 5 & for f_{p} = 0 H z \\ CSI & for 0 H z < f_{p} < 187.5 H z \\ 1 & for 187.5 H z \leq f_{p} \leq f_{s} / 2 H z) \end{matrix}

(25)

The lower bound starts with

γ (f_{p} = 0 H z) = 5

and ends with

γ (f_{p} \geq 187.5 H z) = 1

. In the intermediate zone, a Cubic Spline Interpolation (CSI) connects both points. The CSI in the intermediate zone avoids rapid changes of the directivity pattern across frequency below

(f_{p} < 187.5 H z)

. In the high frequency range

(f_{p} \geq 187.5 H z)

, a lower bound of

γ = 1

ensures a robust beamforming design.

For the loop design, the constants are set up as follows:

\begin{matrix} K = 100, K_{t e m p} = K_{s t e p} = 10, α = 0.01 cf . Equation (23) \\ L = 9, b_{s t o p}^{f i r s t} = 0.2 \\ ϕ_{u} = 10^{\circ} \end{matrix}

(26)

The constants

ϕ_{b}

and

ϕ_{s t e e r}

, as well as the parts of the desired directivity pattern

{\hat{b}}_{b e a m}

and

{\hat{b}}_{s t e e r}

are set up according to Section 2.3.

The values of the constants K,

K_{t e m p}

and

K_{s t e p}

are chosen in such a way that Loop 1 scans the beam area from

ϵ_{b e a m}^{k = 1} = 0

in steps of

α = 0.01

till

ϵ_{b e a m}^{k = K_{t e m p} = 10} = K_{t e m p} \cdot α = 0.1

. If necessary, Loop 3 increases the value of the upper bound of the beam area according to the value of the constant

K_{s t e p}

(cf. Section 2.5).

An increase of the value of the constant K leads to an improvement in the beam area (lower

e r r o r_{b e a m}

values), because the step size

α

is smaller. The validity (cf. Section 2.5) of more possible directivity patterns with small

e r r o r_{b e a m}

values is checked by the loop design. In fact, to find a valid solution, Loop 2 has to increase

ϵ_{s t o p}

further than before, which leads to a worsening in the stop area (higher

e r r o r_{s t o p}

values). A decrease of the value of the constant K leads consequently to the opposite effect.

An increase of the values of the constants

K_{t e m p}

and

K_{s t e p}

leads to a worsening in the beam area (higher

e r r o r_{b e a m}

values), because the first end point of Loop 1

ϵ_{b e a m}^{k = K_{t e m p}}

, as well as all of the other ones

ϵ_{b e a m}^{k = K_{t e m p} + K_{s t e p} + K_{s t e p} + \dots}

is now higher. More possible directivity patterns with high

e r r o r_{b e a m}

values are checked by the loop design: Loop 2 does not have to increase

ϵ_{s t o p}

so much than before, because these directivity patterns are in general more likely to be valid. This leads then to an improvement in the stop area (lower

e r r o r_{s t o p}

values). A decrease of the values of the constants

K_{t e m p}

and

K_{s t e p}

leads consequently to the opposite effect.

The values of the constants L and

b_{s t o p}^{f i r s t}

are chosen in such a way that Loop 2 scans the stop area from

ϵ_{s t o p}^{l = 1} = 0.2

in steps of

({\hat{b}}_{s t e e r} - b_{s t o p}^{f i r s t} \cdot {\hat{b}}_{s t e e r}) / (L - 1) = 0.1

till

ϵ_{s t o p}^{l = L} = {\hat{b}}_{s t e e r} = 1

.

An increase of the value of the constant

b_{s t o p}^{f i r s t}

and at the same time a decrease of the value of the constant L, preserving the step width of

0.1

as mentioned earlier, lead to a worsening in the stop area. The start point of Loop 2 is now higher, allowing higher

e r r o r_{s t o p}

values from the beginning. It is now easier for Loop 1 to find a valid solution, which leads to an improvement in the beam area. A decrease of the value of the constant

b_{s t o p}^{f i r s t}

and a coherent increase of the value of the constant L lead to the opposite effect.

Overall, it can be said that a variation of the values of the constants K,

K_{t e m p}

,

K_{s t e p}

, L and

b_{s t o p}^{f i r s t}

leads to a changed balance, fulfilling the constraints between the beam and the stop area. For every desired directivity pattern and intended purpose of the microphone array has to be found separately optimal values.

A variation of the value of the constant

ϕ_{u}

does not significantly change the results in terms of the error in the beam and the stop area. Nevertheless, the value should not be chosen too big to avoid undesired results (very big differences between the obtained and the desired directivity pattern), since there is no control over the directivity pattern in the unconstrained area. The maximum value of

ϕ_{u}

till there are no undesired results depends in a complex manner on the number of used microphones and the desired directivity pattern.

With the setup shown in Equation (26), we achieved best results in fitting the directivity pattern to the desired one. Different initializations of the constants are also possible, as mentioned before (a detailed analysis of the effect on the results regarding the variation of the constants’ values given in Equation (26) is beyond the scope of this article). Our results are, however, discussed in the following Section 4 and Section 5.

4. Objective Evaluation

The following section is divided into four parts. In Section 4.1, two array designs are compared to each other to show the improvement of the spatial aliasing of a logarithmically-spaced array over a linearly-spaced one. In the second Section 4.2, the new stereo system proposed in this study is compared to the state-of-the-art ones, which utilize two microphones. In the third Section 4.3, the WNG constraint and the frequency response are analyzed. Finally, in the last Section 4.4, the angular constraints, as well as the phase of the directivity pattern are investigated.

4.1. Directivity Index Comparison

The directivity pattern of the logarithmically-spaced array

(l_{s p r e a d} \approx 35, s = 0.01 m)

is more directive for high frequencies than the one of a linearly-spaced array

(l_{s p r e a d} = 1, s = 0.125 m)

having the same total length of

L e n g t h = 1 m

. Less reverberant sound is recorded by the first type of array than by the latter one. As a measure, we choose the directivity index

D I

, which is the logarithm of the directivity D [15]:

\begin{matrix} D (f_{p}) & = \frac{\sum_{m = 0}^{M - 1} {max}_{ϕ_{m}} (| b (f_{p}, ϕ_{m}) |^{2})}{\sum_{m = 0}^{M - 1} {| b (f_{p}, ϕ_{m}) |}^{2}} \\ D I (f_{p}) & = 10 {log}_{10} (D (f_{p})) \end{matrix}

(27)

In fact, Figure 6 shows that the linearly-spaced array has lower

D I

values for high frequencies

(f_{p} > 1200 H z)

than the logarithmically-spaced one. This is caused by aliasing effects, as the aliasing frequency for the linearly-spaced array is

f_{a l} \approx 1460 H z

. There is a big drop of the

D I

values

(D I < 7 d B)

for the logarithmically-spaced array for very high frequencies

(f_{p} > 10, 500 H z)

, which is also caused by aliasing effects. The lowest values of the

D I

for the logarithmically-spaced array are located around the aliasing frequency

f_{a l} (Δ x = s) \approx 17, 000 H z

.

4.2. Comparison Stereo Systems

The necessary phase and/or level differences for a stereophonic recording as mentioned in Section 2.3 can also be obtained by only two microphones. Different angles and distances between these two microphones, as well as different microphone directivity patterns are possible, as described, for example, by the A-B or the X-Y technique [12]. A unified theory of these two-microphone systems for stereophonic sound recording can be found in [6].

Assuming no phase differences, this theory states that a level difference of

Δ L e v e l = \pm 15 d B

determines the left or right lateral shift towards the loudspeakers of a phantom sound source in the playback situation. This level difference is achieved in the recording situation with different angles between two microphones with specific directivity patterns. The angle covering this level difference is called recording angle

ϕ_{r e c}

. If

ϕ_{r e c} > ϕ_{b a s e}

, the recorded sound scene is compressed in the playback configuration, whereas

ϕ_{r e c} < ϕ_{b a s e}

, the recorded sound scene is expanded [6]. Therefore, we can assume that if we have

ϕ_{r e c} = ϕ_{b a s e}

, the recorded spatial properties are the same after playback. Table 1 shows the possible microphone directivities and base angles between the microphone pairs.

The microphone array stereo system described in this study records less reverberant sound than these state-of-the-art two-microphone stereo systems. As a measure, we choose a modified definition of the directivity index

D I_{m o d}

, which is the logarithm of a modified directivity

D_{m o d}

, mentioned in Section 4.1:

\begin{matrix} D_{m o d} & = \frac{\sum_{m = 0}^{M - 1} 2 {max}_{ϕ_{m}} (b_{m i c 1} {(ϕ_{m})}^{2})}{\sum_{m = 0}^{M - 1} b_{m i c 1} {(ϕ_{m})}^{2} + b_{m i c 2} {(ϕ_{m})}^{2}} \\ D I_{m o d} & = 10 {log}_{10} (D_{m o d}) \end{matrix}

(28)

where

b_{m i c 1} (ϕ_{m})

and

b_{m i c 2} (ϕ_{m})

are the directivity patterns of the first and the second microphone, respectively. The modified directivity index includes the sum of the directivity patterns of the two microphones. The modified directivity index considers the angle between these two directivity patterns, which determines the percentage of recorded reverberant sound in addition to the directivity pattern itself. As shown in Table 1, the proposed microphone array stereo system is, in fact, more directive than the two-microphone stereo ones, taking also into account the angle between the two microphone arrays.

4.3. WNG and Frequency Response

The algorithm successfully fits the WNG

A (f_{p})

to the lower bound

γ (f_{p})

specified in Section 3, as shown in Figure 7a.

This ensures a robust beamforming design. For high frequencies

f_{p} \geq 7031 H z

, the algorithm finds even higher WNG values than the lower bound.

Figure 7b shows the frequency response of both arrays according to the configuration that is shown in Figure 3. The responses for both arrays were calculated for a sound source emanating from

ϕ = 30^{\circ}

(resulting in a sound source perceived at the location of the left loudspeaker, solid and dashed line) according to Figure 3 and

ϕ = 0^{\circ}

(resulting in a phantom source between both speakers, dotted and dash-dotted line). It can be seen that for

ϕ = 0^{\circ}

, the responses of both arrays show a high similarity in terms of level differences and have only minor fluctuations of approximately

\pm 2 d B

above

1000 H z

. Below

1000 H z

, it can be observed that there is a boost of approximately

3 d B

, which might be attributed to a violation of a constraint at low frequencies. When the sound source is emanating from

ϕ = 30^{\circ}

, a flat frequency response can be observed for Array 1 (on axis) with minor fluctuations of approximately

1 d B

across frequency. Array 2 shows a considerably lower level, but larger fluctuations. It can be assumed that these fluctuations will not be perceivable because the location of the sound source will be determined by Array 1.

4.4. Beam and Stop Area Constraints

The results of the loop design mentioned in Section 2.5 are shown in Figure 8. This loop design finds a compromise between a good fit in the beam area and low directivity pattern values in the stop area.

For low frequencies

f_{p} < 187.5 H z

, the directivity pattern is quite omnidirectional

(∥ {error}_{s t o p} ∥_{\infty} > 0.2

and

∥ {error}_{b e a m} ∥_{\infty} > 0.1)

, so that Loop 3 has to increase

ϵ_{b e a m}

to

∥ ϵ_{b e a m}^{o p t} ∥_{\infty} > 0.1

. For higher frequencies

f_{p} \geq 187.5 H z

, there is a good fit in the beam area

∥ {error}_{b e a m} ∥_{\infty} \leq 0.1

so that Loop 1 and Loop 2 find the ideal upper bound for the beam and the stop area. Overall, it can be said that the best result is found in the frequency range of

281.3 H z \leq f_{p} \leq 1969 H z

: a good fit in the beam area combined with low directivity pattern values in the stop area

∥ e r r o r_{s t o p} ∥_{\infty} \leq 0.2

. At high frequencies

(f_{p} \geq 16, 690 H z)

, Figure 8b shows aliasing effects

(∥ {error}_{s t o p} ∥_{\infty} = 1)

, which are expected, since the aliasing frequency of the logarithmically-spaced array is

f_{a l} (Δ x = s) \approx 17, 000 H z

.

Figure 9 shows the polar plot of the desired directivity pattern in addition to the absolute value of the directivity patterns of the frequencies

f_{p} = 250 H z

,

f_{p} = 1000 H z

,

f_{p} = 4000 H z

and

f_{p} = 8000 H z

. For all frequencies, there is a good fit (a small difference between desired and obtained directivity pattern) in the beam area, as already quantified by Figure 8a. Comparing the side-lobe-levels of the different frequencies, the following can be stated: the side-lobe-level decreases from

f_{p} = 250 H z

to

1000 H z

; there is no big difference in side-lobe-level between

f_{p} = 1000 H z

and

f_{p} = 4000 H z

; the side-lobe-level increases from

f_{p} = 4000 H z

to

f_{p} = 8000 H z

. This analysis is described in a quantified matter in Figure 8b.

Figure 10a allows for a more detailed analysis, as it shows the absolute value of the difference between the directivity pattern and the desired one in the whole angular range

| e r r o r (ϕ_{m}, f_{p}) |

. The omnidirectional behavior of the directivity pattern up to

f_{p} = 187.5 H z

can be also seen there. For higher frequencies, side lobes appear at

ϕ_{m} = \pm 180^{\circ}

and move with increasing frequency into the direction of the beam

- 60^{\circ} \leq ϕ_{m} \leq 60^{\circ}

. Aliasing effects can be seen in Figure 10a, like in Figure 8b.

In addition to the absolute value of the directivity pattern, the phase

arg (b (f_{p}, ϕ_{m}))

is represented in Figure 10b.

The directivity pattern is purely real: the phase shows only three possible values

arg (b) = {- π, 0, π}

as mentioned in Section 2.3. In the beam area, the phase has, in fact, only values

arg (b) = 0

, which leads to no phase differences between the two arrays in the recording configuration mentioned in Section 2.3.

5. Subjective Evaluation

In this section, the proposed microphone array is subjectively evaluated. For this purpose, a listening experiment was performed, whose results are shown.

5.1. Subjective Evaluation: Localization Accuracy

In order to evaluate the proposed stereophonic-microphone array in terms of localization accuracy when simulating spatially-distributed sound sources, subjective data were obtained in a localization experiment within a real room from listeners. The loudspeaker signals were generated using a single sound source and by simulating the delays between the microphones and the sound source. The optimized filters

w^{o p t}

were applied on each microphone signal to obtain the output signal for the left and right array, which was then played back via the two loudspeakers during the listening experiment. The loudspeaker and array configurations are shown in Figure 3.

The sound sources were placed on virtual locations between

- 30^{\circ}

and

+ 30^{\circ}

in a five degree resolution, resulting in a phantom source stereo image based on intensity-panning between the left and the right loudspeakers. The evaluation took place in a reverberant room with the dimensions

(7.5, 7.1, 2.97)

m with a reverberation time of

T_{60} = 0.45 s

. The distance between the loudspeakers was

3 m

, and the listeners were seated at the position that created a

60^{\circ}

stereo triangle with the loudspeakers (cf. Figure 3). As a source signal, three short pink noise bursts with a total length of

1.1 s

were presented to the listeners. The noise covered a frequency rang from

100 H z

to

f_{s} / 2

covering the spectral content of musical signals. Data were obtained from seven listeners, and the 13 source position angles were presented in random order. For each subject, the experiment covered one training session and three measurement sessions. The task of the participants was to indicate the perceived direction between the loudspeaker using indicators placed between the loudspeakers in five degree steps.

5.2. Subjective Evaluation: Results

Figure 11 shows the perceived directions of the subjective evaluation. The dotted line indicates perfect correspondence between the true source location and the perceived location. Circles show the average perceived location in dependence of the simulated source location. As can be seen, there is a rather linear behavior on localization, indicating a mostly precise representation of the presented directions. Exceptions can be observed around

\pm 20

degrees at which the presented source is perceived more lateral than the simulated source location. The maximum localization error of ≈6 degrees that can be observed can probably be attributed to the target functions that were used to optimize the directivity pattern, which may cause too high level differences when both arrays are used in combination.

6. Discussion and Conclusions

In this study, a new approach for intensity stereophonic recording has been investigated. Guided by the playback situation and its auditory requirements, we decided to postulate a setup consisting of two crossed end-fire microphone arrays and a fitting desired directivity pattern. The difference between the directivity pattern obtained and the one desired was minimized by a superdirective beamforming algorithm. It was based on convex numeric optimization and also contains a frequency-dependent WNG constraint to ensure a robust beamforming design.

In addition to designing the filters of the microphones via beamforming algorithms, we found an ideal array design. This design maximizes the spatial aliasing frequency and also takes practical issues into account, which will appear in an actualization of the arrays. The extent of the microphones demands a particular spacing, also to avoid interferences between them.

A comparison between the new stereo system and the state-of-the-art ones, which use two microphones, has shown that the former has the advantage of less recorded reverberant sound, as it is more directive in the look direction than the latter are. This matches the requirements posed by the recording method proposed in Grosse and van de Par [3], which requires separate dry and reverberated representations of the audio signal. The reverberated sound field can be taken from single microphone signals.

Future research could develop a method to optimize the directivity pattern of both arrays as one system rather than handling them separately. Furthermore, two additional beams pointing into the diffuse field could be introduced for optimization to replace the two microphones placed in that field and to use only the array system.

A final assessment of the proposed recording and playback system needs to run listening tests and investigate the perception of the recording and playback room.

Acknowledgments

We would like to thank the Deutsche Forschungsgemeinschaft for supporting this work as part of the Forschergruppe Individualisierte Hoerakustik (FOR-1732). We also would like to thank the reviewers for their helpful and insightful comments.

Author Contributions

Steven van de Par and Julian Grosse formulated the constraints for the true stereo microphone array. Jonathan Albert Gößwein developed and evaluated the methods for optimizing the true stereo microphone array. Julian Grosse planed and performed the localization experiment.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

WNG	White Noise Gain
CLSB	Constrained Least-Squares Beamformer
CSI	Cubic Spline Interpolation

References

Berkhout, A.J. A holographic approach to acoustic control. J. Audio Eng. Soc 1988, 36, 977–995. [Google Scholar]
Gerzon, M.A. Periphony: With-Height sound reproduction. J. Audio Eng. Soc 1973, 21, 2–10. [Google Scholar]
Grosse, J.; van de Par, S. Perceptually accurate reproduction of recorded sound fields in a reverberant room using spatially distributed loudspeakers. IEEE J. Sel. Top. Signal Process. 2015, 9, 867–880. [Google Scholar] [CrossRef]
Schroeder, M.R. Statistical parameters of the frequency response curves of large rooms. J. Audio Eng. Soc 1987, 35, 299–306. [Google Scholar]
Haeussler, A.; van de Par, S. Theoretischer und subjektiver Einfluss des Aufnahmeraumes auf den Wiedergaberaum. In Proceedings of the 40th DAGA’14 Jahrestagung fuer Akustik, Oldenburg, Germany, 10–13 March 2014. [Google Scholar]
Williams, M. Unified theory of microphone systems for stereophonic sound recording. In Proceedings of the 82th Audio Engineering Society Convention, London, UK, 10–13 March 1987. [Google Scholar]
Mabande, E.; Schad, A.; Kellermann, W. Design of robust superdirective beamformers as a convex optimization problem. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 77–80. [Google Scholar]
Frost, O.L. An algorithm for linearly constrained adaptive array processing. Proc. IEEE 1972, 60, 926–935. [Google Scholar] [CrossRef]
Pulkki, V. Compensating displacement of amplitude-panned virtual sources. In Proceedings of the Audio Engineering Society Conference: 22nd International Conference: Virtual, Synthetic, and Entertainment Audio, Espoo, Finland, 15–17 June 2002. [Google Scholar]
McCowan, I.A. Robust Speech Recognition using Microphone Arrays. Ph.D. Thesis, Queensland University of Technology, Brisbane City, QLD, Australia, 2001. [Google Scholar]
Corteel, E. On the use of irregularly spaced loudspeaker arrays for wave field synthesis, potential impact in spatial aliasing frequency. In Proceedings of the 9th international converence on Digital Audio Effects (DAFx’06), Montreal, QC, Canada, 18–20 September 2006; pp. 209–214. [Google Scholar]
Dickreiter, M.; Dittel, V.; Hoeg, W.; Woehr, M. Handbuch der Tonstudiotechnik, 7th ed.; K. G. Sauer Verlag: München, Germany, 2008; Volume 1. [Google Scholar]
Grant, M.; Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, version 2.1. 2014. Available online: http://cvxr.com/cvx (accessed on 18 May 2017).
Grant, M.; Boyd, S. Graph implementations for nonsmooth convex programs. In Recent Advances in Learning and Control: Lecture Notes in Control and Information Sciences; Blondel, V., Boyd, S., Kimura, H., Eds.; Springer: New York, NY, USA, 2008; pp. 95–110. Available online: http://stanford.edu/ boyd/graphdcp.html (accessed on 18 May 2017).
Kinsler, L.; Frey, A.; Coppens, A.; Sanders, J. Fundamentals of Acoustics; John Wiley and Sons, Inc.: New York, NY, USA, 2000. [Google Scholar]

Figure 1. Recording and playback configuration with a processing stage in between to maintain the acoustical perception of a recording room. The microphone

(C)

records the direct sound, which is played later by two conventional loudspeakers, whereas the two microphones

(B^{l})

and

(B^{r})

record the reverberant sound field, which is played later by two dipole loudspeakers. Figure reproduced with permission from [3], Copyright IEEE, 2015.

Figure 1. Recording and playback configuration with a processing stage in between to maintain the acoustical perception of a recording room. The microphone

(C)

records the direct sound, which is played later by two conventional loudspeakers, whereas the two microphones

(B^{l})

and

(B^{r})

record the reverberant sound field, which is played later by two dipole loudspeakers. Figure reproduced with permission from [3], Copyright IEEE, 2015.

Figure 2. Microphone array receiving a signal with frequency f and angle of incidence

ϕ

. The incoming wavefront is captured with a microphone n, modified with the respective filter

w_{n}

and, at the end, summed up to form the directivity pattern

b (f, ϕ)

.

Figure 2. Microphone array receiving a signal with frequency f and angle of incidence

ϕ

. The incoming wavefront is captured with a microphone n, modified with the respective filter

w_{n}

and, at the end, summed up to form the directivity pattern

b (f, ϕ)

.

Figure 3. The stereophonic recording configuration is based on the playback one. Recorded level and phase differences with the two end-fire microphone arrays generate a phantom source between the two loudspeakers in the playback configuration. The signal emitted from Loudspeaker 1 has the level

L e v e l_{1}

and the phase

p h a s e_{1}

. The signal emitted from Loudspeaker 2 has the level

L e v e l_{2}

and the phase

p h a s e_{2}

. (a) Typical stereophonic playback configuration [9]; (b) proposed stereophonic recording configuration with sketched microphone positions. The absolute microphone positions are shown in Section 3.

Figure 3. The stereophonic recording configuration is based on the playback one. Recorded level and phase differences with the two end-fire microphone arrays generate a phantom source between the two loudspeakers in the playback configuration. The signal emitted from Loudspeaker 1 has the level

L e v e l_{1}

and the phase

p h a s e_{1}

. The signal emitted from Loudspeaker 2 has the level

L e v e l_{2}

and the phase

p h a s e_{2}

. (a) Typical stereophonic playback configuration [9]; (b) proposed stereophonic recording configuration with sketched microphone positions. The absolute microphone positions are shown in Section 3.

Figure 4. Different spatial areas in the directivity pattern optimization problem. The steering angle

ϕ_{s t e e r}

, the beam area

ϕ_{b e a m}

(indicated by horizontal hash lines), an area without any constraints

ϕ_{u n c o n s t r a i n e d}

(indicated by crossed hash lines) and the stop area

ϕ_{s t o p}

(indicated by vertical hash lines).

Figure 4. Different spatial areas in the directivity pattern optimization problem. The steering angle

ϕ_{s t e e r}

, the beam area

ϕ_{b e a m}

(indicated by horizontal hash lines), an area without any constraints

ϕ_{u n c o n s t r a i n e d}

(indicated by crossed hash lines) and the stop area

ϕ_{s t o p}

(indicated by vertical hash lines).

Figure 5. Loop design to determine the optimal filter, as well as the optimal upper bound for the beam and the stop area.

Figure 6. Directivity index

D I (f_{p})

of a linearly-spaced array

(l_{s p r e a d} = 1, s = 0.125 m)

(dashed line) and the logarithmically-spaced one

(l_{s p r e a d} \approx 35, s = 0.01 m)

(solid line) with the same total length of

L e n g t h = 1 m

.

Figure 6. Directivity index

D I (f_{p})

of a linearly-spaced array

(l_{s p r e a d} = 1, s = 0.125 m)

(dashed line) and the logarithmically-spaced one

(l_{s p r e a d} \approx 35, s = 0.01 m)

(solid line) with the same total length of

L e n g t h = 1 m

.

Figure 7. (a) White Noise Gain (WNG)

A (f_{p})

, as well as the lower bound for the WNG

γ (f_{p})

across frequency; (b) shown are frequency responses of both arrays for two sound sources emanating from

ϕ = 30^{\circ}

and

ϕ = 0^{\circ}

according to the configuration illustrated in Figure 3.

Figure 7. (a) White Noise Gain (WNG)

A (f_{p})

, as well as the lower bound for the WNG

γ (f_{p})

across frequency; (b) shown are frequency responses of both arrays for two sound sources emanating from

ϕ = 30^{\circ}

and

ϕ = 0^{\circ}

according to the configuration illustrated in Figure 3.

Figure 8. The difference between the simulated directivity pattern and the desired one

(e r r o r)

in the beam (a) and the stop (b) area, as well as the corresponding upper bounds of both areas as function of the frequency.

Figure 8. The difference between the simulated directivity pattern and the desired one

(e r r o r)

in the beam (a) and the stop (b) area, as well as the corresponding upper bounds of both areas as function of the frequency.

Figure 9. Polar plot of the desired directivity pattern (grey markers) and the absolute value of the obtained directivity patterns of the frequencies

f_{p} = 250 H z

(solid line),

f_{p} = 1000 H z

(dashed line),

f_{p} = 4000 H z

(dashed-dotted line) and

f_{p} = 8000 H z

(dotted line).

Figure 9. Polar plot of the desired directivity pattern (grey markers) and the absolute value of the obtained directivity patterns of the frequencies

f_{p} = 250 H z

(solid line),

f_{p} = 1000 H z

(dashed line),

f_{p} = 4000 H z

(dashed-dotted line) and

f_{p} = 8000 H z

(dotted line).

Figure 10. The difference between the directivity pattern and the desired one

| e r r o r (f_{p}, ϕ_{m}) |

(a), as well as the phase of the directivity pattern

arg (b (f_{p}, ϕ_{m}))

(b).

Figure 10. The difference between the directivity pattern and the desired one

| e r r o r (f_{p}, ϕ_{m}) |

(a), as well as the phase of the directivity pattern

arg (b (f_{p}, ϕ_{m}))

(b).

Figure 11. Illustrated are the mean-values of the perceived angle of incidence with the standard deviation across seven participants’ means. The x-axis represents the simulated angle of incidence

ϕ

of the presented noise sources. The dotted line indicates a perfect match between simulated and perceived localization.

Figure 11. Illustrated are the mean-values of the perceived angle of incidence with the standard deviation across seven participants’ means. The x-axis represents the simulated angle of incidence

ϕ

of the presented noise sources. The dotted line indicates a perfect match between simulated and perceived localization.

Table 1. The modified directivity index

D I_{m o d}

of the state-of-the-art two-microphone stereo systems and the microphone array stereo system described in this study. For the latter one, the desired directivity patterns are used. Only stereo systems with

ϕ_{r e c} = ϕ_{b a s e}

are displayed. This angle constraint avoids angular compression or angular expansion in the playback situation.

Table 1. The modified directivity index

D I_{m o d}

of the state-of-the-art two-microphone stereo systems and the microphone array stereo system described in this study. For the latter one, the desired directivity patterns are used. Only stereo systems with

ϕ_{r e c} = ϕ_{b a s e}

are displayed. This angle constraint avoids angular compression or angular expansion in the playback situation.

Two-Microphone Stereo Systems
Microphone Directivity	Angle between the Microphones $(^{\circ})$	${DI}_{\mod}$
Figure of Eight	101	$5.95$
Hypercardioid $(back attenuation = - 6 d B)$	136	$8.29$
Hypercardioid $(back attenuation = - 10 d B)$	156	$8.7$
Microphone Array Stereo System
$D I_{m o d} = 11.29$ with $b_{m i c 1} (ϕ_{m}) = {\hat{b}}_{a r r a y 1} (ϕ_{m})$ and $b_{m i c 2} (ϕ_{m}) = {\hat{b}}_{a r r a y 2} (ϕ_{m})$

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gößwein, J.A.; Grosse, J.; Van de Par, S. Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment. Appl. Sci. 2017, 7, 541. https://doi.org/10.3390/app7060541

AMA Style

Gößwein JA, Grosse J, Van de Par S. Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment. Applied Sciences. 2017; 7(6):541. https://doi.org/10.3390/app7060541

Chicago/Turabian Style

Gößwein, Jonathan Albert, Julian Grosse, and Steven Van de Par. 2017. "Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment" Applied Sciences 7, no. 6: 541. https://doi.org/10.3390/app7060541

APA Style

Gößwein, J. A., Grosse, J., & Van de Par, S. (2017). Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment. Applied Sciences, 7(6), 541. https://doi.org/10.3390/app7060541

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment

Abstract

1. Introduction

2. Methods

2.1. Beamforming

2.2. Robustness and White Noise Gain

2.3. Desired Directivity Pattern

2.4. Array Design

2.5. Filter Design

2.5.1. White Noise Gain

2.5.2. Steering Angle

2.5.3. Beam Area

2.5.4. Unconstrained Area

2.5.5. Stop Area

2.5.6. Loop Design

3. Setup

4. Objective Evaluation

4.1. Directivity Index Comparison

4.2. Comparison Stereo Systems

4.3. WNG and Frequency Response

4.4. Beam and Stop Area Constraints

5. Subjective Evaluation

5.1. Subjective Evaluation: Localization Accuracy

5.2. Subjective Evaluation: Results

6. Discussion and Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI