Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array

Xiang, Ning; Jasa, Tomislav

doi:10.3390/psf2023009026

Open AccessProceeding Paper

Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array^†

by

Ning Xiang

^1,*

and

Tomislav Jasa

²

¹

Graduate Program in Architectural Acoustics, Rensselaer Polytechnic Institute, Troy, NY 12180, USA

²

Thalgorithm Research, San Jose, CA 95134, USA

^*

Author to whom correspondence should be addressed.

^†

Presented at the 42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Garching, Germany, 3–7 July 2023.

Phys. Sci. Forum 2023, 9(1), 26; https://doi.org/10.3390/psf2023009026

Published: 20 May 2024

(This article belongs to the Proceedings of The 42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Since its inception in 2004, nested sampling has been used in acoustics applications. This work applies nested sampling within a Bayesian framework to the detection and localization of sound sources using a spherical microphone array. Beyond an existing work, this source localization task relies on spherical harmonics to establish parametric models that distinguish the background sound environment from the presence of sound sources. Upon a positive detection, the parametric models are also involved to estimate an unknown number of potentially multiple sound sources. For the purpose of source detection, a no-source scenario needs to be considered in addition to the presence of at least one sound source. Specifically, the spherical microphone array senses the sound environment. The acoustic data are analyzed via spherical Fourier transforms using a Bayesian model comparison of two different models accounting for the absence and presence of sound sources for the source detection. Upon a positive detection, potentially multiple source models are involved to analyze direction of arrivals (DoAs) using Bayesian model selection and parameter estimation for the sound source enumeration and localization. These are two levels (enumeration and localization) of inferential estimations necessary to correctly localize potentially multiple sound sources. This paper discusses an efficient implementation of the nested sampling algorithm applied to the sound source detection and localization within the Bayesian framework.

Keywords:

nested sampling; Bayesian model comparison; Bayesian model selection; parameter estimation; sound source detection; sound source localization

1. Introduction

Nested sampling (NS) was introduced by Skilling [1] as a numerical method for efficient Bayesian calculations. Soon afterward, this method was applied to acoustics problems [2], where Jasa and Xiang explored using Lebesgue integral as the mathematical foundation of the NS algorithm. Since then, that effort has resulted in a series of publications in acoustic applications [3,4,5]. This paper showcases that the NS algorithm has recently been applied in sound source detection and localization within a Bayesian framework. To detect and localize sound sources, the sound environment is sensed by a spherical microphone array whose signals are processed using a spherical Fourier transform. Spherical harmonics are exploited to process the acoustic data and to formulate the signal models. This paper emphasizes that source detection represents a model comparison problem, source enumeration represents a model selection problem, and source localization represents a parameter estimation problem, all of which can be efficiently accomplished within the Bayesian framework using the NS algorithm.

This paper presents a further development from the previous work [6] in that a background model for a no-source scenario needs to established. The source detection problem is critically based on the model comparison between the no-source and one-source models. Special attention is given to the spherical harmonics when dealing with the background model and is separately dealt with in Section 2.2. In addition to these model improvements, higher-order (fourth) spherical components have been achieved due to the further development of a 32-channel microphone array as illustrated in Figure 1.

2. Spherical Microphone Data and Models

This section briefly introduces the data processing and the prediction models used for the sound source detection, enumeration, and direction of arrival estimations.

2.1. Microphone Array Data

When Q microphones are arranged flush on a rigid sphere of radius a nearly equidistantly, the sound pressure signals

{\underset{̲}{P}}_{m i c}

are processed by

D (Θ) = \frac{4 π}{Q} \sum_{n = 0}^{N} \sum_{m = - n}^{n} \frac{{\underset{̲}{Y}}_{n}^{m} (Θ)}{B (k a)} \sum_{q = 1}^{Q} {\underset{̲}{P}}_{mic} (Θ_{q}) {\underset{̲}{Y}}_{n}^{m} {(Θ_{q})}^{*},

(1)

with the third sum over

q = 1, \dots, Q

being a spherical Fourier transform of Q microphone signals

{\underset{̲}{P}}_{mic} (Θ_{q})

at angular positions

Θ_{q}

, and symbol ∗ standing for a complex conjugate. Function

B (k a)

is a modal strength of the rigid sphere of radius a,

B (k a) = j k [j_{n} (k a) - \frac{j_{n}^{'} (k a)}{h_{n}^{'} (k a)} h_{n} (k a)],

(2)

where

j = \sqrt{- 1}

,

k = ω / c

is the propagation coefficient of sound waves. Functions

j_{n} (\cdot)

and

h_{n} (\cdot)

are spherical Bessel and Hankel functions, and

j_{n}^{'} (\cdot)

and

h_{n}^{'} (\cdot)

are their derivatives, respectively.

Θ = {θ, ϕ}

, collectively represents elevation and azimuth angles, while

Θ_{q}

specifies Q microphone locations flush-mounted on the spherical surface of the rigid sphere of radius a. Figure 1 shows a spherical microphone array of

Q = 32

channels developed for this research. The spherical array is built upon a rigid sphere of radius

a = 3.5

cm. In the following, we denote

D = {D (Θ)}

as a two-dimensional matrix (vectors) representing the experimental data in the context of Bayesian inferential inversion.

2.2. Prediction Models

In Equation (1),

{\underset{̲}{Y}}_{n}^{m} (Θ)

is so-called spherical harmonics of order n and degree m, it is orthonormal and complete in a sense, and

g (Θ_{s}, Θ) = 2 \sqrt{π} \sum_{n = 0}^{N} \sum_{m = - n}^{n} {\underset{̲}{Y}}_{n}^{m} {(Θ_{s})}^{*} {\underset{̲}{Y}}_{n}^{m} (Θ) \to δ (cos θ - cos θ_{s}) δ (ϕ - ϕ_{s}),

(3)

when

N \to \infty

.

Θ_{s}

is the source angle in the form of the direction of arrival (DoA).

δ (\cdot)

represents the Kronecker delta function. Using Equation (3), we establish a predictive model of spherical beamforming as

M_{S} (Θ_{S}, Θ) = \sum_{s = 1}^{S} \frac{A_{s} g^{2} (Θ_{s}, Θ)}{max [g^{2} (Θ_{s}, Θ)]},

(4)

where

A_{s}

counts for different source energy strengths of individual sound sources. Note that

Θ_{S} = {A_{1}, A_{2}, \dots, A_{S}, Θ_{1}, Θ_{2}, \dots, Θ_{S}}

collectively denotes both strength vector

A_{S}

and angular directions (vectors) of S sound sources, and each angular vector contains one pair of elevation and azimuth angles

Θ_{s} = {θ_{s}, ϕ_{s}}

. Variable

Θ

represents the angular range for possible sound sources to be localized. For

S \geq 1

, the kernel function

g (Θ_{s}, Θ)

in Equation (3) is processed for the upper order

{(N + 1)}^{2} \leq Q

. This means that the integer-valued order N of the spherical harmonics is limited by the number of microphone channels instead of infinity. Figure 2 illustrates a superposition of two simultaneous sources of equal amplitudes predicted by the model kernel in Equation (3) for

N = 4

before squaring the operation to build source energy. The finite upper order N is responsible for the width of lobes rather than middle-form ones. Figure 3a illustrates the spherical microphone data for the presence of two simultaneous sound sources, processed using Equations (1) and (2), while Figure 3b illustrates the predicted map of the two simultaneous sound sources using the model in Equation (4). The angular range is evaluated over

Θ = {0 \leq θ \leq 180^{\circ}; 0 \leq ϕ \leq 360^{\circ}}

.

When processing the microphone array data, there is no prior knowledge about the incoming sound field either with the presence or absence of sound sources. It does not make sense to pursue direction of arrival analysis if no sound sources are present in the incoming microphone signals. For the model-based Bayesian detection, we need to establish a background model. Special attention has to be given to the spherical harmonics processing in this case. Specifically,

M_{0}

represents the no-source model for

S = 0

. In this case, the kernel function

g (Θ_{0}, Θ)

in Equation (3) is only calculated for

N = 0

, namely the zero-order of the spherical harmonics.

M_{0} (Θ) = \frac{A_{0} g^{2} (Θ)}{max [g^{2} (Θ)]},

(5)

where the direction of ‘no-source’

Θ_{0}

is irrelevant over the angular range

Θ

. For notation purpose, we collectively denote

M_{S} = {M_{0}, M_{1}, \dots, M_{S}}

as being the prediction models for the directional of arrival analysis, while we denote

M_{D} = {M_{0}, M_{1}}

for the sound source detection, a small subset of

M_{S}

.

3. Bayesian Calculations

Given the data

D

as formulated in Section 2.1 and the prediction models

M_{S} (Θ_{S})

in Section 2.2, this work relies on Bayes theorem:

\begin{matrix} p (Θ_{S} | D, M_{S}) & \times & Z & = & L (Θ_{S}) & \times & Π (Θ_{S}), \\ posterior & \times & evidence & = & likelihood & \times & prior, \end{matrix}

(6)

with

Π (Θ_{S}) = p (Θ_{S} | M_{S})

being the prior probability and

L (Θ_{S}) = p (D | Θ_{S}, M_{S})

being the likelihood function. The prior and the likelihood are both prior probabilistic in nature and need to be assigned a priori. This work applies the principle of maximum entropy (MaxEnt), which leads to a uniform prior and a Student-T distribution for the likelihood (see Ref. [6] for details). The evidence

Z

in Equation (6) plays a central role for the source detection and source enumeration problems and is determined by

Z = \int_{Θ_{S}} L (Θ_{S}) Π (Θ_{S}) d Θ_{S} = \int_{0}^{1} L (μ) d μ,

(7)

where

μ (L_{ϵ}) = \int_{L (Θ_{S}) > L_{ϵ}} Π (Θ_{S}) d Θ_{S}

(8)

is the prior mass with

L (μ (L_{ϵ})) = L_{ϵ}

, and

0 \leq L_{ϵ} \leq L_{\max}

as derived by Skilling [7]. The NS algorithm generates a monotonically increasing partition of the likelihood range

[0, L_{\max}]

0 < L_{0} < L_{1} < \dots < L_{T - 1} < L_{T} < L_{\max},

(9)

via constrained sampling such that

L_{t}

with

t \in {0, 1, \dots, T}

is sampled from the domain

(Θ_{S} : L (Θ_{S}) > L_{t - 1})

. Observe that as

L_{ϵ}

increases from 0 to

L_{\max}

,

μ_{ϵ}

decreases from 1 to 0, where

μ_{ϵ} = μ (L_{ϵ})

and the partition of Equation (9) generates the monotonically decreasing sequence

1 > μ_{0} > μ_{1} > \dots > μ_{T - 1} > μ_{T} > 0 .

(10)

Using the sequences in Equations (9) and (10), the one-dimensional integration on the far-right-hand side of Equation (7) is well approximated by

Z \leftarrow \sum_{t = 0}^{T} L_{t} Δ μ_{t},

(11)

with

Δ μ_{t} = μ_{t} - μ_{t + 1}, or Δ μ_{t} = μ_{t - 1} - μ_{t} .

(12)

Skilling [7] pointed out that the constrained prior mass is a statistical quantity and follows a shrinkage of

μ_{t} \approx e^{- t / P},

(13)

after t iterations, with P being an integer number for initializing random samples. A detailed proof of this result was given using order statistics in Appendix B of Jasa and Xiang [3].

NS was shown by Jasa and Xiang [3] to be a numerical implementation of Lebesgue integration, where Equation (11) represents the sum of weighted integrands of simple functions that are generated by partitioning the range rather than the domain of the function. An early account of this connection of the NS algorithm to Lebesgue integration can also be found in Jasa and Xiang [2], published in the Proceedings of the 25th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2005).

4. Sound Source Detection, Enumeration, and Localization

The above described calculation is implemented for

P = 500

[in Equation (13)], the initial population with uniformly distributed prior ranges of all pending parameters, including sound source strength

A_{s}

, the elevation

θ_{S}

, and azimuth angles

ϕ_{S}

. For potentially simultaneous sound sources up to four (

S \leq 4)

, the parameter space is of dimensions up to

3 \times S

. The NS is applied to estimating Bayes factors for model order from 1 to 4 via evidence. It is used to rank a potential model accounting for an unknown number of simultaneous sound sources, in a so-called sound source enumeration process. This process has also been described previously in Xiang and Landschoot [6], followed by a DoA analysis based on the selected model

M_{S}

that is carried out by Bayesian parameter estimation. When examining this effort critically, the authors recognize that the source enumeration, even if representing a higher level of inference via Bayesian model selection, would still be incomplete if the machine sensory modality, such as in this application of a spherical microphone array, is not notified that the absence of sound sources often represents predominant portions of the sound environment in practical scenarios. It will only make sense to pursue sound source enumeration and DoA analysis if any sound source is ever detected.

The sound source detection is carried out in the scope of this current work using Bayesian model comparison. The prediction models

M_{D} = {M_{0}, M_{1}}

solely involve two models,

M_{0}

in Equation (5) and

M_{1}

in Equation (4) for

S = 1

. Note that Equation (5) is separately described because the

g^{2} (Θ_{0}, Θ)

needs special attention, in which the spherical order is set to

N = 0

, while, for

M_{S}

for

S \geq 1

, the spherical order

N = 4

due to the 32-channel spherical microphone array used for this work.

For the source detection, Bayesian evidence is estimated using the NS based on the ‘no-source’ model

M_{0}

against the ‘one-source’ model

M_{1}

. Specifically, for

M_{0}

-based sampling, there is still one pending parameter

A_{0}

to sample. Figure 4 (left) shows an experimental investigation when the microphone array data contain no sources but noisy background signals. The evidence estimation using the NS demonstrates insignificant differences to that of

M_{1}

, indicating that the source detection is negative. Figure 4 (right) shows that if the microphone array data contain sound sources, yet an unknown number, the evidence estimation clearly shows significant differences in comparison with those of ’no-sources’. The source detection is positive.

Upon a positive detection of sound sources, a further process involves Bayesian model selection. A set of sound source models from

M_{0}

to

M_{4}

is involved for estimating Bayes factors:

B_{i, i - 1} = 10 lg (\frac{Z_{i}}{Z_{i - 1}}), [decibans]

(14)

for

i = 1, 2, \dots 4

. Figure 5 illustrates one set of Bayes factor estimations. In this work, the evidence and Bayes factors are calculated in units [decibans] denoting 10 times logarithm base 10

[10 lg]

in honor of Thomas Bayes [8]. In this case, the source enumeration using Bayesian model selection suggests that two sources are contained in the data. At this stage, the interest in specific DoA parameters is pushed into background. Upon the selection of a two-source model using the NS, the exploration samples during the iterative NS process also provide posterior samples for the model

M_{2}

as a byproduct; they are readily available once the evidence

Z_{2}

for two sound sources is sufficiently explored. The posterior samples provide parameter estimates of two sound sources in terms of source strength

A_{2}

, and angular parameters

Θ_{2}

. The data processed using Equation (1) and the model prediction according to the posterior estimates are compared in Figure 3.

5. Concluding Remarks

From its introduction into Bayesian calculations, Skilling’s nested sampling [1] had an immediate impact on room-acoustic research [2], where an early account of the Lebesgue integral view on the nested sampling was first exposed in the MaxEnt Community in 2005. A thorough handling of its mathematical foundation of the Lebesgue integral was given at a later point [3]; ‘Interpreting nested sampling as a statistical approximation of a Lebesgue integral opens the possibility of a large body of existing research to be applied in the analysis and possible extension of the algorithm’. Over the past 15 years, a stream of applications using NS in acoustics science and engineering has emerged. Among others, this paper reports on an acoustic application of nested sampling using a spherical microphone array within the Bayesian framework.

Author Contributions

Conceptualization, N.X.; methodology, N.X., T.J.; software, N.X.; validation, N.X.; formal analysis, N.X., T.J.; investigation, N.X.; resources, N.X.; data curation, N.X.; writing—original draft preparation, N.X.; writing—review and editing, T.J.; visualization, N.X.; supervision, N.X.; project administration, N.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors are grateful to John Skilling, Paul Goggans, and Kevin Knuth for their stimulating discussions. Stephen Weikel, Christopher Landschoot, and Thomas Metzger have contributed to parts of this work in scope of their MS degree thesis projects.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DoA	Direction of arrival
NS	Nested sampling
MaxEnt	Principle of maximum entropy

References

Skilling, J. Nested sampling. In Proceedings of the Bayesian Inference and Maximum Entropy Methods in Science and Engineering, AIP Conference Proceedings, Garching, Germany, 25–30 July 2004; Volume 735, pp. 395–405. [Google Scholar]
Jasa, T.; Xiang, N. Using nested sampling in the analysis of multi-rate sound energy decay in acoustically coupled rooms. AIP Conf. Proc. 2005, 803, 189–196. [Google Scholar]
Jasa, T.; Xiang, N. Nested sampling applied in Bayesian room-acoustics decay analysis. J. Acoust. Soc. Am. 2012, 132, 3251–3262. [Google Scholar] [CrossRef] [PubMed]
Botts, J.M.; Escolano, J.; Xiang, N. Design of IIR Filters With Bayesian Model Selection and Parameter Estimation. IEEE Trans. ASLP 2013, 21, 669–674. [Google Scholar] [CrossRef]
Fackler, C.J.; Xiang, N.; Horoshenkov, K.V. Bayesian acoustic analysis of multilayer porous media. J. Acoust. Soc. Am. 2018, 144, 3582–3592. [Google Scholar] [CrossRef] [PubMed]
Xiang, N.; Landschoot, C. Bayesian Inference for Acoustic Direction of Arrival Analysis Using Spherical Harmonics. J. Entropy 2019, 21, 579. [Google Scholar] [CrossRef] [PubMed]
Skilling, J. Nested sampling for general Bayesian computation. Bayesian Anal. 2006, 1, 833–859. [Google Scholar] [CrossRef]
Jeffreys, H. Theory of Probability; Oxford University Press: Oxford, NY, USA, 1961; Reprinted by Clarendon Press, Oxford, NY, USA, 2003. [Google Scholar]

Figure 1. Spherical microphone array of radius

a = 3.5

cm. Altogether, 32 microphones are nearly uniformly flush-mounted over the rigid spherical surface.

Figure 1. Spherical microphone array of radius

a = 3.5

cm. Altogether, 32 microphones are nearly uniformly flush-mounted over the rigid spherical surface.

Figure 2. Beamforming superposition of two sound sources using a spherical order

N = 4

.

Figure 2. Beamforming superposition of two sound sources using a spherical order

N = 4

.

Figure 3. Comparison between the experimental data (a) processed according to Equation (1) with the prediction model (b) in Equation (4) of two sound sources using a spherical order

N = 4

.

Figure 3. Comparison between the experimental data (a) processed according to Equation (1) with the prediction model (b) in Equation (4) of two sound sources using a spherical order

N = 4

.

Figure 4. Sound source detection based on Bayesian model comparison. Bayesian evidence is estimated using both ’no-source’ model

M_{0}

and one-source model

M_{1}

. The evidence is expressed in unit [decibans] in honor of Thomas Bayes [8].

Figure 4. Sound source detection based on Bayesian model comparison. Bayesian evidence is estimated using both ’no-source’ model

M_{0}

and one-source model

M_{1}

. The evidence is expressed in unit [decibans] in honor of Thomas Bayes [8].

Figure 5. The sound source enumeration based on Bayes factor estimation. The Bayes factors are expressed in unit [decibans] in honor of Thomas Bayes [8]. A two-source model is preferred by the Bayesian model selection. The evidence estimated using nested sampling also provides the posterior as a byproduct.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiang, N.; Jasa, T. Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array. Phys. Sci. Forum 2023, 9, 26. https://doi.org/10.3390/psf2023009026

AMA Style

Xiang N, Jasa T. Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array. Physical Sciences Forum. 2023; 9(1):26. https://doi.org/10.3390/psf2023009026

Chicago/Turabian Style

Xiang, Ning, and Tomislav Jasa. 2023. "Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array" Physical Sciences Forum 9, no. 1: 26. https://doi.org/10.3390/psf2023009026

Article Menu

Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array^†

Abstract

1. Introduction

2. Spherical Microphone Data and Models

2.1. Microphone Array Data

2.2. Prediction Models

3. Bayesian Calculations

4. Sound Source Detection, Enumeration, and Localization

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array †

Abstract

1. Introduction

2. Spherical Microphone Data and Models

2.1. Microphone Array Data

2.2. Prediction Models

3. Bayesian Calculations

4. Sound Source Detection, Enumeration, and Localization

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Nested Sampling for Detection and Localization of Sound Sources Using a Spherical Microphone Array^†