1. Introduction
Microphone arrays are more and more employed for several purposes, such as electronic devices (e.g., dereverberation, speech enhancement and recognition), industries (e.g., machine condition monitoring), and even civil applications (e.g., structural health monitoring). Among these applications, the automotive industry particularly benefited from this technology, which is now used in a variety of problems. In [
1], a uniform circular microphone array featuring seven capsules (of which one in the center) was employed to localize the speaker in autonomous shared vehicles. In [
2,
3], a spherical array with 32 capsules was used to auralize the car sound system of a large sedan. The same array was also employed in [
4] to spatially evaluate the sound quality inside a car cabin, while in [
5] the authors did a similar work with the Bruel and Kjaer WA-1565, a rigid spherical microphone array of 195 mm diameter, equipped with 36 capsules and 12 cameras. In [
6], a uniform planar Micro electro-mechanical system (MEMS) array featuring 150 capsules was used to detect pedestrians in an autonomous emergency braking system. Finally, K. Vasudev et al. attempted to enhance hands-free communication by integrating a microphone array into the seat belt [
7]. The usage of microphone arrays for hands-free communication inside cars is widely studied and offers a large volume of the literature, among which three relevant references worth to be cited are [
8,
9,
10].
Active Noise Control (ANC) is another field where microphone arrays find relevant practical applications in the automotive industry. In [
11], linear and circular arrays with 4, 8, 16, or 32 capsules were employed to enhance noise cancelling inside a car cabin by means of the virtual sensing technique. A comprehensive review of virtual sensing algorithms for ANC can be found in [
12]. In [
13], a head-shaped array with 32 capsules and 8 cameras was used to spatially assess the effectiveness of an ANC system installed in a segment-D car. In [
14], two linear arrays of 6 microphones were each used to enhance the performance of an ANC system by means of the noise source separation technique. From the previous examples, it is evident that geometry plays a crucial role when designing a microphone array. In fact, it determines the portion of space where the system exhibits optimal accuracy, such as a line, a plane, a half-space, or the full sphere. As a result, microphone arrays with a variety of shapes were observed. In [
15], S. J. Patel et al. discussed the optimal design for linear arrays, while in [
16] X. Wang et al. did the same for planar arrays and in [
17] J. Trevino et al. for the cylindrical ones. Nevertheless, most of the microphone arrays are spherical, namely the spherical microphone array (SMA), constituted by a rigid shell; however, in some applications, open frame SMA are also employed. An in-depth analysis of the design of open frame SMA is discussed in [
18,
19], while four of the most relevant papers discussing design, analysis, and processing of rigid SMA are [
20,
21,
22,
23].
The dimension of the array is another key factor for the design, since the larger the array, the lower the minimum frequency where beamforming is effective [
24]. Therefore, large arrays, either real or virtual [
25], are required to localize noise sources at low frequency. This is particularly important when dealing with ANC systems for cars, which are mostly effective at very low frequencies, e.g., 50 Hz–400 Hz for road noise as referred in [
26] and up to 1 kHz for tonal applications such as engine noise order cancelling, as referred in [
27].
Another fundamental feature for microphone array design is the number of capsules, as the closer the microphones, the higher the maximum beamforming frequency, as demonstrated in [
28] by B. Rafaely. Consequently, a trend toward the continuous appearance on the market of systems with an ever-increasing number of capsules was observed. The first microphone array was launched in 1978 by K. Farrar with the name “Soundfield” [
29,
30]. It featured four capsules positioned in the vertices of a tetrahedron; several other microphone arrays featuring four capsules have come to the market in the following decades, the last of them being the Saramonic Sr-VR in 2024. In 2009, Eingemike32 (also employed in this work) was launched by G. Elko [
31], i.e., a SMA featuring 32 capsules; in 2017, the Octomic by Core Sound, featuring 8 capsules and the Zylia, another SMA featuring 19 MEMS capsules, were launched; and finally in 2023, the new Eingemike64, featuring 64 capsules was launched. In 2015, a prototype featuring 252 capsules was presented by Sakamoto et al. [
32], while in 2021, Carsten et al. showed a portable array with 512 MEMS microphones [
33].
The layout of the capsules over the surface of the array is at least as important. A heuristic approach was used for planar arrays in [
34], while several mathematical approaches were developed for SMA, with the aim of optimizing the spatial sampling over the sphere, as well treated in [
35]. In the case of equiangular sampling, the number
N of required microphones is given by (1), while in the case of nearly uniform sampling, it is given by (2)
where
O is the Ambisonics order. The Ambisonics theory, consisting of a Spherical Harmonics (SHs) expansion of the sound field at the recording point, was first presented by M. Gerzon in 1975 [
36,
37] and nowadays is widely employed for delivering spatial audio. More on the SH can be found in [
38], while an exhaustive discussion of SH decomposition with SMA can be found in [
39]. In the case of nearly uniform sampling, a mathematical approach called Spherical Design is commonly employed to optimize the distribution of the capsules over the surface of a rigid SMA. A spherical design is a finite set of
N points on the d-dimensional unit sphere
Sd such that the average value of any polynomial
f of degree
t or less equals the average value of
f on the whole sphere [
40,
41,
42]. Such a set is often also called the Spherical T-design. In [
43,
44], many spherical designs and their properties are presented. Other interesting distributions of points are based on regular polyhedrons, such as tetrahedron, dodecahedron, and icosahedron, with
N = 4, 12, and 20, respectively. Eigenmike32, widely recognized as the reference system of the last decade and employed as a comparative target also in the presented work, features 32 capsules arranged in a truncated icosahedron over a rigid sphere of 84 mm diameter.
When selecting the type of capsule, the choice is between analog and digital transducers. Analog transducers can provide high-quality audio signals, but they come with some drawbacks, such as cumbersome wiring and susceptibility to noise interference, especially when long cables are used to connect the capsules to the Analog-to-Digital (A/D) converters. Affordable digital MEMS microphones are less affected by electrical interference, though this often comes at the expense of lower acoustic performance, such as reduced dynamic range and signal-to-noise ratio. A review of MEMS microphones can be found in [
45].
In this work, a wearable microphone array is presented for the development and performance assessment of ANC systems installed on vehicles. The array solves a common limitation encountered in measuring automotive ANC systems with microphone arrays: the presence of the driver in the front left seat during road tests. The array was built by using a normal rigid helmet as a frame. The dimension of the helmet, which is larger than the microphone arrays currently available on the market, allows for shifting the frequency range of beamforming toward low frequencies, where ANC systems are mostly effective. With the shape of the helmet being almost spherical, the nearly uniform sampling theory was applied to optimize the distribution of the capsules over the surface of the array. A Spherical Design of degree
t = 7 with
N = 32 points was employed. The proposed solution features electret capsules connected to a miniaturized A/D converter incorporated in the helmet. The A/D converter also integrates an A
2B transceiver (see
Section 2.3), allowing it to deliver digital signals over an Ethernet cable, thus ensuring immunity to electromagnetic disturbances, which are always present in a car cabin.
The proposed solution is flexible and can be used in many scenarios, such as internal combustion engines or electric vehicles, different road conditions (smooth, rough, very rough asphalt, with/without bumps are common tests performed by car makers), and speed. The only limitation is encountered with cabriolet and spider cars due to the aerodynamic noise and the unavailability of a windshield for the array helmet. If equipped with internal loudspeakers, the array helmet would find another interesting application in motor sport for the communication between the pilot and the co-pilot in rally competitions. Thanks to the beamforming capability, the array helmet can focus on the speaker voice, rejecting the noise and enhancing the intelligibility of the instructions provided by the co-pilot. This application is known as Audio Augmented Reality (AAR) or augmented listening. Despite no previous works being found in motor sports applications, in [
46], several wearable microphone arrays, both rigid and flexible, were built and tested for AAR.
The paper is arranged as follows:
Section 2 describes the development of the helmet (
Section 2.1), the acoustic characterization (
Section 2.2), and the architecture of the electronics (
Section 2.3);
Section 3 illustrates the main findings obtained within this work; in
Section 4 the results are discussed; and finally,
Section 5 summarizes the conclusions, the limitations, and possible future developments.
3. Results
At first, the beamforming capability of the proposed array helmet was compared to the Eigenmike32. The comparison was made by using a well-established metric for microphone array assessment [
60], which consists of evaluating the deviation of the directivity
A′ with respect to the ideal directivity defined by
A. This can be performed by relying on two parameters, the Spatial Correlation (SC), which sets the limit of the beamforming at high frequency and is the defined in (8), and the Level Difference (LD), which sets the limit of the beamforming at low frequency and is defined in (9) as follows:
The parameters are calculated for each frequency, each direction, and each virtual microphone; then, the
D directions are summed, while the virtual microphones are averaged among those belonging to the same Ambisonics order. Hence, from 1 to 4 in the first order, from 5 to 9 in the second order, and from 10 to 16 in the third order. In the ideal case of perfect reconstruction of the SH, it will be SC = 1 and LD = 0 dB. However, a certain amount of deviation is always present in the real case; hence, the upper and lower frequency limits of the beamforming are usually defined by considering two thresholds: SC > 0.9 and LD > −1 dB. The results are shown in
Figure 7 for Eigenmike32 and in
Figure 8 for the array helmet.
Then, a localization test was performed in the acoustic laboratory at the University of Parma, Parma (Italy), using the sound color mapping technique [
61,
62]. The array helmet was mounted on a microphone stand, in front of the Genelec studio monitor 8351a playing pink noise. A 30 s signal was recorded and converted into Ambisonics third order. Then, the array helmet was replaced with a dual lenses camera to take a 360° picture of the environment, which is used as the background of the sound color maps. The analysis was performed in the octave bands centered at 31.5 Hz, 63 Hz, 125 Hz, 250 Hz, 500 Hz, and 1 kHz. The quantity being mapped is the Sound Pressure Level (SPL), calculated with a resolution of 1 degree. The pseudo-color maps are generated through graphical interpolation, with a color scale that goes from blue (lowest SPL value) to red (highest SPL value). The results are shown in
Figure 9.
4. Discussion
The SC and LD charts of
Figure 7 and
Figure 8 can be used to precisely define the frequency limits of beamforming at each Ambisonics order. This is obtained by applying the previously stated thresholds, SC > 0.9 and LD > −1 dB. Each of them intersects the chart for each Ambisonics order in two points, thus identifying a low and a high acceptable frequency. These results are shown in
Table 3, where it can be seen that the LD is always more restrictive at low frequency, while the SC is at high frequency. Therefore, the frequency limits provided by the two metrics must be combined, taking the most restrictive condition at low and at high frequency for each Ambisonics order (
Table 4).
The maximum beamforming frequency for the array helmet is 1 kHz at all orders. Despite being lower than the Eigenmike32 (10 kHz at 1st order, 9 kHz at 2nd and 8 kHz at 3rd), it is still acceptable for ANC applications. As referred to in [
28] by R. Boaz, the maximum theoretical beamforming frequency,
, can be calculated as
where
c = 343 m/s is the celerity of the sound wave and
is the minimum distance between the capsules of the array. The results shown in
Table 5 are obtained by replacing the value of
in (10) for the Eigenmike32 and for the array helmet.
The main advantage offered by the proposed solution consists of a significant shift in the beamforming capability toward low frequencies, as follows:
First order Ambisonics at 20 Hz (array helmet) instead of 45 Hz (Eingemike32);
Second order Ambisonics at 40 Hz (array helmet) instead of 170 Hz (Eingemike32);
Third order Ambisonics at 220 Hz (array helmet) instead of 700 Hz (Eingemike32).
Usually, the comparison between microphone arrays is also performed in terms of octave bands. As can be seen in
Table 6, the array helmet allowed gaining one Ambisonics order at the octave bands from one to five. The two solutions have the same performance at the sixth octave band (centered at 1 kHz), while only the Eigenmike32 operates at the octave bands centered at 2, 4, and 8 kHz. A particularly useful improvement for ANC applications is obtained at the first octave band (centered at 31.5 Hz), at which the Eigenmike32 does not provide any beamforming, while the array helmet offers a first order localization.
The second laboratory experiment, related to the localization of a noise source through sound color mapping, is now analyzed. The acoustic center of noise source was positioned coincident with the optical center of the panoramic camera. Therefore, in polar coordinates (azimuth,
a, and elevation,
e) the acoustic center is in
.
Table 7 shows the estimated position of the direction of arrival (DoA) of the direct sound, in polar coordinates. The DoA of the direct sound was estimated in correspondence of the maximum SPL, which is the center of the red spots in the color maps of
Figure 9.
As can be seen, an average absolute error of 1.2° is made along the azimuth and 1.8° along the elevation. The array helmet correctly localized the sound source (Genelec studio monitor) even at the first octave band, where only the first order Ambisonics is available. At 125 Hz, when the second order Ambisonics becomes available, some reflections can also be seen, e.g., the door (135°; 0°) and the desk (−100°; −30°). Above 250 Hz, the third order Ambisonics provides its own contribution too, and the spot on the noise source becomes narrower as the frequency increases.
5. Conclusions
A wearable helmet microphone array featuring 32 electret capsules has been developed, built, and characterized. It integrates a miniaturized 32-channel A/D converter, which avoids the bulky wiring that would occur with an analog solution and ensures a high S/N ratio thanks to the usage of the A2B digital bus. In addition, this solution allows only one Ethernet cable to come out of the helmet, maintaining good comfort and ease of use. The A2B signal is received by an external D/A box, which can deliver digital data to the PC via USB or convert the data back to the analog domain by means of two D/A converters, allowing us to record them with an external acquisition unit.
The array helmet was compared with the Eigenmike32, a spherical microphone array that widely considered the reference equipment for spatial audio recording during the last decade. By analyzing two metrics for Ambisonics performance evaluation, namely Spatial Correlation and Level Difference, it was possible to assess that the proposed system shifted toward low frequencies, with the Ambisonics orders by one or even two octaves, making it accurate even at the lowest octave band, centered at 31.5 Hz, where the Eigenmike32 is not effective at all. These results have been proved with a laboratory test consisting of a noise source localization problem, making use of the sound color mapping technique. The array helmet was demonstrated to correctly localize the noise source at all frequencies, with a trend of increasing accuracy with frequency, as expected. The highest valid frequency for beamforming has been reduced to 1 kHz, due to the large size of the helmet compared to the number of capsules. However, such a frequency is still above the maximum frequency at which Active Noise Control systems for cars are effective. In conclusion, the proposed solution can be employed for the assessment and the development of road noise cancelling or engine order cancelling systems at the driver seat in driving condition.
The presented array has two limitations. Most of the rolling noise from the wheels comes from below; therefore, the highest microphone density would be desired in the lower part of the helmet to increase the spatial resolution in the interested portion of space. However, this is not possible due to the presence of the neck. The second limitation consists in the movement of the driver’s head, which is naturally unstable when driving a car, in particular while cornering. Therefore, when the recording is played back, a mismatch between the orientation of the listener’s head and the orientation of the sound field may occur. However, this problem will be overcome in a future update of the helmet by installing a head-tracking system to record the quaternion of spatial rotation synchronously with the pressure signals. Such a solution will allow compensating the head’s movements in post-processing, by counter-rotating the sound field after Ambisonics conversion.