1. Introduction
Virtual sound synthesis has already been applied in various applications [
1] (Bianchini and Cipriani 1998). Besides virtual reality simulations [
2], 3D play back systems and computer game development, the use of virtual acoustic scenario simulations has evolved to a common practice tool for research in musical and room acoustics [
3,
4,
5]. Recently, auralization [
6] has found its place also in hearing research [
7,
8,
9,
10], and universal design [
11]. In view of applications of virtual acoustics for soundscape assessment in the framework of urban planning [
12,
13,
14,
15], one question is how realistic virtual acoustics succeeds to mimic the soundscape in an urban public place, and how adequate appreciations of the sonic environment can be made on the basis of laboratory listening tests, in comparison with surveys.
Most of the studies about soundscapes deal with both objective and subjective aspects. Objective assessments are typically performed by means of noise level measurements, or by more quantitative parameters [
16,
17]. Subjective aspects have been investigated through surveys
in situ or by listening tests based on monaural or binaural recordings [
18,
19,
20].
During the last decade, different assessment methods have been proposed for soundscape research, such as description by semantic differential [
21,
22], numerical and multidimensional descriptors [
17,
23], automatic classification method [
22,
24,
25]. Recently, Thorne and Shepherd [
26] have prepared a proposal for legislation based on the concept of “quietness” as an “environmental value” in terms of amenity and wellbeing.
Adams
et al., Pheasant
et al. and Yu and Kanga [
27,
28,
29] stated that the subjective interpretation of a soundscape can depend on the given location, its visual appearance, on the type of activities going on, and on the observer’s personal preference and expectations. Emotional dimensions of soundscapes have been investigated by Cain
et al. [
30], cultural aspects by Farina [
31] and effects of social, demographical and behavioral factors on the sound level evaluation in urban open spaces by Yu and Kanga [
29].
In subjective assessment, a person is acting as a measuring apparatus, whose judgment cannot be so easily calibrated as a microphone or artificial head Miller [
32] has proposed to use the human perception to link soundscape improvement with traditional noise control methods, in an approach that gives priority to sounds heard with undesirable/desirable judgments. In this way subjects identify different sources, which can then be assessed by classical noise control methods.
Dökmeci and Kang [
16] concluded that loudness is a more adequate parameter for indoor soundscape assessment than sound pressure levels and A-weighting.
An interesting human-mimicking computational model was proposed by Oldoni
et al. [
33]. Their method combines a self-organized map of acoustical features with a functional model of auditory attention, giving to the soundscape designer a quick overview of the typical sounds at an investigated location, and allows assessment of the perceptual effects of introducing additional sounds.
Although soundscape assessment is based on an association process that originates in a genetic code or individual cultural background [
31], urban planners and decision makers often prefer to work with international descriptors and single number ratings.
A priori to an adequate soundscape design in a context of urban planning, it is important to assess the reliability of predicted acoustic parameters.
In contrast with room acoustics, where the main aim is to understand the impact of the volume, shape and acoustic properties of surrounding structures (such as their absorption and scattering) on the room acoustic parameters (reverberation time, clarity, sound pressure level distribution, speech intelligibility, etc.), in an acoustic assessment of an urban public space the focus lies on determining the impact of the distribution of sound sources, and on their spectral and temporal characteristics, duration and temporal or permanent presence. Influences of surrounding buildings are also relevant, but rather in relation to their function and to the activities that might occur in the buildings or in their proximity.
The main goal of this article is to find out how precisely it is possible to predict statistical noise levels such as L5 and L95, which not only depend on the overall sound power level of the sound sources and their distance from the receiver, but even more on the temporal features of the stimuli. In the present study statistical values of noise levels are extracted from auralized acoustic scenarios on an urban public square, and people’s perception of these levels are analyzed in different settings.
The feasibility of synthesizing a virtual urban soundscape, based on information about its functionality and the activity occurring in it, is assessed.
In the second part of this paper, laboratory listening tests are performed that investigate the disturbance of two type of traffic noise (stationary traffic noise and traffic noise caused by a clear “sound event”), the influence of the activity of listener to overall disturbance/pleasantness, and the sound pressure level of the noise. The results are analyzed by (Analysis of variance) ANOVA-repeated measures.
2. Methods
2.1. Measurement and Recording Methods
Two kinds of
in situ measurements were performed: (1) measurements based on binaural recordings and (2) standard noise level measurements with a certified class “A” sound pressure level meter (Bruel and Kjaer 2236). The recordings were acquired on five randomly chosen positions in the square. Since the recordings were very similar, it was sufficient to choose one (
Figure 3b—position 1) for comparison with the simulations. More details about the measurement length and the time of recording are given later in this article.
The binaural recordings were performed by in-ear microphones and an M-Audio® solid state recorder (Cetacean Research Technology, Washington, DC, USA ) with sampling frequency 44,100 Hz and a dynamic range of 24 bits. The recording system was calibrated in an acoustic laboratory in order to calculate correct absolute sound pressure levels from all recorded samples. The calibration was performed in the diffuse field (reverberant room) at 10 different positions of microphones and one position of omnidirectional point sound source (BK 4295) at three different sound pressure levels 60, 70 and 80 dB. The calibration file used for the analysis of sound pressure levels in 01dB Software was a pink noise recording of 80 dB. The calibration was later double-checked in free field (anechoic room) conditions by placing in-ear microphones on a stand at known distances 4 and 8 m from the same sound source (BK 4295 OmniSource, Bruel & Kjaer, Nærum, Denmark).
The sound analysis of recordings was performed a posteriori, partially in 01dB®Sonic software, and by homemade Matlab® routines.
The standard noise level measurements were made simultaneously with the binaural recording using a Bruel and Kjaer 2236 Sound Level Meter with “fast” integration time. These measurements served for double checking the absolute sound levels measured by binaural microphones.
2.2. Simulation Method
The acoustical simulations in this study were performed by ODEON® prediction software, which uses a hybrid calculation algorithm in which the simulation of the Impulse Response (IR) of a given environment is performed in two steps. The early part of the IR is based on early reflections, which are calculated by combining an Image Source Method (ISM) and Early Scattered Rays (ESR). The late part of the IR, i.e., the part containing late reflections, is calculated by using a Ray Tracing Method (RTM) that includes an advanced scattering algorithm. The length of the first part (of the IR) can be chosen by the software user via the so-called Transition Order (TO). This is the maximum number of image sources taken into account per initial ray. For TO = 0, the simulation is performed with only ray-tracing, which is a very robust calculation method for predicting of acoustic parameters, but which is typically not optimum for binaural auralization of sound. The disadvantage of a large TO, corresponding with a dominant use of the image source method, is that, due to the calculation time increasing exponentially with TO, the simulations become very slowly.
In order to obtain a spatial impression from the simulated space, and to allow listeners to localize virtual sounds, a Binaural Room Impulse Response (BRIR) is needed. In Odeon software, the BRIR at the receiver point is obtained by filtering the calculated room impulse response with the Head-Related Transfer Function (HRTF) for respectively the left and right ear. For this type of approach, the image source method, which is based on calculations in a point, is more convenient than the ray tracing method, which is based on statistics of passing rays in a certain region. In view of this, a moderate TO, with binaural ISM for the early part of the impulse response, and RTM calculations for calculating the less critical later part in a reasonable time, is optimum.
In the two experiments performed in this work we were interested in the accurate prediction of the sound pressure level and statistical noise levels (Experiment 1), and in a high quality auralization of the given soundscape (Experiment 2) respectively. In order to assess the influence of the simulation algorithm on the calculated values and on the auralization quality, simulations with different transition orders, TO = 0, 1 and 2, were performed (further in the text referred to as TO0, TO1 and TO2).
In order to obtain auralized sound for the site of interest the generated BRIRs were convolved with anechoic recordings, recorded in an anechoic room.
2.3. Methods for Noise Analysis
In the analysis, statistical noise level values were used for the objective assessment of noise. In order to be able to describe the most important features of sound level fluctuations, the statistical parameters L5 and L95 were calculated and analyzed. Lx expresses as the value of sound pressure level that is exceeded during x% of the measuring time.
Another parameter used in this study is the equivalent noise level
LA,eq, which is one of the most frequently used descriptors of environmental noise.
LA,eq,T expresses the level of continuous steady sound within a time interval
T, which has the same effective (rms) sound pressure as the measured sound, defined as:
where
pA is the instantaneous A-weighted rms sound pressure at time
t,
p0 = 20 μPa is the reference sound pressure level,
T = t2 − t1 is the measuring period.
3. Experiments
3.1. Description of the Case Study
The main square of the city of Leuven (“Grote Markt”) was chosen as the site of evaluation (
Figure 1). The square is surrounded by buildings such as the town hall, St. Pieter’s church, several restaurants and apartment buildings and has a rich history. Due to many different kinds of sound sources and diverse social activities present in this square on different days and seasons in the year, the soundscapes occurring on this site are quite interesting. The overall most typical sounds occurring on the site are definitely human voices, human steps, bicycles, church bells and busses passing by 10 times per hour during working days. During the past years several changes were made in this square, mainly related to a reduction of its accessibility by cars for reasons of functionality, noise and safety. Nowadays, the square is considered as a pedestrian zone where only city buses are allowed to enter.
Figure 1.
Grote Markt in Leuven, Belgium, view on the part of the square with restaurants with the position of the recording microphone indicated.
Figure 1.
Grote Markt in Leuven, Belgium, view on the part of the square with restaurants with the position of the recording microphone indicated.
3.2. Binaural Recordings and Measurements in Situ
Two kinds of recordings were acquired.
(1) A first set of calibrated binaural recordings was acquired
in situ by using in-ear microphones (MS-TFB-2 Sound Professionals In-Ear Binaural microphones) and a solid state recorder, on a warm summer evening in the middle of the square surrounded by restaurants full of people. Measurements were done during 15 min on 5 different positions (randomly chosen between two restaurants, about 3–5 m from each other). Since there was not a large difference found between the positions, only one position was taken for comparison with simulations (
Figure 1 and
Figure 3—position 1). The recordings were performed in a period of the day when no buses were passing in the square, and analyzed in the laboratory in terms of their statistical noise levels and
Leq values.
(2) The second set of recordings
in situ was not meant for estimation of statistical noise levels, but for the sake of collecting sounds present in the square, which would be very difficult to simulate (due to the Doppler effect on sounds from moving vehicles,
etc.), but necessary for later sound synthesis used in the listening test experiments (Experiment 2 of this article). These sounds, e.g., a passing bus and distant stationary traffic noise, were recorded as much as possible individually, during wintertime, when no vocal sounds or birds were present on the square. The frequency spectra of the two mentioned sounds are shown in the
Figure 2.
Figure 2.
One third of an octave spectrum of distant traffic noise (as measured for left and right ear channel) and a passing bus (as measured for left and right ear channel).
Figure 2.
One third of an octave spectrum of distant traffic noise (as measured for left and right ear channel) and a passing bus (as measured for left and right ear channel).
3.3. Recordings in Situ and in the Laboratory
A third set of recordings, of sounds such as different human voices, human steps, various restaurant sounds, e.g., cutlery, glasses, chair movements, etc., necessary for final convolution with simulated BRIRs, were acquired in an anechoic room.
3.4. Acoustic Simulations
A 3D computer model of Grote Markt, Leuven was developed, based on dimensions of the square that were measured
in situ by using a laser distance meter and verified by a detailed city plan of the center of Leuven. A simplified spatial model of the square was constructed for the purpose of simulation in Odeon9.2
® software (
Figure 3). Grote Markt has an irregular shape but roughly its dimensions can be estimated to 120 m × 32 m. For the sake of making realistic acoustical simulations, parts of the streets that terminate on this square were included in the model, resulting in a total calculation domain of about 240 m × 140 m surface (
Figure 3). The sound absorption and scattering coefficients of the surrounding buildings and ground surfaces were estimated based on a visual check
in situ.
Figure 3.
Geometric 3D model of Grote Markt including surrounding buildings (a) and the ground plan of Grote Markt with an indication of the outdoor restaurant zones A and B, and of the two listening positions: 1. in the middle of the square; 2. at the table in the restaurant. The dashed line indicates the trajectory of the buses (b).
Figure 3.
Geometric 3D model of Grote Markt including surrounding buildings (a) and the ground plan of Grote Markt with an indication of the outdoor restaurant zones A and B, and of the two listening positions: 1. in the middle of the square; 2. at the table in the restaurant. The dashed line indicates the trajectory of the buses (b).
The acoustic model of the square was closed in a box with boundaries defined as surfaces with a sound absorption coefficient α = 100%, expressing an open-air situation. The BRIRs of the 3D model were obtained from a simulation of a multisource environment with 102 sound sources. Each of the 102 BRIRs was convolved with an appropriate anechoic sample, among which a speaking person, walking people, various restaurant sounds, such as sounds or the cutlery or glass, etc.
These sound sources were regularly distributed into two virtual outdoor restaurant area, in particular Zone A and Zone B (
Figure 3b). 58 speaking people were simulated in zone A and 44 in zone B. The auralized samples were mixed to final audio samples (wave files) expressing a summer evening soundscape typical for Grote Markt. The final simulated sound samples were 5 min long, and were analyzed in the same way as the recorded one,
i.e., by using the statistical noise analysis.
For the listening tests in the second experiment, shorter sound samples of about 15 s duration, containing the typical features of the simulated soundscape, were prepared.
3.5. Description of the Two Experiments Performed in This Study
In the first experiment, a comparison was made between the measured and predicted statistical noise levels L5, L95 and LAeq, which were determined for sound samples containing a typical soundscape on the square during evening hours in the summer holiday. Since more than 100 BRIRs needed to be calculated and convolved with anechoic sounds in every considered scenario (TO0, TO1, TO2 and the free field situation), the length of the analyzed samples for comparison with simulation was reduced from 15 min to 5 characteristic min, by cutting a part of the recorded sound out of the in situ recording. The statistical noise levels, the histogram and the spectrum of the selected 5 min fragment were almost identical to the full recording of 15 min.
The simulations for TO0, TO1 and TO2 and for a free field situation were compared with each other and with the measurements (
Table 1).
Table 1.
Values of LA,eq for different numbers of speaking people: comparison between simulations and measurements in situ.
Table 1.
Values of LA,eq for different numbers of speaking people: comparison between simulations and measurements in situ.
Comparison between simulations and measurements | Simulation with TO0 | Simulation with TO1 | Simulation with TO2 | Simulation of free field situation | Measurement |
---|
Number of talking people | 102 | 51 | 102 | 51 | 102 | 51 | 102 | 51 | >100 |
LA,eq [dB] | 56.8 | 53.8 | 56.4 | 53.6 | 56.3 | 51.7 | 53 | 51.4 | 60.3 |
Although the prediction of the soundscape in an urban public place is rather difficult, questions from urban planners and decision makers are often related to the prediction of the acoustical situation outdoors and to the proposals of noise reduction or pleasant soundscape creation.
The second, subjective testing experiment was complementary to the objective tests in the first experiment, and meant to verify: (i) if listening tests based on simulated and synthesized sound in the square can be adequately used to verify people’s qualification of elements of a soundscape (such as the sound level, the type of sound); (ii) to assess to what extent the activity of a listener is influencing his or her perception, and to investigate (iii) if synthesized soundscapes could possibly help urban public place developer to estimate the pleasantness of the soundscape.
The experiment was based on listening tests that use virtual sound, and investigates the subjective perception of the traffic noise level for two listening scenarios, based on the activity of the person. First, a virtual listener was located in the middle of the square walking between two virtual outdoor restaurants. His or her activity was defined as being waiting for friends (
Figure 3b—position 1). In the second simulation, the listener was supposed to sit on the terrace of one of the restaurants, close to the talking people (
Figure 3b—position 2). In both scenarios, the sound level of the restaurant sound,
i.e., talking people, were constant (at the level of 54 dB (A)). On the other hand, the noise from the traffic was mixed on different sound levels, in order to investigate its disturbing character.
The stimuli played to listening subjects via headphones were created by mixing auralized restaurant sound from the Odeon® simulation with 22 noise recordings of a different level. Half of them were based on stationary traffic noise recording. The other half contained also the sound of a bus passing by. The reason for the choice of two different traffic noise stimuli was to investigate the different character of noise on perception of an urban soundscape. The first stimulus, stationary traffic noise, is often perceived subconsciously on the background. The second stimulus, a passing bus, was recorded by binaural microphone preserving information about its location, and chosen as a consciously perceived sound.
The stimuli were played in random order to the subjects, each twice. The task of the listening subjects was to imagine him or herself in the sketched situation, and to indicate whether the traffic noise in the given acoustic scenario was (1) too silent; (2) pleasant; (3) acceptable; (4) noisy or (5) disturbing.
Listening tests were performed in the silent anechoic room by using a listening unit of Head Acoustics® with open headphones. The headphones were calibrated by means of an artificial ear device. The system was calibrated before each listening session.
12 normal hearing listening subjects having an age between 20 and 34 participated in the experiments. The response of each subject was analyzed by means of ANOVA-repeated measures statistics. The number of subjects was large enough for a within subject analysis, which was the main scope of this experiment. Conclusions about the relation between the impact of absolute sound levels on pleasantness/annoyance of the soundscape, which would require a much larger sample of test persons and comparison compared with in situ surveys, were not attempted.
5. Conclusions
In this study it was verified to what extent statistical values of noise can be used in urban soundscape prediction, for a particular city square scenario. While predicted values of equivalent noise levels were adequate, the prediction of the statistical quantifiers L5 as well as L95 was less accurate. Apparently, due to effects of collective behavior, slow fluctuations of the speech noise level produced by a real crowd are larger than the ones resulting from rapid random variations of simulated individual talking individuals in the crowd.
A challenge when simulating a crowd people speaking is to estimate the absolute sound levels of voices. There are multiple factors that might influence the vocal output. Besides the distance between talker and listener, which is typically determined by size of the tables in a restaurant, also the Lombard effect can play a role, even in the relatively anechoic conditions of urban public space.
With respect to the perception of traffic noise by a person who is surrounded by people in the square, we found that there is no difference in disturbance by noise depending on the position in the square and the listener’s virtual activity, when the perception test is conducted in an acoustical laboratory. This is in contrast with a number of soundscape studies, which claim that the listener’s activity and visual setting is a key factor in soundscape perception (Viollon
et al., 1998) [
35]. Apparently, in laboratory tests it is difficult for a listener to “live” the virtual activity so that these kinds of tests can deliver a stronger result when performed “
in situ”.
A statistically significant difference in soundscape qualification was found between two different noise stimuli. People perceive stationary noise as “less disturbing” than stimuli containing “noise event” which is also possible to localize. In this experiment a 6 dB difference in noise signal was necessary to obtain statistical significance in assessment of a subjective annoyance.
Finally, situations in which the level of the traffic noise was not stronger than the one of human voices, i.e., 50–55 dB, were considered as pleasant by most of the people, while noise values till 66–68 dB were found acceptable. However these values are only indicative, as more test persons would be necessary for enhances statistical significances.