**3. Design of an Environmental Database**

In this section, the real-operation environmental audio database recorded and built in for the suburban area of the DYNAMAP project is detailed. First, the conducted recording campaign is described. Second, the subsequent generation of the audio database is detailed, which includes the labeling and the computation of the acoustic salience of the anomalous noise events (in terms of SNR) and their impact on the *LAeq*. These are computed in this study and used jointly with duration, number of occurrences and other database descriptors to analyze in detailed nature of the ANEs of the suburban area.

### *3.1. Description of the WASN-based Suburban Recording Campaign*

The main goal of any recording campaign is to collect representative samples of the acoustic environment under study through the WASN in real operation. Taking advantage of the experience gained from the preliminary recording and analysis of that acoustic environment [23], a second recording campaign was designed. The main reason for it was three-fold: *(i)* all the nodes of the WASN had been already deployed in their definitive operative location, which increased the number of recording points and changed slightly the sensor position in the portals, *(ii)* the sampling completeness, because the previous recording campaign did not include nighttime, weekend data, or different meteorological conditions, and *(iii)* the total amount of time recorded was quite short (4 h and 44 min), including only 12.2% of ANEs, which led us to the conclusion that ANEs were misrepresented in the dataset, after discarding the augmentation of the dataset by means of synthetic samples according to [23].

The WASN deployed in the suburban area of DYNAMAP project is located on the A90 highway surrounding Rome, and comprises 24 acoustic nodes, 5 of which are low-capacity sensors without enough computational resources to run the ANED algorithm. The locations of the 19 high-capacity sensors of the WASN in the Rome's suburban pilot area are shown in Figure 1. The set of basic specifications [48] that are defined to satisfy DYNAMAP requirements for each monitoring station are the following: *(i)* 40–100 dB(A) broadband linearity range, *(ii)* 35–115 dB working range with acceptable Total Harmonic Distortion (THD), and *(iii)* narrowband floor noise level. The project also requires the possibility of audio recording, as well as Virtual Private Network (VPN) connection and GPRS/3G/WiFi connection. The precision of the sensors is a key issue for the system reliability [49]. During the developing stage, all the elements that could increase the uncertainty of the measurement were taken into account following the requirements of IEC 61672 [50]. Several tests have been

conducted with both the hardware and the software, using a climate chamber with different operation temperatures. Electromagnetic Compatibility tests were also conducted as well as atmospheric agent simulations over the designed equipment [51].

**Figure 1.** Map with sensors' location information within the WASN of DYNAMAP project in the suburban area of Rome, those locations also sensed during the initial recording campaign described in [23] shown in red.

Some of the recording locations are the same as the ones used in the previous recording campaign [23] (see Figure 2). However, although conducted in the same portal both the sensor and the exact location measurement differ. The microphone location is slightly different with respect to the entire structure of the portal (see Figure 2a,b).

**Figure 2.** Location of the sensor used in the recording campaign in the portals. (**a**) Recording campaign deployment in [23]; (**b**) WASN deployment in this work (picture property of ANAS S.p.A.).

Achieving a complete and exhaustive dataset is a challenging task since the amount of available resources is limited, e.g., processing and storage capabilities, data collection using 3G modems, availability of all nodes in fully operative conditions, etc. In Figure 3, it can be observed that the *LAeq*

presents a diurnal variation that suggests sampling the recording differentiating day and night to obtain data from several patterns of traffic noise, and so of ANEs.

**Figure 3.** Daily curve of *LAeq* for sensor hb147 for the 2nd and the 5th of November 2017, and the recordings conducted. Diagram of the recording days and duration for each sensor, and scheme of the labeled files to build the dataset.

To this aim, one-day real-operation recordings were planned through all nodes of the suburban WASN considering different days of variating traffic conditions, and assuming a trade-off between completeness and available computation and data communication resources, which were limited by the resources available in each of the nodes of the WASN (storage capacity, throughput of data, etc.). The following data sampling approach was proposed: Thursday and Sunday were selected as representative weekday and weekend days, respectively, being the 2nd and the 5th of November 2017, specifically. From each sensor, 20 min have been recorded per hour which was limited by the storage capacity and communication resources of each of the nodes. Figure 3 shows the recordings over the values of *LAeq*, while Figure 3 shows a schematic diagram with the recording process during the selected weekday and weekend day. As a result, 16 h of acoustic data were collected from each sensor to cover the diversity of the acoustic environment in a workday and a weekend day in this suburban environment.

It is worth mentioning that the high-capacity sensors used to conduct the recording campaign using the WASN in real operation were low-cost acoustic sensors designed ad hoc for the DYNAMAP project [51].
