*4.3. ANEs' Impact on the LAeq*

The ANE contribution in the noise map can be analyzed in several ways. In this study, each ANE has been characterized with the two variables analyzed in the previous section: duration and SNR. Nevertheless, the impact of each ANE to the noise map should be quantified, and preliminary studies with a smaller dataset show that has a strong dependence on the SNR and the duration values [52]. The impact consists of the *LA*<sup>300</sup>*<sup>s</sup>* measurement of the raw audio minus the same measurement after removing the noise event, which allows discovery of the final contribution of the event to the noise map (more details are given in Section 3.3.2).

To find a first approximation to the ANE subcategories and impact analysis, all the recorded ANEs are depicted according to their characteristics in Figure 8. The SNR is plotted on the vertical axis and the duration on the horizontal axis in a logarithmic scale, for illustration purposes. Also, the size of the marker represents the ANE impact, in a scale indicated in the legend. Besides, the class of the event is depicted in a color scale detailed in the legend, designed to distinguish events with similar parameters easily (the reader is referred to Section 3.2 for a list of all ANE subcategories).

In Figure 8 the reader may appreciate that the class distribution of the events is not uniform, as seen in Table 1. Events shorter than 1 s does not appear to have a significant impact on the *LA*<sup>300</sup>*<sup>s</sup>* when evaluated individually, as they represent an impact near 0 dB. However, events that last more than 4 s and have a positive SNR may contribute to the *LA*<sup>300</sup>*<sup>s</sup>* level with impacts of more than 3 dB. Among these significant ANEs, train pass-bys and sirens are the events presenting a higher overall impact score and presence, as also depicted in Figure 7 (where bike noise also appears to have a high median impact because it has only three occurrences). The events presenting a higher impact on the *LA*<sup>300</sup>*<sup>s</sup>* are trains and sirens, mainly, being only the trains the ANE that surpass the 3 dB level.

**Figure 8.** Scatterplot of all ANE parameters separated in subcategories (recorded and labeled in the 2nd and the 5th of November 2017).

It is worth mentioning that in some long events, the SNR and the impact on the *LA*<sup>300</sup>*<sup>s</sup>* have computation problems. Hence, in Figure 8, only ANEs with both a quantifiable SNR and impact are depicted. Of the total of 2014 events, only three have a duration of more than 300 s, hence, making the impact on the *LA*<sup>300</sup>*<sup>s</sup>* impossible to calculate. In addition, 61 events give SNR calculation problems, mainly due to high ANE density segments where the interpolation between RTN labels is not possible. All those events have been discarded from these representations.

### *4.4. Node-Based Analysis*

The main upgrade of the dataset detailed in this work, apart from the time distribution of the recordings and the total amount of time labeled, is the fact that the data gathered corresponds to 19 different locations and nodes in a WASN. This leads us to detail a spatial study of the collected data, assuming that not all nodes will observe the same subcategories of ANEs and, of course, the same number of occurrences. This is a key study for the final usage of this dataset, which is the training of the ANED, to detect the ANE in all the sensors of the WASN. A first approach to the homogeneity of this network will be given by the results of the cross analysis between ANE subcategory and sensor Id.

Figure 9a shows the ANE occurrences distribution segregated by sensors Id and ANE subcategory as an image, where it can be appreciated that birds in sensor hb143 are the ones more frequently observed, as birds produce short noise bursts and they can be very repetitive in certain locations and hours. Otherwise, the rain episode during the weekend day of the recording campaign is the second mostly observed event. From the same figure, it can be observed that noise events with quite uniform distribution across all the sensors network are some which are more related to traffic (horns, brakes, and sounds of trucks), and meteorological sounds during the weekend (rain and thunder). The rest of ANEs show a more irregular distribution across the sensor network during the recording campaign.

(**a**) ANE occurrences distribution for each ANE subcategory and sensor Id.

(**c**) ANE total duration distribution. The median duration value is depicted for each ANE subcategory and sensor Id.

(**b**) ANE SNR distribution. Median SNR value is depicted for each ANE subcategory and sensor Id.

(**d**) ANE duration distribution. The median duration value is depicted for each ANE subcategory and sensor Id.

**Figure 9.** ANE parameters distribution per subcategory for each WASN node.

Figure 9b shows the median SNR values segregated by sensors and ANE subcategory. There are several nodes and ANE subcategories that exhibit positive SNR values, and their maximum values are attained for sirens, horns, trucks, and *buses* while medium SNRs are observed for brakes, birds, thunder, doors, and airplanes. The negative median SNR values of *rain* can be basically explained by the fact that the used computation methodology of SNR (see [23] for further details) is imprecise when the ANE duration is too high because of the underlying stationary assumption of the RTN assumed within this method.

Figure 9c shows the total duration of ANE recorded for each ANE type and sensor Id. The colormap scale leaves the maximum value as an outlier, depicted as a numeric value 1689 for rain in sensor hb143. Regarding the other values, it can be appreciated that *brak*, *sire*, *horn*, *trck*, *rain* and *thun* are the ANEs with more regular presence across the entire sensor network, while other ANEs like *stru*, *tran*, *musi*, *inte*, *bike*, and *alrm* are more irregularly observed. Figure 9d shows the ANE mean duration

distribution per category for each of the WASN sensors, i.e., the median ANE duration statistic has been computed for each cell of the depicted matrix. The color bar legend also presents an outlier, which is reached by sensor hb143 and rain ANE category. The following maximum mean duration is related to sensor hb104 and *musi*.

As a conclusion of this WASN-based analysis, *siren* is the subcategory of ANEs with longest duration and with presence in most of the sensors, and so are *horns*, but the latter have shorter duration; both ANEs present high values of SNR. Another two subcategories are present in most of the sensors in this WASN-based recording campaign; *rain* and *thunder* have a wide presence in the recording, due to the fact that on the 5th of November nearly all the WASN suffered heavy rain in the afternoon. The main difference between them is that while *rain* presents mainly low SNR values, *thunder* presents mid SNR values. *birds* present values of high occurrence and duration for several sensors, but the SNR associated with the birdsong is moderate or low. Nevertheless, the main output of this analysis is that there are few events with uniform appearance in all the sensors, and that most of the subcategories labeled in this work correspond to recordings of fewer groups of sensors. This leads us to the conclusion that the WASN-based recording considering all the sensors in the network was a requirement to observe the variety of the distribution of the events occurring in the entire network.
