*5.2. WASN-Based Dataset and Node Homogeneity*

The final use of the presented dataset is the training of the ANED algorithm for a precise detection of the ANEs in all the nodes of the WASN already deployed in the Rome pilot area. Although the developed dataset contains a significantly larger sample of both RTN and ANEs—around 19 times the data gathered in the preliminary expert-based dataset [23]—the analysis of the acoustic data confirms the heterogeneous distribution of the ANEs subcategories in real-life environments already observed in [23]. This heterogeneity has also been observed across the nodes as detailed in Section 4.4. Although some of the subcategories occurred in most of the nodes (e.g., *brak*, *rain*, *truck* and *horn*), there are others found particular in some sensors of the network (e.g., *airp*, *bike* and *train*).

The design of a WASN-based dataset raises the hypothesis of homogeneity of the raw data captured in each of the nodes of the network. This hypothesis was analyzed by means of the distribution of the ANEs in the previous recording campaign [56], with an analysis of the five recording locations. To ensure that the ANED will operate properly in all the nodes, their acoustic environments should present a certain homogeneity in terms of frequency distribution. To collect the data in similar conditions, all nodes have been installed maintaining the same distances and orientations to the road. Nevertheless, a study to evaluate the homogeneity should be carried out taking into account both the locations of the nodes and the occurrences of the ANEs with the final goal of a generalist training of ANED for all the network.
