*3.2. Data Labeling*

After recording representative acoustic data for building the suburban environmental database, a labeling process was conducted. The manual annotation of ANEs becomes particularly complex when dealing with real-operation data from raw recordings. Thus, this process must be conducted by experts, since it is very important to precisely determine the occurrence and boundaries of each event, e.g., indicating the start and end points in the mixed audio [23].

The labeling process was not exhaustive because of the excessive burden that such a task would represent if considering all recorded data, meaning that only 50% of gathered audio signals were finally labeled. This represented a total of 156 h and 20 min of labeled audio data. From the labeling process 94.8% was labeled as RTN, 1.8% as ANE and the remaining 3.4% was categorized as *others* when the audio passage was difficult to categorize in one or other class due to the complexity of the audio scene. These last passages were not included in the subsequent analyses presented throughout this paper, but they have been left for future analyses that will focus, e.g., on the impact they can have for ANED assessment.

In Figure 4, an example of the labeling process using Audacity software is shown for illustrative purposes. The example of ANE is a siren, which is a long event, recorded in sensor hb149 the 2nd of November 2017.

**Figure 4.** Example of the labeling of a siren using Audacity, with the label on top, the signal in the middle and the spectrum on the bottom.

From the labeling of all the collected data through the 19 nodes of the Rome's WASN, the following list of ANEs were observed:


Another label has been used for the annotation process, the *cmplx* label that indicates that the piece of raw acoustic audio was the result of more than one subcategory of ANE or that the sound was not identifiable by the labeler. This is not considered to be a subcategory of ANE because it cannot be assigned to a single noise source.
