**Application of the Intermittency Ratio Metric for the Classification of Urban Sites Based on Road Tra**ffi**c Noise Events**

#### **Giovanni Brambilla 1, Chiara Confalonieri <sup>2</sup> and Roberto Benocci 2,\***


Received: 21 October 2019; Accepted: 21 November 2019; Published: 23 November 2019

**Abstract:** Human hearing adapts to steady signals, but remains very sensitive to fluctuations as well as to prominent, salient noise events. The higher these fluctuations are, the more annoying a sound is possibly perceived. To quantify these fluctuations, descriptors have been proposed in the literature and, among these, the intermittency ratio (*IR*) has been formulated to quantify the eventfulness of an exposure from transportation noise. This paper deals with the application of *IR* to urban road traffic noise data, collected in terms of 1 s A-weighted sound pressure level (SPL), without being attended, monitored continuously for 24 h in 90 sites in the city of Milan. *IR* was computed on each hourly data of the 251 time series available (lasting 24 h each), including different types of roads, from motorways to local roads with low traffic flow. The obtained hourly *IR* values have been processed by clustering methods to extract the most significant temporal pattern features of *IR* in order to figure out a criterion to classify the urban sites taking into account road traffic noise events, which potentially increase annoyance. Two clusters have been obtained and a "non-acoustic" parameter *x*, determined by combination of the traffic flow rate in three hourly intervals, has allowed to associate each site with the cluster membership. The described methodology could be fruitfully applied on road traffic noise data in other cities. Moreover, to have a more detailed characterization of noise exposure, *IR*, describing SPL short-term temporal variations, has proved to be a useful supplementary metric accompanying *LAeq*, which is limited to measure the energy content of the noise exposure.

**Keywords:** road traffic noise; noise events; intermittency ratio; urban sites classification

#### **1. Introduction**

Noise pollution has been estimated as the second major environmental health risk after air pollution in Europe [1]. The noise health effects may emerge directly via autonomous stress reactions to the physical exposure or indirectly via negative affective states, for example the evoked annoyance. Noise annoyance may interfere with daily activities, rest or sleep, and can be accompanied by negative emotional and behavioral responses such as anger, displeasure, exhaustion and by stress-related symptoms [2–4].

There is clear evidence in the literature that annoyance and sleep effects depend not only on sound energy, described by metrics like *LAeq*, but also by the characteristics of noise events, which can be quantified by different metrics proposed in the literature, as those reviewed in [5]. It is well known that human hearing is able to adapt to steady noise easier than to the sound pressure level (SPL) fluctuations, as well as to prominent, salient noise events [6,7]. The higher these fluctuations are, the more annoying a sound is possibly perceived. Road traffic noise is typically characterized by the

noise events due to the single vehicle pass-by, where the temporal structure of SPL varies between local one-lane city roads, showing highly intermittent noise, up to wide multi-lane motorways, producing a nearly continuous noise with very limited SPL fluctuations. To quantify these SPL fluctuations, common approaches either apply thresholds to detect events exceeding such thresholds and count number and duration of these events, or use SPL statistics, like percentile levels *LA1*, *LA5* and *LA10*, namely the A-weighted SPL exceeded for 1%, 5% and 10% of the measurement time, respectively.

Recently, a new descriptor has been proposed [8], describing the eventfulness (or intermittency) of transportation noise exposure, taking into account both number and magnitude of noise events during a certain time period. The metric, named intermittency ratio (*IR*) and introduced within the framework of the SiRENE project, can be derived either directly from acoustic measurements or calculated from traffic and geometric data for any transportation noise source and any time period. A recent survey, performed on a stratified random sample of 5592 residents exposed to transportation noise all over Switzerland, has shown that for road traffic noise *IR* has an additional effect on the percentage of highly annoyed people and can explain shifts of the exposure-response curve of up to about 6 dB between low *IR* and high *IR* exposure situations, possibly due to the effect of different durations of noise-free intervals between events [9]. Moreover, a parameter study, based on calculations, has showed the dependency of *IR* on source–receiver distance, traffic volume, the percentage of heavy vehicles and travelling speed [10].

The metric *IR* has been determined on the 1 s A-weighted SPL from road traffic, without being attended, monitored continuously for 24 h in 90 sites in the city of Milan. It was computed on each hourly data of the 251 time series available (lasting 24 h each), including different types of roads, from motorways to local roads with low traffic flow. The obtained hourly *IR* values have been processed by clustering methods to extract the most significant temporal pattern features of *IR*, in order to figure out a criterion to classify the urban sites considering road traffic noise events, which potentially increase annoyance. Two clusters have been determined and a "non-acoustic" parameter *x*, calculated by combination of the traffic flow rate in three hourly intervals, has allowed us to associate each site with the cluster membership. Furthermore, binomial logistic regression has been applied to develop a model to predict the cluster membership on the basis of the *IR* time patterns. The performance of the model, determined comparing the predicted classification of the test data subset with that obtained by the cluster analysis, was satisfactory.

#### **2. Materials and Methods**

#### *2.1. Acoustic Data Set*

In the framework of the LIFE DYNAMAP project, a large road traffic noise monitoring survey was carried out in the entire area of Milan to collect a database containing noise data related to the city road network [11]. From this database a set of 90 sites have been considered to represent the different types of roads according to the Italian functional road classification, that is motorway (class "A"), thoroughfare roads (class "D"), urban district roads (class "E") and urban local roads ("F"). The distribution of the sites among these classes is given in Table 1, together with the number of 24-hour time series of 1 s A-weighted sound pressure level (SPL) from road traffic monitored continuously by a class 1 sound level meter with the microphone placed at 4 m above the road. In some sites the unattended monitoring has been performed on more consecutive days. The monitoring has been performed on weekdays only (Monday to Friday), without rain and with wind speed less than 5 m/s. Noise events not associated with road traffic have been visually detected and manually masked before further data processing. The microphone was placed close to the road to reduce the influence of noise from other sources. Thus, the noise data are not representative of the real exposure of residents, living at greater distances from the road, especially where the road has a multi-lane geometrical configuration and even further away if the lanes are separated by a median strip area. In particular, the inhabitants' exposure is most likely overestimated because the values of *IR* and *LAeq* are greater than those at the

road facing building façades (i.e., for *IR* dependency on source–receiver distance see Figure 4 in [10]). Details on the numbers of lanes for each direction and the average distance of the microphone from the roadside are given in Table 1.


**Table 1.** Distribution of the 90 sites included in the road traffic noise monitoring.

#### *2.2. Intermittency Ratio IR Formulation*

A noise event can be characterized by its maximum level, its sound exposure level (SEL), its "emergence" from background noise, its duration, or by the slope of rise of the level. For the characterization of the "eventfulness" of a noise exposure, the event continuous equivalent level *Leq,T,Events* is introduced in the *IR* formulation, which accounts for all sound energy contributions that exceed a given threshold, that is clearly stand out from background noise. This parameter is referred to the overall continuous equivalent level *Leq,T,tot* for the measurement time *T* to give the following formulation of *IR* [8]:

$$IR \;= \frac{10^{0.1L\_{eq,T,\text{Events}}}}{10^{0.1L\_{eq,T,\text{tot}}}} \cdot 100 \; [\,\%]. \tag{1}$$

A single pass-by only contributes to *Leq,T,Events* if its SPL exceeds a given threshold *K* determined by:

$$K = L\_{\text{eq},T,tot} + \mathbb{C} \left[ \text{dB} \right], \tag{2}$$

where *C* might be between 0 and 10 dB. For low values of *C*, almost any situation produces a high *IR*, whereas high values of *C* almost always produce low *IR*. The balance between these extreme cases was investigated by numerical simulations of various traffic situations and resulted in *C* = 3 dB [8]. This value has not been set based on any verified psychoacoustic principle, but was derived empirically. As pointed out in [8], "The question of how much an event really has to stand out from background noise in order to be termed "event" by normal listeners depends on various other parameters", like the attentional, cognitive and emotional situation of the listener [6]. By definition, *IR* only takes values between 0% and 100%. An *IR* > 50% means that more than half of the sound exposure is caused by "distinct" pass-by events. In situations with only events that clearly emerge from background noise (e.g., a receiver very close by a road), *IR* yields values close to 100%. For example, Figure 1 shows the A-weighted SPL time history (measurement time *T* = 1 h) for an urban local road together with the corresponding *LAeq,T,tot* and the threshold *K* used to detect the events (all the SPLs above *K*), which determine *LAeq,T,Events*.

**Figure 1.** A-weighted sound pressure level versus time *t* (*T* = 1 h) for an urban local road. The sound pressure levels (SPLs) above the threshold *K* are events contributing to determine *Leq,T,Events*. For the plotted hourly SPL time history intermittency ratio (*IR*) = 93.3%, *LAeq,T,tot* = 59.8 dB(A), number of events above *K* threshold = 45 and *LAeq,T,Events* = 59.5 dB(A).

#### *2.3. Data Processing and Analysis*

A script running in the "R" environment, version 3.5.1 [12], has been written to import each of the 24-h time series as input in terms of text file (four columns with date, time, SPL in dB(A) at 1 s intervals and a code to indicate the corresponding source, which is road traffic noise or something else). The reference measurement time *T* was chosen equal to 1 h, as this time frame is established by the Italian legislation for road traffic noise measurement. Besides this requirement, the chosen measurement time *T* of 1 h was considered a reasonable compromise between longer time (i.e., 24 h, day and night periods, etc.) and shorter ones (i.e., 30 min or even shorter). For each *T* of 1 h, the output data were exported to an Excel file, including:


In addition for each site the hourly traffic flow was provided by the Municipal Agency of Mobility, Environment and Land of Milan (AMAT). The data were calculated by a model of traffic applied to the city road network.

The statistical analysis of the collected data was carried out by the software "R" [12]. For the sites where the noise monitoring lasted more days the median value of *IR* for each hour was determined, as this parameter is less influenced by the presence of outliers. Thus a matrix of 90 (sites) × 24 (hours) = 2160 values of hourly *IR* was used as input of the subsequent cluster analysis performed to find out the similarities in the *IR* time patterns.

To fulfill such an objective, hierarchical clustering, an unsupervised machine learning method for data classification, was applied. This method does not require to pre-specify the number of clusters to be generated and the output is a tree-based representation of the observations (dendrogram) showing the sequence of cluster formation and the distance at which each fusion takes place. Previously, for each hour the *IR* values have been scaled (mean = 0 and standard deviation = 1). The Euclidean distance has been considered to represent the similarity between pairs of observations. Complete-linkage clustering was considered: at the beginning of the process, each element is in a cluster of its own and, afterwards, the clusters are sequentially combined into larger clusters until all elements end up being in the same cluster. Different clustering methods available in the "clValid" R package, version 0.6-6 [13], were applied. In particular, six methods were considered, that is hierarchical, partitioning around medoids

(PAM), k-means, divisive analysis clustering (DIANA), model-base clustering and self-organizing tree algorithm (SOTA). For the sake of simplicity, minimal discrimination was considered, that is two clusters for both the sites and the hourly time intervals. The clustering performance of the methods was ranked according to seven parameters, namely connectivity, silhouette width and Dunn index (combining measures of compactness and separation of the clusters), the average proportion of non-overlap (APN), the average distance (AD), the average distance between means (ADM) and the figure of merit (FOM). The method selected as "optimal" on the basis of the above parameters was applied to obtain two clusters of *IR* patterns for both the sites and the hourly time intervals.

Afterwards, a model was developed to predict the cluster membership on the basis of the *IR* time patterns. For this purpose the "caret" R package, acronym for "Classification And REgression Training" [14], was used. The dataset needed to be randomly divided into two subsets, one for training the model and the other to test it and evaluate its classification performance. The binomial logistic regression was applied to develop the model because the dependent variable (cluster membership) was categorical with two categories. The classification performance of the model was determined comparing the predicted classification of the test data subset with that obtained by the cluster analysis.

#### **3. Results**

Figure 2 shows an example of the obtained 24-h pattern of hourly values of *LAeq* and corresponding *IR* for two different types of roads, namely a motorway (class "A") and a local street (class "F"). The plot reports the median of the hourly values ± the median absolute deviation (MAD) because the monitoring included more than one day, namely 12 days for road "A" and 9 days for road "F". It can be seen that road "A" was always much noisier than road "F" (hourly *LAeq* average differences across the hours of about 6 dB) and shows always lower *IR* values than road "F" (hourly *IR* average difference across the hours of about −50%, and less pronounced (−30%) during the night). The lower *IR* values observed for road "A" were due to the high traffic flow rate and speed on the motorway, resulting in a high background SPL above which the noise events did not stand out too much. This feature was clearly present in the day period from 6 to 18 h, whereas for the period from 2 to 4 h the highest values of *IR* were observed, when the reduced traffic flow allowed the increase of speed and more prominent noise events occurred above the lower background level. Road "F" shows the same behavior in the night, whereas the lowest *IR* values occurred at the traffic peak hours (8 and 18 h), when the traffic flow was highest and the increased background SPL reduced the prominence of noise events. Thus, given the very different temporal patterns of urban road traffic noise, from relative continuity to high intermittency, it would be worth to consider the *IR* metric as a supplementary quantity to *LAeq*.

Regarding clustering, the DIANA method was selected as the "optimal" clustering algorithm to divide the data set into two clusters of *IR* patterns for both the sites and the hourly time intervals. The dendrogram of the scaled *IR* hourly values obtained for the sites (matrix with rows = sites and columns = hours) is given in Figure 3. Table 2 reports the distribution of the sites across the road type and clusters. Cluster 2, on the right hand side in Figure 3, includes the majority of all the road types, whereas Cluster 1, on the left hand side in Figure 3, includes the remaining roads and all those in class "A".

**Figure 2.** Example of the obtained 24-h pattern of hourly values of *LAeq* and corresponding *IR* for two different types of roads, namely a motorway (class "A") and a local street (class "F").

**Figure 3.** Dendrogram of the scaled *IR* hourly values obtained for the 24-h road traffic noise data monitored in the 90 sites in Milan.


**Table 2.** Distribution of the 90 sites across the two clusters and type of road for the classification based on *IR* hourly time patterns.

The multidimensional scaling (MDS) applied to the data provided the bi-dimensional plot given in Figure 4, where the two clusters appeared satisfactorily separated and the variance explained by the two dimensions was 88.3%.

**Figure 4.** Bi-dimensional plot of the two clusters obtained by multidimensional scaling (MDS). Dimension 1 and 2 explain 68.4% and 19.9% of the variance, respectively.

The dendrogram in Figure 5 shows the clustering in terms of hourly intervals, obtained after the transposition of the matrix containing the 2160 values of hourly *IR* (rows = hours and columns = sites). The night period (from 22 to 7 h) was clearly separated from the day-time. Regarding the *IR* time pattern for each cluster, Figure 6 reports the hourly median *IR* values ± the median absolute deviation (MAD) and the three hourly intervals showing the biggest differences between the two clusters (green rectangles). In the night period the *IR* values were the highest for both clusters because of the presence of noise events clearly emerging above the background noise. In this time period there was an overlapping between *IR* values corresponding to the two clusters. Similar median *IR* time patterns were also observed from 7 to 24 h, with Cluster 1 having lower *IR* values. As expected the night period was the most critical due to prominent noise events, which could produce an increasing of annoyance, considering also the affected activities (mainly sleep).

The above results of clustering were also plotted in terms of a heatmap, reported in Figure 7, a rectangular tiling of the data matrix with cluster trees appended to its margins, where the rows and columns of the matrix are ordered to highlight patterns [15]. The color key legend on the top left in the figure shows also the distribution of the 2160 hourly *IR* values.

**Figure 5.** Dendrogram of the scaled *IR* hourly values as a function of the hours.

**Figure 6.** *IR* time pattern for each cluster. Green rectangles correspond to the hourly intervals showing the biggest differences between the two *IR* time patterns.

**Figure 7.** Cluster heatmap of the scaled hourly values of *IR*. On the *y* axis the sites divided into Cluster 1 (blue rectangle) and 2 (red rectangle). On the *x* axis the clustering across the hourly intervals with the night period, from 22 to 7 h, in the blue rectangle.

The obtained *IR* time pattern for each cluster cannot be applied in a straightforward way without any linking to a specific feature of either the road or the corresponding traffic flow. As shown in Table 2, the road type was useless because each cluster included different road types. Thus, to find a "non-acoustic" parameter suitable to predict the cluster membership, the Mann-Whitney U test was performed on the hourly *IR* values to detect the hourly intervals where their differences between the two clusters were biggest. The rank descending order of these differences showed that they corresponded to the hourly intervals 15–16 h, 13–14 h and 11–12 h (see Figure 7). Thus, the traffic flows *F* in these three hours were combined according to the following relationship, similar to that previously proposed in [16]:

$$\mathbf{x} = \sqrt{[\lg(F\_{15-16})]^2 + [\lg(F\_{13-14})]^2 + [\lg(F\_{11-12})]^2}.\tag{3}$$

Having a separation of the sites into two clusters, binomial logistic regression was applied to develop a model to predict this classification. This is a statistical model that in its basic form uses a logistic function (known as "S" shape or sigmoid curve) to model a binary dependent variable, having only two possible values. In such a model, the cluster membership was considered as a dependent variable, in particular Cluster 1 was labeled "0" and Cluster 2 was labeled "1", and the "non-acoustic" parameter *x* was taken as an independent variable (predictor). The split ratio = 0.7 was used for randomly sub-setting the data set for training the classification model (63 sites) and, afterwards, to test it (27 sites). At the end of the training process, the model equations in terms of probability *P* of an observation to belong to Cluster 2 (Y = 1) was obtained as follows:

$$P(Y=1) = \frac{1}{1 + e^{(-6.84 + 1.26x)}} \tag{4}$$

The classification model was applied to the test dataset in order to evaluate its classification performance and the obtained confusion matrix, a table counting how often each combination of known categories (the clusters) occurred in combination with each prediction type, is reported in Figure 8. The results were satisfactory, being the model accuracy (fraction of correct predictions) equaled to 0.83, the precision (the ratio of true positives to predicted positives) and recall (the ratio of true positives over all positives) equaled to 0.88 and the Cohen's kappa |ê = 0.60 (moderate agreement). Table 3 reports additional performance parameters. Figure 9 shows the comparison between the cluster membership (blue dots = Cluster 1 and red dots = Cluster 2) obtained by the DIANA clustering and the probabilities predicted by the logistic regression (blue curve obtained by Equation (4)). The proportion of correctly classified observations by the model was equal to 0.74.

**Figure 8.** Confusion matrix of the classification model applied to the test dataset.

**Table 3.** Classification performance of the logistic model.

**Figure 9.** Cluster membership (blue dots = Cluster 1 and red dots = Cluster 2) obtained by the divisive analysis (DIANA) clustering compared with the probabilities predicted by the logistic regression (blue curve obtained by Equation (4)). Probabilities *p* ≤ 0.5 and p > 0.5 correspond to Cluster 1 and 2, respectively. The threshold for the "non-acoustic" parameter x to discriminate between the cluster membership is reported in green.

Regarding the effective application of the above two clusters, it is essential to determine a threshold for the "non-acoustic" parameter *x* able to discriminate between the cluster membership. Such a threshold (*x* = 5.24) was empirically determined as shown in the box plot of the *x* values reported according to the cluster membership of sites (Figure 10). This value was comparable with that obtained from the intersection of the logistic model curve with the cluster membership probability value of 0.5, shown in Figure 9 (*x* = 5.428).

**Figure 10.** Empirical threshold value of the "non-acoustic" parameter *x* obtained for the discrimination between the two clusters (*x* = 5.24).

#### **4. Discussion**

It has to be pointed out that the *IR* values calculated from the noise data provided by the noise monitoring network in Milan have some drawbacks due to some factors, like the different distance microphone-longitudinal axis of the road, the microphone proximity to the road and not where the residents live and so forth. In addition, the results of the clustering and classification model were strongly dependent on the local situation and could not be generalized to other contexts. Besides these limitations, the methodology applied could be fruitful applied in other cities and some general considerations could be drawn. For instance, the hourly *IR* and *LAeq* time patterns, shown by the example in Figure 2, highlight the complementarity of these two metrics, the former describing SPL short-term temporal variation, the latter measuring the energy content of the noise exposure. In particular, for the available experimental dataset, Figure 11 reports the logistic fitting of the hourly values of these two descriptors for the centroids of cluster 1 and 2.

**Figure 11.** Logistic fitting of the hourly values of *IR* and *LAeq* for the centroids of cluster 1 (**a**) and 2 (**b**). The area around the regression line represents the confidence bands at a 95% confidence level. The symbol labels represent the hourly intervals.

Due to its definition, the *IR* value ranges between the following two opposite sonic environments:


The sonic environment (1) occurs usually at roads with high traffic road rate, such as motorways and thoroughfare roads (road classes "A" and "D") especially during the day-time, whereas the sonic environment (2) is usually observed at roads either with low traffic road rate, such as local roads (road class "F") during the day-time or during the night for all the roads with the exception of motorways.

However, there might be particular cases, indeed very frequent in the urban context, where the local road is very close to a busy street whose noise is clearly influencing the sonic environment in the local road itself. In these circumstances, the low energy noise events, produced by small number of vehicle pass-by at low speed, do not emerge so much above the high background SPL produced by the nearby busy road. In the data set herewith considered there were a few sites with this feature, like the two ones shown in Figure 12. The *IR* time pattern in these sites is similar to those observed for thoroughfare roads. This is, most likely, the reason why a marginal percentage (23.1%) of local roads (class "F") have not been grouped in the cluster containing busy roads. Thus, in the selection of sites to be monitored it is important to avoid, as much as possible, this situation, which, nevertheless, is often present in urban road network.

**Figure 12.** Examples of two local roads (**a** and **b**) monitored nearby a busy street (adapted from Google Earth images).

The above remarks should not be considered a weakness of the *IR* metric, but rather a reliable representation of the time pattern of the sonic environment and of the potential annoyance it might evoke. In addition, a comparison has been performed between the classification based on *IR* hourly time patterns and that provided by hourly *LAeq* time patterns, the latter obtained according to the procedure detailed in [17,18]. The two classifications, as shown in Figure 13, are somewhat different, as they overlap for 64% only.

**Figure 13.** Comparison between the classifications based on *IR* and *LAeq* hourly time patterns for the type of roads.

Despite the observed mismatch between the above two classifications, the difference between the hourly *LAeq* patterns corresponding to the clusters obtained by the two classifications was not statistically significant at 95% confidence level for any hourly interval, even in the night period, as shown in Figure 14 where the hourly *LAeq* median values ± the median absolute deviation (MAD) are reported. However, it has to be pointed out that the two classifications have different aims: the one based on *LAeq* pattern is mainly focused on noise mapping, according to the standards issued by the European Directive 2002/49/EC [19], whereas that based on *IR* pattern could be aimed at discriminating the sites according to the potential annoyance their sonic environment might evoke. Thus, these two approaches are not alternative with one another but shall be considered complementary. Furthermore, both the classifications are rather different from the categorization based on the type of road, as established by the Italian legislation, which defines the noise limits as a function of the road

category. Thus, this approach did not seem appropriate for an effective protection against road traffic noise pollution.

**Figure 14.** Comparison of the hourly *LAeq* patterns corresponding to the two clusters (**a**, cluster 1 and **b**, cluster 2) obtained by the two classifications based on *IR* and *LAeq*, respectively.

#### **5. Conclusions**

The intermittency ratio *IR* metric was applied to a database of road traffic noise, without being attended, monitored for 24 h in 90 sites in the city of Milan. The reference measurement time T was set at 1 h and the obtained *IR* values were processed by clustering methods. Two clusters were determined, providing hourly *IR* temporal patterns enabling us to classify the urban sites on the basis of the observed noise events, which, potentially, increase the annoyance. A "non-acoustic" parameter *x*, determined by combination of the traffic flow rate in three hourly intervals, was allowed to associate each site with the cluster membership. Furthermore, binomial logistic regression was applied to develop a model to predict the cluster membership on the basis of the *IR* time patterns. The performance of the model, determined comparing the predicted classification of the test data subset with that obtained by the cluster analysis, was satisfactory.

However, the *IR* values calculated from the noise data provided by the road traffic noise monitoring network in Milan, mainly used for a noise mapping update, had some drawbacks due to some factors, like different distances microphone-longitudinal axis of the road and microphone position close to the road and not where the residents live. The reference measurement time *T* chosen, equal to 1 h, had also affected the *IR* values. In addition, the results of clustering and classification model were strongly dependent on the local situation and could not be generalized to other contexts. However, the study showed that data collected for noise monitoring and mapping purposes could be processed to evaluate the occurrences of noise events produced by a vehicle pass-by. Besides the above limitations, the described methodology could be fruitfully applied on road traffic noise data in other cities and some general considerations could be drawn. In particular, *IR* could be a supplementary metric accompanying *LAeq*, as the former describes SPL short-term temporal variation and the latter measures the energy content of the noise exposure. Indeed, *IR* could explain deviations of highly annoyed people percentage from that estimated by the classical exposure–response curves that only rely on *LAeq* [4], like those in [20].

Furthermore, the two classifications based on *IR* and *LAeq* hourly time patterns are rather different from that based on the type of road, as established by the Italian legislation, which defines the noise limits as function of the road category. Thus, this approach does not seem appropriate for an effective protection against road traffic noise pollution.

Further steps of this research are already planned and they include the statistics of errors in the estimate of *IR* values derived by the application of the above time patterns, as well as the potential of *IR* to detect correctly the noise events produced by road traffic, identified by an automatic recognition algorithm already developed within the DYNAMAP project [21,22].

**Author Contributions:** Conceptualization, G.B.; methodology, G.B. and R.B.; software, G.B.; validation, C.C.; investigation, G.B. and R.B.; data curation, C.C.; writing—original draft preparation, G.B.; writing—review and editing, G.B., R.B. and C.C.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors thank Giovanni Zambon to grant the use of the noise monitoring data.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **EAgLE: Equivalent Acoustic Level Estimator Proposal**

#### **Claudio Guarnaccia**

Department of Civil Engineering, University of Salerno, I-84084 Fisciano, Italy; cguarnaccia@unisa.it Received: 9 December 2019; Accepted: 23 January 2020; Published: 27 January 2020

**Abstract:** Road infrastructures represent a key point in the development of smart cities. In any case, the environmental impact of road traffic should be carefully assessed. Acoustic noise is one of the most important issues to be monitored by means of sound level measurements. When a large measurement campaign is not possible, road traffic noise predictive models (RTNMs) can be used. Standard RTNMs present in literature usually require in input several information about the traffic, such as flows of vehicles, percentage of heavy vehicles, average speed, etc. Many times, the lack of information about this large set of inputs is a limitation to the application of predictive models on a large scale. In this paper, a new methodology, easy to be implemented in a sensor concept, based on video processing and object detection tools, is proposed: the Equivalent Acoustic Level Estimator (EAgLE). The input parameters of EAgLE are detected analyzing video images of the area under study. Once the number of vehicles, the typology (light or heavy vehicle), and the speeds are recorded, the sound power level of each vehicle is computed, according to the EU recommended standard model (CNOSSOS-EU), and the Sound Exposure Level (SEL) of each transit is estimated at the receiver. Finally, summing up the contributions of all the vehicles, the continuous equivalent level, *Leq*, on a given time range can be assessed. A preliminary test of the EAgLE technique is proposed in this paper on two sample measurements performed in proximity of an Italian highway. The results will show excellent performances in terms of agreement with the measured *Leq* and comparing with other RTNMs. These satisfying results, once confirmed by a larger validation test, will open the way to the development of a dedicated sensor, embedding the EAgLE model, with possible interesting applications in smart cities and road infrastructures monitoring. These sites, in fact, are often equipped (or can be equipped) with a network of monitoring video cameras for safety purposes or for fining/tolling, that, once the model is properly calibrated and validated, can be turned in a large scale network of noise estimators.

**Keywords:** noise control; sensor concept; road traffic noise model; dynamic model

#### **1. Introduction**

The problem of road traffic noise in urban and non-urban areas is becoming more and more important nowadays. The effect of noise on human health is well established [1]. The recent publication of the European Environment Agency (EEA) about "The European environment—state and outlook 2020. Knowledge for transition to a sustainable Europe" [2] lists environmental noise among the most dangerous phenomena, dedicating a full chapter to this issue. In this document, the delay in implementing the actions suggested by the Environmental Noise Directive (END) [3] are claimed, underlining how at least 20% of the EU's population is still exposed to noise levels unsafe for health. Due to society and human habits, such as to existing infrastructures, road traffic noise is the most important source of noise in the EU, with more than 100 million of people affected by long-term daily average noise levels greater than 55 dBA and with about 80 million of people exposed to night-time levels above 50 dBA [3].

In order to cope with this issue, many municipalities introduced fixed or temporary monitoring stations and implemented mitigation actions based on the results of the measurements. Expensive and not always accepted acoustic barriers are the most widespread solution to mitigate the noise produced by the main sources [4]. Pavement plays a key-role in noise emitted by road, as recently studied by many authors aiming to integrate noise reduction with green economy by recycling rubber from old tires into asphalts (rubber asphalts) [5–7]. Preventing is also mitigating, thus, innovative solutions, like real time monitoring, are actually studied using a wireless sensor network [8,9].

On the other hand, road infrastructures companies are obliged to perform environmental impact analysis, including noise monitoring and estimation. In addition, when critical situations are highlighted, action plans must be performed, according to the END [2] and to the national regulations of each country.

In any case, measurements are expensive and cannot be performed all over large areas, thus, road traffic noise predictive models (RTNMs) can be adopted to assess noise produced by vehicles. Extensive reviews of the standard statistical RTNMs can be found for instance in [10,11], also in comparison with field measurements [12]. In [13], a brief review of advanced techniques for road traffic noise assessment is reported, including cellular automata [14], Time Series Analysis [15], Poisson models [16], etc. Can et al. in [17] reported a review of the models to estimate the source power level of the single vehicle.

The usage of advanced computing techniques is somehow growing in literature, even though it must still be demonstrated that the adoption of computationally demanding procedures introduces a widespread benefit in the predictions. In "non-standard" conditions (such as traffic jams or congestions or intersections) usually the common RTNMs fail. Therefore, in large areas case studies, such as big municipalities and big road infrastructures, and for long term average (such as *Lden* evaluation), the need of a fast and effective model is more important than having an extreme precision (for instance lower than 1 dBA).

Neglecting the predictive models based on data analysis, such as Time Series Analysis models and Poisson models, it can be affirmed that in order to implement a RTNM it is compulsory to know at least the number of vehicles that pass in a certain time range (flow) and the classification of each vehicle (at least light and heavy categories), together with the geometrical detail of the source and receiver positions. Many other parameters can be included to take into account second order corrections, such as road pavement typology, gradient of the road, temperature, humidity, etc. All these data, compulsory and additional ones, are not always available, and thus the possibility to detect the inputs of any RTNM automatically is an important challenge. In this paper, the author presents the design of a new methodology, the Equivalent Acoustic Level Estimator (EAgLE), based on vehicles detection, counting and tracking, by means of a video processing tool, with a single vehicle noise emission and propagation model.

There are several studies in literature about image processing, video analysis, object detection and tracking. In [18], a detailed study on vehicle recognition based on deep neural network is presented. Huang in [19] presented traffic speed estimation from surveillance video recordings, highlighting the difficulties related to crowded lanes and perspective corrections. Similar research has been presented by Hua et al. in [20], focusing on the tracking and speed estimation from traffic videos. Biswas et al. in [21] presented a speed estimation performed on video recordings taken by unmanned aerial vehicles. Several other studies are reported in literature, focusing mainly on counting, detection and category assignment, tracking, and speed estimation of vehicles from video recordings. Basically, the real time traffic monitoring systems have been deeply innovated in the last years, leading to the development, and sometimes the installation, of very intelligent sensors on road infrastructures and urban areas. In any case, these sensors are somehow limited, since usually they just detect, count, and track vehicles, in order to help in traffic management, for instance in signalized intersections. Sometimes the video cameras are used to control the restricted areas and, in some cases, also for environmental issues (see, for instance, the Ultra Low Emission Zones (ULEZ) in London), but they usually do not produce an assessment of any environmental parameter, such as air and/or noise pollution.

The research presented in this paper aims to partially fill this gap, proposing a methodology to embed these sensor networks with a noise level estimator. The proposed approach starts from the recognition of a moving object on the road. Once the object is tracked, it can be counted and categorized according to its dimension. Its speed can be estimated as well, allowing to assess the sound power level. This assessment can be performed in many ways, according to several noise emission models (as presented by Can et al. in [17]). In this paper, the proposed approach is to adopt the CNOSSOS-EU emission model [22]. Once the noise emission, i.e., the source sound power level, of each vehicle is estimated, the overall continuous equivalent level over a given time range can be calculated, summing up all the contributions coming from the vehicles flowing in that time range. This technique, named after "EAgLE: Equivalent Acoustic Level Estimator", in honor of one of the animals that have the best visual capacities, can be implemented in a new sensor to be developed for road traffic noise assessment purposes.

The above brief description of the EAgLE methodology is detailed in Section 2, while in Section 3 a preliminary application is presented, showing the results obtained in a preliminary comparison with sound levels recorded on an Italian highway. It will be highlighted that this methodology can be implemented in an existing or under development sensor network. In fact, many road and railway infrastructures, such as many municipalities, have already implemented a video recording network, mainly for safety reasons, that can be easily integrated to become an environmental monitoring network. The integration between existing video recordings and the proposed methodology is the starting point for transforming standard video cameras in smart sensors. In fact, the new proposed sensor, based on video recording, is able to give a quantitative estimation of the noise levels in many points, without the adoption of sound level meters and extensive (and expensive) measurement campaigns. Of course, this methodology being a concept, with only a small dataset for validation, it has several shortcomings at the moment that are reported in the discussion section. The EAgLE efficacy must be tested on a large dataset. This is the reason why a long term validation should be run, for instance, using it in parallel with existing monitoring stations. At any rate, EAgLe seems to be very promising, giving the chance to produce large noise maps, with the only aid of existing, or to be installed, video cameras.

#### **2. Materials and Methods**

The EAgLE technique adopted in this paper is based on the recognition of any moving object by means of background subtraction, defining a moving "blob" (Figure 1). The blob is bounded in a box (yellow box) and a centroid is applied in the center of the box (red dot). This centroid is tracked and when it passes a given line (green horizontal line in Figure 1), the vehicle is counted and assigned to light or heavy vehicle category according to the box's diagonal length (see counter on top right of Figure 1). The time each centroid takes for going from the green line to the white line (or vice versa) is used to estimate the speed of the vehicle, after a conversion from frame per second to meter per second.

**Figure 1.** Image analysis on the video frame. (**a**) Moving object are bounded in a yellow box and a red dot (centroid) is applied. Green and white line are used for counting and speed estimation; (**b**) blob detection after background subtraction.

The algorithm has been developed in the "Microsoft Visual Studio 2015" framework, with the aid of the "Open Source Computer Vision Library" (OpenCV). The code is written in Python. The source code of the recognition part has been created starting from codes shared in the "GitHub" platform [23].

The principal functions of the code are


The input file is a MPEG-4 file. At this stage, the algorithm works only in offline mode, analyzing the single frames after having processed the input video. The detection is performed by subtracting the background into two following frames. The tracking of the blob centroid is performed with an improved algorithm, proposed in [23]. The classic approach suggests minimizing the distance between all the centroid positions and the referenced one in two following frames. In this algorithm, a prediction of the position in the next frame is performed for each centroid, on the basis of the trajectory that followed in the previous close frames. Then, a weighted mean between previous positions, with weights varying according to the time distance, is performed and this position is proposed for the following frame. This calculation is done on 4 previous positions, as a compromise between tracking efficiency and computing time. Then, the distance between the predicted position and the real one is minimized, assigning the position to each blob, in all the frames of the video. This is useful to avoid multiple recognition due to several vehicles moving close each other.

When a centroid crosses a chosen line (in our case, the green line in Figure 1), that can be horizontal or vertical, the counting is increased by one. The category is assigned according to the length of the diagonal of the box. A short video sample of this procedure (Video S1) is proposed in the supplementary material of the paper. A time stamp (frame number) of the crossing is recorded and used for evaluation of the speed, by combining this time with the time of crossing the white line.

Once the vehicle has been detected and classified, and its speed has been assigned to the velocity vector, the sound power level can be estimated. In this preliminary stage, the following procedure has been implemented in Matlab©, but it can be implemented in the same framework of the video processing.

As mentioned in the introduction, among the several emission models that are presented in the literature (see Can et al. [17]), EAgLE implements the CNOSSOS-EU emission model that suggests calculating the sound power level as follows:

$$L\_{\mathcal{W},i,m}(v\_m) = 10 \log \left( 10^{\frac{L\_{\mathcal{W}R,j,m}(v\_m)}{10}} + 10^{\frac{L\_{\mathcal{W}P,j,m}(v\_m)}{10}} \right) \tag{1}$$

where, *i* is the index related to the frequency band of octave, *m* is the index related to the type of vehicle, *vm* is the average speed of the flow of the *m*-th category of vehicles, *Lw,R,i,m* is the rolling noise, and *Lw,P,i,m* is the propulsion noise, given by:

$$L\_{\rm NR,j,m}(v\_m) = A\_{\rm R,j,m} + B\_{\rm R,j,m} \log \left(\frac{v\_m}{v\_{ref}}\right) + \Delta L\_{\rm NR,j,m}(v\_m) \tag{2}$$

$$L\_{\rm WP,i,m} = A\_{P,i,m} + B\_{P,i,m} \left(\frac{\upsilon\_{m-}\upsilon\_{ref}}{\upsilon\_{ref}}\right) + \Delta L\_{\rm WP,i,m}(\upsilon\_m) \tag{3}$$

with *vref* being the reference speed (70 km/h), *A* and *B* table coefficients, and Δ*Lw* the correction terms. Of course, other emission models can be easily implemented, according to the needs and the country of application of the EAgLE system.

Once the *Lw* is obtained for each vehicle, the instantaneous sound pressure level at the receiver *Lp*(*t*) can be estimated using the pointlike source propagation formula, and the single event Sound Exposure Level (SEL) of each pass-by, i.e., the amount of acoustic energy of each transit "compressed" in 1 s, at the fixed receiver, is calculated:

$$SEL = 10\log\frac{1}{t\_0}\int\_{t1}^{t2} 10^{\frac{l\_p(t)}{10}}dt\tag{4}$$

where *t*<sup>0</sup> = 1 s, *t*1, and *t*2, respectively, are the beginning and the end of the transit. This step is fundamental in order to make all the transits comparable, since they have strong differences in terms of duration, according to the speed of the vehicles [24]. This procedure is done for each vehicle and for each category, in particular for light and heavy duty vehicles. Then, the overall SEL is calculated with a log sum for light and heavy vehicles. The continuous equivalent level *Leq* evaluated in the time range Δ*t* is finally obtained with the following formula:

$$L\_{eq}^{(\Delta t)} = 10\log\frac{1}{\Delta t} + 10\log\left(\sum\_{i=1}^{N\_L} 10^{0.1SEL\_i^{light}} + \sum\_{i=1}^{N\_H} 10^{0.1SEL\_i^{heavy}}\right) \tag{5}$$

A résumé of the main steps of the EAgLE methodology is


Of course, once the EAgLE methodology is embedded in existing sensors for video recording and validated with on-site measurements and calibration, the choice of time basis and time range to calculate the *Leq* can be tuned according to the needs of the case study. For instance, for urban planning purposes, in urban areas with specific limits, the *Lden* (i.e., equivalent level evaluated on the day, evening, and night periods, with penalties for evening and night) can be calculated by running the algorithm on the video recordings of one year. Several other applications are possible, changing and tuning the parameters of the EAgLE methodology, depending on the aim of the investigation and on the case study.

#### *Preliminary Application on a Case Study on an Italian Highway: Case Study Description*

A preliminary application of the EAgLE methodology has been performed on a site located along the Italian highway A2 "Autostrada del Mediterraneo". This highway is managed by ANAS S.p.a. and goes from the crossing between A30 and RA2, in Fisciano, to Reggio Calabria. The video recording and the measurements have been performed in the city of Baronissi (Figure 2a), in the segment between Fisciano and Salerno, from the sidewalk of a bridge (Figure 2b,c), in safety conditions (Figure 2d). In this segment, the highway is made of two lanes per direction, with an entering lane coming from a gas station, in the south-north direction. Anyway, the entering flow recorded during the measurements was negligible. Furthermore, the traffic on the bridge was negligible. No unusual events have been recorded, such as noisy motorcycles, airplanes passing by, honking, etc., meaning that the conditions of test are quite ideal for the application of the methodology.

(a) (b)

**Figure 2.** Measurement location: (**a**) Position of Baronissi (red mark), in the Campania region (courtesy of Google Earth©); (**b**) 3D aerial view of the bridge from Google Earth©; (**c**) lateral view of the bridge from Google Street View©; (**d**) picture of the instruments during the measurement collection.

(c) (d)

The instruments used for the measurements are a class 1 sound level meter Fusion by 01 dB and a video camera embedded in a mobile phone. Two measurements of 15 minutes have been collected around lunch time on Friday, 17 November, 2017. All the acoustic parameters, in particular LpA,F, Leq,A, percentile levels, acoustic spectrum in third of octaves, etc., and the video of the vehicles passing-by have been recorded in parallel. Temperature was approximately in the range 11 ◦C–14 ◦C and wind speed was below 5 m/s on average. Furthermore, to protect the sound level meter from sudden wind peaks, the wind cover was used (see Figure 2d). The flow was running almost freely, with little variations of speed. The average number of vehicles flowing in 15 minutes is 1091 vehicles, with a percentage of heavy vehicles of about 15% in both the measurements. Details about the manual counts performed on the videos are reported in Table 1.



The detection algorithm is obviously strongly influenced by the stability of the image that is affected by vibrations of the bridge and wind. Since in this sample application a simple camera with a tripod has been used, the overall recognition efficiency is affected by the vibration of the image. Without any post processing and offline analysis, the detection error is greater than 200%. For this reason, in order to check the complete EAgLE technique, a sampling of the two videos was tested, choosing the time ranges in which the camera was more stable, in order to find subsections of the videos less affected by image movements. Two video subsections, each of them made by 5 cuts collected at the beginning, at the end, and in the middle of the video, were extracted, one per each measurement. The overall duration of each subsection is around 300 seconds. Moreover, an offline analysis was run, removing the counts due to the moving of the frames. The periods chosen for the videos' cuts are summarized in Table 2.



#### **3. Preliminary Results**

The results of vehicle counting and detection is reported in Table 3, for the two video cuts, approximately five minutes long each, after post processing of the videos and moving frames counts removal.


**Table 3.** Results of the manual and Equivalent Acoustic Level Estimator (EAgLE) counting and recognition, after the post processing of the video, in the two video cuts.

The efficiency achieved after the removal of moving frames counts is good. Moreover, the recognition is performed with satisfying results. The mistakes in the category are usually overestimated due to the fact that some slightly moving frames could not be removed. That led to the creation of fake moving blobs due to the difference in the background between two following frames. When these fake blobs appeared close to the counting line, they were counted (usually as light vehicles because of the little variation between the two images in the following frames). In addition, it occurred that in some cases two light vehicles moving very closely to each other were recognized as a single heavy vehicle, leading to a small overestimation in this category. The author believes that these problems can be solved by means of a more stable video camera, an optimized angle of view, and a more advanced recognition tool.

The distributions of the speed estimated with the EAgLE algorithm are reported in Figure 3. It can be noticed that the distribution of light vehicles' speeds is very close to a normal distribution, as suggested in literature for free flows. For the heavy vehicles, the different shape of the distribution is probably influenced by the mixing of medium and heavy vehicles, which in principle have different average speeds. The EAgLE algorithm run in this preliminary application, in fact, did not distinguish between vans (medium vehicles) and buses or trucks (heavy vehicles). The mean values of the two distributions are of course different, due to the different speed limits and run conditions.

**Figure 3.** Speeds distributions for light (**a**) and heavy vehicles (**b**) summing the speeds estimated in both the video cuts.

The missing bins in light vehicles' speeds distribution figures are due to the discretization in detection of the speed. The frame rate of the camera (30 fps), in fact, influences the speed estimation, that is performed converting the number of frames per second needed to go from the trigger line to the "arrival" line. In particular, the discretization due to the frame rate introduces a discretization in the speed estimated as well. The resulting "delta" is a function of the speed itself (it grows according to the growth of the speed), of the frame rate and of the position of the lines. This position is the result of a compromise between a distance large enough to estimate the speed in a sufficiently large range, and the best location for vehicles pass-by detection. The delta ranges from about 4 km/h in the low speeds part of the distribution to about 14 km/h in the high speeds zone. It is expected that a more advanced camera, with a higher frame rate, will lead to a more precise estimation of the speed, with a consequently better distribution plot. Additional error sources can be the uncertainty on the centroid position, for instance, due to the shadow effect and the resolution of the image, since it influences the bounding box shape.

Due to the results obtained in the first phase with the detection algorithm, basically, once the identification and the speeds vectors for light and heavy vehicles have been detected, the noise levels estimation has been performed in Matlab framework. As already described in Section 2, the sound power level of the sources has been estimated with the CNOSSOS-EU approach, and the propagation to the receiver has been done with the standard pointlike source propagation formula. The measured continuous equivalent level *Leq* on the 15 minutes time range, the levels predicted with some predictive statistical models, the levels predicted with CNOSSOS-EU model, and the *Leq* simulated with the EAgLE technique, are resumed in Table 4. The predictive models selected for the comparison are a fully statistical and simple model, i.e., the Burgess model [25], that includes just the traffic flow, the percentage of heavy vehicles, and the distance between source and receiver, and a "semi-dynamical" model, i.e., CNOSSOS-EU, that, in addition to the previous inputs, includes the mean speed of the flow and some correction factors, such as road gradient, temperature, etc.

**Table 4.** Summary of measured *Leq* over 15 minutes compared with predictive model results and with the *Leq* simulated with the EAgLE methodology.


It can be immediately noticed that the statistical models overestimate the measured *Leq*, while the models that consider the speed of the flow (as a mean value, such as CNOSSOS, or for the single vehicle, such as EAgLE) give a much better estimation of the noise levels.

#### **4. Discussion**

The preliminary results reported in Section 3 are very encouraging and the comparison performed on the two test videos present a very good agreement between EAgLE simulated levels and the measured *Leq*. Furthermore, it should be underlined that at the moment the methodology presents some limitations and shortcomings.

First of all, the EAgLE technique is strongly affected by the video recording. In particular, the critical points seem to be the angle of recording, which affects the parallax and the conversion between frames and real world distances, the resolution of the camera, which influences the speed estimation, the light conditions, and the shadow effects. The former two points can be quite easily solved with a calibration of the system and with the adoption of high resolution cameras. In regards to the latter two points, of course, a dark image is not feasible for EAgLE at the moment. Problems can occur during the first and last hours of the day, when the sun is barely perpendicular to the road and shadows can modify the size of the bounding boxes, leading to a misclassification of the vehicle. This means that the proposed methodology can be used continuously, during day and night, only in places with artificial lights, but by calibrating the angle of view and the sensibility of the bounding box can include effects of the shadows. It should be also underlined that the actual video recording sensors are always placed on illuminated sites, since it makes no sense to place a video camera on dark sites. For this reason, the EAgLE methodology is still interesting to be embedded in existing sensors, and, for new installation, should be designed in proper locations to avoid the night (or little light) issues and the shadow effects. Moreover, tests with the light projectors of the cars should be performed to see if the recognition efficiency can be kept using the moving lights. Furthermore, tests at different hours of the day have to be performed, to assess the effect of the sunrays inclination on the recognition performance.

Another important issue concerning the video recording is the detection ability in crowded and congested roads. While the exclusion of other "non-noisy" moving objects (pedestrians, animals, etc.) can be performed with a proper placement of the video camera, the possibility of giving bad results in congested situations is a critical point, especially in urban areas. In highways, in fact, congestions are quite rare, especially out of the rush hours. In the author's opinion, this is a problem that can be solved by improving the detection and classification code of EAgLE. As mentioned in the introduction, several techniques have been developed for this purpose, much more advanced that the one implemented in this preliminary application, based on machine learning, deep learning, neural network, etc. (see for instance [19]). For this reason, the author is confident that great improvements can be done on this issue, by tuning the detection algorithm on the case study under investigation.

Another limitation of the EAgLE methodology lies in the estimation of non-standard events, such as honking, sirens, extremely noisy vehicles, external sources, etc. It must be underlined that none of the predictive models present in literature nowadays can predict such events, thus, from this point of view, EAgLE, at the moment, is somehow aligned with the other models. In any case, trigger events could be implemented in the recognition code, for instance, using the lights of the ambulances or of the police cars, to tag these events as non-standard and treat them in a proper way.

The preliminary application presented in previous sections is limited due to the small number of measurements and to the free flow condition. More on this part must be done in future researches in order to validate the technique on a larger sample of measurements, with different traffic conditions and geometric features of the sites.

Looking at the comparison in Table 4, it could be argued that such a strong computation effort is not needed, since the CNOSSOS model gives very similar results. The key point is that the CNOSSOS model needs several inputs to run, while the EAgLE methodology produced excellent results just using the video recordings. Moreover, EAgLE includes a fully dynamic model, since it considers the speed and the kinematics of the single vehicle.

Even with the above mentioned limitations of this study, the EAgLE methodology is really promising, because of its easiness in application in any place controlled by video camera recordings. The actual algorithm is quite easy and can be implemented in real time monitoring, to produce raw estimations of noise levels. Of course, for a more reliable estimation, an offline analysis is mandatory in order to clean the raw data from mistakes in counting or the classification of vehicles, such as in estimating the speed. The integration of this system in a complex sensor, including video recording, online analysis, data transmission, and offline processing, is encouraged by the preliminary results obtained in the case study application. The author believes that with a more powerful video camera network and an improved data processing system, this methodology can be extremely useful in qualitative noise monitoring systems, especially in urban areas and big infrastructures, where usually video recording is already present for safety reasons or for tolling/fining systems.

Future studies should include the production of a test sensor that embeds the EAgLE methodology, with a video camera, a sound level meter, and a processor able to run the algorithm at a local site. In this way, the sensor can be tested on a large scale validation, with a continuous recording of pressure levels and video images in order to test the online performances and the criticisms. A sensibility analysis of the sensor can be performed, testing the variations according to the detection and propagation critical elements (such as angle of view, distance, geometry of the site, etc.) and to the source parameters (such as flow volume, typology and dynamics, pulsing conditions, and/or congestions). Moreover, the non-standard events, such as honking, ambulances, police sirens, etc., should be investigated, since the noise produced is due to both the vehicle and to external loudspeakers.

Once the EAgLE methodology will be validated, a large spatial scale can be tested, with the aim to produce a noise map of a city or of a transportation infrastructure, taking advantage of the existing video camera networks. When long term recordings are available, for instance, in more than a year, the *Lden* estimation can be performed, using real traffic data, instead of simulating ideal conditions in noise predictive software. This could help local policy makers and infrastructure managers in finding the critical points of their networks and, if needed, in committing to implement further investigations, based on standard noise level measurements or other tools.

#### **5. Conclusions**

In this paper, the EAgLE (Equivalent Acoustic Level Estimator) technique has been presented. This technique, based on image analysis, vehicle tracking, and dynamic noise modeling, aims at producing a robust estimation of the continuous equivalent noise level on given time ranges, by using just a video camera recording.

A preliminary application of the technique, in a short time range (630 s) related to a case study along a highway in South Italy, has been presented, showing how, with a good recognition efficiency, the noise levels estimated with EAgLE are extremely close to the measured levels in this reduced sample of measurements performed in free flow and standard conditions.

More tests are needed to validate the EAgLE procedure. Moreover, beside the shortcomings discussed in the previous sections, several strength points arise from the first tests. In particular, the possibility to provide reliable qualitative estimations of the noise level in any place embedded with a video camera, in cities, or along transportation infrastructures, is definitively the key point of the proposed sensor. These estimations can be used on one side to cope with the need of a large spatial monitoring, and on the other side to provide first level alarms of exceeding limit thresholds, to be checked with follow-up interventions at specific sites.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1424-8220/20/3/701/s1, Video S1: 1-minute video of the EAgLE counting algorithm running.

**Funding:** This research received no external funding.

**Acknowledgments:** The author is grateful to Joseph Quartieri for providing support for this research. This research would not be possible without the efforts of Antonio Marino who developed the code during his undergraduate thesis period and who helped in the analyses. The author thanks Valentina Salzano and Angela Raimondo for participating in the field measurement campaign, in the framework of their undergraduate thesis. The author expresses gratitude to the editors and to the reviewers for the valuable suggestions and comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
