2.2.1. Acoustic Vector Sensor

An intensity-based AVS (Acoustic Vector Sensor) described in this paper consists of three pairs of pressure sensors (microphones) positioned on three orthogonal axes, in equal distances from the center point. Each pair of microphones forms a *p-p* sensor, which is used for measuring the particle velocity in a given direction. The averaged pressure at the center point is computed as a mean of all six pressure signals. The intensity on each axis is proportional to the product of the particle velocity and the averaged pressure. Figure 8 shows the AVS used during experiments. Computation of intensity and angles are performed by the algorithm, as outlined in Figure 9. The developed algorithm for source localization and tracking has three main sections. The first part is related to the correction section. Correction of the pressure signals, as obtained by the microphones, should be applied for the proper determination of the sound intensity. This correction is realized in two steps. First, the frequency responses of all microphones are equalized, and the phase response in the microphone pair is also equalized (block denoted as *A&P.Corr* in Figure 9). Next, the particle velocity on each axis (*ux*, *uy*, *uz*) and the average acoustic pressure *p* are calculated. Then the second step of the correction is performed

in which phase differences between the particle velocity and the average acoustic pressure signals are equalized (*P.Corr*). The authors' previous publication described the detailed description of the calibration and correction process [21].

**Figure 8.** The acoustical vector sensor (AVS) designed by the authors (**a**), the AVS inside the windscreen (**b**), and the AVS during the measurements (**c**).

**Figure 9.** Block diagram of the developed algorithm for detection, localization, and tracking of moving sound sources.

After these steps, the sound intensity components (*ix*, *iy*, *iz*) can be determined by computing a product of each intensity component with the averaged pressure signal and by integrating the result [22, 23]. The second part of the algorithm contains smoothing blocks, labeled as *S*. Noise suppression procedure has to be applied to each intensity signal in order to make the vehicle detection possible. In the experiments described in this paper, a Savitzky–Golay filter was used in order to suppress noise and to obtain smoothed intensity functions [24]. The optimal values of the filter length (51) and the polynomial order (3) were experimentally found, as a good balance between the details and noise present in the processed signals. At the end of the second part, the three smoothed components *IX*, *IY*, and *IZ* are known.

The last section, labeled as *V.D.*—vehicle detection block—uses the intensity components and values of azimuth and elevation angle for making a decision regarding the presence of sound source (a vehicle in the considered scenario).

#### 2.2.2. Intensity Computation

A three-dimensional (3D) AVS applied during our research is able to measure the acoustic particle velocity (referred to as "velocity" further in the text) in three orthogonal directions, as well as pressure in the central point of the sensor. Sound intensity vectors in three orthogonal directions may be obtained based on the velocity and pressure measurement results.

The acoustic pressure has to be measured at six points located on three orthogonal axes, at identical distances *d* from the origin. These points are denoted as *x*1, *x*2, *y*1, *y*2, *z*1, and *z*2, describing their location in the coordinate system, e.g., point *y*2 is located at (0, *d*, 0) and *y*1 at (0, −*d*, 0). Omnidirectional microphones of the same type are used to measure pressure *pl*(*t*) at six locations *l*. According to the Euler's formula [22], velocity vectors **u***i*(*t*) alongside axes X, Y, Z may be computed as:

$$
\begin{bmatrix} \mathbf{u}\_x(t) \\ \mathbf{u}\_y(t) \\ \mathbf{u}\_z(t) \end{bmatrix} = \begin{bmatrix} a\_x & 0 & 0 \\ 0 & a\_y & 0 \\ 0 & 0 & a\_z \end{bmatrix} \cdot \begin{bmatrix} p\_{x2}(t) - p\_{x1}(t) \\ p\_{y2}(t) - p\_{y1}(t) \\ p\_{z2}(t) - p\_{z1}(t) \end{bmatrix} \tag{8}
$$

where *ai* are the scaling factors (determined during calibration procedure). The magnitude of the **u***i*(*t*) vector will be denoted as *ui*(*t*). Pressure *p*(*t*) measured at the origin is averaged from two points at the given axis and it has to be equal on all three axes. In practice, the pressure is averaged from all six microphones (as in Equation (9)):

$$p(t) = \frac{p\_{\ge 1}(t) + p\_{\ge 2}(t) + p\_{y1}(t) + p\_{y2}(t) + p\_{z1}(t) + p\_{z2}(t)}{6} \tag{9}$$

The sound intensity *I* at a given axis can be then computed, as [22]:

$$I = \frac{1}{T} \int\limits\_{T} p(t)u(t) \,\mathrm{d}t \tag{10}$$

where *T* is the integration period.

If a single, omnidirectional sound source is put into the system at polar coordinates (*r*, φ, θ), the angles of the sound received by the AVS may then be computed as:

$$\phi = \arctan\left(\frac{I\_y}{I\_x}\right) \tag{11}$$

$$\theta = \arctan\left(\frac{I\_z}{\sqrt{l\_x^2 + l\_y^2}}\right) \tag{12}$$

where *Ix*, *Iy*, *Iz* are the intensity signals measured along the axes of the coordinate system, being oriented as shown in Figure 10.

In general, the sound intensity is determined by means of the algorithm described above in the time domain while using broadband signals of acoustic pressure and particle velocity. For purposes that are considered in this article, it is important to perform sound intensity calculation in the frequency range related to the acoustic events produced by vehicles moving near the sensor. The frequency analysis of the background noise and pass-by vehicle sounds were performed to avoid the unwanted and disturbing sounds emitted by other sound sources during vehicle movement. Figure 11 shows the results of this analysis. The dotted line indicates the background noise. The solid line depicts

the spectrum of pass-by vehicle event. It can be noticed that for frequency greater than 6 kHz the background noise is close to the noise that is emitted by the vehicle. The grasshoppers generated this high background noise. Low-frequency noise can be produced by the wind and other sound sources placed far away from the measurement point. For this purpose, the sound intensity analysis was limited to the frequency range: 400 Hz–4 kHz. It was shown in Figure 11 while using two vertical dotted lines. In this way, the essential part of the acoustic energy emitted by the moving vehicle was taken into consideration during the calculation of sound intensity and direction of arrival.

**Figure 11.** Acoustic energy distribution in the frequency domain for background noise and for the pass-by vehicle.

Figure 12 shows an example of the output of the algorithm described above for an acoustic event evoked by a single pass-by vehicle. The left chart depicted the sound intensity components. We can indicate the highest values that were obtained for the direction perpendicular to the road (*Iy*). Other components have relatively lover values. This observation will be an essential fact during the development of the vehicle detection module. The right chart includes the direction of arrival components, being expressed by azimuth and elevation angles. In Figure 13, an example of 120 s of sound intensity continuous analysis was shown. The acoustic events that are evoked by the vehicles are clearly visible. An event typical for a group of vehicles occurred around 80 s. The rapid changes of the azimuth angle can be noticed for this event. It is important to emphasize that no other acoustic

events than passing vehicles can be observed. It confirms that the frequency range of sound intensity calculation was correctly selected.

**Figure 12.** The results obtained from the AVS signal recorded for a single-vehicle: (**a**) intensity components, (**b**) azimuth and elevation.2.2.3. Vehicle Detection.

**Figure 13.** The results obtained from the AVS signal recorded for several vehicles: (**a**) intensity components, (**b**) azimuth and elevation.

The algorithm works in two stages. The first stage is based on the analysis of sound intensity signals and it detects acoustic events. The second stage analyses a detection function, based on the normalized source position, its task is to determine whether the acoustic event represents a vehicle

passing the sensor and detecting its direction of movement. The detected and verified acoustic events may be analyzed further, e.g., for the velocity estimation.

In the first stage, the intensity signal is analyzed with a sliding window **w** of length *N*, moving with a step equal to one sample. In the experiments, the window span was about 640 ms. It was found that using the intensity signal that was measured on the axis perpendicular to the road (*IY*) provides better results than the total intensity. An acoustic event is detected if:

$$\mathbf{w}\_{N/2} = \max\_{i} \mathbf{w}\_{i} \text{ and } \frac{1}{N} \sum\_{i=1}^{N} I\_{Y(i)} \ge T\_{\text{int}} \tag{13}$$

where *Tint* is the minimum average intensity within the window and *I* is the sample index within the window. The value of *Tint* must be set according to the amplitude of the analyzed signal, so that both low-energy and short-term (impulse) acoustic events are discarded.

The second stage of the algorithm analyzes the changes in the normalized position of the sound source. Position *x* of the source moving along the trajectory parallel to the X-axis of the system, at a normalized distance of *y* = 1 m, is equal to:

$$\tan(\phi) = \frac{I\_Y}{I\_X} \tag{14}$$

where φ is the azimuth of the source.

If a vehicle moves along this trajectory with an approximately constant velocity, then smooth changes in *x* will be observed, and the direction of these changes will indicate the direction of the object's movement. For acoustic events that are not related to moving vehicles, much larger changes in the source position can be expected. Therefore, the detection metric *d*, as computed within the same window as in the first stage, is:

$$d = \frac{2}{N} \sum\_{i=1}^{N/2} \left( \frac{I\_{Y(i+1)}}{I\_{X(i+1)}} - \frac{I\_{Y(i)}}{I\_{X(i)}} \right) \tag{15}$$

Only the first half of the window is used. In the case of isolated vehicles, the second half is redundant and, if several vehicles move close to each other, their measured intensities overlap, which usually causes more distortion in the second part of the window. The sign of *d* indicates the direction of movement: vehicles moving towards positive *x* values have *d* < 0, vehicles moving in the opposite direction have *d* > 0. Additionally, standard deviation within the window might be computed, similarly to *d*. It is expected that the standard deviation will be small for vehicles moving past the sensor, and high for unrelated acoustic events, which might be discarded with a maximum standard deviation threshold.

Figure 14 presents an example of detection. The maxima of the intensity function is detected as acoustic events (Figure 14a). The detection function (normalized *x* position) smoothly changes within the acoustic events and oscillates randomly when no events are present (Figure 14b). The value of *d* computed for each event indicates the direction of a vehicle's movement; this is marked with bars pointing upwards or downwards for *d* < 0 and *d* > 0, respectively.

Detections from the reference data are marked with dots, with the direction being indicated in the same way. It can be observed that most of the vehicles were correctly detected and identified. For isolated vehicles (frame 28429), the detection function changes smoothly within the whole detection window, and the analysis is straightforward. When multiple vehicles are moving close to each other, their detection functions overlap. In some events (e.g., frames 31804 & 31898), the results are correct. In the case of a heavy occlusion (multiple vehicles on both lanes), the probability of errors increases. In the presented example, the vehicle in frame 29835 is detected, but its direction is incorrect, due to the overlap of detection functions from many vehicles, and one vehicle (frame 29742) was missed, as two intensity maxima from two vehicles merged into a single one.

**Figure 14.** Example of the detection results: (**a**) sound intensity (line) and detected acoustic events (+); (**b**) detection function (line), detected vehicles (vertical bars, the direction of bars indicates the direction of movement), reference data (dots, vertical position indicates the direction of movement).

#### **3. Results**

#### *3.1. Test Setup*

A test setup was constructed and experiments were performed in a real-world scenario in order to verify the proposed algorithms. A low-cost RSM2650 Doppler sensor by B+B Sensors [25] was used. The sensor was connected through a custom-built amplifier and an analog-to-digital converter (48 kHz sampling rate) to a Raspberry Pi 3B microcomputer. All of the elements, together with an LTE router and a power supply, were placed in an enclosure (Figure 15). In the first stage of the research, signals from the sensor were recorded on the microcomputer and then downloaded for offline analysis. The analysis of the Doppler sensor signal was performed online on the microcomputer in the experiments described here, and the results (timestamp, velocity, and standard deviation) were available via the MQTT network protocol. The signal was analyzed in windows of 2048 samples (42.67 ms) with 75% overlap while using the Blackman window before FFT was computed. The processing algorithm was implemented in the Python programming language.

The AVS was constructed from six omnidirectional digital MEMS microphones, IvenSense INMP441 [26], operating at 48 kHz sampling rate with 24-bit resolution. Each microphone was mounted on a board of ca. 10 × 10 mm size that was connected through an I2S USB digital interface to a USB port on a computer (Figure 8). The sensor was mounted in a windshield at the bottom side of the enclosure. The six-channel signal was recorded on the microcomputer and then stored on a hard drive. The recordings were analyzed offline. Additionally, environmental sensors (temperature, pressure, precipitation, air quality), as well as a LiDAR sensor and a video camera, were mounted on the enclosure; these were not used in the experiments described here.

The test system was mounted on the outskirts of Gda ´nsk, Poland (near Le ´zno village), geographic coordinates: 54.344555, 18.443811. The monitored road section had one lane in each direction and the speed limit was 90 km/h. The measurements were performed on a straight and flat section of the road (Figure 16), where the typical speed of vehicles is 60 to 80 km/h. The enclosure was mounted 4 m away from the road edge and the bottom side of the box was 2.9 m above the ground. The Doppler sensor was positioned at 3.2 m above the ground, oriented at 45◦ azimuth, and −18◦ elevation relative to the road axis. The algorithm only analyzed the closer lane (eastbound traffic).

**Figure 15.** The test system mounted on the site. The Doppler sensor is located inside the box marked with the rectangle, the AVS in a windshield is visible at the bottom right, below the enclosure.

**Figure 16.** The test road section—a view from the camera mounted in the test system. The first measurement tubes are mounted at the trees visible in the back, the second pair is positioned near the bottom right corner of the photo.

A system based on pneumatic tubes (Metrocount MC5600 Vehicle Counter System) was mounted on the road in order to obtain reference data for the experiments. Two pairs of tubes were used. One pair was positioned near the test system, for comparison with the AVS results. The other pair of tubes were mounted ca. 100 m away, within the zone of Doppler sensor detection. Recordings that spanned a continuous period of 24 h (July 1st, 14:00 to 2nd July 2019) were obtained, and timestamps and velocity measured for each vehicle (from both lanes) were used for comparison with the Doppler and AVS sensors results. The temperature during the recording was 15 ◦C to 26 ◦C, average pressure was 997 hPa, the wind was up to 7 m/s from the West, and there were occasional periods of rainfall (about 15% of the total time).

#### *3.2. Analysis of Vehicle Counting*

Aggregating the detection results in 30-min. periods was undertaken to analyze the results of vehicle counting on the lane closer to the test setup. Data from the tubes were used as the reference and it is assumed that there are no detection errors in the recorded data (this was partially confirmed by reviewing selected sections of the recorded video). Table 1 presents the results that were obtained for both sensors within the measurement period. For the AVS, a total of 30 min. was missing from the recorded material due to technical difficulties. Figure 17 shows the vehicle count aggregated in 30 min. intervals, for the data from the Doppler sensor, analyzed by the proposed algorithm, and data from the reference device. Figure 18 presents a similar plot calculated for the AVS. The Pearson correlation coefficient between the sets of the calculated and the reference vehicle counts is 0.994 for the radar and 0.995 for the AVS.

**Table 1.** Summary of the vehicle detection results.


**Figure 17.** Vehicle count in 30-min. intervals—measured with the Doppler sensor and the proposed algorithm (solid line), and the reference data from the tube detector (dashed line).

**Figure 18.** Vehicle count in 30-min. intervals—measured with the AVS and the proposed algorithm (solid line), and the reference data from the tube detector (dashed line).

'Ghost tracks' when the algorithm followed noisy components of the signal and produced duplicated detections mainly caused the observed false-positive results. To conclude, improving the algorithm accuracy requires its modifications that will make it more robust to the observed errors, and also tuning the algorithm parameters and, if possible, repositioning the sensor.

The results of vehicle counting that were obtained with the AVS and the proposed algorithm (Figure 18) were consistent with the results from the Doppler sensor. The number of false-positive and false-negative results is slightly larger for the AVS, the overall accuracy is lower (82% vs. 90%). The main source of incorrect detection results is the problem with determining the direction of movement in the case of occlusion (vehicles on both lanes, moving in opposite directions) and when several vehicles move close to each other. In such cases, the detection function does not allow for accurate direction analysis, because the signals from multiple vehicles overlap. As a result, some vehicles were detected, but their direction was incorrect, as shown in Table 2. However, this problem occurs on both lanes. In the presented experiment, the number of vehicles on each lane was similar, so that false positive and false negative results on each lane were mostly balanced. Some vehicles could not be detected at all, which was mostly due to high occlusion. As shown in Table 2, the number of vehicles in the closer lane that was not detected is almost equal to the number of vehicles detected with incorrect direction. The number of vehicles that were not detected is similar on both lanes, while the incorrect detection of the direction happens more often on the closer lane. The number of false detections is higher in the further lane, which results from the occlusion. In total, statistics that were obtained in 30-min. intervals were very similar to those from the Doppler sensor, they also slightly underestimate the real vehicle count in the case of high traffic. The trend also highly correlates with the reference data.



### *3.3. Analysis of Velocity Measurement Using Doppler Sensor*

Comparison of velocity measured by the proposed algorithm analyzing signals from the Doppler sensor and by the reference device cannot be accurately performed due to the fact that the tube-based system measured the velocity at one point, about 100 m from the sensor, while the Doppler sensor measured velocity at different points within the zone approximately 50 to 100 m from the sensor. Therefore, any observed differences may be caused both by measurement errors and by vehicles that accelerate or brake within the detection zone. The mean squared difference (MSD) between the measurements of individual vehicles by both sensors is 19.45, with a standard deviation that is equal to 176.47, and the root of the MSD (RSMD) is 4.41 ± 13.28 km/h. The accuracy of measuring the velocity of individual vehicles with the proposed algorithm is satisfactory while taking the condition mentioned before into account.

Figure 19 shows the results of averaging the velocity in 30-min. intervals for both data sources. It can be observed that the results that were obtained from the evaluated algorithm are slightly lower than for the reference device (MSD 1.93 ± 1.68, RMSD 1.39 km/h). Both datasets follow a similar trend and the Pearson correlation coefficient is 0.92. This confirms that the evaluated algorithm provides velocity measurements with accuracy that is sufficient for collecting traffic statistics.

**Figure 19.** Vehicle velocity in 30-min. intervals—measured with the Doppler sensor and the proposed algorithm (solid line) and reference data from the tube detector (dashed line, shifted on the time axis for improved readability). Points represent mean velocity, error bars—standard deviation.
