**1. Introduction**

As the future of road transportation is being shaped around the idea of autonomous mobility, new methods of data acquisition and processing are being developed. Especially in the field of driving assistance systems, there is much current research on detecting and localizing vehicles with many different sensors like radar, LiDAR, computer vision, and acoustics. On the other side, infrastructure-based Intelligent Transportation Systems (ITS) still need further development to exploit the full potential of the already existing sensors. This especially means increasing the level of detail reached with current ITS techniques, which often only deliver aggregated traffic data consisting of datasets with a time resolution of minutes covering traffic parameters like traffic flux (vehicles/hour), density (vehicles/km), and speed (km/hour). This kind of data is a limiting factor for comprehensive analysis on the interaction between individual vehicles. For such detailed analysis, traffic data are required on a microscopic scale, which includes the quasi-continuous trajectory of every vehicle on the road.

The analysis of vehicle-to-vehicle interactions is indispensable for a detailed traffic safety analysis also covering risky situations between vehicles beyond only counting traffic accidents [1]. Many of the surrogate safety indicators rely on such a level of detail [2]. In [3], it was shown that more safety-critical interactions happen when traffic is fluent and vehicle speeds are still at least moderate. In this case, high speed differences lead to more critical time-to-collision values and required deceleration rates to avoid a crash. While analyzing non-fluent traffic can be interesting to calibrate microsimulation models, the lower speeds are less critical from the safety perspective. Beyond safety considerations, efficiency analysis and calibration of traffic flow models can greatly benefit from microscopic traffic data [4]. While these applications rely on the offline processing of the raw data, the work in [5] showed how microscopic traffic models can be integrated into implementing methods for collaborative self-driving vehicles. Similarly, the work in [6] showed how such data can be used to model unmanaged intersections analytically. All these model-based applications could lead to real-time solutions of Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) collaboration to support self-driving vehicles in the future.

#### *1.1. State-of-the-Art Data Acquisition*

Several alternative techniques exist today, capable of acquiring real microscopic traffic data. Of course, one of the most straightforward methods is equipping a large number of vehicles with high-precision Global Navigation Satellite System (GNSS) sensors and recording data from each vehicle [7]. Such methods can also be enhanced by inertial systems [8] to increase positioning accuracy. Other in-car methods can also be used like computer vision systems [9] and fusion of different sensor technologies [10]. While delivering high accuracy and large road coverage, all these techniques share the disadvantage of low traffic coverage. This means that specifically looking at a critical road section of interest and analyzing the dynamics of most vehicles passing through the section is practically impossible. On the other hand, analyzing the dynamics of vehicles specifically equipped with sophisticated sensor systems could be ambiguous, as the behavior of the drivers could be far from natural due to the necessary equipment installed in the vehicles.

A better approach to recording traffic data with a large coverage is the use of infrastructure-based sensors. A very common technique is based on inductive loops [11,12]. They have grea<sup>t</sup> reliability and real-time capabilities and are thus widely used for both highway and urban traffic management. Similar to loop detectors, radar sensors work well in various weather and illumination conditions, enabling the detection of individual vehicles and measuring their speeds [13,14]. Other types of static sensors like laser [15], LiDAR [16], and acoustic sensors [17,18] are also capable of detecting vehicles with their respective speeds. Many of the existing commercial products are very easy to use, as they do not need to be installed at grea<sup>t</sup> heights. They can be deployed on the ground at the road side, without the necessity of lane closures. This also enables them to be supplied by a battery without the necessity of complicated cabling. The key common drawback of these sensors though is that they gather traffic data from cross-sections, by counting vehicles and measuring their speeds [19]. Thus, the standard use case of these sensors does not enable recording microscopic traffic data over an entire area.

While there are a few studies showing how radar- [20] and laser-based [21] sensors can deliver microscopic traffic data, the most common technique by far is by means of computer vision [22–24]. With the appropriate technique of back-projection from pixel coordinates to real-world coordinates, computer vision-based techniques are a cost-effective, non-intrusive, and flexible way of recording microscopic traffic data. Even so, in many cases, the area covered is limited by occlusion, while a dense deployment of cameras is also limited due to the difficulty of road side installation. More specifically, the deployment of surveillance cameras for automatic data acquisition needs thorough planning because they typically need to be deployed at a height where side poles or bridges are required. Installing the cameras at the required height often requires the use of trailer-mounted work platforms and lane closures. Furthermore, the consultation of road administrations, operators of the poles, and sometimes even public transport organizations is necessary. Many cameras additionally require power cabling, with the respective effort and cost, or the installation of batteries on the poles, which can be safety critical. Thus, reducing the number of required cameras by filling gaps for interim sections is of grea<sup>t</sup> interest.

An additional difficulty in using already deployed CCTV cameras for computer vision methods is measuring the exact sensor distance or accurately synchronizing the video streams. In our research project "Highly automated tunnel surveillance for catastrophe managemen<sup>t</sup> and regular operation" (AUTUKAR) regarding automatic video analysis in tunnel surveillance systems, we can only rely on the existing planning documents of the tunnel to determine distances between cameras, which often do not have the desired accuracy. On the other hand, when using real-time video analysis based on video streams, network bandwidth limitation means that a very accurate time synchronization between the streams cannot be guaranteed.

Finally, using cameras is also often restricted by data privacy, a very delicate subject especially in European countries, which often makes application of these sensors impossible. Using sensors that are 100% compliant with privacy laws makes a huge difference in European countries.
