5.1. Headway Threshold Criterion for the Detection of Free-Moving Vehicles
In
Section 3, we have observed the exponential model, which suggests that the arrivals of unconditioned vehicles can be described as Poissonian. Additionally, their inter-arrival time distribution follows a negative exponential distribution. When analyzing the experimental data described in
Section 5.1, we can use the threshold headway criterion to identify free-flowing vehicles. This involves scrutinizing the inter-arrival times of the sampled vehicles in each monitoring section and identifying the lowest value for which the sample distribution can be approximated by an exponential distribution. This value is deemed as the threshold for the absence of conditioning: any inter-arrival values higher than the threshold are thus consistent with the distribution of random arrivals. Correspondingly, the associated sample headway distribution is considered as consistent with the distribution of random arrivals.
From a statistical standpoint, the threshold time value represents the minimum headway for which the hypothesis of absence of conditioning with respect to the preceding vehicle, or equivalently the randomness of arrival, cannot be rejected. In general, it can be observed that if f(τ) is an exponential probability density function and F(τ) is the related cumulative distribution function, the logarithmic transformation L(τ) of the complement to one of the F(τ) has a linear trend. Thus, to identify the critical headway, we can analyze the logarithmic transformation of the complement to one of the experimental headway distribution function , i.e., , and we can examine the minimum time interval value τ*, for which can be approximated by a straight line.
An explanatory example of how this can be carried out can be provided by using the traffic data collected according to the methods discussed in
Section 5.1, which relate to the two monitoring sections presented in
Section 5.2. Only values less than 300 s were considered for the two monitoring sections, under the assumption that arrivals with headways greater than this value were entirely random and not part of the same traffic stream. By grouping headway values less than 5 min into bins of 1 s (in order that the 0 s bin includes headways ≤ 0.5 s, the 1 s bin includes headways between 0.5 and 1.5 s, etc.), we can evaluate the experimental probability density function
, the experimental probability distribution function
, and the logarithmic transformation of its complement to one
.
For each of the two considered monitoring sections, i.e., 218 and 1333, and directions of travel, after grouping the headways less than 300 s into bins of 1 s, we proceeded to test progressively every value of possible critical headway
between 0 and 9 s. For every
,
was computed for
≤ τ ≤ 300 s. Thus, we analyzed the trend of
and estimated the regression line of the {X = τ; Y =
}. Upon analyzing the obtained trends,
Figure 5 shows that as
increases, the possibility of approximating
with a straight line significantly improves, with a Goodness of Fit (GoF) that can be initially observed merely graphically.
However, a graphical assessment of linearity as in
Figure 5 does not allow us to identify the threshold value τ* as we intended. Measures of GoF, such as SSE (Sum of Squared Residuals) and R2 (Coefficient of Determination), were evaluated by considering the experimental trend of
for
and that of the regression line. Additionally, the regression line obtained from each
was employed to estimate the theoretical frequencies of each headway class for 1 s bins between 0 and 30 s. By considering the deviations between the sample frequencies and the theoretical frequencies obtained with the regression, MAPE (Mean Absolute Percentage Error) and MXAPE (Maximum Absolute Percentage Error) were also evaluated. The values for both sections, varying with τ*, are reported in
Table 4:
Based on these clear trends, however, we are not yet able to identify which
we can assume as the threshold value. It is necessary to develop a specific criterion that can be used to select the minimum value of
that makes the linear approximation of
acceptable, and thus estimate the suitable
to represent the threshold value between free-flow and congested traffic. A criterion based on a statistical GoF test between probability distributions was introduced to address this question. The Kolmogorov–Smirnov (KS) test, commonly found in literature [
65], was used to compare a sample distribution with a reference distribution (in this case, exponential). The KS test statistic D
n, in Equation (5):
is obtained by comparing the experimental probability distribution functions, i.e., each sample distribution
with
, and the theoretical probability distribution function with which the comparison is made, F(τ), at each point τ of the related sample. The null hypothesis H
0 is that
. If the maximum difference between the theoretical and observed frequencies is large, we reject the null hypothesis for large values of
, while we do not reject it for small values of
. The critical value of
(which we can denote as KSC) is defined for a certain sample size N and level of significance α, i.e.,
.
If
, then this result leads to the non-rejection of the null hypothesis H
0, which states that the data come from the hypothesized distribution model, with the probability distribution function F(τ). Usually, a significance level of 5% (α = 0.05) is adopted as the reference value. In general, GoF tests, such as the KS test, provide some information about the fit between the sample and the theoretical model, based on
p-values and critical values. In the case of the KS test, lower values of the statistic, particularly below the critical value, indicate an agreement between the sample and the theoretical model. While the meaning of the
statistic is intuitively evident, calculating its probability distribution (under the null hypothesis H
0) is more complicated. It is shown that under the null hypothesis H
0, the probability distribution of the test statistic
does not depend on the functional form of F(τ), and therefore the critical values can be obtained from specific statistical tables for small samples or calculated based on simple formulas for large samples, as a function of the significance level α. In general, the formula for
is:
and if we consider the typical value of α = 0.05, then
. However, in general, this type of test is extremely sensitive to the sample size. As the sample size increases, the
p-values decrease dramatically, even in cases where the good agreement of data with the candidate model is evident [
66]. As for
, we see that it decreases rapidly as N increases. Considering the available sample sizes (in the order of several hundred thousand for each section), the KS test would lead to rejecting the exponential distribution for all critical headway values, even if they are sufficiently large, due to its intrinsic limitation. Despite the clear linear trend of
for
as
increases, due to the sample size N and the reduction in critical values
, the KS test (as well as other similar GoF tests) provides extremely low
p-values (and
values much larger than excessively low critical values). This would always make the null hypothesis H
0 of agreement with any theoretical model unacceptable [
67,
68,
69,
70].
In formulating the criterion to find the appropriate value of
that can be assumed as the threshold value τ*, the problem is to make correct inference even in the case of a large N, by always applying the KS test to test the null hypothesis of exponential distribution of vehicle interarrival times greater than
. In literature, there are implementations of KS tests on large samples, including headways, by randomly extracting a large number of small sub-samples from the original sample [
67,
71]. Thus, in defining the criterion for choosing the threshold for free-moving vehicles, we have adopted this approach, based on random resampling without replacement from the initial sample of size N, by extracting a large number m of small sub-samples of size n ≪ N.
As previously mentioned, the choice to work with a big number (m = 1000) of small random sub-samples (n = 300) compared to the hundreds of thousands of headways in the original samples solves the problem of GoF test oversensitivity, related to the loss of power of hypothesis tests for very large sample sizes, that leads to rejecting significant distributions. Therefore, for each of the 1000 sub-samples, the
statistic was calculated, and to evaluate the acceptability of H
0 (exponential distribution) for each test threshold value τ* based on
, a significance level of α = 0.05 was chosen resulting in a critical value of
. The proposed criterion for assessing the overall GoF involves calculating the
statistic for each sub-sample and obtaining the average
of these values. This criterion serves as a heuristic guideline to address the oversensitivity of traditional KS tests when dealing with large datasets. By comparing the mean value
with a critical value, it provides a basis for accepting or rejecting the null hypothesis for the entire dataset. Regarding the aforementioned KS tests,
Table 5 presents the mean value
of the
statistic over 1000 random sub-samples of size 300 for two illustrative cases. It also indicates (**) that the null hypothesis is not rejected based on the heuristic criterion
. By considering the smallest value of
between 0 and 9 with a
, the critical headway value τ* is identified. Thus, using the exponential model in the two road sections for test, a threshold headway τ* of 4 s is identified in 218 and 8 s in 1333. Therefore, we can say that vehicles with a headway greater than or equal to these values appear in the sample with characteristics that allow us to attribute their arrivals to a degree of randomness compatible with a Poisson process. As a result, we can consider these as free-moving vehicles. Conversely, vehicles with headways less than these threshold values are not characterized by randomness of arrivals, and thus may express a certain conditioning.
5.2. Speeds and Manoeuvring Freedom Criterion for the Detection of Actual/Apparent Conditioning
We have highlighted that this status can be classified as actual conditioning if vehicles show a reduction in maneuvering freedom with a tendency to conform to the kinematic conditions of the preceding vehicle. On the other hand, it can be classified as apparent conditioning if the vehicles still retain a certain degree of maneuvering freedom compared to the previous vehicle. As previously stated, we need to further characterize this conditioning by introducing some other criteria that allow us to distinguish between actual conditioning and apparent conditioning. In order to ensure the accuracy of the analysis, a criterion that considers only the degree of maneuvering freedom of each vehicle must be established. Based on the information available for each passage reported in
Section 5.2, this criterion can be derived from the analysis of the speed of each vehicle and its relative speed with respect to the preceding vehicle.
An initial analysis that helps to differentiate between free-moving vehicles and conditioned vehicles concerns the experimental bivariate distribution of headway and speed. This is shown in
Figure 6 as a heat map for headways less than or equal to 60 s and speeds less than or equal to 130 km/h. From the experimental bivariate distributions of speed and headway, we can observe a separation between two distinct zones in the heat maps, which can be identified by the threshold between non-free (sub-sample 2) and free-moving vehicles (sub-sample 3). Below this threshold (i.e., 4 s for the CU 218 and 8 s for the CU 1333) the speed distribution covers a wider range that also includes values of 30–40 km/h, which are infrequent for headways above the same threshold.
The threshold headway criterion, by itself, is insufficient in providing a comprehensive characterization of conditioning situations since it fails to distinguish between actual and apparent conditioning. Nevertheless, it facilitates the identification of unconditioned vehicles, and thus the threshold headway criterion establishes a value with a sufficient level of caution in filtering only fully and completely free-moving vehicles. Thus, free- moving vehicles in sub-sample 3 in
Figure 6 can be filtered from the overall data using the headway threshold and further analyzed to determine speed percentiles in free-flow traffic conditions. For example, as introduced in the Introduction section, this can be useful in the analysis of operational speed (i.e., the 85th percentile of operating speed distributions or V
85), that is a critical factor for road safety, since it significantly influences the frequency and severity of accidents [
72,
73], and it is widely acknowledged as a benchmark value for evaluating consistency in homogeneous road sections [
74,
75,
76].
Figure 7 illustrates the experimental and best-fitting distributions (Gumbel Max for CU 218 and Chi-Squared for CU 1333) and the corresponding V
85 values in the two monitored sections from the speed data in sub-sample 3.
After distinguishing the two subgroups of free-moving and conditioned vehicles based on the threshold headway and identifying their differences in terms of speed distributions, further analyses can be conducted in relation to the relative speeds. The speed difference between a vehicle and the preceding one can be used as a parameter to represent the freedom of maneuver, where a lower absolute value implies a higher likelihood of the vehicle being influenced by the preceding one and vice versa.
Figure 8 presents representative histograms of the experimental distributions of speed differences for the overall sample, as well as the two partitions of conditioned vehicles (sub-sample 2) and free-moving vehicles (sub-sample 3).
The figure also includes the distribution that best fits the experimental data based on the KS test. As evidenced by the images and numerical results in
Table 6, the distributions appear substantially different in terms of both descriptive statistics and theoretical distributions that accurately describe their trends.
The descriptive statistics indicate that the mean values are close to zero in all cases, but the standard deviation values differ among the samples, with sub-sample 2 having values closer to the mean than the other two samples. The skewness values are close to zero, indicating that the distributions are approximately symmetrical. However, all kurtosis values are greater than 3, suggesting that all the distributions are more peaked than a normal distribution, with sub-sample 2 having a more peaked distribution than the other samples. These differences in kurtosis values suggest structural differences in the data distribution, primarily in comparison to sub-sample 3. These observations apply to both the CU 218 and CU 1333 test sections illustrated in
Figure 8, which displays the histograms and the best-fit function.
The selection of the Logistic distribution for sub-sample 3 and the Laplace distribution for both sub-sample 2 and the overall data further indicates differences in the headway distribution between sub-samples 2 and 3, in terms of the presence of extreme values and the probability of producing values close to the mean. Therefore, speed differences in sub-sample 3 with headway values equal to or greater than the threshold value may be more prone to extreme variations than those in sub-sample 2.
Following Miller’s (1961) proposal [
17], later adopted by Boora et al., (2018) [
43], we can utilize these differences in distributions to identify the range of speed differences in which the frequencies of the non-free-moving vehicle sub-sample exceed those of the free-moving vehicle one. If the speed difference distribution of sub-sample 2 exceeds that of sub-sample 3 within a certain interval, then there are more vehicles in sub-sample 2 with speed differences falling within that interval than there are free-moving vehicles in sub-sample 3. This suggests that vehicles in sub-sample 2 with speed differences in that interval may have probabilistically different behaviors from free-moving vehicles in sub-sample 3. We can consider the vehicles in sub-sample 2 with speed differences falling within the interval that characterizes probabilistically different behaviors from free-moving ones as actually conditioned, and those with differences outside the interval as apparently conditioned.
Using this approach, as shown in
Figure 9, we can identify the range of speed differences in which the frequencies of the distribution of conditioned situations (actual + apparent) surpass those of free-moving ones, representing a range of conditioning prevalence that can indicate actual conditioning. Based on this range, we can define a new criterion to represent the freedom of maneuver and to distinguish between these two situations:
Vehicles with speed differences outside this range of conditioning prevalence are apparently conditioned;
Vehicles with speed differences within this range of conditioning prevalence are actually conditioned.
Figure 9.
Experimental distributions of speed differences for the overall sample, sub-sample 2, sub-sample 3, and conditioning prevalence interval: (a) road section 218; (b) road section 1333.
Figure 9.
Experimental distributions of speed differences for the overall sample, sub-sample 2, sub-sample 3, and conditioning prevalence interval: (a) road section 218; (b) road section 1333.
In the case under consideration, the ranges that can be identified based on the experimental distributions, as shown in
Figure 9, consider speed differences between −5 km/h and 5 km/h for road section 218 and between 6 km/h and 6 km/h for road section 1333.
5.4. Further Insights on the Distribution of Inter-Vehicle Distance in Vehicle Conditioning
An interesting aspect to investigate is certainly the spacing between vehicles, namely, the spatial distance between two consecutive vehicles. In
Table 7, the trends of the average spacing (in meters) between consecutive vehicles for headway bins between 0 (
) and 10 (
) are shown. As we can see, the average spacing for the threshold headway is 87 m for CU 218 (τ* = 4 s) and 157 m for CU 1333 (τ* = 8 s). Since we consider vehicles with headways greater than t* as free moving, they correspond to average headways greater than 87 m for CU 218 and 157 m for CU 1333.
If we shift our attention to analyzing the probability distribution of the spacing for vehicles below the threshold that we defined as conditioned simply due to the fact that they do not meet the criteria of the exponential model for free-moving vehicles (and thus composed of a combination of vehicles that are apparently and actually conditioned), we observe that the average spacing is 34.2 m, with a 90th percentile of 59 m for CU 218 (sample distribution and lognormal fitting in
Figure 12a), and 47 m, with a 90th percentile of 99 m for CU 1333 (sample distribution and lognormal fitting in
Figure 12b).
If we consider the two groups into which we can divide the conditioned vehicles, i.e., actually and apparently conditioned, based on the prevalence interval of the conditioning concerning speed differences, the analysis of distributions clearly highlights differences between the two clusters. In CU 218, the average spacing is 45.4 m for apparently conditioned vehicles, and it is reduced to 31.8 m for actually conditioned vehicles, with a standard deviation decreasing from 19.7 m to 15.4 m. In CU 1333, the average spacing is 64.4 m for apparently conditioned vehicles, and it is reduced to 41.2 m for actually conditioned vehicles, with a standard deviation decreasing from 40.2 m to 29.0 m. Consequently, we can confirm that the two groups exhibit statistically different behaviors, even in terms of spatial distancing.
Figure 13 presents the histograms for the two groups, i.e., apparently and actually conditioned, of spatial distances and the resulting distributions from the best approximation according to the usual KS test (lognormal for actually conditioned and beta for apparently conditioned in CU 218; 3-parameter lognormal for actually conditioned and gamma for apparently conditioned in CU 1333).