2.2.1. Formulation of Driver Preference Model
Utility theory is widely used to model discrete choice problems. In this theory, a decision maker selects the alternative with the highest utility among those available [
36]. The utility of an alternative is typically modeled as a function of its relevant attributes, often a linear function. To account for the uncertainty of the decision maker, a random utility is added to the utility function, which makes the discrete choice problem probabilistic. In this study, the driver’s preferred trajectory is modeled as the one with the highest expected utility among all alternatives, while the preferred trajectory of the pairwise comparison group is modeled as the one with a higher expected utility.
The relevant attributes of trajectory for vehicles typically include safety, comfort, efficiency, and energy-saving. However, for simplicity, energy-saving is not considered in this study. Let
,
, and
represent the safety, comfort, and efficiency utility, respectively. Let
,
, and
represent the linear weight parameters of the safety, comfort, and efficiency utility, with
representing the random utility. Therefore, the utility (
U) of a trajectory could be represented as follows:
where,
,
.
, and
represent the normalized linear weight parameters of the safety, comfort, and efficiency utilities, respectively, all within [0, 1].
represents the normalization coefficient.
The safety, comfort, and efficiency utility
of a trajectory are unknown. However, they can be calculated using assumed utility functions and corresponding trajectory attributes or characteristic indicators. Let
represent the safety corresponding trajectory indicators. Let
represent the linear weight parameters of the safety, comfort, and efficiency utility. Similarly, the safety utility (
) can be calculated using the following equation:
where,
,
. The comfort and efficiency utility functions are similar but use different trajectory characteristic indicators.
The safety utility is calculated based on trajectory indicators. It means that the safety utility is the driver’s direct perception of the trajectory. Therefore, it is called the safety perception model (SPM) in this research. Similarly, the comfort perception model (CPM) and efficiency perception model (EPM) are used for calculating the comfort and efficiency utility functions, respectively. The utility function of a trajectory, as shown in Equation (1), is indirectly evaluated using the safety, comfort, and efficiency utility functions, and it is referred to as the utility evaluation model (UEM).
For pairwise trajectory comparison group (
A,
B), the probability that a driver with utility function parameter
prefers trajectory
A to
B, represented by
, can be modeled as the probability that the utility of trajectory
A (
is larger than that of trajectory
B (
, represented by
, which is formulated as:
where,
and
could be assumed to be an independent and identical distribution, although the specific distribution is unknown. A reasonable assumption for the distribution of
is a Gaussian distribution, given the central limit theorem. However, this assumption does not lead to a closed-form solution for the probability. A better assumption for the distribution is the standard Gumbel (or type I extreme value) distribution, which does lead to a closed-form solution [
36] as follows:
Equation (4) models the likelihood that the driver prefers trajectory
A to
B for the pairwise comparison group (
A,
B). The probability that the driver prefers trajectory
B to trajectory
A can be modeled as follows:
Based on the above equation, it is easy to predict the driver’s answer to a query. If is larger than 0.5, then the driver is more likely to prefer A than B, and vice versa.
Equations (3)–(5) do not consider the uncertainty of a driver’s answer when distinguishing between two trajectories. Sometimes, drivers may find it difficult to discern the difference between two trajectories, and forcing them to make a deterministic choice may be inappropriate. In such cases, it is more appropriate to allow for uncertain answers. It is assumed that when the absolute difference in utility is closer to zero, the driver is more likely to give an uncertain answer. Thus, the probability of different answers to a query can be modeled as follows:
where, the
UB (upper bound) and
LB (lower bound) represent the probability threshold between the uncertain result and the other two deterministic results.
To calculate the likelihood of a deterministic answer, we can use Equations (4) and (5), respectively. However, to calculate the likelihood of an uncertain answer, we can view it as a joint result of two opposite answers:
where,
represents the probability that the driver’s answer is uncertain.
The correlation between the likelihood of different answers to a query and the utility difference
for drivers with different parameters
is shown in
Figure 2.
Figure 2 illustrates that, on one hand, the likelihood that the driver prefers A to B increases as the utility difference
becomes larger. However, as
approaches 0, the likelihood of an uncertain answer increases. On the other hand, for the same utility difference
, the likelihood of uncertainty decreases as the parameter
β increases, which means that drivers are more likely to give a deterministic answer to a query. The parameter
measures the driver’s ability to distinguish between trajectories and is therefore referred to as the perception coefficient in this study. It is worth noting that, for a specific critical utility difference, e.g., the minimum difference required for a driver to give a deterministic answer, the perception coefficient
β and the corresponding probability thresholds
UB and
LB are interrelated. As the perception coefficient
increases, the values of
UB and
LB approach 0.5. This coupling indicates that the parameters
UB,
LB, and
are interdependent.
Overall, the personalized preference learning system aims to estimate the linear weight parameters and the perception coefficient of the driver preference model (UEM, SPM, CPM, EPM) for each individual.
2.2.2. Estimation Method
The driver preference model parameters are estimated using a Bayesian approach and a limited greedy estimation method. Firstly, an assumption is made about the prior probability distribution of estimation parameters. Then, the estimation is updated based on the driver’s preference trajectory query result of a pairwise trajectory comparison group (
A,
B) at every step, following the Bayesian approach. The flow chart of the parameter update method for query results at each step is shown in
Figure 3.
The first step of the estimation process involves updating the parameter for a given prior parameter set
and perception coefficient
. For the pairwise comparison group (
A,
B) and its corresponding driver query answer, the parameter
can be updated using the following equation:
where,
represents the prior probability distribution of estimation parameters
, and
represents the likelihood of the query answer as calculated by Equations (4)–(7).
represents the posterior probability distribution of the updated
.
The second step of the estimation process involves determining whether the updated driver preference model, with its newly estimated parameters, can correctly predict the latest query result. The prediction of the latest query result can be calculated using Equation (6). If the prediction matches the real driver answer, then the parameter estimation update ends, and the perception coefficient
remains unchanged. However, if the prediction does not match the real driver answer, the third step involves incrementally increasing the perception coefficient
and repeating the first and second steps until the prediction is consistent with the driver answer. This process is referred to as “greedy” because the likelihood of the answer increases with
, as shown in
Figure 2, leading to a more accurate prediction by the driver preference model with updated parameters. To ensure parameter estimation stability and prevent noisy answers from significantly decreasing estimation accuracy, the increment of
is limited to a value no larger than a specified number Incre_max for each query answer. This is why the estimation method is referred to as “limited-greedy”. Finally, the parameters
are normalized, and the perception coefficient
is modified accordingly.