The BPR95 model is more complex in structure than the original BPR model and cannot be transformed into a linear model through appropriate transformations, making calibration more difficult. We propose three algorithms to solve this optimization problem.
3.2. Bayesian Optimization
The calibration problem is a complex nonlinear optimization problem. However, Bayesian optimization (BO) is well suited to handle it. BO is a probabilistic model-based optimization algorithm that uses a surrogate model to approximate the objective function and an acquisition function to balance exploration and exploitation.
Studies have found that BO is widely applied in the solving of traffic optimization problems [
44,
45,
46,
47]. The BO framework consists of two main parts: using a probabilistic model to substitute the original evaluation of costly and intricate objective functions, such as the Gaussian process, and using the posterior information of the surrogate model to create active selection strategies, such as acquisition functions.
The Gaussian process is an infinite-dimensional generalization of the multivariate normal distribution. It is a convenient and robust prior distribution. Let
represent a sample in the calibration problem, where
denotes the parameters in this problem and
is the corresponding objective function value. Let
denote the vector of function observations and let
represent the vector of the expected values of function observations. The Gaussian process assumes that the vector of
samples
generates a multivariate Gaussian distribution:
Here, is the covariance matrix. The form of the distribution is decided by a mean function and a covariance function. In general, the expected value does not affect the optimization process, so a common practice is to set to a constant, i.e., . In the Gaussian process, the covariance between and is measured by the distance between and , . It is assumed that a smaller distance between and will lead to a higher correlation between and . The covariances are generally calculated using the kernel functions.
In this paper, the ARD Matérn 5/2 kernel is used:
Here, and are parameters of this kernel. This covariance function is twice differentiable. These parameters can be estimated by using the maximum likelihood estimation approach.
For any new sample
, the prior distribution can be rewritten as:
The predicted distribution or posterior probability distribution is given by:
Here, denotes the estimated value of given and , and represents the variance of the estimation error introduced into the function observation.
Based on the posterior information, the acquisition function can determine the evaluation direction. The acquisition function is formulated considering both the estimated objective function value (i.e., the expectation in Equation (12)) and the uncertainty information (i.e., the variance in Equation (13)) over all of the feasible regions. Let
represent the acquisition function; then, the determination problem can be expressed as:
The design of the acquisition function is a significant part of BO. In this paper, the expected improvement (EI) function is applied.
Let
denote the current best value of the objective function. EI maximizes the expected improvement over
. It has a closed-form expression:
Here, represents the cumulative distribution function of the standard normal distribution, and denotes the probability density function of the standard normal distribution. and can be computed with Equations (12) and (13), respectively.
Compared with the original objective function, the acquisition function has relatively simple forms and is easier to optimize.
The algorithm of BO for solving the calibration of the link performance function is summarized as follows (Algorithm 2).
Algorithm 2 Bayesian optimization for the calibration of the link performance function |
Step 0: Set and . Randomly sample parameter sets . For each parameter set , calculate the objective function value . |
Step 1: Build the Gaussian process regression model based on and . Obtain the posterior distribution of based on the sampled points. |
Step 2: Use the posterior information obtained in the previous step, and maximize the acquisition function to determine a new sample point . |
Step 3: With , calculate the corresponding objective function value . Set and update and . |
Step 4: Check the termination criterion. If , return , or return to Step 1. |
3.3. Differential Evolution Algorithm
Heuristic algorithms are effective methods for solving parameter calibration problems. Heuristic search algorithms do not require information such as continuous and differentiable objective functions and have good global optimization capabilities, making them a hot topic in the field of optimization [
48,
49,
50].
In this study, the differential evolution algorithm (DE) [
51] was used for parameter calibration. The differential evolution algorithm is a heuristic stochastic search algorithm based on population differences. The problem-solving process is represented as the survival of the fittest of “chromosomes”. Through generations of evolution, including operations such as replication, crossover, and mutation of the “chromosome” population, it ultimately converges to the individual that is most adaptable to the environment, thereby obtaining the optimal or satisfactory solution to the problem. The algorithm was originally proposed for solving Chebyshev polynomials and is a post-set heuristic algorithm used for optimization problems. Essentially, it is a greedy genetic algorithm based on real-number encoding focusing on preserving the optimal solution.
DE is similar to the genetic algorithm in that it involves mutation, crossover operations, and elimination mechanisms. However, the main difference between DE and genetic algorithms is that the mutation part involves selecting the difference between two solution members, which is then scaled and added to the variables of the current solution member. Therefore, DE does not need to use probability distributions to generate the next generation of solution members.
Consider the following unconstrained optimization problem:
The DE algorithm consists of the following steps.
- (1)
Initialization of the population
The classical differential evolution algorithm uses real-valued encoding, which makes the algorithm more suitable for solving real-valued optimization problems.
Here, denotes the i-th individual, denotes the dimensionality of the problem, and notes the population size.
Solution vectors are randomly generated within the upper and lower bounds:
- (2)
Mutation
Individual mutation is achieved through a differential strategy. The common differential strategy is to randomly select two different individuals in the population, scale their vector difference, and combine it with the individual to be mutated.
Here, are three random numbers in the range of . is the scaling factor, which is also known as the mutation rate, and represents the -th generation.
Generally, the mutation operator often takes a constant value, which is difficult to determine accurately. If the mutation rate is too large, the global optimal solution is low, and if the mutation rate is low, the diversity of the population will decrease, so the phenomenon of “premature convergence” is likely to occur. Therefore, we can design an adaptive mutation operator:
In this way, the mutation operator starts with , and it can maintain diversity in the early stages and prevent premature convergence. As the evolution progresses, the mutation operator decreases until it becomes , avoiding the destruction of the optimal solution.
The differential evolution algorithm is named after the differential mutation strategy, which is the most important part of the algorithm.
- (3)
Crossover
The purpose of the crossover operation is to randomly select individuals, as differential evolution is also a random algorithm. The differential evolution algorithm uses discrete crossover factors, including binomial crossover and exponential crossover. The crossover operator discretely crosses the mutation vector
generated by the mutation operator with the parent individual vector
to obtain the trial vector
. The method of the crossover operation is as follows:
Here, is called the crossover probability, and it has a range of [0, 1]. It is used to generate new individuals with a certain probability.
- (4)
Selection
The selection step in the differential evolution algorithm uses a greedy strategy, where the fitter individuals are chosen as new individuals after the offspring population is generated through the mutation and crossover operators. This is done using a binary tournament selection operator, where each offspring competes with its corresponding parent, and the one that is more fit is selected to be included in the next generation’s population. For minimization problems, the selection operator can be described as follows:
Because the differential evolution algorithm uses one-to-one tournament selection, it is a steady-state evolution algorithm that preserves elite individuals. Once a new population is formed, the differential evolution algorithm continues to evolve the population through mutation, crossover, and selection operators until the termination condition is met, and the program is exited.
The algorithm of DE for solving the calibration of the link performance function is summarized as follows (Algorithm 3).
Algorithm 3 DE algorithm for the calibration of the link performance function |
|