1. Introduction
Dynamic optimization is widely used in process systems engineering, with high demands for real-time solving in specific applications such as nonlinear model predictive control (NMPC) and online parameter estimation [
1,
2,
3]. When using a direct approach to solve dynamic optimization problems, the original problem needs to be discretized first. Currently, the most popular discretization methods include the finite element method [
4,
5,
6], the multiple shooting method [
7,
8,
9,
10], and control vector parameterization [
11,
12]. The control vector parameterization method only discretizes the control variables, which results in smaller problem sizes and significant speed advantages in solving. However, collocation may struggle to meet precision requirements and cannot effectively solve problems with state constraints. While the multiple shooting method enhances solution accuracy, the additional computation of sensitivity matrices during the solving process increases complexity. The simultaneous solution method based on finite element collocation discretizes both state and control variables, allowing for the direct treatment of path constraints and achieving higher precision. Nevertheless, the size of the resulting nonlinear programming problem after discretization is closely related to the number and distribution of selected finite elements. Generally, the more segments the finite elements are divided into in the time domain, the smaller the error in the numerical solution. However, more finite elements imply a larger scale of the nonlinear programming problem, leading to longer solving times. Similarly, reducing the number of finite elements can speed up solving but might compromise precision.
In order to simultaneously improve the numerical accuracy and efficiency of discretely solving ordinary differential equations (ODEs) and differential algebraic equations (DAEs), Wright et al. [
8] devised a strategy for adaptive mesh refinement using finite elements by employing subdivision criteria functions. These criteria functions are designed to consider the impact of the size of individual finite elements on errors in other finite elements, and they are applied to solve two-point boundary value problems. Auzinger et al. [
10] proposed a new estimation method for the global error resulting from discretization using the finite element collocation method and provided a comprehensive analysis of the correctness of this method when applied to general nonlinear regularization problems. Furthermore, Auzinger et al. [
13] later introduced a method for globally equalizing errors. Starting from a coarse equal distribution of finite elements, this method refines the elements to ensure that errors generated in each segment of finite elements are uniformly distributed, effectively meeting user-specified accuracy requirements. This refinement strategy has shown good results in the numerical solution process of singular optimal control problems. Wright [
14] employed an adaptive piecewise polynomial interpolation method to solve two-point boundary value problems, categorizing finite element refinement strategies into two types: for elements with small and evenly distributed errors, a subdivision strategy based on the previous finite element distribution is used; for elements with large and unevenly distributed errors, endpoints of the original finite elements need to be removed and the finite element allocation needs to be redone.
Vasantharajan and Biegler [
15] introduced error constraints into the process of solving dynamic optimization problems to ensure more accurate and reliable results. To control approximation errors, the authors developed two different strategies: the error uniform distribution method and the error enforcement constraint method. These strategies are embedded in the solving process of the nonlinear programming problems obtained after discretization, and they adaptively adjust the lengths of finite elements during the optimization process. In the work of Seferlis and Hrymak [
16], a method was employed to insert new grid points adaptively into finite elements to alter the distribution of the grid, thereby enhancing the numerical solution accuracy. The new grid distribution evenly distributes errors in the solution results across each finite element. Additionally, Tanartkit and Biegler [
17,
18] proposed a dual-layer strategy where the outer problem determines the number of finite elements required to meet a given tolerance, while the inner problem aims to solve the dynamic optimization problem and pass the solution results to the outer problem. Subsequently, Biegler et al. [
19] further improved this strategy by introducing a simultaneous method based on error estimation. By adjusting the sizes of finite elements and the distribution of grid points in the mesh, this method captures switching points of control variables in optimal control problems and redistributes the grid based on the positions of these switching points to ensure precise control and state curves. Huntington and Rao [
19] improved solution accuracy by varying the number of finite elements. They compared the global and local implementations of the orthogonal collocation method for solving optimal control problems to determine the accuracy and computational efficiency of each method. In the local method, the number of collocation points within each finite element remains constant while the number of elements changes. The global collocation form uses a single finite element over the entire interval with varying collocation point numbers. Zhao and Tsiotras introduced a method of grid point allocation using density functions [
20], with the key aspect being the choice of the density function. By constructing a density function based on prior knowledge gained from previous solutions, grid point positions in the discretized mesh are recalculated using the density function, leading to more accurate results when solving the problem with the new mesh. Additionally, Darby et al. [
21,
22] proposed an hp-adaptive pseudospectral method. It divides the entire time domain into multiple intervals based on the curvature of the control trajectory. For intervals where the control trajectory is sufficiently smooth, the method of enhancing interpolation order (p-method) achieves faster accuracy improvement within a subinterval. For less smooth intervals, the method of subdividing the subintervals (h-method) works better. This method enables the use of pseudospectral methods to solve complex optimal control problems where global pseudospectral methods are not feasible. However, it requires multiple iterations to solve nonlinear programming problems, increasing computational burden. Consequently, using this method for online applications, especially for large-scale dynamic optimization problems, can be challenging. Paiva and Fontes [
23] proposed an adaptive time mesh algorithm considering different refinement levels. The algorithm utilizes information from adjoint multipliers for grid refinement. However, in the process of solving optimization problems, the computational burden increases due to the addition of excessive new collocation points during grid refinement. In subsequent research, Paiva [
24] introduced a time mesh refinement strategy based on local errors. After computing the solution on a coarse mesh, local errors are evaluated, providing information on the time subintervals that require refinement. This process is repeated until the local errors reach a user-specified threshold.
In recent years, Tamimi et al. [
25] proposed a method for selecting the number of finite elements segments tailored to model predictive control. This method constructs a new optimization problem based on prior information obtained from the previous solution, including the solution time, objective function values, and the number of finite elements. It aids in selecting a “compromise” number of time intervals to achieve the desired accuracy. Subsequently, Haman et al. [
26] enhanced the accuracy of the method by increasing the polynomial approximation order and the number of finite elements in each segment. They also merged adjacent grids that already met the accuracy requirements to reduce grid size and improve solution speed. Miller [
27] utilized step functions to detect discontinuities in control trajectories and applied targeted refinement methods to adjust the discretized grid in different regions, thereby enhancing solution accuracy and efficiency. This approach is particularly effective for addressing optimal control problems with discontinuous control solutions, providing accurate numerical solutions within fewer iterations and a lower computational time. Furthermore, Huang et al. [
28] introduced an adaptive grid allocation method based on pseudospectral techniques. Initially, they estimated the polynomial order based on the error between dynamics and state variable differential approximations in the spectral matrix. Subsequently, they determined whether to segment intervals based on the maximum polynomial order within the intervals. Finally, the positions of grid points were obtained as decision variables during the process of solving the optimal control problem.
Li et al. [
29] proposed a novel bilevel method where an inner layer initially estimates the prior error. By constructing and solving a maximization problem, they obtain an upper bound on the approximate error. In the outer layer, the number of finite elements is treated as a decision variable to determine the minimum number of finite elements that satisfy a user-specified tolerance. The computation process of this method is intricate because in each iteration of the outer layer, the number of finite elements is uniformly increased by one segment, and then the maximum discretization error is solved internally and passed on to the outer layer for further iterative grid updates. Although after multiple iterations a grid satisfying the specified tolerance can be obtained, the grid update process in each iteration only utilizes the maximized error from the previous solution, leading to a relatively slow update pace for the distribution of finite elements. Zhao et al. [
20] utilized the local curvature function of the control variable u as a density function for iterating over finite elements. They redistribute the finite elements based on the magnitude of the curvature of the control curve, densifying the finite element distribution where the local curvature function has a large value and making it sparser where the curvature function value is small.
In this study, inspired by the work of Li et al. [
29], a bilevel strategy is employed to assess whether the discretized grid meets the user-specified tolerance requirements. However, a different approach is taken for updating the discretized grid using the concept of a grid density function. The selection of the density function and the calculation of grid point positions differ from the methodology proposed by Zhao [
20]. Using the control trajectory curvature as the grid density function presents several distinct drawbacks. Firstly, the grid density function determined by this method tends to have relatively high calculation errors. In the work by Zhao, the calculation method for local curvature is relatively simplistic, leading to the loss of crucial information during computation and resulting in the density function not accurately reflecting the true control curvature. Secondly, the approach for selecting the grid density function in the literature lacks universality. For certain highly sensitive issues, using curvature as the grid density function may not yield optimal results. This study constructs the grid density function based on errors at non-collocation points within each finite element. Modifications have been made to the calculation method for grid points, enhancing the accuracy and convenience of refining the discretized grid using the grid density function. Moreover, by combining this refined bilevel approach, a more precise determination of the minimum number of finite elements in the discretized grid that satisfies the user-specified accuracy criteria can be achieved.
2. Description of Problem
This paper research the following form of nonlinear dynamic optimization problem:
In this problem,
x(
t) ∈
Rnx and
y(
t) ∈
Rny represent the vectors of differential and algebraic state profiles, respectively; and the control profiles
u(
t)
are bounded by
,
, respectively;
,
are three times differentiable functions [
30]. Moreover, both the control variable
u(
t) and the state variable
x(
t) are subject to constraints. To solve this problem using a coupled method based on finite element orthogonal configuration, we need to select a discretization method and determine several time intervals to transform it into a nonlinear programming problem to achieve a numerical solution. The specific discretization process is as follows:
The optimal control problem is transformed into an NLP (nonlinear programming) using the finite elements method. First, the continuous time domain is discretized into a finite number of finite elements [
ti−1,
ti],
i = 1, …,
N, which satisfies
where
t0 and
tN are fixed and where grids
t1, …,
tN−1 may become variables. Lagrange interpolation polynomials using either Gauss or Radau points on each finite element are adopted as follows [
31]:
where
K is the number of collocation points. Lagrange polynomials have the additional benefit of satisfying the following conditions:
with the collocation equations written as [
31]
the continuity of the differential state profiles is enforced by the following equation [
31]
After discretizing the problem and transforming it into the form of a nonlinear programming problem, it is necessary to assess the errors generated during the discretization process. Since at the collocation points the numerical and analytical values of the state are nearly identical within a tolerance range [
31,
32], there is no need to discuss the approximation errors at these points during the analysis of the discretization errors. Instead, the focus should be on analyzing the errors at non-collocation points, using them as a metric to gauge the accuracy of the solution. Theoretically, the number of non-interpolated points is infinite, which complicates error estimation. To address this challenge, it is essential to select a distinct non-collocation point within each time interval for error analysis. We adopt the same approach as Li et al. [
29] by using non-collocation points for error estimation. Consequently, the error
at each finite element non-collocation point is expressed as
where
represents the position of the non-collocation point on the
ith finite elements, and
denotes the value of the state variable obtained from the discretized grid after adding a grid point at the non-collocation point.
3. Description and Improvement of the Bilevel Method
In the work by Li et al. [
29], a bilevel problem was proposed to determine the number of finite elements, where the inner bilevel method involves formulating a dynamic optimization problem that maximizes the error. The error maximization problem can be described as
where
. Additionally,
represents the maximized error, and
q denotes the number of components of the state variable
x. The upper and lower bounds of the error maximization problem constructed are the same as the original problem, with the upper and lower bounds for maximizing the error
s being
smax = 1,
smin = 0, respectively. When solving this inner layer problem using a nonlinear solver, the presence of non-smooth absolute value and maximum value symbols in Equation (8) poses a challenge that derivative-based optimization methods cannot directly handle. Therefore, it is necessary to smooth out these non-smooth regions.
In the method proposed by Li et al. [
29], a softmax function is utilized to smooth out these two non-smooth constraints:
however, during the calculations, it was observed that a significant error is introduced when smoothing is performed using Equation (9). For instance, when applying Equation (9) for smoothing, the resulting function graph is compared with the original function graph, as shown in
Figure 1.
In
Figure 1, the function |
t| with the absolute value symbol is smoothed using Equation (9). It can be observed from the graph that the smoothing effect is not ideal, especially near zero where the smoothed function values deviate significantly from the original function values. Moreover, the errors calculated using the inner methods are often very small. Therefore, when applying Equation (9) for smoothing, these errors become substantially large, failing to accurately reflect the actual error values. Consequently, using Equation (9) to smooth functions with absolute value symbols is not advisable. Additionally, the removal of the maximum value symbol in Equation (9) for smoothing also introduces errors that need to be addressed. Hence, this paper proposes a new smoothing strategy that can better approximate the actual results.
In Equation (10), two non-negative functions are used to replace the absolute value function [
33]. Furthermore,
the approach of eliminating the max function using Equation (11) is shown to be entirely equivalent and does not introduce errors. In practical applications, let
where
is a very small positive integer. Compared to Equation (9), the smoothing errors generated by using Equations (10) and (12) are smaller. This enables the inner method to better reflect the actual error values during the process of maximizing error.
In contrast to the inner method, the design of the outer method is relatively concise. It involves passing the maximum error calculated by the inner method and comparing it with the given tolerance. Based on this comparison, the positions of the finite element nodes are updated. This can be specifically expressed as follows:
where
l denotes the iteration number and
N represents the size of the discretized grid.
N is updated after each iteration as follows:
N =
N + 1. Then, based on the updated value of
N, the time domain is uniformly partitioned to create a new distribution of finite elements. If the maximum non-collocation point error is less than the user-defined tolerance, then
N is set as the minimum number of finite elements.
4. Finite Element Allocation Strategy Based on Density Function
The grid density function is a non-negative function
,
that satisfies an integral value of 1 over the interval [a, b]. According to the definition, any non-negative function
can be considered as a standard grid density function if it has only a finite number of zeros within its domain:
Assuming that any grid density function used for finite element refinement has been normalized, implying that its cumulative distribution function
F has its domain transformed from [
a,
b] to [0, 1], the calculation method is as follows:
Clearly,
F(0) = 0 and
F(1) = 1. The time domain [
a,
b] can be normalized to [0, 1] and discretized into a grid of size
N + 1 to obtain grid points
, where
t0 = 0 and
tN = 1. Given a grid density function, the position of the
i-th point in the grid can be represented as follows:
Therefore, a new finite element distribution can be generated based on this grid density function, where the density of the finite element endpoints is entirely determined by the magnitude of the density function.
As per the definition, the grid density function only needs to be non-negative (and integrable). The specific choice of the density function for calculating the grid distribution can maximize both the accuracy and efficiency of the solution. In this paper, a different approach and method are adopted for calculating and selecting the density function compared to Zhao et al. [
20]. After solving with the inner method, the errors at the non-collocation points on each finite element can be determined. By comparing these errors, the distribution of errors at non-collocation points on each segment can be understood, and segments that do not meet the user-defined tolerance can be identified. Segments with large non-collocation point errors can be further refined to ensure that the discretization errors at non-collocation points on the finite elements meet the user-defined tolerance for subsequent solving processes. However, refining only in areas with large errors does not guarantee that the errors at non-collocation points across the entire time domain of the finite elements will meet the specified tolerance. Previous experimental results suggest that refining only the segments that do not meet the error criteria can lead to new non-collocation points violating the specified tolerance within previously compliant segments. Therefore, the refinement of the discretized grid update cannot simply focus on the segments containing non-collocation points that do not meet the specified tolerance after a single solving iteration.
The errors can be processed by solving with the inner method at the non-collocation points on each segment of the finite element obtained. The error rate of each segment of the finite element can be calculated by dividing the error at each non-collocation point on that segment by the sum of the absolute values of errors at all the non-collocation points on all the segments of the finite element. This ratio is denoted as
, …,
N, using
as an indicator to construct the error function
e(
t);
e(
t) is viewed as piecewise constant over the entire time domain, where its value on each segment of the finite element is denoted as
. Therefore, we can derive the grid density function
f(
t) based on the error function
e(
t).
Similarly, the grid density function
f(
t) is also a piecewise function. Although the piecewise form of
f(
t) may result in some loss of accuracy during grid point allocation, compared to the density function constructed using cubic spline functions as carried out by Zhao et al. [
20], the cost of computation is significantly reduced, enhancing the overall efficiency of the algorithm. Furthermore, as demonstrated in subsequent cases, it also achieves the goal of finite element refinement.
In the computation process of finite element endpoints using the grid density function, the differences in the cumulative probability density function values calculated at adjacent finite element endpoints are equal, being . Thus, during each grid update, the updated positions of the finite element endpoints can be calculated using the density function.
This study utilizes a grid density function for solving the finite element endpoints.
represents the computed finite element endpoints, where
t0 = 0 and
tN = 1:
According to Formula (18), the updated finite element endpoints can be calculated using the grid density function. However, the obtained distribution of finite elements cannot be directly utilized in the subsequent iterative calculations. In practical operations, the computed distribution of finite elements often fails to meet the requirements and violates the principle of finite element allocation using density functions: allocating more finite elements in regions with higher density function values. During the process of finite element endpoint calculation, the distribution of finite elements from the previous iteration affects the allocation results of the next iteration. Since the computation of grid points is determined by the cumulative probability density function, and the sizes of finite elements in each segment obtained in subsequent iterations are inconsistent, the sizes of finite elements and the cumulative probability density function are correlated. As a result, finite elements that should have been further refined may not be adequately refined because in the previous distribution of grid points, the segment was too small, causing the corresponding value of the cumulative probability density function to decrease, resulting in fewer computed finite element endpoints. This limitation prevents the further refinement of finite elements with higher density functions. Conversely, for finite elements with initially lower grid density functions, if the element lengths were larger in the previous allocation, their cumulative density function values would increase, leading to the allocation of more finite element endpoints. Based on these scenarios, this study has improved the generation of finite element endpoints. The algorithm for updating finite element endpoints is as follows:
- (1)
First, record the finite element distribution obtained from the previous solution as the original distribution, as shown in
Figure 2, and derive the error density function based on the errors at non-collocation points on each segment of the original distribution.
- (2)
Uniformly redistribute the entire time domain without changing the number of finite elements, ensuring that the length of each segment of finite elements is equal, as illustrated in
Figure 3.
- (3)
Under the assumption of a uniform distribution of finite elements, construct the grid density function based on Formula (17):
- (4)
Based on the values of the grid density function, calculate new finite element endpoints within the uniformly distributed finite elements using Formula (19). Record the relative positions of the new finite element endpoints within each segment of finite elements.
- (5)
Map the finite element endpoints from the uniformly distributed finite elements back to the original finite element distribution based on their relative positions within each segment. This process results in the updated finite element distribution. For example, if a grid point in the
ith segment within the uniformly distributed grid is located at 2/5 of that segment’s length, it will be mapped back to the non-uniform grid at the 2/5 position of the
ith segment, representing the location of the newly generated grid point, as shown in
Figure 4.
In the process of generating the grid density function using the error function, there may be extreme cases. For instance, when calculating the errors at non-collocation points of certain finite elements, the errors might be extremely small, leading to very small values of the grid density function for these elements. During subsequent finite element refinement, new finite element endpoints may not be allocated in these regions. While this aligns with the original intent of the density function design—allocating fewer finite elements in regions with small errors—there are too many neighboring finite elements with very small density values. However, we also want to appropriately allocate some collocation points to the finite elements that previously met the tolerance requirements for the finite element regions. Otherwise, there will be a violation of the tolerance requirements again in the regions of the finite elements that previously satisfied the user-specified tolerance. This is an undesired outcome, so
In Formula (20), if the density function is less than a specified threshold, it is replaced by a relatively small constant value to prevent finite elements that originally met the error requirements from becoming ineffective again after grid updates. In conclusion, this section presents a new approach to constructing the grid density function, improving and optimizing the process of calculating finite element endpoints using the grid density function.
7. Conclusions
In the process of solving dynamic optimization problems using a direct approach, the efficiency and accuracy of the solution can be enhanced by optimizing the number and distribution of finite elements during the discretization process. In this study, an improved bilevel method was utilized to select the number of finite elements, and error information generated by the inner method was used to construct a grid density function for refining the finite elements. Experimental cases have demonstrated that compared to traditional uniform distribution strategies for finite elements, the proposed approach can significantly reduce the scale of the solved problem while meeting user-specified accuracy requirements. The use of a grid density function for allocating finite elements resulted in a significantly lower number of finite elements compared to allocations without using a grid density function. For example, in Case 1, when the user-defined tolerance is set to , the number of finite elements required to meet this tolerance using the method of Li et al. is 10. However, by utilizing the density function for allocation, only seven finite elements are needed to satisfy the user-defined tolerance. When the user-defined tolerance requirement is increased to , using Li et al.’s method required 37 finite elements while using the density function only requires 21 finite elements.
Moreover, by enhancing the smoothing strategy for the inner method within the bilevel method, errors near zero points caused by the soft maximum function were eliminated, making the subsequent construction of the grid density function using errors at non-collocation points more precise. This improvement also increased the efficiency of solving the inner problem. However, there are areas for improvement in the method proposed in this study. In the future work, more attention can be paid to the errors of other state variables at non-configured points in order to collectively construct a joint error function, leading to a more accurate grid density function.