Next Article in Journal / Special Issue
A Modifier-Adaptation Strategy towards Offset-Free Economic MPC
Previous Article in Journal / Special Issue
Modifier Adaptation for Real-Time Optimization—Methods and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Analysis of the Directional-Modifier Adaptation Algorithm Based on Optimal Experimental Design

Department of Signals and Systems, Chalmers University of Technology, SE-412 96 Göteborg, Sweden
Processes 2017, 5(1), 1; https://doi.org/10.3390/pr5010001
Submission received: 1 November 2016 / Revised: 28 November 2016 / Accepted: 15 December 2016 / Published: 22 December 2016
(This article belongs to the Special Issue Real-Time Optimization)

Abstract

:
The modifier approach has been extensively explored and offers a theoretically-sound and practically-useful method to deploy real-time optimization. The recent directional-modifier adaptation algorithm offers a heuristic to tackle the modifier approach. The directional-modifier adaptation algorithm, supported by strong theoretical properties and the ease of deployment in practice, proposes a meaningful compromise between process optimality and quickly improving the quality of the estimation of the gradient of the process cost function. This paper proposes a novel view of the directional-modifier adaptation algorithm, as an approximation of the optimal trade-off between the underlying experimental design problem and the process optimization problem. It moreover suggests a minor modification in the tuning of the algorithm, so as to make it a more genuine approximation.

1. Introduction

Real-Time Optimization (RTO) aims at improving the performance and safety of industrial processes by means of continually-adjusting their inputs, i.e., the degrees of freedom defining their operating conditions, in response to disturbances and process variations. RTO makes use of both model-based and model-free approaches. The model-free approaches have the clear advantage of being less labor intensive, as a model of the process is not needed, but the increasing number of inputs that can be adjusted when running the process has made them decreasingly attractive.
Model-based techniques have received an increasing interest as the capability of running a large amount of computations online has become standard. Arguably, the most natural approach to model-based RTO is the two-step approach, where model parameter estimation and model-based optimization are alternated so as to refine the process model and adapt the operational parameters accordingly [1,2]. Unfortunately, the two-step approach requires the process model to satisfy very strict criteria in order for the scheme to reach optimality [3,4]. This issue is especially striking in the case of structural mismatch between the model and the process and can make the two-step scheme ineffective or even counterproductive [5,6,7].
The idea of not only adapting the model parameters, but also the gradient of the cost function can be traced back to [8] and allows for guaranteeing that the resulting scheme reaches optimality upon convergence [7,9,10]. Unlike the two-step approach, adapting the gradient of the cost function allows one to tackle structural model-plant mismatches efficiently, which cannot be efficiently addressed via the adaptation of the model parameters alone. The original idea has been further improved; see, e.g., [7,9,10,11,12,13]. These contributions have converged to the modern Modifier Adaptation (MA) approach, which has been successfully deployed on several industrial processes; see [14,15,16,17]. The MA approach has been recently further developed along a number of interesting directions; see [16,18,19,20,21].
In a run-to-run scenario where estimations of the uncertain parameters are carried out after every run, the input for any run does not only maximize the process performance for the coming run, but also influences the performance of the subsequent runs through the estimation of the process parameters. This observation is generally valid when parameter estimation is performed between runs and pertains to the MA approach. Taking this influence into account leads one to possibly depart from applying to the process an input that is optimal according to the best available estimation of the parameters at the time and adopt an input that strikes a compromise between process optimality and gathering relevant information for the next parameter estimation. In that sense, the MA approach can be construed as a mix of an optimization problem and an experimental design problem. The problem of tailoring experimental design specifically for optimization in a computationally-tractable way has been recently studied in [22], where the problem of designing inputs for a process so as to gather relevant information for achieving process optimality is tackled via an approximate optimality loss function.
The recently-proposed Directional-Modifier Adaptation (DMA) algorithm [23,24] and its earlier variant the dual-modifier adaptation approach [25] offer a practical way for the MA approach to deal with the compromise between process optimality and gaining information. Indeed, at each process run, the DMA algorithm delivers an input that seeks a compromise between maximizing the process performance and promoting the quality of the estimation of the process gradients. The DMA approach handles this compromise by adopting inputs that depart from the nominal ones in directions corresponding to the largest covariance in the estimated gradients of the process Lagrange function. The DMA algorithm is easy to deploy and has strong theoretical properties, e.g., it converges rapidly and with guarantees to the true process optimum. The directional-modifier adaptation algorithm additionally makes use of iterative schemes to update the modifiers used in the cost model, so as to reduce the computational burden of performing classical gradient estimations.
In this paper, we propose to construct the DMA algorithm from a different angle, based on a modification of the optimality loss function [22]. This construction delivers new theoretical insights into the DMA algorithm and suggests minor modifications that make the DMA algorithm a more genuine approximation of the optimal trade-off between process optimality and excitation. For the sake of simplicity, we focus on the unconstrained case, though the developments can arguably be naturally extended to constrained problems.
The paper is structured as follows. Section 2 proposes some preliminaries on the selection of an optimality loss function for the considered experimental design problem and proposes a computationally-tractable approximation, following similar lines as [22]. Section 3 investigates the MA approach as a special case of the previous developments, proposes to tackle it within the proposed theoretical framework and shows that the resulting algorithm has the same structure as the DMA algorithm, but with some notable differences. Simple examples are presented throughout the text to illustrate and support the concepts presented.

2. Optimal Experimental Design

In this paper, we consider the problem of optimizing a process in a run-to-run fashion. The process is described via the cost function ϕ u , p , where u gathers the set of inputs, or degrees of freedom, available to steer the process, and p gathers the parameters available to adjust the cost function using the measurements gathered on the plant. Function ϕ is assumed to be everywhere defined and smooth. This assumption is arguably not required, but will make the subsequent analysis less involved. The N-run optimization problem can then be formulated as:
min u 0 , , N 1 1 N k = 0 N 1 ϕ u k , p ,
where u k is the vector of decision variables applied at run k. Here, we seek the minimization of the average process performance over the N runs. The cost function ϕ u , p associated with the process is not available in practice, such that at any run k, the input u k is typically chosen according to the best parameter estimation p ^ k available at that time. It is important to observe here that the parametric cost function (1) encompasses parametric mismatch between the plant and the model, but also any structure adjusting the cost function according to the data, such as the MA approach; see Section 3.1. Ideally, one ought to seek solving the optimization model:
min u 0 , , N 1 1 N k = 0 N 1 E p ^ k ϕ u k , p ^ k ,
where E p ^ k stands for the expected value over p ^ k . For the sake of simplicity, we will focus in this paper on the two-run problem, i.e., using N = 2 in Problem (2). In the following, we will assume that there exists a vector of parameter p real for which ϕ u k , p real captures effectively the cost function of the real process. This assumption is locally fulfilled, up to a constant term, by the MA approach.
When estimations of the parameters p ^ k are conducted between the runs using the latest measurements gathered on the process, a difficulty in using (2) stems from the fact that it can yield an inadequate sequence of decisions u 0 , , N 1 . We motivate this statement next, via a simple example.

2.1. Failure of Problem (2): An Example

Consider the optimization model ϕ u = u 2 + p 2 yielding the the two-run problem:
min u 0 , 1 1 2 k = 0 1 E p ^ k u k 2 + p ^ k 2 = min u 0 , 1 1 2 k = 0 1 u k 2 + Σ k + μ k 2 ,
where Σ k is the covariance of the estimation of parameter p ^ k and μ k its expected value. If the distribution of the estimated parameter p ^ 1 is independent of the input u 0 , then Problem (3) takes the trivial solution u 0 , 1 = 0 , which yields the best performance on the real cost function ϕ u , p real , regardless of the actual parameter value p real or of its estimated value p ^ 0 available for deciding the input u 0 . However, since the estimated parameter p ^ 1 is obtained from the run based on u 0 , it is in fact not independent of the decision variables. Indeed, let us assume that the estimation of p ^ 1 is provided between the two runs via the least-square fitting problems:
p ^ 1 = arg min p 1 2 p p ^ 0 Σ 0 1 2 + 1 2 y u 0 , p y 0 meas Σ meas 1 2 ,
where y 0 meas R m is the measurements taken on the process during or after the run based on u 0 , y u , p is the corresponding measurement model, Σ meas is the covariance of the measurement noise and Σ 0 the covariance associated with the parameter estimation p ^ 0 . Consider then the measurement model:
y u , p = p u .
The solution to (4) is then explicitly given by:
p ^ 1 = Σ 0 1 + Σ meas 1 u 0 2 1 Σ 0 1 p ^ 0 + Σ meas 1 u 0 2 y 0 meas .
Assuming that μ p ^ 0 = p real = 0 and E y 0 meas = 0 , we then observe that if the measurement noise is independent between the various runs, we have:
Σ 1 = Σ 0 1 + Σ meas 1 u 0 2 1 .
After removing the constant terms, Problem (3) becomes:
min u 0 , u 1 1 2 u 0 2 + u 1 2 + 1 2 Σ 0 1 + Σ meas 1 u 0 2 1 .
An interesting situation occurs for Σ meas Σ 0 2 , i.e., when the covariance of the measurements is sufficiently low; see Figure 1. The solution to (8) then reads as:
u 0 = ± Σ meas 1 2 Σ meas Σ 0 1 2 , u 1 = 0 ,
while the sequence u 0 = u 1 = 0 should clearly be used in order to minimize the cost of the real two-run process, even in the sense of the expected value. This trivial example illustrates a fundamental limitation of Problem (2) in successfully achieving the goal of minimizing the cost over a two- or N-run process.

2.2. Modified Optimality Loss Function

A sensible approach inspired from the work presented in [22] consists of selecting the input u 0 according to:
u 0 = arg min u E e ϕ u , p real + ϕ u * p ^ 1 , p real = arg min u E e [ ϕ u , p real + ϕ u * p ^ 1 , p real ϕ u * p real , p real Δ 0 ] ,
where Δ 0 is labeled the optimality loss function and e gathers the noise on the estimation of the process parameters and the measurement noise, i.e.:
p ^ 0 = p real + e 0 , y meas = y u 0 , p real + e 1 .
Problem (10) seeks a compromise between the expected process performance at the coming run via the first term in (10) and the expected process performance at the subsequent run via the second term. The performance of the second term depends on the input selected in the first run via the parameter estimation performed between the two runs. We assume hereafter that e follows a normal, centered distribution, and we use for the estimated parameter p ^ 1 the least-square fitting problem:
p ^ 1 = arg min p 1 2 p p real + e 0 Σ 0 1 2 + 1 2 y u 0 , p real + e 1 y u 0 , p Σ 1 1 2 ,
where p ^ 1 is the parameter estimation following the first run. The optimality loss function Δ 0 proposed in [22] was designed for the specific purpose of performing experimental design dedicated to capturing the process parameters most relevant for process optimization. However, it was not designed to be used within the two-run problem considered here. In this paper, we propose to use a slightly modified version of (10), so as to avoid a potential difficulty it poses. For the sake of brevity and in order to skip elaborate technical details, let us illustrate this difficulty via the following simple example. Consider the cost function and measurement model:
ϕ u , p = 1 2 u p 2 , y u , p = u p .
The least-square problem (12) reads as:
p ^ 1 = arg min p 1 2 p p real + e 0 Σ 0 1 2 + 1 2 u 0 p real + e 1 u 0 p Σ 1 1 2 ,
which takes the explicit form:
p ^ 1 = p real + e 0 Σ 1 + e 1 Σ 0 u 0 Σ 0 u 0 2 + Σ 1 .
The optimality loss function Δ 0 then reads as:
Δ 0 u 0 = ϕ u * p ^ 1 , p real ϕ u * p real , p real = 1 2 e 0 Σ 1 + e 1 Σ 0 u 0 Σ 0 u 0 2 + Σ 1 2 ,
and has the expected value:
E e Δ 0 = 1 2 Σ 0 Σ 1 Σ 0 u 0 2 + Σ 1 .
It is worth observing that a similar optimality loss function has also been used in [25] in order to quantify the loss of optimality resulting from uncertain parameters. Problem (10) can then be equivalently written as:
u 0 = arg min u ϕ u , p real + E e Δ 0 .
However, since in practice, p real is not available to solve Problem (18), a surrogate problem must be solved, using p real p ^ 0 . It reads as:
u 0 = arg min u ϕ u , p ^ 0 + E e Δ 0 = arg min u ϕ u , p real + e 0 + E e Δ 0 .
An issue occurs here, which is illustrated in Figure 2. Because the expected value of the optimality loss function computed in a stand-alone fashion in (17) misses the correlation between the control input u 0 and the initial estimation error e 0 that arises via the optimization problem (19), using (19) as a surrogate for (18) can be counterproductive in the sense that the performance of Problem (18) is worse than the one of the nominal problem.
In this paper, we address this issue by taking an approach to the optimality loss function that departs slightly from (16).

2.3. Problem Formulation

For a given initial estimation p ^ 0 , initial estimation error e 0 and measurement error e 1 and using p real = p ^ 0 e 0 , the estimation problem solved after the first run can be formulated as:
p ^ 1 u 0 , Σ , p ^ 0 , e = arg min p 1 2 p p ^ 0 Σ 0 1 2 + 1 2 y u 0 , p y u 0 , p ^ 0 e 0 + e 1 Σ 1 1 2 ,
where we use the notation:
e = e 0 e 1 , Σ = Σ 0 0 0 Σ 1 ,
and consider e 0 , e 1 to be uncorrelated. Defining:
u ^ 1 * u 0 , Σ , p ^ 0 , e = u * p ^ 1 u 0 , Σ , p ^ 0 , e ,
the modified optimality loss function can be formulated as:
Δ u 0 , Σ , p ^ 0 , e = ϕ u ^ 1 * u 0 , Σ , p ^ 0 , e , p ^ 0 e 0 ϕ u * p ^ 0 e 0 , p ^ 0 e 0 .
This reformulation allows for construing the optimality loss function from the point of view of the experimenter, by considering p ^ 0 as a fixed variable arising as a realization of the estimation of the unknown parameter p real rather than a stochastic one. In (20) and (23), the actual parameter p real is then, from the experimenter point of view, a stochastic variable, reflecting the uncertainty of the experimenter concerning the real parameter. The resulting two-run problem reads as:
u 0 = arg min u E e ϕ u , p ^ 0 e 0 + Δ u , Σ , p ^ 0 , e .
We observe here that the cost function proposed in (24) is different from the original one in (19). From the optimality principle, Problem (24) delivers an expected performance that is better or no worse than the expected performance yielded by applying the nominal input u 0 = u * p ^ 0 . A simple example of the proposed optimality-loss approach is provided in Section 2.5. Unfortunately, solving Problem (24) is in general difficult. In the next section, we consider a second-order approximation instead, following a line also adopted in [22].

2.4. Second-Order Approximation of the Modified Optimality Loss Function

The optimality loss function (23) is difficult to use in practice. A second-order approximation of (23) can be deployed as a tractable surrogate problem in (24). We develop this second-order approximation next. We observe that the following equality trivially holds:
p ^ 1 u 0 , Σ , p ^ 0 , 0 = p ^ 0 .
The sensitivity of the parameter estimations p ^ 1 to the errors e can be obtained via the implicit function theorem applied to the fitting problem (20); it reads as:
p ^ 1 u 0 , Σ , p ^ 0 , e e e = 0 = F u 0 , Σ , p ^ 0 1 M u 0 , p ^ 0 ,
where:
F u 0 , Σ , p ^ 0 = Σ 0 1 + y p u 0 , p ^ 0 Σ 1 1 y p u 0 , p ^ 0
is the Fisher information matrix of (20), and:
M u 0 , p ^ 0 = y p u 0 , p ^ 0 Σ 1 1 y p u 0 , p ^ 0 y p u 0 , p ^ 0 Σ 1 1 = Σ 0 1 F u 0 , Σ , p ^ 0 y p u 0 , p ^ 0 Σ 1 1 .
We note that from optimality that Δ 0 always holds and:
Δ u 0 , Σ , p ^ 0 , 0 = 0 , Δ u 0 , Σ , p ^ 0 , e e e = 0 = 0 ,
which motivates a second-order approximation of Δ at e = 0 . The Taylor expansion of Δ in e reads as:
Δ u 0 , Σ , p ^ 0 , e = 1 2 e 2 Δ u 0 , Σ , p ^ 0 , 0 e 2 e + r 3 u 0 , Σ , p ^ 0 , e .
We can then form the second-order approximation of the modified optimality loss function Δ.
Lemma 1.
The following equality holds:
2 Δ u 0 , Σ , p ^ 0 , 0 e 2 = p ^ 1 e + e 0 e ϕ p u * ϕ u u * 1 ϕ u p * p ^ 1 e + e 0 e ,
where we note ϕ x x * = ϕ x x u * p ^ 0 , p ^ 0 , and all partial derivatives are evaluated at e = 0 .
Proof. 
We observe that:
2 Δ u 0 , Σ , p ^ 0 , e e 2 = e ϕ u u ^ 1 * , p ^ 0 e 0 u p * p ^ 1 p ^ 1 e ϕ p u ^ 1 * , p ^ 0 e 0 e 0 e + ϕ u u * p ^ 0 e 0 , p ^ 0 e 0 u p * p ^ 0 e 0 e 0 e + ϕ p u * p ^ 0 e 0 , p ^ 0 e 0 e 0 e ,
where for the sake of clarity, the arguments are omitted when unambiguous. Using the fact that ϕ u u * p ^ 0 , p ^ 0 = 0 , it follows that:
2 Δ u 0 , Σ , p ^ 0 , 0 e 2 = p ^ 1 e u p * ϕ u u * u p * p ^ 1 e e 0 e ϕ p u * u p * p ^ 1 e p ^ 1 e u p * ϕ u p * e 0 e + e 0 e ϕ p p * e 0 e e 0 e u p * ϕ u u * u p * e 0 e e 0 e ϕ p u * u p * e 0 e e 0 e u p * ϕ u p * e 0 e e 0 e ϕ p p * e 0 e ,
where all functions are evaluated at e = 0 . We use then the equality u p * = ϕ u u * 1 ϕ u p * to get (31). ☐
In the following, it will be useful to write Δ u 0 , Σ , p ^ 0 , e as:
Δ u 0 , Σ , p ^ 0 , e = 1 2 Tr ϕ p u * ϕ u u * 1 ϕ u p * V + r 3 u 0 , Σ , p ^ 0 , e ,
where we note:
V u 0 , Σ , p ^ 0 , e = p ^ 1 u 0 , Σ , p ^ 0 , e e e = 0 + e 0 e e e p ^ 1 u 0 , Σ , p ^ 0 , e e e = 0 + e 0 e .
Using (28), we observe that:
p ^ 1 u , Σ , p ^ 0 , e e e = 0 + e 0 e = F u 0 , Σ , p ^ 0 1 [ Σ 0 1 y p u 0 , p ^ 0 Σ 1 1 ] ,
such that:
E e V u 0 , Σ , p ^ 0 , e = F u 0 , Σ , p ^ 0 1 [ Σ 0 1 y p u 0 , p ^ 0 Σ 1 1 ] Σ 1 = F u 0 , Σ , p ^ 0 1 .
It follows that the expected value of the optimality loss function reads as:
E e Δ u , Σ , p ^ 0 , e = 1 2 Tr ϕ p u * ϕ u u * 1 ϕ u p * F u 0 , Σ , p ^ 0 1 + E e r 3 u 0 , Σ , p ^ 0 , e .
It is useful to observe that even though a modified optimality loss function has been selected here, its approximation (37) is nonetheless very similar to the one proposed in [22]. Hence, the real difference lies in its interpretation as an approximation of the modified function (23) rather than (16). Here, it is useful to introduce the following lemma:
Lemma 2.
If the following conditions hold:
1.
the noise e has a multivariate normal and centered distribution
2.
for all p P , u * p exists, is smooth, unique and satisfies the Second-Order Sufficient Condition (SOSC) condition of optimality.
3.
the parameter estimation problem (20) has a unique solution p ^ 1 u 0 , Σ , p ^ 0 , e satisfying SOSC for any e and is smooth and polynomially bounded in e
4.
functions u * p , p ^ 1 u 0 , Σ , p ^ 0 , e and ϕ u p , ϕ u u are all bounded by polynomials on their respective domains.
Then, the inequality:
E e r 3 u 0 , Σ , p ^ 0 , e c Σ 2
holds locally for some constant c > 0 , where . is the matrix two-norm.
Proof. 
Because all functions are smooth and bounded by polynomials, the function Δ is also smooth and bounded by polynomials. It follows that:
r 3 u 0 , Σ , p ^ 0 , e = Δ u 0 , Σ , p ^ 0 , e 1 2 Tr ϕ p u * ϕ u u * 1 ϕ u p * V
is also smooth and polynomially bounded. Additionally, the bound:
r 3 u 0 , Σ , p ^ 0 , e c e 3
holds locally for some c > 0 as a result of Taylor’s theorem. Then, Inequality (38) follows directly from Lemma 3. ☐
Lemma 2 appears to be a special case of the delta method [26]. We can now approximate (24) as:
min u 0 E e ϕ u 0 , p ^ 0 e 0 + 1 2 Tr ϕ p u * ϕ u u * 1 ϕ u p * F u 0 , Σ , p ^ 0 1 ,
using ϕ u u * = ϕ u u u * p ^ 0 , p ^ 0 and ϕ u p * = ϕ u p u * p ^ 0 , p ^ 0 .
Algorithm 1: 2-run nominal optimal experimental design.
  Input : Current parameter estimation p ^ 0 , covariance Σ.
1
Compute u * p ^ 0
2
Evaluate ϕ p u * ϕ u u * 1 ϕ u p * and F at u * p ^ 0 , p ^ 0
3
Solve:
min u 0 E e ϕ u 0 , p ^ 0 e 0 + 1 2 Tr ϕ p u * ϕ u u * 1 ϕ u p * F u 0 , Σ , p ^ 0 1
4
Apply u 0 to the process, gather measurements, perform parameter estimation update return updated p ^ 0 and Σ, repeat
For the sake of clarity, the deployment of Problem (41) in a run-to-run algorithm is detailed in Algorithm 1.

2.5. Illustrative Example: Observability Problem

We consider again the example (13), i.e.:
ϕ u , p = 1 2 u p 2 , y u , p = u p .
where we consider p ^ 0 = p real + e 0 as known a priori with E [ e 0 ] = 0 , and p ^ 1 is provided by the estimation problem:
p ^ 1 = arg min p 1 2 p p ^ 0 Σ 0 1 2 + 1 2 u 0 p ^ 0 e 0 + e 1 y meas u 0 p Σ 1 1 2 ,
and takes the explicit solution:
p ^ 1 = p ^ 0 + e 1 Σ 0 u 0 e 0 Σ 0 u 0 2 Σ 0 u 0 2 + Σ 1 .
The optimality loss for the second run then reads as:
Δ = 1 2 u 1 * p ^ 1 p ^ 0 e 0 2 ϕ * = 0 = 1 2 ( e 0 Σ 1 + e 1 Σ 0 u 0 ) 2 ( Σ 0 u 0 2 + Σ 1 ) 2 ,
and its expected value takes the form:
E e Δ = 1 2 Σ 0 Σ 1 Σ 0 u 0 2 + Σ 1 = 1 2 F u 0 , Σ 1 .
Ignoring the constant terms and since E [ e 0 ] = 0 , the two-stage optimal experimental design then picks the input u 0 according to:
u 0 = arg min u 1 2 u p ^ 0 2 + 1 2 F u , Σ 1 .
We observe that in this simple case, the proposed approximation (41) is identical to the original problem (24) and to Problem (19). This equivalence does not hold in general. The behavior of Problem (41) in this simple case is reported in Figure 3 and Figure 4. In particular, we observe that the expected performance of Problem (41) on this example is consistently better than the one of the nominal approach. It is important to understand here that in this specific example, the difference between Figure 2 and Figure 4 lies in the cost function that evaluates the performance of the nominal and proposed approach. Indeed, because of the approximation p real = p ^ 0 , the original approach (19) appears potentially counterproductive under its targeted performance metric (10). Instead, the proposed performance metric (24) is the one that can be minimized via exploiting measurements for subsequent optimizations. In general, however, the inputs selected by (10) and (24) are different.

3. Link to the Modifier Approach and the DMA Approximation

In this section, we draw a connection between the proposed developments and the well-proven modifier approach tackled via the recent Directional-Modifier Adaptation (DMA) algorithm [23,24]. In particular, we show that the DMA approach can be construed as an approximation of Problem (41).

3.1. The Modifier Approach

In the context of RTO, instead of considering uncertain model parameters, the Modifier Approach (MA) tackles the difficulty of working with uncertain process models by introducing a modification of the gradient of the cost function in the optimization problem. The MA then considers a model of the cost function in the form:
ϕ u , p = ϕ 0 u + p T u ,
where p is a set of parameters that modifies the gradient of the process model. Hence, instead of refining the process model, the MA focuses on adjusting the cost gradient at the solution in order to reach optimality for the real process. At each run, measurements of the cost function can be used to improve the estimation of the process gradients via numerical differences. The measurements obtained at each run can be written as:
y real = ϕ u 0 , p real ϕ u 1 , p real u 0 u ,
while the measurement model reads as:
y u 0 , p = ϕ u 0 , p ϕ u 1 , p u 0 u = p u 0 u 1 u 0 u + ϕ 0 u 0 ϕ 0 u 1 u 0 u .
Here, we consider the inputs prior to u 0 as fixed, since they are already realized, and we consider that a parameter estimation p ^ 0 is available from these previous measurements, with associated covariance Σ 0 . It can be verified that:
ϕ p u * ϕ u u * 1 ϕ u p * = 2 ϕ 0 1 , F u 0 , Σ , p ^ 0 = Σ p ^ 0 1 + Σ meas 1 S u 0 ,
where Σ meas R is the covariance of the measurements of the numerical gradients of the process cost function and where we have defined:
S u 0 = u 0 u 1 u 0 u 1 u 0 u 2 .
Hence, Problem (41) deployed on the MA approach solves the problem:
min u 0 ϕ u 0 , p ^ 0 + 1 2 Tr Σ p ^ 0 2 ϕ 0 1 I + Σ meas 1 Σ p ^ 0 S u 0 1 .

3.2. DMA as an Approximation of (41)

We will consider next a 1 st -order Neumann expansion to approximate Problem (53) for u 0 u 1 . We observe that:
I + Σ meas 1 Σ p ^ 0 S u 0 1 = I Σ meas 1 Σ p ^ 0 S u 0 + R ,
where:
R = I + Σ meas 1 Σ p ^ 0 S u 0 1 Σ meas 1 Σ p ^ 0 S u 0 2 .
If the covariance Σ ϕ associated with the measurements of the cost function is fixed, then Σ meas 1 = 1 2 Σ ϕ 1 u 0 u 1 2 . It follows that for u 0 u 1 small, the following approximation is asymptotically exact:
Tr Σ p ^ 0 2 ϕ 0 1 I + Σ meas 1 Σ p ^ 0 S u 0 1 Tr Σ p ^ 0 2 ϕ 0 1 Σ meas 1 Σ p ^ 0 2 ϕ 0 1 Σ p ^ 0 S u 0
= Tr 2 ϕ 0 1 Σ p ^ 0 Σ meas 1 u 0 u 1 Σ p ^ 0 2 ϕ 0 1 Σ p ^ 0 u 0 u 1 .
One can then consider the following approximation of Problem (53):
u 0 = arg min u 0 ϕ 0 u 0 + p ^ 0 u 0 1 2 Σ meas 1 u 0 u 1 Σ p ^ 0 2 ϕ 0 1 Σ p ^ 0 u 0 u 1 ,
which is valid for u 0 u 1 small. The DMA approach computes a direction δ u in the input space according to:
max δ u δ u T Σ ϕ δ u
s . t . δ u = 1 , δ u C U r ,
where U r = I trivially holds in the unconstrained case and then solves the problem:
u 0 = arg min u 0 ϕ 0 u + p ^ 0 T u 0 u 1 c 2 δ u T u 0 u 1 2 ,
which is equivalent to:
u 0 = arg min u 0 ϕ 0 u 0 + p ^ 0 T u 0 c 2 u 0 u 1 T Q u 0 u 1
for the semi-positive, rank-one weighting matrix Q = c δ u δ u T . The close resemblance of the DMA problem (61) to Problem (58) offers a deeper understanding of the procedure at play in the DMA algorithm. More specifically, Problem (58) is identical to the DMA problem (61) if:
Σ meas 1 Σ p ^ 0 2 ϕ 0 1 Σ p ^ 0 = c Q .
We observe here that ϕ = ϕ 0 + p ^ 0 , such that Σ ϕ Σ p ^ 0 mathematically holds. Since δ u is the dominant unitary eigenvector of Σ ϕ and is therefore also the dominant unitary eigenvector of Σ ϕ 2 , it follows that matrix Q is given by:
max Σ p ^ 0 2 , Q
s . t . Q = 1 , rank Q = 1 .
Observing (62) and (63), it follows that the classical DMA method picks an input using:
  • the approximation 2 ϕ 0 γ I for some γ > 0
  • a rank-one approximation of Σ p ^ 0 2
According to these observations, a reasonable choice for the scaling constant c can be:
c = Σ ϕ 1 Σ p ^ 0 2 ϕ 0 1 Σ p ^ 0 .
It is useful to remark here that dismissing the information provided by 2 ϕ 0 may be advantageous when ϕ 0 does not reflect adequately the curvature of the cost function of the real process. In such a case, the weighting provided by 2 ϕ 0 in (58) can arguably be misleading. Including estimations of the 2 nd -order sensitivities in the MA approach has been investigated in [19].

3.3. Illustrative Example

We illustrate here the developments proposed above via a simple quadratic example, which nonetheless captures a number of observations that ought to be made. Consider the cost model:
ϕ 0 = 1 2 u 0 R u 0 + f u 0 ,
such that the nominal optimal input is trivially given by:
u 0 = R 1 f .
Problem (53) then reads as:
min u 0 1 2 u 0 R u 0 + f u 0 + 1 2 Tr Σ p ^ 0 R 1 I + Σ ϕ 1 Σ p ^ 0 S u 0 1 ,
while the approximate problem (58) reads as:
u 0 = arg min u 0 1 2 u 0 R Σ ϕ 1 Σ p ^ 0 R 1 Σ p ^ 0 u 0 + p ^ 0 + f u 1 Σ ϕ 1 Σ p ^ 0 R 1 Σ p ^ 0 u 0 .
Note that Problem (68) is unbounded for Σ ϕ I < R 1 Σ p ^ 0 2 , while (67) can have a well-defined solution; see Figure 5 and Figure 6 for an illustration. This situation occurs here when the measurement noise is small while the current parameter estimation is highly uncertain and is discontinued when the parameter estimation becomes reliable, such that Σ p ^ 0 becomes small. Note that this can be addressed in practice via an ad hoc regularization or by, e.g., bounding the input correction u 0 u * p ^ 0 in Problem (58).
The DMA-based problem (61) reads as:
u 0 = arg min u 0 1 2 u 0 R c Q u 0 + p ^ 0 + f c 2 u 1 T Q u 0 .
The behaviors of the DMA problem (69) and its proposed counterpart (68) are reported in Figure 5, Figure 6, Figure 7 and Figure 8. In Figure 5 and Figure 6, the two problems are compared for the setup:
Q = 0 . 5060 0 0 1.2358 , f = 0 , u = 0 , p ^ 0 = 5 · 10 3 0 , Σ p ^ 0 0.0990 0 0 0.1638 , Σ ϕ = 0.0202 ,
resulting in an unbounded problem for both problems. In this case, the DMA approach (69) with a reduced choice of c would ensure a bounded problem, while a regularization or trust-region technique for Problem (68) would deliver a solution. We observe in Figure 5 and Figure 6 that ignoring the term 2 ϕ 0 in the DMA problem can lead the algorithm to favor a solution that departs significantly from the ones proposed by (53).
In Figure 7, the two problems are compared for the setup:
Q = I , f = 0 , u = 0 0.0444 , p ^ 0 = 0.04 0 , Σ p ^ 0 = 1 0 0 0.95 , Σ ϕ = 1.5 .
In this case, ignoring the term 2 ϕ 0 = Q = I does not yield any difficulty. However, because all parameters p ^ 0 have a very similar covariance, the rank-one approximation of Σ p ^ 0 misleads the DMA algorithm into selecting a solution that departs significantly from the one of (53). Finally, in Figure 8, the two problems are compared for the setup:
Q = 1 0 0 0.8 , f = 0 , u = 0 0.05 , p ^ 0 = 0.04 0 , Σ p ^ 0 = 1 0 0 0.2 , Σ ϕ = 1.5 .
In this last case, both the DMA problem (69) and (68) deliver solutions that are very close to the one of Problem (53), i.e., in this scenario, ignoring the term 2 ϕ 0 and forming a rank-one approximation do not affect the solution significantly.

4. Conclusions

In this paper, we have proposed a novel view of real-time optimization and of the modifier approach from an experimental design perspective. While some methods are available to handle the trade-off between process optimality and the gathering of information for the performance of future runs, this paper proposes a formal framework to construe this trade-off as an optimization problem and develops a tractable approximation of this problem. The paper then shows that the recent directional-modifier adaptation algorithm is a special formulation of this approximation. This observation allows one to further justify the directional-modifier adaptation algorithm from a theoretical standpoint and to consider a refined tuning of the algorithm. The theory presented in the paper is illustrated via simple examples.

Acknowledgments

This research was partially supported by Freiburg Institute for Advanced Studies (FRIAS), University of Freiburg, Germany. Helpful remarks have been provided by B. Houska.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

We provide here the lemma used in Proposition 2.
Lemma 3.
If x R m is a normally distributed, centered variable of covariance Σ R m × m and f ( x ) : R m R a smooth function polynomially bounded as:
f x P m , n x
for some n-th-order polynomial of the form P m , n x = k = m n α k x k , then the following inequality holds locally:
E x f x c Σ ceil m 2 .
Proof. 
We first observe that:
E x P m , n x k = m n β k Σ k 2
holds for some suite β k 0 , with β k = 0 for k odd. This is a direct consequence of the generalized Isserlis theorem [27,28], which states that the expected value of any even-order moment of a multivariate normal centered distribution is a sum of products between k / 2 entries of the covariance matrix Σ, while odd-order moments are null. It then also holds that:
E x f x k = m n β k Σ k 2 ,
and the inequality:
E x f x c Σ ceil m 2
holds locally. ☐
We observe that this Lemma appears to be a simple special case of the Theorem proposed by [26] on the delta method, restricted to the normal distribution.

References

  1. Chen, C.; Joseph, B. On-line optimization using a two-phase approach: An application study. Ind. Eng. Chem. Res. 1987, 26, 1924–1930. [Google Scholar] [CrossRef]
  2. Jang, S.; Joseph, B. On-line optimization of constrained multivariable chemical processes. Ind. Eng. Chem. Res. 1987, 33, 26–35. [Google Scholar] [CrossRef]
  3. Forbes, J.; Marlin, T.; MacGgregor, J. Model adequacy requirements for optimizing plant operations. Comput. Chem. Eng. 1994, 18, 497–510. [Google Scholar] [CrossRef]
  4. Forbes, J.; Marlin, T. Design cost: A systematic approach to technology selection for model-based real-time optimization systems. Comput. Chem. Eng. 1996, 20, 717–734. [Google Scholar] [CrossRef]
  5. Agarwal, M. Feasibility of on-line reoptimization in batch processes. Chem. Eng. Commun. 1997, 158, 19–29. [Google Scholar] [CrossRef]
  6. Agarwal, M. Iterative set-point optimization of batch chromatography. Comput. Chem. Eng. 2005, 29, 1401–1409. [Google Scholar]
  7. Marchetti, A. Modifier-Adaptation Methodology for Real-Time Optimization. Ph.D. Thesis, EPFL, Lausanne, Switzerland, 2009. [Google Scholar]
  8. Roberts, P. An algorithm for steady-state system optimization and parameter estimation. Int. J. Syst. Sci. 1979, 10, 719–734. [Google Scholar] [CrossRef]
  9. Gao, W.; Engell, S. Comparison of iterative set-point optimization strategies under structural plant-model mismatch. IFAC Proc. Vol. 2005, 16, 401. [Google Scholar]
  10. Marchetti, A.; Chachuat, B.; Bonvin, D. Modifier-adaptation methodology for real-time optimization. Ind. Eng. Chem. Res. 2009, 48, 6022–6033. [Google Scholar] [CrossRef]
  11. Roberts, P. Coping with model-reality differences in industrial process optimisation—A review of integrated system optimization and parameter estimation (ISOPE). Comput. Ind. 1995, 26, 281–290. [Google Scholar] [CrossRef]
  12. Tatjewski, P. Iterative optimizing set-point control-the basic principle redesigned. IFAC Proc. Vol. 2002, 35, 49–54. [Google Scholar] [CrossRef]
  13. François, G.; Bonvin, D. Use of convex model approximations for real-time optimization via modifier adaptation. Ind. Eng. Chem. Res. 2014, 52, 11614–11625. [Google Scholar] [CrossRef]
  14. Bunin, G.; Wuillemin, Z.; François, G.; Nakajo, A.; Tsikonis, L.; Bonvin, D. Experimental real-time optimization of a solid oxide fuel cell stack via constraint adaptation. Energy 2012, 39, 54–62. [Google Scholar] [CrossRef]
  15. Serralunga, F.; Mussati, M.; Aguirre, P. Model adaptation for real-time optimization in energy systems. Ind. Eng. Chem. Res. 2013, 52, 16795–16810. [Google Scholar] [CrossRef]
  16. Navia, D.; Marti, R.; Sarabia, R.; Gutirrez, G.; Prada, C. Handling infeasibilities in dual modifier-adaptation methodology for real-time optimization. IFAC Proc. Vol. 2012, 45, 537–542. [Google Scholar] [CrossRef]
  17. Darby, M.; Nikolaou, M.; Jones, J.; Nicholson, D. RTO: An overview and assessment of current practice. J. Process Control 2011, 21, 874–884. [Google Scholar] [CrossRef]
  18. Costello, S.; François, G.; Bonvin, D.; Marchetti, A. Modifier adaptation for constrained closed-loop systems. IFAC Proc. Vol. 2014, 47, 11080–11086. [Google Scholar] [CrossRef]
  19. Faulwasser, T.; Bonvin, D. On the Use of Second-Order Modifiers for Real-Time Optimization. In Proceedings of the 19th IFAC World Congress, Cape Town, South Africa, 24–29 August 2014.
  20. Bunin, G.; François, G.; Bonvin, D. From discrete measurements to bounded gradient estimates: A look at some regularizing structures. Ind. Eng. Chem. Res. 2013, 52, 12500–12513. [Google Scholar] [CrossRef]
  21. Serralunga, F.; Aguirre, P.; Mussati, M. Including disjunctions in real-time optimization. Ind. Eng. Chem. Res. 2014, 53, 17200–17213. [Google Scholar] [CrossRef]
  22. Houska, B.; Telenb, D.; Logist, F.; Diehl, M.; Van Impe, J.F.M. An economic objective for the optimal experiment design of nonlinear dynamic processes. Automatica 2015, 51, 98–103. [Google Scholar] [CrossRef]
  23. Costello, S.; François, G.; Bonvin, D. Directional Real-Time Optimization Applied to a Kite-Control Simulation Benchmark. In Proceedings of the European Control Conference 2015, Linz, Austria, 15–17 July 2015.
  24. Costello, S.; François, G.; Bonvin, D. A directional modifier-adaptation algorithm for real-time optimization. J. Process Control 2016, 39, 64–76. [Google Scholar] [CrossRef]
  25. Marchetti, A.; Chachuat, B.; Bonvin, D. A dual modifier-adaptation approach for real-time optimization. J. Process Control 2010, 20, 1027–1037. [Google Scholar] [CrossRef]
  26. Oehlert, G. A note on the delta method. Am. Stat. 1992, 46, 27–29. [Google Scholar] [CrossRef]
  27. Withers, C. The moments of the multivariate normal. Bull. Aust. Math. Soc. 1985, 32. [Google Scholar] [CrossRef]
  28. Vignat, C. A generalized Isserlis theorem for location mixtures of Gaussian random vectors. Stat. Probab. Lett. 2012, 82, 67–71. [Google Scholar] [CrossRef]
Figure 1. Illustration for Problem (8). The level curves report the cost of (8) as a function of u 0 and Σ meas , with Σ 0 = 1 . The dashed lines report the optimal input u 0 for various values of Σ meas . For Σ meas low enough, the problem has two non-zero solutions.
Figure 1. Illustration for Problem (8). The level curves report the cost of (8) as a function of u 0 and Σ meas , with Σ 0 = 1 . The dashed lines report the optimal input u 0 for various values of Σ meas . For Σ meas low enough, the problem has two non-zero solutions.
Processes 05 00001 g001
Figure 2. Comparison of the performance resulting from using the nominal input u 0 = u * p ^ 0 and the one resulting from (18) or (19) on the proposed example. The displayed cost is calculated according to (10) and reads as E e 1 2 u 0 2 + 1 2 u 1 * p ^ 1 2 . The left graph displays the cost resulting from using (18), which delivers a better expected performance than using the nominal input. The right graph displays the cost resulting from using (19), where p real p ^ 0 is used. In this example, this approximation is detrimental to the performance of Problem (10), resulting in a worse performance than the nominal one.
Figure 2. Comparison of the performance resulting from using the nominal input u 0 = u * p ^ 0 and the one resulting from (18) or (19) on the proposed example. The displayed cost is calculated according to (10) and reads as E e 1 2 u 0 2 + 1 2 u 1 * p ^ 1 2 . The left graph displays the cost resulting from using (18), which delivers a better expected performance than using the nominal input. The right graph displays the cost resulting from using (19), where p real p ^ 0 is used. In this example, this approximation is detrimental to the performance of Problem (10), resulting in a worse performance than the nominal one.
Processes 05 00001 g002
Figure 3. Comparison of the nominal and optimal experimental design on the proposed example for p ^ 0 = 0 . The displayed cost is calculated according to the cost proposed in (24), which reads as E e 1 2 ( u 0 ( p ^ 0 e 0 ) ) 2 + 1 2 ( u 1 * p ^ 1 ( p ^ 0 e 0 ) ) 2 in this example. It can be observed that the optimal experimental design approach has two solutions, due to the non-convexity of the problem.
Figure 3. Comparison of the nominal and optimal experimental design on the proposed example for p ^ 0 = 0 . The displayed cost is calculated according to the cost proposed in (24), which reads as E e 1 2 ( u 0 ( p ^ 0 e 0 ) ) 2 + 1 2 ( u 1 * p ^ 1 ( p ^ 0 e 0 ) ) 2 in this example. It can be observed that the optimal experimental design approach has two solutions, due to the non-convexity of the problem.
Processes 05 00001 g003
Figure 4. The left graph illustrates the nominal and optimal experimental design performance on the proposed example. The displayed cost is calculated according to the cost proposed in (24), which reads as E e 1 2 ( u 0 ( p ^ 0 e 0 ) ) 2 + 1 2 ( u 1 * p ^ 1 ( p ^ 0 e 0 ) ) 2 in this example. The right graph displays the corresponding inputs. Observe that the right-hand graph ought to be compared to the right-hand graph of Figure 2.
Figure 4. The left graph illustrates the nominal and optimal experimental design performance on the proposed example. The displayed cost is calculated according to the cost proposed in (24), which reads as E e 1 2 ( u 0 ( p ^ 0 e 0 ) ) 2 + 1 2 ( u 1 * p ^ 1 ( p ^ 0 e 0 ) ) 2 in this example. The right graph displays the corresponding inputs. Observe that the right-hand graph ought to be compared to the right-hand graph of Figure 2.
Processes 05 00001 g004
Figure 5. Example of the problem where the quadratic approximation (58) is unbounded, while (53) has a solution. The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) with Q given by (63). In this example, ignoring the contribution of 2 ϕ 0 in the Directional-Modifier Adaptation (DMA) algorithm leads it to privilege directions (light grey dashed line) that are significantly different from the ones privileged by (67) (grey dashed line). The latter point to the solution of the original Problem (53).
Figure 5. Example of the problem where the quadratic approximation (58) is unbounded, while (53) has a solution. The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) with Q given by (63). In this example, ignoring the contribution of 2 ϕ 0 in the Directional-Modifier Adaptation (DMA) algorithm leads it to privilege directions (light grey dashed line) that are significantly different from the ones privileged by (67) (grey dashed line). The latter point to the solution of the original Problem (53).
Processes 05 00001 g005
Figure 6. Example of problem where the quadratic approximation (58) is unbounded, while (53) has a solution. The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) where Q = Σ p ^ 0 2 Σ p ^ 0 2 and c = Σ ϕ 1 Σ p ^ 0 2 ϕ 0 1 Σ p ^ 0 . Adopting a matrix Q delivering a full-rank approximation of Σ p ^ 0 2 does not help the DMA algorithm adopting directions (see the light-grey dashed line) that point to the direction of the solution to (67); hence, ignoring 2 ϕ 0 is problematic here.
Figure 6. Example of problem where the quadratic approximation (58) is unbounded, while (53) has a solution. The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) where Q = Σ p ^ 0 2 Σ p ^ 0 2 and c = Σ ϕ 1 Σ p ^ 0 2 ϕ 0 1 Σ p ^ 0 . Adopting a matrix Q delivering a full-rank approximation of Σ p ^ 0 2 does not help the DMA algorithm adopting directions (see the light-grey dashed line) that point to the direction of the solution to (67); hence, ignoring 2 ϕ 0 is problematic here.
Processes 05 00001 g006
Figure 7. Illustration for Section 3.3, setup (71). The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) with Q given by (63). In this example, the rank-one approximation for Q leads the DMA algorithm to propose a solution that is far from the one of (67).
Figure 7. Illustration for Section 3.3, setup (71). The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) with Q given by (63). In this example, the rank-one approximation for Q leads the DMA algorithm to propose a solution that is far from the one of (67).
Processes 05 00001 g007
Figure 8. Illustration for Section 3.3, setup (72). The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) with Q given by (63). In this example, all problems deliver very similar solutions.
Figure 8. Illustration for Section 3.3, setup (72). The black lines report the level curves of the cost of Problem (67); the grey lines report the level curves of the cost of Problem (68); and the light grey lines report the level curves of the cost of Problem (69) with Q given by (63). In this example, all problems deliver very similar solutions.
Processes 05 00001 g008

Share and Cite

MDPI and ACS Style

Gros, S. An Analysis of the Directional-Modifier Adaptation Algorithm Based on Optimal Experimental Design. Processes 2017, 5, 1. https://doi.org/10.3390/pr5010001

AMA Style

Gros S. An Analysis of the Directional-Modifier Adaptation Algorithm Based on Optimal Experimental Design. Processes. 2017; 5(1):1. https://doi.org/10.3390/pr5010001

Chicago/Turabian Style

Gros, Sébastien. 2017. "An Analysis of the Directional-Modifier Adaptation Algorithm Based on Optimal Experimental Design" Processes 5, no. 1: 1. https://doi.org/10.3390/pr5010001

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop