**4. Optimization of Optical Parametric Amplification Gain Factor in a Micro-Resonator**

Assume that we are using N ultrashort high power pulses to amplify the stimulus wave. These ultrashort pulses have similar pulse energies so that each of them affects the amplification performance in the same degree. The ultrashort high-power pulses are summed up to form the pump wave. At a given spatial input point, the pump wave and the stimulus wave are given as:

$$\begin{aligned} E\_p(\mathbf{x} = \mathbf{x}\_{input}, t) \\ &= \sum\_{i=1}^N A\_i \cos(2\pi\nu\_i t + \psi\_i)(u(t) - u(t - \Delta T\_i)) \; V \\ &\quad / m \quad (u(t): \mathsf{Unit step function}) \end{aligned}$$

$$E\_{st}(\mathbf{x} = \mathbf{x}\_{input}, t) = A\_{st} \cos(2\pi(\nu\_{st})t + \psi\_{st})(u(t) - u(t - \Delta T\_{st})) \; V/m$$

$$\text{where } A\_i \gg A\_{st\_i} \; \Delta T\_i \ll \Delta T\_{st\_i} \; i = 1, 2, \dots, N$$

We want to tune the frequencies (ν1, ν2, ... , ν*N*) of these ultrashort pulses so that the gain factor is maximized. In order to do that, we use the BFGS algorithm, so that the Hessian matrix of each iteration is recursively updated. The BFGS algorithm is one of the quasi-Newton methods that are used to compute the Hessian matrix. Recursive computation reduces the computational cost of the optimization by eliminating the need to compute the second derivative at each iteration. We will use the BFGS algorithm because of its accuracy and simplicity. The most general form of the quasi-Newton method is given as [11]:

$$\mathbf{x}^{(k)} = \mathbf{x}^{(k-1)} - a\_k \mathbf{f}(\mathbf{f}^{(k-1)})^{-1} (\nabla f(\mathbf{x}^{(k-1)}), \ k = 1, 2, 3, \dots \tag{10}$$

*f x*(*k*−1) : *Cost f unction*, *<sup>x</sup>*(*k*) : *Optimization parameters*, *<sup>H</sup>*(*k*−1) : *Hessian matrix*, <sup>α</sup>*<sup>k</sup>* : *Step size*

Quasi-Newton methods, like steepest descent, require only the gradient of the objective function to be supplied at each iteration. The Hessian is updated by analyzing successive gradient vectors. The whole BFGS algorithm is as described below [11].

Given a starting point *x*0, convergence tolerance ε > 0, inverse Hessian approximation *H*0;

*k* ← 0;

*while* ∇*fk* > ε; *Compute search direction pk* = − *Hk* ∇*fk*; *Set xk*+<sup>1</sup> = *xk* + α*<sup>k</sup> pk*, *where* α*<sup>k</sup> is computed from a line search procedure to satisfy the Wolfe conditions*. *Define sk* = *xk*+1− *xk and yk* = ∇*fk*+1− ∇*fk*; *Compute Hk*+<sup>1</sup> *using; Hk*+<sup>1</sup> = *I* − ρ*kskyk T Hk I* − ρ*kyksk T* + ρ*ksksk <sup>T</sup> (BFGS) Where* ρ*<sup>k</sup>* = <sup>1</sup> *yk Tsk k* ← *k* + 1;

*end (while)*

The step size α*<sup>k</sup>* can be computed from a line search procedure to satisfy the Wolfe conditions:

$$f(\mathbf{x}\_k + \alpha\_k \, p\_k) \le f(\mathbf{x}\_k) + \mathbf{c}\_1 \, \alpha\_k \nabla f\_k^T p\_k \tag{11}$$

$$\left| \nabla f(\mathbf{x}\_k + \alpha\_k \, p\_k)^T p\_k \right| \le c\_2 \left| \nabla f\_k^T p\_k \right| 0 < c\_1 < c\_2 < 1 \tag{12}$$

Alternatively, the step size α*<sup>k</sup>* can be computed from the so-called backtracking approach [11]:

Choose α > 0, ρ (0, 1), *c* (0, 1) repeat until *f*(*xk*+ α *pk*) ≤ *f*(*xk*)+ *c*α∇*fk Tpk* α ← ρα

end

Since our problem is the amplification of a stimulus (input) wave via non-linear wave mixing with a high-power pump wave, for this problem, the optimization parameters are the frequencies of the N ultrashort pulses ν1, ν2, ... , ν*<sup>N</sup>* that constitute the total pump wave. Assume that *E*<sup>1</sup> is the low power stimulus wave to be amplified, and *E*<sup>2</sup> is the high-power pump wave, which is the combination

of N ultrashort pulses. The general formulation for the maximization of the stimulus wave magnitude (gain factor) is summarized as follows:

**Optimization parameters**: ν = [ν1, ν2, ... , ν*N*], **Cost function to be maximized**: *f* = *E*1(ν) **Constraints**: *g*1(ν) ≤ *c*1**,** *g*2(ν) ≤ *c*2, ... , *gN*(ν) ≤ *cN* **Equations:**

$$
\nabla^2(E\_2(\nu)) - \mu\_0 \varepsilon\_{\rm os} \frac{\partial^2(E\_2(\nu))}{\partial t^2} = \mu\_0 \sigma \frac{\partial(E\_2(\nu))}{\partial t} + \mu\_0 \frac{d^2 P\_2}{\partial t^2} \tag{13}
$$

$$\frac{d^2P\_2}{dt^2} + \gamma \frac{dP\_2}{dt} + \omega\_0 \,^2P\_2 - \frac{a\nu\_0^2P\_2r^2}{Ned} - \frac{a\nu\_0^2P\_2^3}{N^2c^2d^2} = \frac{Ne^2(E\_2(\nu))}{m} \tag{14}$$

$$\nabla^2(E\_1(\nu)) - \mu\_0 \varepsilon\_{\alpha 0} \frac{\partial^2(E\_1(\nu))}{\partial t^2} = \mu\_0 \sigma \frac{\partial(E\_1(\nu))}{\partial t} + \mu\_0 \frac{d^2(P\_1)}{\partial t^2} \tag{15}$$

$$\frac{d^2(P\_1)}{dt^2} + \gamma \frac{d(P\_1)}{dt} + a\nu\_0^2(P\_1) - \frac{a\nu\_0^2(P\_1^2 + 2P\_1P\_2)}{N\alpha t} - \frac{a\nu\_0^2(P\_1^3 + 3P\_1^2P\_2 + 3P\_1P\_2^2)}{N^2\varepsilon^2 d^2} = \frac{N\varepsilon^2(E\_1(\nu))}{m} \tag{16}$$

This problem is a constrained optimization problem, we can convert this problem into an unconstrained optimization problem by modifying the cost function via the addition of penalty functions. In the case of a maximization problem, these penalty functions yield a decrease in the cost function when the constraints are violated. In our case, the penalty for violating the constraints is adjusted to yield a quadratic decrease:

**Augmented cost function**: (penalty function method)

$$f = \left| E\_1(\nu) \right| - L \left\{ \sum\_{i=1}^{N} \delta\_i (g\_i(\nu) - c\_i) \right\}^q , \delta\_i = \left\{ \begin{array}{ll} 0 & \text{if } g\_i(\nu) \le c\_i \\ > 0 & \text{if } g\_i(\nu) > c\_i \end{array} \right\} \tag{17}$$

*q*: positive valued penalty exponent, *L*: Positive valued penalty constant, δ*i*: penalty coefficients

**Optimization process**:

$$\begin{aligned} \boldsymbol{\nu}\_{k+1} &= \boldsymbol{\nu}\_{k} + \alpha\_{k} p\_{k} \\\\ p\_{k} &= -H\_{k} \nabla f\_{k} \\\\ \mathbf{s}\_{k} &= \boldsymbol{\nu}\_{k+1} - \boldsymbol{\nu}\_{k} \\\\ \boldsymbol{H}\_{k+1} &= \left(\boldsymbol{I} - \rho\_{k} \mathbf{s}\_{k} \mathbf{y}\_{k}^{T}\right) \mathbf{H}\_{k} \left(\mathbf{I} - \rho\_{k} \mathbf{y}\_{k} \mathbf{s}\_{k}\right^{T}\right) + \rho\_{k} \mathbf{s}\_{k} \mathbf{s}\_{k}^{T} \left(\text{BFCS}\right) \\\\ \nabla f &= \begin{bmatrix} \frac{f(\boldsymbol{v}\_{1} + \boldsymbol{\epsilon}, \boldsymbol{v}\_{2}, \boldsymbol{\nu}\_{\text{2}}, \boldsymbol{\nu}\_{\text{N}}) - f(\boldsymbol{v}\_{1}, \boldsymbol{\nu}\_{2}, \boldsymbol{\nu}\_{\text{N}})}{\epsilon} \\\\ \frac{f(\boldsymbol{v}\_{1}, \boldsymbol{\nu}\_{2} + \boldsymbol{\epsilon}, \boldsymbol{\nu}\_{\text{N}}, \boldsymbol{\nu}\_{\text{N}}) - f(\boldsymbol{v}\_{1}, \boldsymbol{\nu}\_{2}, \boldsymbol{\nu}\_{\text{N}})}{\epsilon} \\\\ \frac{f(\boldsymbol{v}\_{1}, \boldsymbol{\nu}\_{2}, \boldsymbol{\nu}\_{\text{N}} + \boldsymbol{\epsilon}) - f(\boldsymbol{v}\_{1}, \boldsymbol{\nu}\_{2}, \boldsymbol{\nu}\_{\text{N}})}{\epsilon} \end{bmatrix}, \quad \rho\_{k} = \frac{1}{\mathcal{Y}\_{k} \mathcal{T} \mathbf{s}\_{k}} \end{aligned}$$

The convergence rate of the BFGS algorithm is super-linear, but our formulated problem is a non-linear optimization problem (non-linear programming). Therefore, the convergence is not reached immediately. Furthermore, the recursive computation of the Hessian matrix slows down the convergence rate. For these reasons, the computation of the optimum frequency values takes a great amount of time.
