*Article* **Comparing Corrective and Preventive Security-Constrained DCOPF Problems Using Linear Shift-Factors**

#### **Victor H. Hinojosa**

Department of Electrical Engineering, Universidad Técnica Federico Santa María, Valparaíso 2390123, Chile; victor.hinojosa@usm.cl; Tel.: +56-32-265-4354

Received: 13 December 2019; Accepted: 17 January 2020; Published: 21 January 2020

**Abstract:** This study compares two efficient formulations to solve corrective as well as preventive security-constrained (SC) DC-based optimal power flow (OPF) problems using linear sensitivity factors without sacrificing optimality. Both SCOPF problems are modelled using two frameworks based on these distribution factors. The main advantage of the accomplished formulation is the significant reduction of decision variables and—equality and inequality—constraints in comparison with the traditional DC-based SCOPF formulation. Several test power systems and extensive computational experiments are conducted using a commercial solver to clearly demonstrate the feasibility to carry out the corrective and the preventive SCOPF problems with a reduced solution space. Another point worth noting is the lower simulation time achieved by the introduced methodology. Additionally, this study presents advantages and disadvantages for the proposed shift-factor formulation solving both corrective and preventive formulations.

**Keywords:** linear OPF problem; shift-factors; line outage distribution factors; security-constrained; corrective formulation; preventive formulation

#### **1. Introduction**

Carpentier introduced the optimal power flow (OPF) problem in 1960 [1]. The optimal result obtains the economic dispatch and transmission power flows. The power flow solution must meet technical power generation and transmission network limits. Several optimization algorithms have been presented in the technical literature to solve operation and planning problems based on this conventional mathematical formulation [2–4].

In real-time operation, power system operators (Independent System Operators (ISOs) and Regional Transmission Organizations (RTOs)) must execute a lot of N−1 power flows very quickly taking into account generation and transmission (lines or tranformers) failures to obtain a safe condition after a contingency event [5]. Security studies should guarantee that not only the power flow result will be maintained below the thermal limit but also overloading conditions in the transmission elements will be mitigated to avoid undesirable operational effects. This study is only focused on transmission contingencies.

#### *1.1. Technical Literature Analysis*

Power system planners and operators have typically used the direct-current OPF (DCOPF) problem. A DCOPF problem is a simplified linear approach modelling nonlinear transmission constraints based on approximations regarding voltage (magnitudes and angles), admittances and reactive power [6]. Nowadays, the most common transmission network formulation is the so-called DC model [7].

Another point worth noting is the transmission network modelling. In the technical literature, there are two methodologies using linear factors—(1) the classical DC formulation [8] and (2) the linear

shift-factor (SF) formulation [6]. In the first case, power unit and voltage phase angles are decision variables to model the transmission network. According to References [6,7,9], another problem has been introduced using power unit generation as decision variable. Although decision variables, as well as equality and inequality constraints, are lower in comparison with the classical DC formulation, the SF-based formulation does not affect OPF optimality.

In electrical power systems, security studies must guarantee that transmission modifications and failures do not impact the transmission network with overloading conditions. Furthermore, ISOs must reduce the amount of power flow simulations that should be executed to verify the power system security. For very large-scale power systems, linear DC models are typically employed to solve contingency events [10].

Researchers have found that not all transmission failures originate an alert condition. Consequently, electrical power engineers must obtain a safety and economic post-contingency steady-state system with respect to the worst or a list (ranking) of contingencies to adequately solve the SCOPF problem without affecting the power system security. This problem is known as security-constrained OPF (SCOPF) problem [8,11]. For more information about SCOPF, read the following References [6,12,13].

In the state-of-the-art, two mathematical frameworks have been developed to solve the SCOPF problem—(i) the corrective formulation and (ii) the preventive formulation.


#### *1.2. Contributions*

References [6,7] introduce the N−1 preventive security analysis by using shift-factors and line outage distribution factors. While the security problem only includes the worst contingency, a ranking of contingencies should be modelled in the security-constrained analysis to avoid risky operational conditions and technical transmission problems to supply adequately the load of the customers for an unlikely line or transformer failure. Based on our knowledge, the corrective formulation is not applied in the technical literature using shift-factors. Hence, it would be very attractive to accomplish several analyses and comparisons with short- and large-scale power systems to validate scalability and simulation performance using both CL- and SF-based formulations and a commercial solver (Gurobi). Therefore, the following issues have been solved in this study—(1) the corrective SCOPF problem is solved using shift-factors and a comparative analysis for both corrective and preventive formulations has been carried out using different-scale test power systems; and (2) a realistic case is successfully solved (National Electric Power System of Chile). Simulation results have demonstrated superior performance when the SF-based formulation is applied to the SCOPF problem in comparison with the CL-based formulation. Notice that the introduced formulation could bring better performance and practical advantages solving large-scale problems as well as complicated problems such as stochastic OPF, stochastic unit commitment and generation planning methodologies.

This study is organized as follows. In Section 2, a detailed study shows different OPF problems applying the classical formulation and the introduced formulation. Section 3 presents the corrective and preventive SCOPF simulations using shift-factors. Besides, results and comparisons are achieved using several test power systems. Section 4 concludes the study.

#### **2. Security-Constrained Optimal Power Flow Problem**

In this section, corrective and preventive SCOPF problems are mathematically presented in detailed. In addition, the optimization problem does not include the DC power losses.

#### *2.1. Corrective SCOPF Problem Using the Classical DC-Based Formulation*

This optimization problem is modelled using the following objective function:

$$\min(\mathbb{C}\_{\text{total}}) = \min(\mathbb{C}\_{\text{prc}} + \mathbb{C}\_{\text{post}}) \tag{1a}$$

$$\mathbf{C}\_{\text{prc}} = \sum\_{\mathcal{S}} (A\_{\mathcal{S}} + B\_{\mathcal{S}} \times p\_{\mathcal{S}}^{\text{prc}} \times \mathbf{S}\_{\text{bus}} + \mathbf{C}\_{\mathcal{S}} \times p\_{\mathcal{S}}^{\text{prc}^2} \times \mathbf{S}\_{\text{bus}}^2) + VolL \sum\_{\mathcal{v}} v\_{\mathcal{v}}^{\text{prc}} \times \mathbf{S}\_{\text{bus}} \qquad \forall \, \mathcal{g} \in \mathbf{G}, \forall \, \mathbf{v} \in V \quad \text{(1b)}$$

$$\mathcal{C}\_{\text{post}} = \sum\_{\mathcal{S}} \left( A\_{\mathcal{S}} + B\_{\mathcal{S}} \times p\_{\mathcal{S}}^{\text{post}} \times S\_{\text{base}} + \mathbb{C}\_{\mathcal{S}} \times p\_{\mathcal{S}}^{\text{post}^2} \times S\_{\text{base}}^2 \right) + VolL \sum\_{\upsilon} v\_{\upsilon}^{\text{post}} \times S\_{\text{base}} \qquad \forall \, \mathcal{g} \in B, \, \forall \upsilon \in V \text{ (1c)}$$

Subject to:

$$(p\_b^{pvc} + v\_b^{pvc}) - D\_b - \sum\_{b-l} f\_{b-l}^{pvc} = 0 \qquad \forall \ b \in B\_\prime \; \forall \ b - l \in L \tag{2}$$

$$(p\_{b}^{\text{post}} + v\_{b}^{\text{post}}) - D\_{b} - \sum\_{b,l} f\_{b-l}^{\text{post}} = 0 \qquad \forall \, b \in B, \,\forall \, b - l \in L - 1 \tag{3}$$

$$f\_{b-l}^{prc} = B\_{b-l} \times (\delta\_b^{prc} - \delta\_l^{prc}) \qquad \forall \ b - l \in L, \ \forall \ b, l \in B \tag{4}$$

$$f\_{b-l}^{\text{post}} = B\_{b-l} \times (\delta\_b^{\text{post}} - \delta\_l^{\text{pre}}) \qquad \forall \, b - l \in L - 1, \forall \, b, l \in B \tag{5}$$

$$|f\_{b-l}^{prc}| \le F\_{b-l}^{max} \qquad \forall \ b - l \in L \tag{6}$$

$$\|f\_{b-l}^{\text{post}}\| \le F\_{b-l}^{\text{max}} \qquad \forall \ b - l \in L - 1 \tag{7}$$

$$1 - R\_{\mathcal{S}}^{down} \le p\_{\mathcal{S}}^{upst} - p\_{\mathcal{S}}^{upe} \le R\_{\mathcal{S}}^{up} \qquad \forall \text{ g} \in G \tag{8}$$

$$P\_{\mathcal{S}}^{\min} \le p\_{\mathcal{S}}^{\text{prt}} \le P\_{\mathcal{S}}^{\max} \qquad \forall \text{ } \mathcal{g} \in G \tag{9}$$

$$P\_{\mathcal{K}}^{\text{win}} \le p\_{\mathcal{K}}^{\text{post}} \le P\_{\mathcal{K}}^{\text{max}} \qquad \forall \text{ g} \in G \tag{10}$$

$$|\delta\_{b}^{\text{prc}}| \le \pi/2 \qquad \forall \ b \in B \tag{11}$$

$$|\delta\_b^{\text{post}}| \le \pi/2 \qquad \forall \ b \in B \tag{12}$$

$$
\delta\_b^{\rm prc} = 0 \qquad b = SL \tag{13}
$$

$$
\delta\_b^{\text{post}} = 0 \qquad b = \text{SL}.\tag{14}
$$

Equations (2) and (3) represent nodal power balance constraints for the pre- and the post-contingency conditions, respectively. Equations (4) and (5) define power flows in the transmission elements and (6) and (7) limit these power flows for both conditions. Equation (8) models the ramp-up and ramp-down power unit generation limits and (9) and (10) limit the minimum and the maximum power unit generation. Constraints (11) and (12) limit voltage bus angles for both preand post-contingency conditions, respectively. Last, slack reference is defined for both conditions using (13) and (14).

In this mathematical formulation, the decision variables (*n*) are active power generation, voltage angles and transmission power flows for pre- and post-contingency analyses.

$$n = 2n\_{\mathbb{B}} + 2n\_{\mathbb{G}} + n\_{L} + n\_{L-1}$$

The equality constraints (*ne*) are the following:

$$n\_{\varepsilon} = 2n\_{\text{B}} + n\_{L} + n\_{L-1} + 2$$

The inequality constraints (*ni*) are the following:

$$n\_i = 2(2n\_B + 3n\_G + n\_L + n\_{L-1})\dots$$

#### *2.2. Preventive SCOPF Problem Using the Classical DC-Based Formulation*

In this mathematical formulation, the post-contingency condition is the same that the pre-contingency condition. With this assumption, the ramp-up and ramp-down constraints (8) are not necessary to add in the optimization problem. Nevertheless, transmission power flows are different because these values represent the pre-contingency power system state and the post-contingency state. As a result, there is only one set of decision variables as follows: *ppre* = *ppost* = *p*, *vpre* = *vpost* = *v* and δ*pre* = δ*post* = δ. Hence, the optimization problem considers a different objective function and it is subject to the following constraints:

$$\min(\mathbb{C}\_{\text{total}}) = \sum\_{\mathcal{S}} (A\_{\mathcal{S}} + B\_{\mathcal{S}} \times p\_{\mathcal{S}} \times \mathbb{S}\_{\text{base}} + \mathbb{C}\_{\mathcal{S}} \times p\_{\mathcal{S}}^2 \times \mathbb{S}\_{\text{base}}^2) + \text{VolLL} \sum\_{v} v\_b \times \mathbb{S}\_{\text{base}} \qquad \forall \mathcal{g} \in B\_\prime \; \forall v \in V \tag{15}$$

$$D(p\_b + \upsilon\_b) - D\_b - \sum\_{b-l} f\_{b-l}^{\eta \tau \varepsilon} = 0 \qquad \forall \ b \in B\_\prime, \forall \ b - l \in L \tag{16}$$

$$\Delta(p\_b + v\_b) - D\_b - \sum\_{b-l} f\_{b-l}^{\text{post}} = 0 \qquad \forall \ b \in B, \ \forall \ b - l \in L - 1 \tag{17}$$

$$f\_{b-l}^{prc} = B\_{b-l} \times (\delta\_b - \delta\_l) \qquad \forall \ b - l \in L, \ \forall \ b, l \in B \tag{18}$$

$$f\_{b-l}^{post} = B\_{b-l} \times (\delta\_b - \delta\_l) \qquad \forall \ b - l \in L - 1, \forall \ b, l \in B \tag{19}$$

$$|f\_{b-l}^{prc}| \le F\_{b-l}^{max} \qquad \forall \ b - l \in L \tag{20}$$

$$|f\_{b-l}^{post}| \le F\_{b-l}^{max} \qquad \forall \ b - l \in L - 1 \tag{21}$$

$$P\_{\mathcal{J}}^{\text{min}} \le p\_{\mathcal{J}} \le P\_{\mathcal{J}}^{\text{max}} \qquad \forall \ g \in G \tag{22}$$

$$|\delta\_b| \le \pi/2 \qquad \forall \ b \in B \tag{23}$$

$$
\delta\_b = 0 \qquad b = SL. \tag{24}
$$

The number of decision variables *n* and equality *ne* and inequality *ni* constraints are calculated using *n* = *nB* + *nG* + *nL* + *nL*−1, *ne* = 2*nB* + *nL* + *nL*<sup>−</sup><sup>1</sup> + 1 and *ni* = 2(*nB* + *nG* + *nL* + *nL*<sup>−</sup>1).

#### *2.3. Corrective SCOPF Problem Using the SF-Based Formulation*

In this section, the corrective SCOPF problem is introduced using shift-factors. For this optimization problem, the classical DC-based set of transmission network constraints is reformulated using the inverse of admittance matrix avoiding to compute voltage bus angles and nodal transmission constraints. Therefore, nodal balance constraints are turned into only one equality constraint modelling both pre-contingency (25) and post-contingency states (26). Additionally, transmission power flows for both (27) and (28) conditions are obtained using shift-factors and net power injected.

*Energies* **2020**, *13*, 516

The objective function is presented in (1a) and the optimization problem is subject to the following constraints:

$$\sum\_{\mathcal{S}} (p\_{\mathcal{S}}^{pr\epsilon} + \boldsymbol{\upsilon}\_{\mathcal{S}}^{pr\epsilon}) - D^{total} = 0 \qquad \forall \; \mathbf{g} \in \mathbf{G} \tag{25}$$

$$\sum\_{\mathcal{S}} (p\_{\mathcal{S}}^{\text{post}} + v\_{\mathcal{S}}^{\text{post}}) - D^{\text{total}} = 0 \qquad \forall \, \mathcal{g} \in G \tag{26}$$

$$\mathbb{E} \mid \sum\_{k} SF\_{l-b,k}^{prc} \times (p\_k^{prc} - D\_k) \mid \le F\_{b-l}^{\max} \qquad \forall \, k \in B, \, k \neq SL, \, \forall \, b-l \in L \tag{27}$$

$$\mathbb{P}\left|\sum\_{k} SF\_{l-b,k}^{post} \times (p\_k^{post} - D\_k) \right| \le F\_{b-l}^{max} \qquad \forall \ k \in B, \ k \ne SL, \ \forall \ b - l \in L - 1 \tag{28}$$

$$-R\_{\mathcal{S}}^{down} \le p\_{\mathcal{S}}^{upst} - p\_{\mathcal{S}}^{upe} \le R\_{\mathcal{S}}^{up} \qquad \forall \ g \in G \tag{29}$$

$$P\_{\mathcal{S}}^{\rm min} \le p\_{\mathcal{S}}^{\rm pre} \le P\_{\mathcal{S}}^{\rm max} \qquad \forall \text{ g} \in G \tag{30}$$

$$P\_{\mathcal{S}}^{\min} \le p\_{\mathcal{S}}^{\text{post}} \le P\_{\mathcal{S}}^{\max} \qquad \forall \text{ g} \in G,\tag{31}$$

where *SFpre* = *B f pre* × *Xbuspre <sup>r</sup>* and *SFpost* <sup>=</sup> *B f post* <sup>×</sup> *Xbuspost <sup>r</sup>* are the SF for pre- and post-contingency conditions and *Xbuspre <sup>r</sup>* = (*A<sup>T</sup> <sup>r</sup>* <sup>×</sup> *Bbuspre <sup>r</sup>* )−<sup>1</sup> and *Xbuspost <sup>r</sup>* = (*AT <sup>r</sup>* <sup>×</sup> *Bbuspost <sup>r</sup>* )−<sup>1</sup> are the reactance bus matrix for both conditions.

For this mathematical formulation, decision variables are exclusively the power unit generation. As a result, there are *ne* = 2 and *ni* = 2(3 *nG* + *nL* + *nL*<sup>−</sup>*1*). According to the lower number of variables and constraints, this formulation achieves a very compact SCOPF optimization problem.

#### *2.4. Preventive SCOPF Problem Using the SF-Based Formulation*

With previous assumptions given in Section 2.2, the OPF problem is subject to the following constraints:

$$\sum\_{\mathcal{S}} (p\_{\mathcal{S}} + v\_{\mathcal{S}}) - D^{\text{total}} = 0 \qquad \forall \text{ } \mathcal{g} \in G \tag{32}$$

$$\mathbb{E}\left|\sum\_{k} SF\_{l-b,k}^{\mathbb{P}\tau\varepsilon} \times (p\_k - D\_k) \right| \le F\_{b-l}^{\max} \qquad \forall \, k \in B, \, k \ne SL, \, \forall \, b-l \in L \tag{33}$$

$$\mathbb{E}\left|\sum\_{k} SF\_{l-b,k}^{\text{post}} \times (p\_k - D\_k) \right| \le F\_{b-l}^{\text{max}} \qquad \forall \ k \in B, \ k \ne SL\_\gamma \ \forall \ b - l \in L - 1 \tag{34}$$

$$P\_{\mathcal{S}}^{\min} \le p\_{\mathcal{S}} \le P\_{\mathcal{S}}^{\max} \qquad \forall \: \mathcal{g} \in \mathcal{G}. \tag{35}$$

For this formulation, there are *ne* = 1 and *ni* = 2(*nG* + *nL* + *nL*<sup>−</sup>*1*).

#### *2.5. Computing the Post-Contingency Shift-Factors*

To compute more efficiently the *SFpost* matrix, this study implements previous definition given in Reference [6] to determine post-contingency transmission constraints using Equation (36).

$$|SF\_{b}^{\mathcal{P}\mathcal{T}} + LODF\_{b,k} \times SF\_{b}^{\mathcal{P}\mathcal{T}} \times (P - P^{d})| \le F^{\text{postmax}} \quad \forall b \in L - 1. \tag{36}$$

Lastly, LODF factors are computed as follows:

$$LODF\_{b,k} = SF\_{b}^{prc} \times A\_{k}^{prc^T} \times [1/(1 - SF\_{k}^{prc} \times A\_{k}^{prc^T})].\tag{37}$$

Even though line outage distribution factors are only computed with pre-contingency shift-factors; that is, network data before the transmission outage, islanding conditions could be detected finding out states where the denominator of (37) is zero.

#### **3. Simulation Results**

Several simulations are conducted using short- and large-scale electrical power systems to find out the performance of both corrective and preventive SCOPF formulations. To formulate the optimization problem, Python [20] has been used. Moreover, Gurobi [21] is used as a commercial solver on a computer with the following characteristics: Intel Core i7 3930 (3.20 GHz) six core with RAM 32 GB.

#### *3.1. SCOPF Formulation Applied to an Example Power System (6-Bus)*

The first security-constrained simulation takes into account the 6-bus power system presented by Wood and Wollenberg [8]. Transmission network data can be seen in [8] or in MatPower [22].

In comparison with the original generation system, three new power units are located at each load bus (*G*4, *G*<sup>5</sup> and *G*6). Figure 1 shows the electrical power system.

**Figure 1.** Test system I: A new 6-bus electrical power system is proposed in this study.

Technical data for the power generation system are given in Table 1. Besides, ramp-up and ramp-down characteristics are incorporated in the last two columns for each power unit.


**Table 1.** Power generation technical parameters.

Solving the classical OPF problem, the optimal cost is *Ctotal* = 3003.17 \$/h. The power generation solution is *P*<sup>1</sup> = 50.0 MW, *P*<sup>2</sup> = 37.5 MW, *P*<sup>3</sup> = 45.0 MW, *P*<sup>4</sup> = 5.0 MW, *P*<sup>5</sup> = 67.5 MW and *P*<sup>6</sup> = 5.0 MW. Figure 2 shows the power flow solution. According to the power flow solution, there is no congestion in transmission elements. PyPower [23] is used to validate the solution and both solutions are the same.

**Figure 2.** Classical optimal power flow (OPF) solution.

To realize possible transmission effects, the maximum transmission limits are changed to 40 MW in lines 1–4, 2–4 and 3–6. With this new capacity, the OPF problem also obtains the same solution (operational cost, power unit generation and transmission power flows). This new transmission capacity is used to develop the N−1 power system security.

#### DC-Based and SF-Based SCOPF Formulations

In the first analysis, both classical (CL) and introduced (SF) problems are implemented to model the SCOPF. These formulations are subject to one pre- and eleven post-contingency constraints.

• Corrective SCOPF problem—for this analysis, power generation data are incorporated in the formulation assuming that the ISO has 10 min to eliminate overloading transmission effects and to recover the steady-state power system security. The main advantage of the corrective formulation is related to the operator clearly knows the post-contingency economic dispatch. This solution considers not only technical generation constraints but the variable fuel cost of each power unit. Indeed, the new power generation setting will achieve a safety and robust security-constrained N−1 solution no matter which transmission element failure.

Two cases are conducted to determine effects when ramps constraints are added in the SCOPF problem. These optimization problems have been solved using the SF-based formulation. Results are the following:

Case 1 Without ramp constraints—the operational cost is 3003.17 \$/h for the pre-contingency condition and 3487.87 \$/h for the post-contingency condition and the optimal total cost is *Ctotal* = 6491.04 \$/h. Actually, the pre-contingency cost is the same than the traditional OPF problem (3003.17 \$/h).

Case 2 Including ramp-up and -down constraints—the cost is 3146.35 \$/h for the pre-contingency condition and 3487.87 \$/h for the post-contingency condition and the total cost is *Ctotal* = 6634.22 \$/h. For this case, the number of decision variables is 12 and the number of constraints is 280.

Table 2 shows the results, operational cost and power dispatch for each SCOPF solution. Positive values imply that the power unit generation must decrease its dispatch. With this information, the ISO rapidly decides (10 or 20 min) which generator would economically change the power unit setting to optimally eliminate overloading conditions.


Because PyPower does not formulated the security-constrained analysis, it is not possible to contrast this power generation solution.

Because of ramp constraints, the ISO will be not able to achieve a safe post-contingency steady-state with the solution obtained in Case 1). Therefore, the system operator will need to decrease the load of the customers or/and turn-on fast reserve power units.

Notice that the ramp-down constraint of power unit 5 (44.14 − 24.14 = 20 MW) is active with a shadow price of 6.36 \$/MWh. Therefore, the pre-contingency cost is higher (143.18 \$/h) than the traditional OPF problem. Additionally, the power generation solution for the post-contingency condition is the same for both cases.

In comparison with the SF-based formulation, simulation results obtained by the CL-based formulation are the same. Therefore, both formulations achieve the same optimal solution; that is, these optimization problems are equivalent.

Incorporating the ramp constraints in the CL SCOPF problem, the number of decision variables is 205 and the number of constraints is 483. Comparing with the SF formulation, there is a very significant reduction of 94.1% and 42.0%, respectively.

Figure 3 displays the pre- and the post-contingency solutions when line 1–2 is outage. Both solutions could be compared to realize redispatch effects in the transmission network for generation two, four, five and six (grey color).

**Figure 3.** Corrective OPF solution considering the outage of line 1–2. (**a**) Pre-contingency power flow solution. (**b**) Post-contingency power flow solution.

Using the post-contingency power generation solution (Table 2), eleven N−1 power flows are computed. For detailed information, Table 3 shows the power flow solution for each outage.


**Table 3.** Power flow solution for each N−1 transmission contingency.

Results showed there are two post-contingency congestions in the following lines: (a) transmission line 2–4 when line 1–4 is out; and (b) transmission line 3–6 when line 2–6 is out. We have highlighted these values with bold. Based on the OPF solution, shadow prices (Lagrangian multipliers) for each congestion are the following: (a) for line 2–4 is 28.60 \$/MWh; and (b) for line 3–6 is 41.98 \$/MWh. The OPF overcost and the bigger Lagrangian multipliers are produced by transmission congestion and ramp-down constraint of unit 5.

• Preventive SCOPF problem—the main advantage of this formulation is related to the operator does not need to modify the post-contingency power dispatch.

Using the SF-based formulation, the number of decision variables is 6 and the number of constraints is 280. The optimal cost obtained by Gurobi is *Ctotal* = 3487.87 \$/h which was previously obtained. In the solution, there is no congestion in the pre-contingency condition. However, there is congestion in transmission line 2–4 (when line 1–4 is out) and line 3–6 (when line 2–6 is out).

Gurobi achieves the same solution solving the CL-based SCOPF formulation. Nevertheless, the number of decision variables is 199 and the number of constraints is 459.

To compare corrective and preventive SCOPF solutions, the power dispatch obtained in the pre-contingency condition must be compared for both solutions. The preventive overcost is 342.52 \$/h (9.82%). Therefore, there is an important operational saving cost obtained by the corrective formulation.

#### *3.2. Corrective SCOPF Using a Ranking of Contingencies*

In real power systems, all contingencies do not result in a post-contingency alert condition [8]. To figure out the SCOPF problem, several authors have used as security criterion the most severe contingency [6].

Regarding the N−1 power flow solution obtained previously by the initial OPF (3003.17 \$/h), we solves eleven power flows for each N−1 post-contingency event. Simulations have shown there are five overloading states: (a) when line 1–4 is out, the power flow in line 2–4 is 131.32%; (b) when line 2–4 is out, the power flow in line 1–4 is 117.11%; (c) when line 2–6 is out, the power flow in line 3–6 is 124.41%; (d) when line 3–5 is out, the power flow in line 3–6 is 103.70%; and (e) when line 5–6 is out, the power flow in line 3–6 is 109.42%. Indeed, the ranking of overloading contingencies is the following: (1) line 1–4, (2) line 2–6, (3) line 2–4, (4) line 5–6 and (5) line 3–6.

The outage of these transmission elements produce a risky operational condition and probably technical problems to supply adequately the load of the customers. In the next analysis, each line of this ranking will be added in the SCOPF problem. Table 4 displays the number of decision variables and constraints and the objective function solving the SCOPF problem.


**Table 4.** Security-constrained operational cost considering the ranking of contingencies.

• For the first case, the pre-contingency and the worst contingency (line 1–4) constraints are simultaneously incorporated in the SCOPF problem.

Solving the SCOPF problem, the post-contingency power unit solution is used to validate electrical system security using eleven N−1 power flow problems. Simulation results show there are three overloading conditions: (a) when transmission line 2–6 is out, the power flow in line 3–6 is 51.18 MW; (b) when transmission line 3–5 is out, the power flow in line 3–6 is 42.74 MW; and (c) when transmission line 5–6 is out, the power flow in line 3–6 is 42.11 MW. Notice that the maximum power flow in line 3–6 is 40 MW. Therefore, a SCOPF problem based on the worst contingency does not guarantee a safe post-contingency operational point.


Finally, the post-contingency analysis including two N−1 candidate lines is developed. The SCOPF problem incorporates two major contingencies: line 1–4 and line 2–6. This framework adds in the optimization problem simultaneously one set of pre-contingency constraints and two sets of post-contingency constraints. For this optimization problem, the achieved solution is the optimal (the same solution considering the outage of all lines).

Table 4 also shows a reduction of 35.7% in constraints with respect to the original problem. In addition, both problems have the same number of decision variables.

As a result, the SCOPF problem should be modelled not only with the worst contingency but also with two or more contingencies to avoid risky operational conditions and technical problems to supply the load of the customers.

#### *3.3. SCOPF Methodology Applied to Di*ff*erent-Scale Power Systems*

We employ different electrical power test systems to determine the performance of both corrective and preventive SCOPF formulations using the SF-based as well as the CL-based frameworks: 14-bus system (five generators and twenty transmission lines); 57-bus system (seven generators and eighty transmission lines); 118-bus system (fifty-four generators and one hundred eighty-six transmission lines); and 300-bus system (sixty-nine generators and four hundred eleven transmission lines). PyPower shows technical information.

#### 3.3.1. Corrective and Preventive Formulations

Because PyPower does not include ramp generation constraints, we have selected these limits using random numbers provided by a normal distribution function with a mean of 7 MW/min and a variance of 1.

Table 5 shows simulation results for both corrective and preventive analyses applying the CL and the SF formulations. For each SC problem, we have included the objective function (*OF*), the number of decision variables (*n*), constraints (*ni* + *ne*) and the average simulation time considering 200 trials.

For both 14-bus and 57-bus power systems, the security-constrained problem models the outage of all transmission lines.

Considering the higher simulation time obtained by a complete N−1 analysis, a reduced number of contingencies should be used to solve the security problem applied to large-scale power systems. In the 118-bus and the 300-bus systems, we have modelled the N−1 analysis using 176 transmission and 100 transmission candidate lines, respectively. This assumption is accomplished not only by the bigger number of variables and constraints but also by the higher simulation time.


**Table 5.** Simulation results using different power systems.

For all power systems, there is an important improvement in the simulation time. Comparing both classical and SF-based formulations applied to the 300-bus system, SF-based formulation achieves the best performance with an efficiency of 6.97 times using the corrective formulation and 7.02 times using the preventive formulation.

Simulation results have shown the corrective formulation obtains the lower *OF* in comparison with the preventive formulation.

Notice that the reactance bus matrix needs an inverse method.Therefore, the main disadvantage of the proposed methodology is the time computing to obtain the SF matrix. However, the security-constrained analysis could be carried out using an off-line SF matrix; that is, the SF matrix does not need to compute each N−1 time. Therefore, an external file could be used to save these matrixes improving the formulation time.

The LODF matrix could be computed using basic mathematical operations—Equation (37). Therefore, the computation time is not significant.

3.3.2. Another Classical DC-Based Preventive Formulation Applied to the 300-Bus Power System

In the following analysis, we use a different preventive SCOPF formulation. For this approach, power flow variables are eliminated from the optimization problem using two admittance bus matrixes

(*Bbuspre* and *Bbuspost*). Consequently, there are only two sets of nodal balance constraints for the preand post-contingency analyses. Besides, the primitive-admittance incidence matrix for the pre-contingency *B f pre* and the post-contingency *B f post* conditions is substitute to determine transmission power flows.

Simulating 200 trials, the average simulation time is 0.7648 s for the preventive SCOPF problem which is 19.22% lower than the original CL-based formulation. For this mathematical problem, the number of decision variables is 30,369 and the number of constraints is 113,361. Therefore, the new classical formulation causes a reduction of 57.69% in the number of decision variables and a reduction of 26.76% in the number of equality and inequality constraints. Although there is an improvement in the simulation performance, this time is still higher (5.67 times) than the SF-based formulation.

Based on simulations, the introduced SF-based formulation achieves the best simulation time to carry out the corrective formulation and the preventive formulation applied to the SCOPF problem. This improvement is obtained by the lower number of decision variables as well as by the lower number of equality and inequality constraints. Therefore, this framework could be recommended to solve very large-scale power systems.

#### *3.4. Corrective and Preventive SCOPF Methodology Applied to the Chilean Electrical Power System*

The Chilean Interconnected Power System (in Spanish, SEN: Sistema Eléctrico Nacional) is used to validate both SCOPF formulations. Concerning the lower simulation time accomplished previously, the SF-based methodology is only applied to the SCOPF problem.

In January, 2019, the SEN generation capacity was 21,189.95 MW. In 2019, the peak load was 9382.7 MW. Review the Regulatory Company webpage (CNE: www.cne.cl) or the ISO webpage (CEN: www.coordinador.cl) to obtain more technical information.

The electrical power system contains one hundred fifty-nine buses, three hundred twenty-one transmission lines from 110 to 500 kV and two hundred sixty-seven generation power units (thermal, hydro and renewable energy). The webpage link https://infotecnica.coordinador.cl displays detailed technical and economic information about the Chilean study case.

For the power generation system, there are one hundred three renewable power units. For hydro, wind and solar, a capacity factor of 20%, 20% and 30% are selected, respectively. For thermal units, random numbers model ramp generation constraints for each power unit. Additionally, one hundred transmission lines are randomly selected to formulate the N−1 security-constrained analysis.


Even though the security cost is similar, the OPF formulation only considers one load time-period. Moreover, the unit commitment solution obtained by the ISO should be incorporated in the OPF problem to realize real saving costs.

#### **4. Conclusions**

This study analyzed two very effective methodologies to solve both corrective and preventive security-constrained DC optimal power flow problems using shift-factors. We used these factors to formulate pre- and post-contingency steady-state conditions. The optimal solution was found with lower number of decision variables and constraints in comparison with the traditional (CL) SCOPF formulation. To validate simulation time, a lot of analyses were conducted to determine which methodology obtained the best performance solving the SCOPF problem with different-scale test power systems.

The research group believes that the introduced security-constrained methodology could be applied to formulate deterministic and stochastic power system issues where the transmission topology does not change. For instance, unit commitment, generation planning, among other power system problems.

**Funding:** This research was funded by Universidad Técnica Federico Santa María, Chile, grant number USM PI\_M\_18\_14 and The APC was funded by the same research project.

**Acknowledgments:** The author would like to thank the associate editor and the anonymous reviewers for their valuable comments.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Nomenclature**

#### Parameters


#### SETS


#### VARIABLES


#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **New High-E**ffi**ciency Resonant O-Type Devices as the Promising Sources of Microwave Power**

**Andrei Baikov 1,\* and Olga Baikova 2,3**


Received: 9 April 2020; Accepted: 11 May 2020; Published: 15 May 2020

**Abstract:** New O-type high-power vacuum resonant microwave devices are considered in this study: COM klystrons, CSM klystrons and resotrodes. All these devices can output a large amount of power (up to units of MW and higher) with an efficiency of up to 90%. Such devices are promising microwave sources for industrial microwave technologies as well as for microwave energy. The principle of GSP-equivalence for klystrons is described herein, allowing a complete physical analog of this device with other parameters to be created. The existing mathematical and computer models of klystrons are analyzed. The processes of stage-by-stage optimization and the embedding procedure, which leads to COM and to CSM klystrons, are considered. Resotrodes, IOT-type devices with energy regeneration in the input circuit, are also considered. It is shown that these devices can combine high power with an efficiency of up to 90% and a gain of more than 30 dB. Resotrodes with 0-regeneration can be effective sources of radio frequency (RF) power in the range of 20 to 200 MHz. Resotrodes with 2π-regeneration are an effective source of RF/microwave energy in the range of 200 MHz to 1000 MHz.

**Keywords:** klystron; IOT; resotrode; efficiency; microwave; bunching; computer simulation; global optimization

#### **1. Introduction**

Powerful vacuum microwave devices have existed and been developed since the 1930s. From the moment of their introduction until recently, such devices have mainly been used for radar and communication. Other applications of one of these devices (the magnetron) are defrosting food and cooking in household microwave stoves.

Some applications of vacuum microwave devices have been discovered relatively recently. In the 1960s, charged particle accelerators appeared that used microwave fields for acceleration. In the 1970s, medical microwave devices began to appear, including medical accelerators, microwave scalpels, among others.

Recently, industrial technologies using microwave energy have begun to be developed. These include, for example, technologies for defrosting large volumes of products in defrosters [1], for the radiation sterilization of products and materials, for sintering ceramics, for drying wood [2], and so on.

The development potential of industrial microwave technologies is very large. For example, several years ago, successful experiments were conducted to produce artificial sandstone from crushed sand. This is a building material that can compete with concrete in mass production. Foam glass is another building material that can be efficiently produced using microwave technology. There is also the possibility of deep oil refining using microwave technologies.

Another area with great potential for development is microwave energetics. This is a set of technologies utilizing microwave radiation for wireless energy transmission. Energy can be transferred to hard-to-reach places, to mountainous areas, to remote islands, to polar stations, and so on. In addition, energy can be transferred from the earth to aircrafts and spacecrafts. In the long term, the reverse process is possible via the transfer of energy to Earth from solar panels located in space.

Classification of existing high-power vacuum electronic devices (Figure 1), amplifiers and generators of microwave power, can be carried out by taking two basic characteristics into account: the type of interaction of the microwave field with the electronic flow (O-type, M-type, gyrotron type) and the nature of the microwave vibrations (traveling wave or standing wave). As examples, Figure 1 shows the most typical representatives of each class.

**Figure 1.** Classification of high-power vacuum electronic devices.

In contrast to devices with a traveling wave, the interactions of electron beams with microwave fields in O-type resonant devices are local and realized in the gaps of resonators, the values of which are significantly less than the wavelength.

O-type resonant devices can be divided into two classes: devices with velocity modulation (klystron type) and devices with emission modulation (IOT type). The first class includes various modifications of klystrons, while devices of the second class have several names: klystrodes, inductive output tubes (IOTs), tristrodes, among others.

Figure 2 represents the simplified schism of standard klystrons and IOTs.

Klystrons and IOTs are the powerful microwave amplifiers (i.e., transformers of a small input microwave signal to a large output). The output signal receives energy from a DC source or a pulsed electrical power source (modulator). The electron beam is accelerated via the voltage of the electrical source, then bunched and decelerated by the microwave field. Slowing down, the electron beam gives away its kinetic energy to the microwave field. This process can be considered the transformation of DC or pulsed power into microwave power, and therefore klystrons and IOTs can be considered microwave power sources. Klystrons and IOTs are currently widely used for powering particle accelerators and in medical equipment, but their promotion in the field of mass industrial technologies and energy is hindered by their lack of efficiency.

**Figure 2.** (**a**) Simplified schemes of klystrons and (**b**) IOTs.

Recently, several publications have proposed and investigated new variants of O-type resonant devices based on computer models: COM klystrons [3–9], CSM klystrons [9–12] and resotrodes [13–17]. It has been shown that the efficiency of such devices can reach 90%.

Tables 1 and 2 compare the main characteristics of the standard and new devices.

We use the term "standard length" instead of a specific numeric value, since the length depends on the operating frequency, current and voltage. If the values of these parameters are set, the "standard length" becomes a specific number.

A big length is a problem for frequencies under 1000 MHz (standard length is recognized as about 2 m or more in these cases). Thus, devices can be very large and expensive. COM klystrons are 60% longer than the standard length (see Table 1), meaning that they are even longer and even more expensive. As the operating frequency increases, the standard length decreases proportionally, reaching values of 15–20 cm at 10 GHz. For such high frequencies, increasing the length in COM klystrons is an advantage, since it allows the problem of heat removal to be solved.

The main disadvantage of CSM klystrons is their more complex design, which includes cavities of the second and third harmonics. This disadvantage becomes more significant with increases in work frequency.

Thus, in the lower part of the microwave range (up to about 5 GHz), it is advisable that CSM klystrons be used, while COM klystrons are optimal for higher frequencies.

In general, the main disadvantages of COM and CSM klystrons are their larger dimensions (for COM), complicated designs, and increased costs. The main advantage of these klystrons is their heightened efficiency.

Unlike IOT, resotrodes use the regeneration of microwave power in the input circuit. This inevitably narrows the gain band. Therefore, the main drawback of resotrodes is the inability to use them as broadband amplifiers (Table 2). When using a resotrode as the source of microwave power, this drawback does not manifest itself. The main advantage of resotrodes in comparison with IOTs is the ability to provide high gain at high output power with high efficiency.

The purpose of this paper is to review the latest achievements in the field of modeling and to research new O-type resonant microwave devices that could be used for the development of industrial microwave technologies and microwave energetics.


**1.**ComparisonofstandardklystronswithCOMandCSM

Harmonic cavities

Broadband

Work frequency

Gain Efficiency

 amplifier

 No

 yes

 400–1000 MHz

<20 dB

<70%

 No

 no

 40–200 MHz

 35 dB and more

 up to 90%

 2nd or 3rd harmonic cavity

 no

 200–1000 MHz

 35 dB and more

 up to 90%

#### **2. E**ffi**ciency of Resonant Microwave Devices of O-Type**

Both in industrial microwave technologies and in microwave power engineering, it is essential to achieve high efficiency values.

For example, in power systems, no more than 10–12% is lost during transportation and transmission (i.e., the total efficiency of power systems is about 90%). When implementing microwave power systems, efficiency must be at least 90%, which means that the efficiency of devices and devices included in such systems must be at least 90%.

In industrial microwave technologies, an increase in efficiency from 60% to 90% (1.5 times) will reduce the cost of electricity consumed by the same 1.5 times. This can turn some promising, but currently unprofitable, technologies into cost-effective ones. At the same time, the power of additional cooling equipment will be reduced by four times, which will result in additional savings.

For all these applications, high efficiency must be combined with high power output.

Efficiency is one of the main output parameters of the device and is defined as the ratio of useful to consumed power:

$$\eta = \frac{P\_{\text{useful}}}{P\_{\text{supply}}} \tag{1}$$

Depending on the interpretation of the concepts "useful power" and "spent power", different definitions of efficiency are possible. In the Equation (1), *Puseful* refers to the microwave power transmitted to the load, and *Psuply* refers to the power of the power source spent on beam acceleration. This definition corresponds to the term "load efficiency".

In this study we also use the term "electronic efficiency", in which *Psuply* is defined as above, while *Puseful* refers to the microwave energy transmitted by the electron beam to the microwave field in the output cavity. Electronic efficiency is always higher than load efficiency. In resonant microwave devices of the O-type, as a rule, these two values differ by less than 1%, but sometimes the difference can be several percent.

For a long time during the development of O-type resonant microwave devices, efficiency was given less priority compared with other output parameters such as output power and bandwidth. The 2005 SLAC report [18] showed a graph of the increase in the pulsed power of klystrons, revealing that the power grew exponentially over the period from 1940 to 2000, reaching a level of 1 GW by 2000. This process is still ongoing.

As for efficiency, there have been no similar dynamics in improving this parameter. The efficiency of a multi-cavity klystron of 50% was achieved at the turn of the 1950s–60s, and since then then no significant increase in the average efficiency of manufactured klystrons has been achieved.

During the second half of the 20th century, several single klystrons with a fairly high efficiency were created. Among them, we should note the 50 kW CW klystron developed by E. L. Lien in 1970 [19,20], which has an efficiency of 75%, and a klystron with an efficiency of more than 80% (electronic efficiency of about 90%) that was developed by S. V. Lebedinsky in 1979 [21].

Devices with emission modulation have had an efficiency of 50–60% since the beginning of their development in the 1980s. Among these devices, we should note the tristrod [22,23] (an IOT with an additional bunching cavity), created by V. A. Tsarev (RF, NGO "Contact") in 1998. This device had a measured efficiency of about 90%.

These devices were unique, and these results have not been repeated by anyone until now.

Most of the currently manufactured klystrons do not reach an efficiency level of 65%, and only three of the manufactured devices (Figure 3) have an efficiency of about 70%. These are the TH-1801 from Thales, E-3736 from Toshiba, and VKL-8301 from CPI.

TH-1801 E-3736 VKL-8301

**Figure 3.** Modern industrial klystrons with maximum efficiency.

#### **3. Klystron Parameters and the GSP Equivalence Principle**

Before we talk about the possibility of increasing the efficiency of klystrons, we will analyze the main parameters that define the device design and power mode.

The set of these parameters can be divided into two groups, which we will call "group A" and "group B".

Group A includes parameters that are selected a priori based on the technical requirements for the device and general physical considerations. These are the main frequency *f* 0, the accelerating voltage *U*0, the total current *I*0, the number of electron beams *Nb*, the number of cavities *n*, the working harmonics, the gap values *lg*, the R/O factors of the cavities ρ, the proper *Q*-factors of the cavities *Q*0*i*, the beam radii *rb*, the canal radii *rc*, among others. The index *i* means the number of the cavity or the stage. If all the parameters of group A are set, we will say that the basic design of the device is set.

Group B consists of all other parameters of the device, which include the natural frequencies *fi* and the loaded *Q*-factors of the cavities *Qi*, the length of the drifts *lci* and the input power *Pin.* Group B can be narrowed if the values of some of the listed parameters must be fixed in accordance with technical requirements. For example, in a narrow-band klystron, the *Q*-factors of the intermediate cavities can be set equal to their own *Q*-factors, and, accordingly, they will not be included in group B.

If complex electrodynamic systems and multi-band cavities are used in klystron stages, the number of parameters will increase.

A complete set of input parameters of groups A and B that are enough for the development of the device will be called the complete set of parameters.

We will now determine the number of optimization parameters for an *n*-stage klystron consisting only of the standard stages (with a single single-gap cavity that is excited at only one frequency). The parameters of group B are *n* detunings, *n Q*-factors (including the loaded *Q*–factors of the input and output cavities), *n*−*1* drifts and the input power. As such, there are *3n* parameters. Some of these parameters may not be included in the optimization or, accordingly, in group B. For example, if all the intermediate cavities are detuned far beyond the band, then their *Q*-factors can be set equal to their own

*Q*-factors and fixed. The number of parameters disabled from optimization will not exceed *n2*, since the loaded *Q*-factors of the input and output cavities should remain in optimization. Thus, the number of parameters in group B for the *n*-cavity klystron are in the range from *2n* + *2* to *3n*. That is, from 12 to 15 parameters are obtained for the five cavity klystron, and for the 16 to 21 parameters are obtained for the seven cavity klystron.

The number of parameters is reduced after reducing them to a dimensionless form.

When writing formulas, we will use two types of normalization, the first of which is the fundamental normalization [24], which is obtained by dividing physical quantities by the quantities of the same dimension made up of fundamental physical constants: the charge of the electron, the mass of the electron, the speed of light and the dielectric constant. Values for this normalization are shown in Table 3.


**Table 3.** Values for fundamental normalization.

We will denote the fundamental-normalized (f. n.) values by underlining them from below. For example, *U* = *<sup>U</sup> Ue* (f. n. voltage), *<sup>I</sup>* <sup>=</sup> *<sup>I</sup> Ie* (f. n. current), *<sup>v</sup>* <sup>=</sup> *<sup>v</sup> <sup>c</sup>* (f. n. speed), and so on.

The second, "functional" normalization is obtained by dividing physical quantities by quantities of the same dimension, which are made up of the main functional parameters of the device: output power, accelerating voltage and main frequency (Table 4).

**Table 4.** Values for functional normalization.


Functionally normalized values will be denoted by two asterisks at the bottom for example, *f* ∗∗ = *<sup>f</sup> f*0 , *z* ∗∗ <sup>=</sup> *<sup>z</sup> lnorm* , *Z* ∗∗ <sup>=</sup> *<sup>Z</sup> Rnorm* , *t* ∗∗ <sup>=</sup> <sup>ω</sup>0·*t*, etc.

In the process of functional normalization, the parameters of group A turn into dimensionless complexes [25] (similarity criteria), the number of which is three less than the number of initial parameters.

The first of these dimensionless complexes is the relativistic perveance assigned to a single beam of a multi-beam (MB) klystron:

$$\underline{\mathbb{K}}\_{rel} = \frac{\underline{P}\_0}{\underline{\mathbb{U}}\_0 \, ^{5/2} \Big(1 + \frac{\underline{\mathbb{U}}\_0}{2}\Big)^{3/2} N\_b} \tag{2}$$

*Energies* **2020**, *13*, 2514

In addition, dimensionless complexes are the normalized length of the gap (the angle of passage through the gap),

$$\Theta = \frac{\omega \circ l\_{\mathcal{S}}}{c \sqrt{1 - \frac{1}{\left(1 + \underline{l\_{\mathcal{L}}}\right)^2}}}\tag{3}$$

the gap form factor,

$$
\mu = \frac{l\_{\text{g}}}{2r\_{\text{c}}} \tag{4}
$$

the coefficient of filling the channel with a beam,

$$\alpha = \frac{r\_b}{r\_c} \tag{5}$$

and the excitation parameter [24], which has the meaning of the "natural" bandwidth of the device,

$$\nu = \frac{\rho \cdot P\_0}{\mathcal{U}\_0^2} \tag{6}$$

The parameters of group "B" are converted to dimensionless lengths of drifts,

$$l\_{\stackrel{ci}{\star\star}} = \frac{l\_{ci}}{l\_{norm}}\tag{7}$$

to the relative detuning of cavities,

$$
\delta\_{f\_i i} = \frac{f\_i - f\_0}{f\_0} \tag{8}
$$

and to the normalized input power,

$$P\_{\rm ini} = \frac{P\_{\rm ini}}{P\_0} \tag{9}$$

The number of parameters in group B does not change during the normalization process.

The general scaling principle (GSP; transcription as "global scaling principle" was also used in publications [25]) is the statement that two klystrons with the same values of dimensionless Parameters (2)–(9) are physically equivalent. All processes of the grouping and selection of energy occur in these two devices in the same way. These two devices will have the same dimensionless output characteristics, including the same efficiency. At the same time, such klystrons can differ greatly in the level of output power, the number of beams, the accelerating voltage, the operating frequency and other basic dimensional characteristics.

The GSP principle splits the set of all existing klystrons and all klystrons that may be developed in the future into equivalence classes. Each equivalence class is defined by five parameters (Parameters (2)–(6)).

Note that of Parameters (2)–(6), only Parameters (2) and (6) are free. Although optimization for Parameters (3)–(5) is not usually performed, their values are chosen close to the known conditionally optimal values μ  1, α  0.5 and Θ  1.2 ÷ 1.4, and do not deviate significantly from these value, since such a deviation leads to a deliberately suboptimal construction. Thus, Parameters (3)–(5) can be considered fixed.

Consider an equivalence class given by two parameters *Krel* and ν, assuming that each of the parameters μ, α and Θ is in some reasonable range near the optimal value. This class of klystron equivalence will be called the *K* − ν class.

Each *K* − ν class is represented by a dot on the *K* − ν diagram.

In Figure 4, klystrons that are currently produced by various manufacturers are divided into three groups according to level of efficiency, and are marked on two *K* − ν diagrams. In the right-hand diagram, the abscissas axis is replaced by the normal perveance *K* = *<sup>P</sup>*<sup>0</sup> *U*0 5/2·*Nb* instead of the relativistic perveance (Parameter (2)).

**Figure 4.** Produced klystrons: (**a**) in the coordinates (*Krel-*ν*,),* (**b**) in the coordinates (*K-*ν), divided into three groups by efficiency level: ♦ <sup>η</sup> <sup>&</sup>lt; 40%, η = 40–60%, • η > 60%.

The sample includes devices of different frequency ranges (from 400 MHz to 30 GHz), different levels of output power (from 3 kW to 80 MW) and different numbers of beams (from 1 to 42).

All klystrons are divided into three groups: high efficiency (60–70%), medium efficiency (40–60%) and low efficiency (less than 40%).

Klystrons with low and medium efficiency are distributed evenly across the diagram, while klystrons with high efficiency are concentrated in the region of relativistic perveance 0.2–0.3. This range is optimal in terms of efficiency.

The GSP principle makes it easier to develop new high-performance klystrons if their analogues already belong to the same class. To do this, it is enough to recalculate all the parameters, leaving the dimensionless values (Parameters (2)–(9)) constant. In this case, GSP modeling avoids a long, complex and expensive process of computer optimization of the device. The method of such recalculation is described in [24]. The limitations of using GSP modeling are also discussed there.

#### **4. Methods of Klystron Modeling and Optimization**

If there are no suitable GSP analogues (and no analogues for efficiency above 70%), it is necessary to optimize the parameters of group B using a mathematical model implemented in the form of a computer code.

Currently, developers use several computer codes that simulate the operations of klystrons. The most appropriate codes are based on particles-in-cell (PIC) models [26], which were initially developed for modeling processes in plasma, then adapted for electrovacuum devices.

Klystron developers actively use two universal PIC packages: the MAGIC code [27], developed by the American firm Orbital ATK, and the CST studio suit code [28], produced by the German company CST.

However, it should be noted that universal PIC codes do not consider the specifics of microwave devices, so the calculation process for them are extremely expensive. On an office PC, the calculation of one specific device variant with fixed frequency and input power values is a day or more. This process can only be performed on high-performance supercomputers. At the same time, the computing process itself not only requires a lot of resources, but also a lot of data preparation, and any changes to the data are associated with significant difficulties. For example, in order to change a cavity's detuning, it is necessary to repeatedly change its size, select a new frequency value, rebuild the computational grid, and only then perform the calculation process with the new frequency. This complexity of changing data makes it impossible to optimize the parameters of devices using universal PIC codes, even on supercomputers.

Note that, despite the huge computational complexity of universal PIC codes, their use does not guarantee a reliable result. Although the methodological error in such codes can be practically reduced to zero by shredding the computational grid and increasing the number of particles, this cannot be said about the computational error, which, on the contrary, often begins to increase under these conditions. The computational error is shown as "numerical noise", the presence of which is usually estimated visually. At the same time, there are neither sufficiently reliable mechanisms for recognizing computational noise, nor universal ways to suppress it. Usually such noise is suppressed by heuristic manipulations such as changes in boundary conditions (deterioration of wall conductivity), adding virtual absorbers, and so on. It is obvious that such actions change the model of the system, and, consequently, lead to a deterioration in the adequacy of modeling the original system.

However, universal non-stationary 2D/3D PIC codes are currently the main recognized criterion for the accuracy of results obtained by other computational methods.

In addition to universal PIC codes, quite a lot of special computing codes have been developed that are adapted to microwave devices of one type or another.

Let's look at the features of such codes on the example of the TESLA (USA) code, created by a group of specialists from several scientific organizations in the United States. As follows from the description in [29], for example, the TESLA code uses the separation of the microwave fields of cavities and fields (coulomb and microwave) in the region of beam motion. The crosslinking of these fields in the gap area occurs on the continuation of the inner surface of the drift pipe. This separation corresponds to the specifics of the klystron and allows us to separate the problems of electrodynamics (modeling of microwave fields in the cavity) and electronics (beam bunching, interaction of the beam with microwave fields). This, in turn, leads to the possibility of using different scales of computational grids in the cavity region and in the interaction region, while accordingly leading to the possibility of a significant reduction in the number of particles and a consequent reduction in calculation time. The calculation time for one option under the TESLA code is 10–20 min on a personal computer. This is too much to optimize on a PC, but on multiprocessor supercomputers, optimization can be performed using this code. It should be noted, however, that due to the simplifying assumptions used in the construction of the model, the question of the methodological error of the results remains open. For example, in the results given in [29], the discrepancy between the results of TESLA and MAGIC seems significant.

One of the latest developments of adapted codes is the KlyC/1.5 code (CERN, Geneva, Switzerland) [30], developed by I. Syrachev and J. Cai. This code allows you to model the operation of a klystron in approximations 1D and 1.5 D (one-dimensional motion with a bundle). According to the authors, the difference between the results of KlyC-1.5 D and CST is no more than 1%. The calculation time for a single variant using the KlyC-1.5 D code with seven layers depends very much on the device being modeled, but is usually at least 30–40 min on a PC. This is too much for full multiparametric optimization, which requires tens or even hundreds of thousands of calculations [31].

In order to combine high performance with high adequacy, a new class of klystron models was developed called "discrete-analytical models" [32].

Discrete-analytical models appeared as a reasonable compromise between analytical and numerical approaches in the construction of a mathematical model of the device. This compromise is achieved by splitting the structure of the device or processes occurring in it into conditionally autonomous parts, building models for these parts in the form of analytical dependencies and coupling these dependencies.

Thus, the core of the mathematical model is a set of analytical dependencies, while the rest of the model structure, including the method of coupling these dependencies, is formed on the basis of numerical algorithms.

The numerical algorithms can also be used in the process of forming the core to determine the numerical parameters that are included in the core of model. In this case, the gain of time is achieved because the process of the numerical determination of parameters (the process of "learning") is separated from the model operation process.

In addition to the speed and adequacy of the model, the optimization method itself is an important factor for successful optimization of klystron parameters.

The main feature of any klystron goal function constructed on the basis of efficiency is the existence of a multidimensional "quasi-optimal" variety of an unknown a priori structure, characterized by a weak and non-monotonic change in the objective function with a very large (10<sup>12</sup> or more) number of local extremes [31].

As one of the simplest examples of movement through such a variety, we can cite the process of tuning the klystron input cavity while increasing the input power so that the amplitude of the microwave voltage in the input gap does not change. In this case, the efficiency of the klystron due to changes in the phase of the input voltage will change, but very slightly and non-monotonously. The trajectory of changes in the detuning and the input power in the parameter space forms a curve, a one-dimensional quasi-optimal variety. If you add a loaded *Q*-factor to these two parameters and change all three parameters at once so that the microwave amplitude in the input gap does not change, you will get a two-dimensional quasi-optimal variety. As the number of parameters increases, there are more and more opportunities to change them without significantly changing the efficiency, but it is very difficult to find all such variations a priori (i.e., the structure and dimension of a quasi-optimal manifold are unknown in advance).

This structure of the target function makes it useless to use any optimization method aimed at finding a single local extremum, which explains the fact that most researchers using optimization have always achieved local results without getting a procedure that would allow them to approach the global extremum with a guarantee (even for a very long time).

To overcome these difficulties, a method of macro-steps was developed [31] that includes probing a relatively large working area in the parameter space, finding the local extremes closest to the best points obtained as a result of probing, selecting one of these local extremes and moving to a new working area. The macro-steps method allows for multi-parameter optimization and provides hope for achieving a global maximum of the goal function. To increase the guarantee of achieving the global extremum, the macro-steps method must be supplemented with an appropriate method of structural optimization, as will be discussed in more detail later.

The KlypWin computer code was developed based on the discrete-analytical klystron model and the macro-steps method [33]. It takes less than 1 s to calculate one version of the device using KlypWin.

A comparison of the calculation results for the KlypWin and Magic codes conducted by David Constable in 2013–2016 for 10 klystrons synthesized as part of the work of the international High Efficiency Klystron Activity (HEIKA) group [7] is shown in Figures 5–7.

Note that each point by the MAGIC code in Figures 6 and 7 requires more than a day of computer time, while a similar point by the KlypWin requires only 0.3 s. As you can see from the figures, the difference in results for all 10 devices does not exceed 5%. The results show that KlypWin can be considered an adequate code for modeling klystrons. At the same time, its speed makes it possible to perform global multi-parameter optimization.

**Figure 5.** Results of COM-klystron modeling by codes KlypWin and MAGIC.

**Figure 6.** Comparison of the transfer curves of the klystron №3 Figure 5.

**Figure 7.** Comparison of the transfer curves of the klystron №1 Figure 5.

#### **5. COM and CSM Klystrons are Promising Power Sources in the Centimeter Range**

Based on the KlypWin code, a large set of studies has been conducted over the past decade on the possibility of achieving the maximum efficiency values in klystrons [4–12,24]. In the course of these studies, it was found that maximum efficiency values were achieved with two bunching modes: the core oscillation method (COM) [4–9] and the core stabilization method (CSM) [9–12].

As was found in the course of research, in order to reliably achieve the global extremum of the target function, the macro-steps method must be supplemented with a suitable structural optimization method. Structural optimization refers to the process of gradually complicating the structure of a device, starting with the simplest structure. After each stage of structural complexity, global optimization is performed using the macro-steps method. The process that includes structural optimization of this type and global optimization by the method of macro-steps is called full optimization [34].

Klystrons with extreme efficiency values are obtained as a result of two possible processes of full optimization.

The first full optimization process is called stage-by-stage optimization [24] and was first described in [4]. The process begins with the formation of a two-cavity klystron based on a given prototype (a set of parameters of group A). This two-cavity klystron is optimized, after which another stage is added to the optimal two-cavity klystron, and it turns into a three-cavity klystron. For the resulting three-cavity klystron, the optimal input power is found, then global optimization is performed for all parameters in a wide range of their changes. The resulting optimal three-cavity klystron is converted into a four-cavity klystron by adding another stage, the global optimization process is started again, and so on. Starting with a three-cavity klystron, adding a new stage is performed by duplicating

the second stage. The process stops when, after adding a new stage and performing optimization, the efficiency stops increasing (it comes to saturation).

The stage-by-stage optimization process is shown for a C-band klystron with an output power of 8.5 MW, as shown in Figure 8.

**Figure 8.** Stage-by-stage optimization of the klystron.

In addition to efficiency, the figure shows two other key parameters. The first one is the saturation of the bunch [5], which reflects the proportion of electrons caught in the bunch at the entrance to the output gap. The second parameter is lumped bunching length (LBL) [4], which is describe in the Table 5.

**Table 5.** Bunching dimensionless values.


The left part shows the change in the bunching process (phase trajectories) as the number of stages increases. The upper-right corner shows the change in efficiency (1), bunch saturation (2) and LBL (3) depending on the number of stages.

In the stage-by-stage optimization process, only standard stages are added, which include a simple single-gap cavity and a single drift. Accordingly, all klystrons obtained during this process have a standard configuration (i.e., they do not contain multi-gap cavities and cavities of higher harmonics).

All klystrons resulting from stage-by-stage optimization are characterized by the same non-monotonic character of bunching.

For the core particles of a bunch, this mode is oscillatory: these particles approach the center of the bunch, then move away from it. For particles of anti-bunch [5], the oscillatory nature of the motion is changed to a monotone. It is in this bunching mode that the klystron of the standard configuration reaches the maximum saturation of the bunch at a given frequency.

In turn, the saturation of the bunch is one of the main and most difficult factors to achieve in order to obtain maximum efficiency.

The key parameter that ensures the maximum saturation of the bunch is the LBL, which must be increased 1.6–1.8 times compared to known prototypes to increase the bunch saturation. The length of the device increases accordingly.

The fact that the stage-by-stage optimization of various prototypes that differ significantly from each other leads to the same nature of bunching and to similar values of the LBL and efficiency, allows us to speak about the fundamental nature of both the bunching mode itself and the LBL as the main parameters that characterize the bunching mode. This bunching mode was named core oscillation method (COM) [5].

All COM klystrons synthesized during full stage-by-stage optimization are characterized by efficiency from 85% to 90%, and the result practically does not depend on the prototype used, in particular, on its power. Thus, powerful COM klystrons with high efficiency can be developed for use in industrial microwave technologies and in microwave power engineering.

The second method of full optimization is called the embedding procedure. In this case, the optimal three-cavity COM klystron is selected as the initial option. Then, the total length is fixed, and new cavities are inserted between the existing ones. As shown by the research, only the first harmonic does not affect the efficiency in this case during the insertion of cavities, and efficiency remains at the level of 70%. To increase efficiency, it is necessary to alternate the cavities of the first harmonic with the cavities of the second harmonic, and to achieve an efficiency of about 90% it is necessary to include at least one cavity of the third harmonic. An explanation of this fact is given in [9,24].

Note that second harmonic cavities are used in the development of many klystrons. As a rule, developers have been limited to one such cavity, but in rare cases their number has increased to two or three. In this regard, we can mention the works of I. A. Guzilov, which were devoted to the development of BAK-bunching [35].

We could not find information about the use of third harmonic cavities in commercially produced klystrons. High-efficiency klystron models based on third harmonic cavities were constructed in [36,37] by Chiara Marelli (ESS) and Igor Syratcev (CERN).

The embedding procedure results differ significantly from COM bunching. There are no fluctuations in the core of the bunch; instead, the core stabilizes and attains a monospeed. Peripheral electrons due to voltage harmonics acquire a high relative velocity, which is quickly extinguished when these electrons approach the core of the bunch. After the speed is extinguished, these electrons replenish the stabilized core of the bunch. This process is called the core stabilization method (CSM). It provides a high saturation of the bunch, but at a much shorter length than COM. At the same time, the total number of cavities increases in comparison with COM klystrons. The optimal COM klystron has seven to eight cavities, while in the optimal CSM klystron the number of cavities increases to 10 or 11. The problem of arranging cavities in this case turns out to be solved, because the cavities of the second and third harmonics are relatively small.

The results of the synthesis of CSM klystron at a frequency of 800 MHz and a pulse power of 20 MW are shown in Figure 9. The numbers of working harmonics of the cavities are signed above the diagram of phase trajectories.

**Figure 9.** Result of syntheses of CSM klystron by embedding procedure.

Another high-efficiency bunching mode is possible, which combines the characteristics of COM and CSM modes and makes it possible to realize an efficiency of about 90%. This mode is called COM2. It is implemented only when an additional second harmonic signal is applied to the input. The klystron buncher in this case contains either two-frequency cavities operating on both the first and second harmonics, or alternating cavities of the first and second harmonics. The length of the LBL is less than that of COM klystrons, but longer than that of CSM klystrons. COM2 klystrons are described in more detail in [24].

Some synthesized COM and CSM klystrons are shown in Figure 9, where the numbering of the prototypes from [24] are used. As can be seen from the *K-*ν diagram, high efficiency values are obtained for klystrons with a beam perveance from 0.2 to 0.4. The synthesized klystrons do not correspond to the empirical dependence [38] of the maximum efficiency on the perveance, shown by the inclined straight line in the right part of Figure 10.

**Figure 10.** COM and CSM klystrons in *K-*ν (right) and *K-*η diagrams.

COM and CSM klystrons with an efficiency of about 90% can be considered as promising sources of microwave power for industrial applications and for microwave energetics.

#### **6. Resotrode with 0-Regeneration**

In devices with emission modulation (IOTs), it is possible to realize sufficiently high efficiency values by reducing the length of the clot (cutoff angle). This reduces both the output power and the gain.

A new class of devices called resotrodes can achieve high efficiency with high output power and a high gain value [13]. These devices are sources of RF/microwave power for the upper RF range (20–200 MHz), for the lower microwave range (300–1000 MHz) and for the boundaries of RF/microwave bands (200–300 MHz).

In devices with emission modulation, the electronic gun and input cavity are combined within the same design.

There are three possible variants of this combination, as shown in Figure 11.

**Figure 11.** Types of near-cathode parts of O-type resonant microwave devices with emission modulation: (**a**) single-beam grid design, (**b**) multi-beam (MB) grid design, (**c**) multi-beam grid-free design.

Classical IOTs use variant (a), multi-beam IOTs use variant (b).

One of the disadvantages of IOTs is the low gain, which does not usually exceed 20 dB. Another disadvantage is the need to place the control grid very close to the cathode to obtain even this gain, which significantly complicates the technology and reduces the reliability of the device. This problem is particularly significant in multi-beam devices (MB IOTs), since it is necessary to achieve the same cathode-grid distance for all cathodes at a distance of 0.2 mm or less. Moreover, this same small distance must be implemented in the "hot mode", considering the thermal expansion.

Most commercially produced IOT devices have a single-beam grid design (Figure 11a). The first multi-beam grid structures appeared in the late 1990s (Figure 11b).

The constructions in Figure 11c, as far as is known, were not used for the development of industrial designs of IOTs. This is because the ratio of the absolute values of the grid voltage to anode in this design cannot be less than an amount equal to about 1/4 to obtain a sufficiently high efficiency. Thus, the ratio of the amplitude of the input signal to the amplitude of the output signal must also be about 1/4, which leads to a very low gain (5–6 dB).

For relatively low frequencies (20–200 MHz), a new device with emission modulation has been proposed, corresponding to the scheme Figure 11c and using energy regeneration in the input circuit. This device is called a resotrode (names "resotrod" and "rezotrod" [13] have also been used in publications).

An essential feature of the resotrode that allows the disadvantages of IOTs to be overcome is the ability to implement a high gain at a relatively high value of the voltage amplitude in the input gap (in the cathode-grid gap).

This possibility appears because the control electrode–anode gap is the second gap of the two-gap input cavity, which operates on antiphase-type vibrations. However, the degree of compensation (regeneration) in such a design depends significantly on the angle of passage of the cathode–anode: the higher the angle of passage, the smaller the regeneration. At frequencies above 200 MHz, the compensation becomes incomplete and quickly drops as the frequency increases.

The general scheme of a resotrode with 0-regeneration (assuming that the angle of passage through a pair of input gaps, including the regenerating gap, is close to zero) is shown in Figure 12.

**Figure 12.** Scheme of a resotrode with 0-regeneration.

The anode-collector gap is the gap of the output cavity. The control electrode is current-less, because the instantaneous dynamic value of the potential on it is always negative (c- or d-lamp operation mode).

Thus, the current flows only in the cathode-collector circuit and, therefore, only this circuit should be designed for high power. It should be noted that when working in amplification mode at time intervals when the signal is absent, the cathode is locked, and the current does not flow to the collector, which ensures the efficiency of the device and reduces the thermal load on the collector.

In a general sense we will use "a resotrode" to refer to any devices with emission modulation that has full or excessive electronic load compensation in the input circuit.

Thus, name "resotrode" can apply not only to the construction in Figure 11c, but also to the type in Figure 11b with the appropriate compensation.

#### **7. Evaluation of the E**ffi**ciency and Gain of a Resotrode**

Unlike the klystron output parameters, a resotrode can be quantified with relatively simple analytical relations. Further refinement of these estimates is possible within the framework of computer 2D/3D modeling only.

We assume that the angles of passage of the electrons through all the gaps are small Θ*<sup>d</sup>* 1, and that the electronic load operates at full compensation, in which the amplitudes of the microwave voltage in the first and second gaps of the input cavities are the same. In this case, the electron velocities will correspond to the static potential of the anode after entering the anode hole (i.e., the bunch will be strictly monospeed).

As shown by klystron studies, a narrow monospeed bunch can be slowed down to zero speed, and, if the gap is narrow, full braking is provided by zero detuning and a microwave voltage amplitude equal to the accelerating voltage.

The instantaneous voltage at the control electrode must be negative, which implies a condition of *U*<sup>1</sup> ≤ *Ug*, where *U*<sup>1</sup> is the amplitude of the microwave voltage in the input gap, and *Ug* is the absolute value of the constant voltage at the control electrode (offset voltage) relative to the cathode. On the other hand, a decrease in the microwave amplitude at a given offset leads to a sharp decrease in the current and, consequently, the power of the device. It follows that the next equality is optimal:

$$
\mathcal{U}\_1 = \mathcal{U}\_\mathcal{\mathcal{S}} \tag{10}
$$

In the output gap at small angles of flight, the amplitude of the microwave voltage *U*<sup>2</sup> at zero detuning cannot be made greater than the accelerated voltage *U*0, because this will lead to electron reflections. On the other hand, if *U*<sup>2</sup> < *U*0, then the bunch will not be completely stopped (i.e., under this condition, the efficiency will not be maximum). This leads to the conclusion that for zero detuning, equality *U*<sup>2</sup> = *U*<sup>0</sup> is optimal. Next, we assume that the output cavity detuning is zero.

The input cell of resotrode (Figure 12) is a three-electrode electron gun, which is characterized by the cutoff potential *Ush* (i.e., the value of the control electrode negative potential −*Ush* at which the anode current ceases to go). The relative value of the cutoff potential is determined by the shape of the electrodes and the relative distances between them. In the grid-free constructions (Figure 11c), the value *Ush* ∗∗ = *Ush <sup>U</sup>*<sup>0</sup> is bound from below by the values 0.12–0.15. Next, we assume that the value *Ush* ∗∗ is set.

Under Equation (10), the instantaneous voltage at the gap between the cathode and the control electrode has the form *uin t* ∗∗ = *Ug*· <sup>1</sup> <sup>−</sup> cos *t* ∗∗, and the width of the bunch is determined by the ratio

$$k\_{\rm sh} = \frac{\mathcal{U}\_{\mathcal{S}}}{\mathcal{U}\_{\rm sh}} \tag{11}$$

which will then be called the cutoff coefficient.

From the bottom of Figure 13, it follows that the normalized width τ*<sup>b</sup>* ∗∗ of the bunch is related to the cutoff coefficient *ksh* by the ratio

> τ*b* ∗∗ <sup>=</sup> 2arccos <sup>1</sup> <sup>−</sup> <sup>1</sup> *ksh* (12)

**Figure 13.** Schemes of formation of the bunch in the entry gap and its inhibition in the output gap of the resotrode.

Let us consider the deceleration of a single-speed bunch of width (Equation (12)) by the field of an infinitely thin gap of the output cavity with zero detuning. When such a bunch passes through the gap, the instantaneous value of the microwave voltage changes from the amplitude value *U*<sup>0</sup> corresponding to the center of the bunch, to some value *Ub* corresponding to the edges of the bunch (Figure 13, upper).

The electronic efficiency can be calculated as the ratio of the interaction power to the beam power,

$$\eta\_{\epsilon} = \frac{1}{\mathcal{U}\_0 I\_0} \cdot \frac{1}{\tau\_{\stackrel{b}{\rightsquigarrow}}} \cdot \int\_{\stackrel{\tau\_{\stackrel{b}{\rightsquigarrow}}}{\to}}^{\frac{\tau\_{\stackrel{b}{\rightsquigarrow}}}{\to}} \mathcal{U}\_0 \cdot \cos(s) \cdot I(s) ds \tag{13}$$

If we bring the current outside the integral, using the averaging theorem and assuming approximately that this average current is equal to *I*<sup>0</sup> (calculations for various forms of bunches show that the error of this approximation does not exceed 2%), then from Equation (13) we get an

expression for the electronic efficiency of the resotrode η*<sup>e</sup>* = sin τ*b* ∗∗ /2 τ*b* ∗∗ /2 .

Given Equation (12), we finally get

$$\eta\_{\varepsilon} = \frac{\sqrt{2k\_{\text{sl}} - 1}}{k\_{\text{sl}} \cdot \arccos\left(1 - \frac{1}{k\_{\text{sl}}}\right)}\tag{14}$$

The dependencies defined by Equations (12) and (14) are shown in Figure 14. From these dependencies, it follows that to obtain the maximum efficiency values (90% and higher), the cutoff coefficient values *ksh* > 4 are required, which correspond to the value of the normalized width of the bunch τ*<sup>b</sup>* ∗∗ < <sup>π</sup> 2 .

**Figure 14.** Dependences of the normalized bunch width and resotrode efficiency on the cutoff coefficient.

Now let us estimate the gain of the resotrode.

In the mode of full electronic load compensation, the input power is equal to the power of the input cavity's own losses (i.e., *Pin* = *<sup>U</sup>*<sup>1</sup> 2 2·ρ1·*Q*<sup>1</sup> , where ρ<sup>1</sup> is the R/Q factor, and *Q*<sup>1</sup> is the input cavity's own O-factor).

Representing the output power as *Pout* = η*e*·*U*0·*I*0, we find that

$$\frac{P\_{out}}{P\_{in}} = 2\eta\_c \cdot \nu \cdot Q\_1 \cdot \left(\frac{lI\_0}{lI\_\%}\right)^2\tag{15}$$

where ν is the output gap excitation parameter defined by Equation (6). Substituting Equations (11) and (14) into Equation (15) and proceeding to the logarithmic recording of the gain, we finally get

$$K u = 10 \cdot \log \left( \frac{2 \cdot \nu \cdot Q\_1}{l I\_{sh}^{1}} \frac{\sqrt{2k\_{sl} - 1}}{k\_{sl} ^{3} \cdot \arccos \left( 1 - \frac{1}{k\_{sl}} \right)} \right) \tag{16}$$

Since the resotrode is essentially a high-perveance device, the characteristic values of the excitation parameter ν are 0.1 and higher. The own *Q*-factor can be obtained from 2000 to 5000.

Let's compare the resulting formula with the gain formula for the classical IOT design, which can be written as

$$K u = 10 \cdot \log \left( \frac{1}{\underbrace{\mathcal{U}\_{\rm sl} \cdot k\_{\rm sl}}\_{\ast}} \frac{1}{1 + \frac{k\_{\rm sl} \cdot \mathcal{U}\_{\rm sl}}{\ast^\*}} \right) \tag{17}$$

or, for large values of its own *Q*-factor, in the form

$$K\mu = 10 \cdot \log\left(\frac{lI\_0}{lI\_\odot}\right) \tag{18}$$

Graphs of the gain dependence for resotrode with parameters *Ush* = 0.15, *Q*<sup>1</sup> = 2000, ν = 0.1

∗∗ and for IOTs with the same values *Q*<sup>1</sup> and ν are shown in Figure 15. As you can see from the graphs, the resotrode has a gain of about 40 dB at *ksh* = 1 (a bunch of duration π), the gain decreases as *ksh* increases, but even at *ksh* = 5 (a bunch of duration 0.4 π) the gain remains more than 25 dB. In a grid-less IOT (lower graph), the gain is equal to 9 dB if *ksh* = 1, but falls to almost zero if *ksh* = 5. Only when using a very fine-grained grid (*Ush* ∗∗ = 0.001) can the gain exceed 20 dB.

**Figure 15.** Dependences of the gain on the cutoff factor for resotrodes and IOTs.

The dependence of electronic efficiency of a resotrode on gain is shown in Figure 16, from which it follows that at a gain of 30 dB in a resotrode you can achieve an electronic efficiency of about 90%.

It should be noted that when the cutoff increases due to a decrease in the pulse duration, the output power of the device also decreases. In Figure 16 the output power is normalized to the output power of *ksh* = 1 (i.e., when the offset voltage of the gain is equal to the cutoff voltage).

From the graphs, we can conclude that the optimal choice is a gain of 25–30 dB, which corresponds to a cutoff coefficient from 3 to 5.

In general, the obtained analytical estimates suggest that the combination of high efficiency with high gain is possible in resonant microwave devices with emission modulation.

**Figure 16.** The dependence of the maximum electronic efficiency of resotrodes and normalized output power <sup>∗</sup> *Pout* <sup>=</sup> *Pout*(*ksh*) *Pout*(1) on the gain.

#### **8. Possible Designs and Applications of Resotrodes with 0-Regeneration**

Resotrodes are modular devices that can be represented as a set of functional cells.

One of the first studied designs of a resotrode for a range of 27–40 MHz included flat cells with tape cathodes. These cells can form various devices that differ in the number of cells and their layouts within a single design.

Research has been conducted to determine the parameters of the optimal functional electron-optical cell of a device, including the sizes and shapes of the electrodes, and their potentials.

Cell synthesis was performed using an electron-optical code according to the "gun-anti-gun" scheme (Figure 17). First, a gun was synthesized that provides a laminar uniform beam in the center of the anode hole (Figure 17a). Next, a "mirror" system was synthesized containing a virtual supposed "issue" of the beam from the collector surface. This beam was gradually transformed into a uniform laminar beam in the process of moving to the center of the anode hole (Figure 17B), where these two beams were sewed. The functional cell synthesized as a result of calculations (Figure 17c) consisted of a linear cathode 0.8 mm thick and 50 mm long with a 1.3-mm radius of the curvature of the emitting surface, a control electrode protruding 0.3 mm above the cathode surface, an anode with a narrow (0.7 mm) input groove and an external part expanding (up to 3 mm) to the collector and a collector with an anti-dynatron grid.

**Figure 17.** Synthesis of a functional resotrode cell according to the "gun-anti-gun" scheme: (**a**) cathode-anode part (gun), (**b**) collector-output part (anti-gun), (**c**) full synthesized cell.

The synthesized cell was quite versatile and, depending on the basic requirements for the device, provided a possibility for implementing various electrostatic potentials on the electrodes and various values of RF voltage amplitudes. For example, by setting the potential of the control electrode at −300 V, the anode potential at +1.2 kV and the collector potential at +300 V, an HF power source of 1–2 kW can be created at a frequency of 27 or 40 MHz and powered directly from the supply mains (without a power transformer) based on the 12 cells considered. Such a source would be very useful

for creating high-volume household RF stoves of 60–80 L or more with uniform volumetric heating. Having realized the potentials of 1 kV on the control electrode and +4 kV on the anode and the collector, it is possible to create a power source of 100 kW for industrial heating installations on the basis of the same 12 cells, as well as a continuous power source of 1 MW on the basis of 120 cells (when the voltage increases, the number of cells decreases by the law of degree 5/2).

Based on the synthesized optimal cell, the design of the vacuum part of the resotrode was developed, containing 12 azimuth-oriented functional cells. The control electrode was a molybdenum cylinder with slotted longitudinal grooves, the anode was made of a copper cylinder and the collector had a special prefabricated anti-dynatron structure that absorbed almost all secondary electrons [24]. As a result of the research, the following patterns were established:


In addition to the flat design, a resotrode design with cylindrical beams has also been studied. This design uses magnetics, and focuses on permanent magnets.

The design of such a device with the possibility of wide-range frequency tuning from 20 to 150 MHz has been considered, for which the general design is described in [24].

As a result of numerical simulation, V. N. Kozlov [24] showed that a functional cell with a convex cathode is optimal for such a device, since it reduces the locking potential (Figure 18).

**Figure 18.** Trajectories of particles in the input part of the resotrode (**a**) with a concave cathode, (**b**) with a convex cathode.

Currently, there are no powerful and efficient microwave power sources in the 50–150 MHz range. Traditional microwave devices—magnetrons, klystrons, and TWTs—would be too large in this range, and standard IOTs would have too small a gain. However, this range is the most attractive for solving problems of uniform volumetric heating in industrial processes (e.g., oil distillation, production of building materials, water desalination, etc.).

For effective uniform volumetric heating, it is necessary that the size of the object being heated be somewhat (but not much) less than half the wavelength. Shorter waves inevitably lead to the appearance of zeros in the spatial distribution of the microwave field and, consequently, to uneven heating. Longer waves lead to the need to increase the amplitude of the RF (microwave) field strength and, consequently, to the occurrence of breakouts. The characteristic size of objects that need to be heated during technological processes is about a meter, requiring the wavelength of the microwave field for effective heating to be 2–3 m in order to achieve the frequency range mentioned above. This uniform volumetric heating can be used, for example, for the manufacture of new environmentally friendly materials (foam glass, artificial Sandstone, etc.), for deep oil refining, and for other applications. A design with a direct connection of the resotrode collector to the capacitive electrode of the heating chamber is possible. In this case, heating will be produced not only by the microwave field, but also by the heat of the heated collector (i.e., double microwave and convection heating will be provided).

#### **9. Resotrode with 2**π **-Regeneration**

Based on the given design, low-frequency resotrode can be moved it to a higher-frequency region using scaling, but this procedure will lead to a significant decrease in output power.

Indeed, if all dimensions are reduced in proportion to the wavelength, the emitting area will decrease in proportion to the square of the wavelength, and the total current and power of the device will decrease accordingly. In addition, correct scaling requires the saving value of the beam's perveance, which leads to an additional decrease in voltage and, consequently, in power.

Attempts to compensate for decreases in input power by increasing the voltage while maintaining the distance will lead to an increase in the current density from the cathode (and therefore to a decrease in the durability of the device) and to a risk of electrical breakdowns in the area of the input cavity.

An increase in the emission area without changing the voltage can lead to a decrease in the R/Q factor of the cavity, and will runs into restrictions on the transverse dimensions of the capacitive part of the cavity of the permissible fraction of the wavelength.

If you increase the voltage while increasing the gaps, then the 0-regeneration condition is violated, which requires that the angle of passage from the cathode to the anode be close to zero.

The relationship between the frequency and the maximum output power of the resotrode with a 0-regeneration is determined by a scaling formula, which implies that using such a device at frequencies less than 200 MHz is impractical.

Thus, to create a powerful microwave source at a frequency greater than 200 MHz, the ideas underlying the resotrode must be corrected.

For operation in the area of hundreds of MHz, a different device scheme was proposed [15], as shown in Figure 19.

This device is called a resotrode with 2π-regeneration, because the optimal angle of passage between the main (accelerating) and regenerating (braking) gaps of the input cavity become 2π instead of 0. The input cavity turns out to be a three-gap cavity, in which the first gap is the main one, and the third one is the regenerating one. The microwave voltage in the second gap is common to the voltage in the first gap, but its amplitude must be small (the gap is "off"). This shutdown is provided by an additional large capacity that provides the microwave closure of this gap to the central bushing.

Between the second and third gaps there is a drift distance enough for the location of at least one additional bunching cavity. When using a two-gap bunching cavity or higher harmonics cavities (i.e., second and third harmonics), two or even three bunching gaps can be placed here. This makes it possible to significantly improve the quality of the bunch, reduce its length and ensure the speed distribution corresponds to the converging bunch [24].

The idea of additional bunching in devices with emission modulation was previously proposed and successfully implemented by V. A. Tsarev [22,23], A. D. Sushkov and V. K. Fedyaev [39] in devices called a tristrod and a tryston, respectively.

Note the main features of the resotrode with 2π-regeneration.

The distance between the first (forming) and third (regenerating) gaps of the input cavity does not depend on the "cathode–anode" distance, so the beam perveance is not related to the pass angle (i.e., the electron-optical parameters (current and voltage) and the pass angle that provide regeneration do not depend on each other). This is the main feature of the design, which makes it possible to work in the high-frequency area.

**Figure 19.** Scheme of a multi-beam resotrode with 2π-regeneration. The blue arrows indicate the electric microwave field, and their thickness illustrates the size of the field.

On the other hand, in contrast to a resotrode with 0-regeneration, the bunch at the entrance to the anode hole is not monospeed, since its speed is modulated by the input signal. This circumstance does not allow such a device to be implemented in a completely grid-free version (Figure 11c), because in this case, even with a minimum offset voltage equal to the cutoff voltage, the speed modulation would be too large and would lead to a scattering of the bunch when it moves to the regenerating gap.

Thus, for a resotrode with 2π-regeneration, it is necessary to use the scheme of Figure 11b. However, in contrast to the classic MB IOTs, the grid can be much more large-scaled and, for flat cathodes, a set of monocrystalline graphite "whiskers" laid in parallel in a row on the control electrode can be used as such a grid. This design of the grid (common to all beams) will significantly simplify the technology and avoid misaligned deformations due to thermal care.

The size of the grid cells is determined by the allowable value of the input signal amplitude, which in resotrodes, unlike IOTs, is not the main factor determining the gain.

The estimates of the efficiency and gain of a resotrode with 0-regeneration made in Section 7 require adjustments to apply them to a resotrode with 2π-regeneration.

In a resotrode with 2π-regeneration, the width of the bunch during its formation and at the entrance to the regenerating gap will differ due to additional bunching. The result of this bunching can be estimated based on the results obtained for klystrons. Under the condition that the width of the bunch, in accordance with Equation (12) and Figure 14, becomes smaller (i.e., the bunch, according to the terminology used for klystrons, becomes fully saturated (FS)). Further bunching of the FS-bunch is primarily related to the transformation of its velocity function. It is necessary to acquire the bunch that comes together and ensures the most effective braking in the exit gap. In this case, the output cavity itself must be detuned to the left. In the klystron, the necessary transformation of the velocity function is performed in the gap of last cavity of buncher, which is located close enough to the output gap. However, the bunch comes into this gap close to the monospeed. In a resotrode with 2π-regeneration, the original velocity function has a different form: the center of the bunch has a maximum velocity that falls to the edges (Figure 20). Further transformation of such a bunch in the drift without additional impact would lead to additional bunching of the front part of the bunch and to the unbunching and lagging of its rear part (i.e., to the formation of a kind of "comet").

**Figure 20.** Scheme for correcting the shape of the velocity function by a resonator of the third harmonic with zero detuning: (1) original speed function, (2) microwave voltage in the resonator of the third harmonic, (3) corrected speed function.

To prevent this, it is necessary to correct the shape of the velocity function (e.g., by using a cavity of the second or third harmonic tuned to the proper resonance (Figure 20)). Thus, the buncher must contain at least one higher harmonic cavity and one main harmonic cavity. If the bunch width is small, the main harmonic cavity can be replaced with a second harmonic cavity that is detuned to the right (Figure 21).

**Figure 21.** Scheme for bunching of an FS-bunch with a second harmonic cavity detuned to the right: (1) monospeed bunch, (2) microwave voltage in the cavity of the second harmonic.

Thus, in a resotrode with 2π-regeneration, a converging FS-bunch can be formed, which allows to an efficiency of 90% or more to be obtained.

As for the gain factor, the rating (Equation (15)) applies for it in the mode of full electronic load compensation. Since the ratio is at least 10 in the grid design, the gain factor should be at least 30 dB even for small gain factors.

For the currently active large hadron collider (LHC), the frequency of 400 MHz is the main frequency, and it is assumed that this frequency will also be used for the FCC (future circular collider) and CLIC (compact linear collider). Currently, klystrons are used as microwave sources for the LHC. Resotrodes with 2π-regeneration have significant advantages over such a klystron: lower supply voltage, higher efficiency and significantly smaller dimensions. You can lower the voltage of a klystron by switching to a multi-beam design. As shown in Section 5, this can significantly increase efficiency, as seen with the COM and CSM klystrons. But the size (and, consequently, the weight and cost) of the klystron cannot be significantly reduced.

A resotrode with 2π–regeneration at the frequency of 400 MHz seems to be a much more promising microwave source than the klystron.

Currently, the first preliminary stage of designing a resotrode with 2π-regeneration at a frequency of 400 MHz and an output power of 300 kV in a continuous mode has been carried out.

Dr. Igor Syratchev [16] at CERN conducted a complete electrodynamic simulation of such an instrument using HFSS code and built a 3D model of it. The results of this work are shown in Figure 22.

**Figure 22.** Electrodynamic model of a resotrode (using HFSS code).

The general electrodynamic design of the device (Figure 22) includes an three-phase input cavity that provides bunch formation and energy regeneration, a buncher, an output cavity with a coaxial output of energy through a capacitive connection, as well as blocking filters in the cathode circuit and in the control electrode circuit.

The input cavity has four inductive rods that provide a working antiphase (relative to the third gap to the first two) type of oscillation.

The buncher is modeled with two possible variants, both of which have a two-gap cavity of the main frequency and cavities of harmonics. In the first case, an antiphase type of oscillation is excited in the two-gap cavity, and the gaps are located at a distance of π from each other, which makes it possible to implement bunching phases of the microwave voltage in both gaps. In the second case, the cavity of the third harmonic is tuned to a resonance that corrects the velocity function in accordance with

Figure 20, while the cavity of the second harmonic forms a converging bunch in accordance with Figure 21.

Important electrodynamic elements of the device are the resonant choke filters developed by I. Syratchev [16], which ensure the absence of microwave radiation through the power supply chain on the working-type vibrations, simultaneously suppressing parasitic types of vibrations. Note that the filter in the control electrode circuit is also a communication element for the input signal, the suppression of parasitic vibrations is provided by their low loaded *Q*-factor and grid-less and gridded designs of the resotrode were both considered.

A grid-less design, as already noted, requires a microwave amplitude that is too large for the input gap.

As shown by preliminary calculations, an amplitude of at least 30% of the accelerating voltage requires to ensure the duration of the bunch π/2, which is an unacceptably high value. However, even with this amplitude, a gain of at least 25 dB is obtained, although a preferable option is a sparse grid that provides a gain of more than 35 dB in full-compensation mode. Note that if you switch from the full compensation mode to the excess compensation mode, making the amplitude of the microwave voltage in the regenerating gap greater than the total amplitude in the first gap, you can increase the gain indefinitely until you switch to the generation mode.

The question of the possibility of using a resotrode in the generation mode remains open until studies of the stability of such a mode are conducted.

A general 3D model of a resotrode with 2π-regeneration at 400 MHz and 300 kW in a continuous mode was made by I. Syrachev [16], as shown in Figure 23.

**Figure 23.** A 3D model of a resotrode with 2π -regeneration.

This device is described in more detail in [16,24]. Although this device was modeled as a microwave source for the LHC, it can also be used for other purposes: for defrosters, for wood drying devices and for other industrial technologies, as well as for advanced microwave energetic devices.

#### **10. Conclusions**

In this study, new types of resonant microwave O-type samples are considered including COM klystrons, CSM klystrons and resotrodes, which can combine high output power with an efficiency of up to 90%.

The modeling and research of COM and CSM klystrons has been made possible by the development of mathematical modeling and optimization methods; in particular, by the development of a discrete-analytical klystron model, the macro-steps method for global multiparametric optimization, the stage-by-stage method and the embedding procedure for full structural optimization.

It was shown that the synthesized optimal variants of high-performance multipath klystrons can be transformed into equivalent devices corresponding to other values of output power, voltage and number of beams without performing a new optimization, thanks to the GSP-parameter conversion procedure.

The results of the synthesis of COM and CSM klystrons are presented. It is shown that for various prototypes, klystrons were created with an efficiency of about 90%. COM and CSM klystrons can be considered as promising sources of microwave energy for industrial microwave technologies and microwave energy.

COM klystrons are characterized by an increased ratio of the length of the device to the wavelength, so they are optimal microwave sources for the relatively short-wave part of the microwave range from 3 to 10 GHz.

In CSM klystrons, the relative length is fixed and equal to the length of the optimal three-cavity COM klystron. Therefore, CSM klystrons are optimal sources of microwave power for a range from hundreds of MHz to 3 GHz.

For devices with emission modulation, it is shown that it is possible to achieve a combination of high efficiency with high gain within the design of a new microwave device called a resotrode.

Two types of resotrodes are considered: one with 0-regeneration for frequencies of 20–200 MHz and one with 2π-regeneration for frequencies of 200–1000 MHz.

Possible designs of resotrodes with 0-regeneration based on a universal functional cell are shown. Systems of uniform volumetric RF/microwave heating of large-sized objects based on resotrodes with 0-regeneration are proposed.

The design of a resotrode with 2π-regeneration at 400 MHz with an output power of 300 kW in continuous mode is proposed.

All these devices can be used to power modern colliders, for industrial microwave heating installations for various purposes and for advanced wireless power transmission devices.

Currently, leading klystron manufacturers are trying to implement the ideas of COM and CSM bunching by modifying their existing klystron designs. This leads to an increase in efficiency of several percent. The maximum efficiency values of about 90% can only be realized based on a new design initially calculated for the appropriate bunching mode. The development of such a new design is very expensive, so COM and CSM klystrons that fully correspond to the theory have not yet been created. The same problems apply to resotrodes. Their development requires not only the creation of a new design, but also a thorough study of the regenerative amplification mode with a possible transition to the generator mode. This a fundamental experimental investigation which is still waiting in the wings.

**Author Contributions:** Conceptualization and methodology, A.B.; software, validation, formal analysis, investigation, computer modelling and simulation, review and editing, A.B. and O.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Hybrid Machine Learning Models for Classifying Power Quality Disturbances: A Comparative Study**

#### **Juan Carlos Bravo-Rodríguez \*, Francisco J. Torres and María D. Borrás**

Escuela Politécnica Superior, Universidad de Sevilla, c/ Virgen de África 9, 41011 Sevilla, Spain; fjaviertorres92@gmail.com (F.J.T.); borras@us.es (M.D.B.)

**\*** Correspondence: carlos\_bravo@us.es; Tel.: +34-954-552-847

Received: 10 March 2020; Accepted: 25 May 2020; Published: 1 June 2020

**Abstract:** The economic impact associated with power quality (PQ) problems in electrical systems is increasing, so PQ improvement research becomes a key task. In this paper, a Stockwell transform (ST)-based hybrid machine learning approach was used for the recognition and classification of power quality disturbances (PQDs). The ST of the PQDs was used to extract significant waveform features which constitute the input vectors for different machine learning approaches, including the K-nearest neighbors' algorithm (K-NN), decision tree (DT), and support vector machine (SVM) used for classifying the PQDs. The procedure was optimized by using the genetic algorithm (GA) and the competitive swarm optimization algorithm (CSO). To test the proposed methodology, synthetic PQD waveforms were generated. Typical single disturbances for the voltage signal, as well as complex disturbances resulting from possible combinations of them, were considered. Furthermore, different levels of white Gaussian noise were added to the PQD waveforms while maintaining the desired accuracy level of the proposed classification methods. Finally, all the hybrid classification proposals were evaluated and the best one was compared with some others present in the literature. The proposed ST-based CSO-SVM method provides good results in terms of classification accuracy and noise immunity.

**Keywords:** power quality disturbances; classification; feature selection; swarm optimization; support vector machine; genetic algorithm; K-NN algorithm; decision tree; S-transform

#### **1. Introduction**

Power quality (PQ) is essential for electrical systems to operate properly with the minimum possible deterioration of performance. Emerging PQ challenges, such as the growing integration of large power plants based on renewable sources, improvements in nonlinear loads, and the recent requirements of smart grids, must be considered to obtain an optimal operation of the existing power grid. These factors increasingly require constant revisions of the common power quality problems, enhanced standards, further optimization of control systems, and more powerful capabilities for measuring instruments.

The purpose of this research was to contribute to this task, meeting the particular PQ requirements about detection and classification of power quality disturbances (PQDs) through optimal hybrid machine learning approaches.

Usually, the PQDs' identification procedure is carried out in three steps, i.e., signal analysis, feature selection, and classification. In the stage of PQDs' analysis, some advanced mathematical techniques were used to extract the feature eigenvectors that enable disturbance identification. The time-frequency analysis methods include short-time Fourier transform (STFT), Stockwell transform (ST) [1], wavelet transform (WT) [2–4], Hilbert–Huang transform [5,6], Kalman filter [7,8], strong trace filter (STF) [9], sparse signal decomposition (SSD) [10], Gabor–Wigner transform [11], and empirical

mode decomposition (EMD) [12,13]. In this study, ST was selected due mainly to its noise immunity, simplicity of implementation, and flexible and controllable—to some extent—time-frequency resolution. These advantages far outweigh the computational cost that may be required.

The selection of suitable feature remains a key challenge that requires developing tools in areas such as statistical analysis, machine learning, or data mining [14]. Valuable efforts have been made in this sense and some techniques are used for a precise selection of features including the principal component analysis [15], K-means-based apriori algorithm [16], classification and regression tree algorithm [17], multi-label extreme learning machine [18], random forest model [19], sequential forward selection [20], and bionic algorithms. This latter group has also been successfully used in classification rule discovery. Particularly significant among bionic algorithms are genetic algorithms (GA) [20–22] and swarm-based approaches like ant colonies [23,24] and, above all, particle swarm optimizers (PSO) [25–28]. For example, recently in [25], a combination of PSO and support vector machine (PSO-SVM) was used to optimize the error of the classifier by selecting the best feature combination. Similarly, in [26], PSO optimizes the noise cut-off threshold of PQD signals working together with a modified ST in the feature extraction stage. However, canonical PSO has some limitations for feature selection. Improved and implemented PSO variants include competitive swarm optimizer [29–31] (CSO), discrete particle swarm optimizer [32], and exponential inertia weight particle swarm optimizer [33]. It should be noted that, as PSO was firstly designed for continuous optimization problems, this may not always be the most appropriate method to solve a combinatorial optimization problem such as feature selection. The CSO algorithm, however, is specifically adapted to perform this type of problem with each particle learning from a pair of randomly selected competitors to elevate both global and local search abilities. In this paper, GA and CSO were selected and compared to minimize the number of selected features.

Regarding the disturbance pattern recognition capability and according to the optimal selection of features provided by the above-mentioned algorithms, numerous machine learning approaches have been widely utilized for classifying power quality disturbances. Common classification techniques include artificial neural network (ANN), K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and decision tree (DT) methods.

Support vector machine (SVM) is a good option for classification purposes, especially when dealing with small samples, nonlinearity, or high dimension in pattern recognition [2,16,22,34,35]. Among the advantages of SVM are the lack of local extremum, feature mapping of nonlinear separable data, low space complexity, and the capability to adjust only a reduced number of features as compared to, for example, the ANNs [36]. On the contrary, its disadvantages include limitations resulting from speed and size, in both training and testing, as well as those resulting from an improper choice of the kernel. These handicaps involve, in practical terms, high algorithmic complexity and extensive memory requirements. Improved applications of SVMs algorithms include multiclass SVM [M-SVM] [37], directed acyclic graph SVMs [DAG-SVMs] [38] and radial basis function kernel SVM [RBF-SVM] [39].

For its part, the rule-based DT classifier is a good choice when the features are clearly distinguishable from each other [40,41]. PQDs' classifiers based on DT include fuzzy decision tree [42–44] and the aforementioned classification and regression tree algorithm (CART) [17,21]. On the one hand, DT advantages include removing unnecessary computations, a singular set of parameters which allows differentiating between classes and a smaller number of features at each nonterminal node while maintaining performance at an acceptable level. On the other hand, its principal disadvantages include being strongly dependent on the selected features, accumulation of errors from level to level in a large tree, and overlap—which increases the search time and memory space requirements when the number of classes is large. Compared to other approaches, DT flowchart symbols configure a simple and straightforward model in which the control parameters are easy to understand and apply. Thus, DT is easier to set up and interpret and, despite the mentioned dependence of the classification process on the selected features, its execution of data is better than other methods. For example, in [22], a comparative using DT/SVM, wavelet transform (WT) and ST is shown.

Most of the previous works in the literature are mainly focused on pattern recognition issues, so PQDs are generally treated as single-event signals. However, in electrical systems, it is common to find several disturbances consecutively in the same observation window. These combined disturbances are much more difficult to identify and treat than single ones. In this work, complex PQDs were designed through a consecutive or simultaneous combination of two simple ones in the same interval. From ST, time-frequency features were extracted, while feature selection was optimized by using K-NN, GA, and CSO. In the classification stage, K-NN (again) and distinct types of SVM and DT were considered. There were different proposals of classifiers depending on the optimization-classification sequence chosen. All these proposals operated over the same dataset obtained after optimization. A comparative in terms of classification accuracy and noise immunity of the proposed models was planned. In Figure 1, a general block scheme of the proposed classification plan is presented. The main steps included PQD signal processing via ST, feature extraction, optimal feature selection, and classification. A detailed overview of the proposed comparative study including different hybrid methods can be found in Section 5. The MATLAB (Classification Learner Toolbox) software was used to implement the whole machine learning methods required at both optimization and classification stages.

**Figure 1.** General block scheme of the proposed classification plan.

The rest of this paper is organized as follows: In Section 2, a simplified outline of the extraction of the initial feature set is presented. Section 3 is devoted to the optimal selection of features, describing the optimizers used for this task. Section 4 briefly describes the machine learning methods used to classify. In Section 5, a detailed overview of the proposed classification plan is shown. In Section 6, PQ disturbances' synthesis and the resulting training datasets are detailed. In Section 7, results are discussed. The last section draws conclusions from the results.

#### **2. Initial Feature Set Extraction Based on S-Transform and Statistical Parameters**

Each proposed disturbance signal was generated in a discrete form to compute its S-transform, the detailed description of which is given in the Appendix A. ST was chosen for its inherent noise immunity and acceptable time-frequency resolution.

The resulting complex S-matrix provided valuable time-frequency data on which PQD features were extracted by computing several statistics and figures of merit. In this two-dimensional S-matrix, the signal was split into different frequencies (M = 1280 rows) and distinct samples (N = 2560 columns). This extraction of features had a relevant effect on the accuracy of classification because of its great influence on the overall performance of machine learning approaches.

In a first approximation, the chosen initial feature set should have been enough to guarantee a correct identification of every one of the considered disturbed signals. In this work, the extracted set was formed by nine features (k1–k9) and included the introduced disturbance energy ratio (DER) index as well as some of the well-known statistical parameters, such as maximum, minimum, root mean square and mean values, standard deviation, variance, skewness, and kurtosis. These features were calculated following the equations shown in Table 1.


<sup>1</sup> Ajn, <sup>2</sup> φjn are the absolute value and phase value of the jn-th element in the S-matrix.

All statistical parameters were calculated from both time samples (N = 2560) and frequency (M = 1280) intervals. The skewness parameter was computed based on the phase values of the complex elements in the S-matrix. For the rest of the parameters, calculations were done from the absolute values of such elements.

#### *Disturbance Energy Ratio (DER) Index*

The introduced DER index represents the ratio between the energy of the signal with frequency components greater than 50 Hz and that one whose components are equal to or less than 50 Hz. Thus, the definition of DER parameter includes the terms

$$\text{RMS}\_{\text{>50}} = \sum\_{\text{freq}=\text{50.1 Hz}}^{\text{freq}=\text{6400 Hz}} \text{RMS}\_{\text{\%}} \tag{1}$$

and

$$\text{RMS}\_{50\text{Hz}} = \sum\_{\text{free}=0\text{Hz}}^{\text{free}=50\text{Hz}} \text{RMS}\_{\text{\textdegree}}.\tag{2}$$

This index is very useful for the characterization of PQ disturbances with high-frequency content as, for example, oscillatory transients.

Sample datasets for training/testing consist of single observations, each of which is computed from features, as shown in Table 1.

#### **3. Optimal Feature Selection: GA and CSO**

The main purpose of using an optimizer is to reduce as much as possible the dimension of input feature dataset for the prediction models. Once data have been obtained by S-transform, further analysis is necessary to achieve the optimal feature vector. As seen above, a vector with nine different features was proposed. However, the given feature vector contained attributes whose information was redundant to distinguish the most discriminating features of PQDs. The intraclass compaction could be minimized and the interclass division could be maximized by reducing the number of features. For this purpose, after obtaining the dataset features, it was necessary to select the best optimizer. Wrapper-based techniques are a significant group within feature selection methods that are very accurate and popular and eliminate redundant features by using a learning algorithm with classifier

performance feedback. The two main optimization methods used in this work, namely GA and CSO, belong to this group.

#### *3.1. Genetic Algorithm*

Darwin's theory of evolution, "Survival of the Fittest", inspired the design of genetic algorithms in the 1960s [45]. GA is an adapted heuristic search algorithm [45] that uses optimization methods based on genetics and rules of natural selection. The flowchart that describes the operation of GA is shown in Figure 2.

**Figure 2.** Preprocessing stage using K-nearest neighbor (K-NN) algorithm for the evaluation of genetic algorithms (GA) members.

In GA [46], an optimal feature vector can be represented by a chromosome, which includes the most discriminative features. In turn, chromosomes comprise multiple genes, each one corresponding to a feature. The population is a finite set of chromosomes manipulated by the algorithm in a similar way to the process of natural evolution. In this process, chromosomes are enabled to crossover and to mutate. The crossing of two chromosomes creates two offspring and these two each produce two more, and so on. A genetic mutation in the offspring generates an almost identical copy of the combination of their parents but with some part of the chromosome moved. Generations are the cycles where the optimization process is carried out. Crossover, mutation, and evaluation make it possible to create a set of new chromosomes during each generation. A predefined number of the (best) chromosomes survives to the next cycle of the replica due to the finite size of the population. The population can achieve a fast adaptation despite its limited size, which results in quick optimization of the criterion function (score). The most important step of GA is the crossover, in which exchanges of information among chromosomes are implemented. Once the best individuals are selected, it is necessary to crossover these solutions between themselves. The main purpose of this step is to get a greater differentiation between populations based on new solutions that could be better than the previous ones. A second important step is a mutation, which increases the variableness of the population.

Another key piece is the fitness function. It is necessary to obtain an effective enforcement-oriented version of GA. The fitness function is the procedure or device that is responsible for assessing the quality of each chromosome, specifying which one is the best from the population. Once the fitness function is calculated with each individual of the initial population, the next stage is the so-called selection, in which chromosomes with the best qualities are selected to generate the new evolution of the population using discrimination criteria.

Different GA implementations use specific important parameters to determine the execution and performance of the genetic search. However, some other parameters, including crossover rate, population size, and mutation rate, are usual for all implementations. The probability of taking an eligible pair of chromosomes for crossover is called rate crossover. Conversely, the probability of changing a bit of randomly selected chromosomes is called mutation rate. The crossover rate usually presents high values, close to or equal to 1, while the mutation rate is usually small (1% to 15%).

In the present work, the chromosome consisted of nine genes, each of which represented a feature. As shown in Figure 2, the chromosome is represented as a vector of bits since all the genes could be assigned with either 0 or 1 (0 when the corresponding feature was not selected and 1 when it was). A population of 560 individuals (chromosomes) and 100 iterations (generations) was selected for this problem. The search began initializing the parameters to:


The performance of the classifier must be kept above a certain specified level. For this, the least expensive subset of features must be found. For this purpose, the performance is measured using the error of a classifier. The viability of a subset is ensured when the error rate of the classifier is lower than the so-called feasibility threshold. The goal is to find the smallest subset of features among all feasible ones.

In this case, the identification accuracy of the K-NN algorithm was set as the fitness value of the chromosome. In order to assess the quality of the chromosome through the fitness function (accuracy), the k parameter of the K-NN method was adjusted to 10 and the value of cross-validation was set to 8 folds. The K-NN input dataset was exclusively designed for this validation procedure (see Section 6 for details).

#### *3.2. Competitive Swarm Optimization*

Competitive swarm optimizer [29] is a particular case of particle swarm optimizer (PSO), thus belonging to evolutionary algorithms inspired by flocking and swarming behavior. Swarm methods try to emulate the adaptive strategy, which considers collective intelligence as behavior without any structure of centralized control over individuals. Usually, the overall structure of swarm optimizers includes different algorithms with each handle a specific task. The critical one is the classification rule discovery algorithm, which is, in essence, a standard GA. Thus, a group of individuals (particles) acts and evolves following the principles of natural selection—survival of the fittest. In PSO, the optimal solution for a problem is obtained from the global interactions among particles. In contrast, the CSO method introduces pairwise interactions randomly selected from the swarm (population). Generations succeed one another after each pairwise competition, in which the fitness value of the loser is updated by learning from the winner that goes directly to the swarm of the next generation. CSO has proven to be better than GA in optimization tasks related to feature selection due to its easy-to-use structure, fewer parameters, and simple concept, even though its computational cost is slightly higher. However, as will be shown below in the conclusions, the superiority of CSO over GA is clear from the solution quality, but in terms of success rate, it is not so.

In this work, particles were defined by the feature set (K1...K9) in the same way as chromosomes (individuals) in GA. They also derived from the same dataset (560 individuals) from which particles were randomly selected. Then, the swarm size was set to 100 and the maximal number of generations (iterations) was set as 200.

Following a parallel process to that carried out in the GA optimizer, a K-NN simple identification model was used to check the efficiency of the CSO-based feature selection, in this case with k = 5. Once again, the accuracy of the K-NN identifier was established as the fitness function of the CSO optimizer.

Both types of optimization methods, GA and CSO, reduced the number of features from nine to five, but they were not the same.

As was mentioned above, K-NN was chosen to act as a fast validation tool in the feature optimal selection stage. At this stage, the aim was to reduce features and high accuracy was not as necessary as simplicity, speed, and efficiency. In these aspects, the K-NN was highly competitive. As shown below, this method is going to be used again in the next stage to compare its classification performance with that of other approaches. In the next section, unlike this one, the aim is to achieve the highest possible accuracy in the classification.

#### **4. Classifiers: K-NN, SVMs, and DTs**

Once the optimized set of feature was determined, the next process was the classification of data with these features. In this work, various classification methods were used to find better efficiency and the best behavior with noise signals. These methods included the K-nearest neighbors' algorithm, the support vector machine, and the decision trees.

#### *4.1. K-Nearest Neighbors' Algorithm*

One of the proposed classification approaches used the K-NN classifier to identify both single and complex disturbances. K-NN [47], as a supervised learning algorithm, determines the distance to the nearest neighboring training samples in the feature space in order to classify a new object. This Euclidean distance is stated as follows:

$$D\_{\hat{\jmath}}(\mathbf{x}\_{i\prime}y\_{\hat{\jmath}}) = \sqrt{\sum\_{k=1}^{p} \left(\mathbf{x}\_{i,k} - y\_{j,k}\right)^{2}} \tag{3}$$

where *Dj xi*, *yj* is the Euclidean distance-based relationship between the ith p-dimensional input feature vector *xi* and the jth p-dimensional feature vector *yj* in the training set. A new input vector *xi* is classified by K-NN into the class that allows a minimum of *k* similarities between its members. The parameter *k* of the K-NN method is a user-specific parameter. Often *k* is set to a natural number closer to \$*Ntrsamples* [47], in which *Ntrsamples* is the number of samples in the training dataset. In this work, different K-NN classifiers were fit on the training dataset resulting from values of *k* between 5 and 12. The lowest classification error rate on the validation set permitted selecting the sough-after value of *k*.

Traditional K-NN approach based on Euclidean distance becomes less discriminating as the number of attributes increases. To improve the accuracy of the K-NN method for PQDs classification, a weighted K-NN classification method can be used [48]. The weight factor is often taken to be the reciprocal of the squared distance, ω*<sup>i</sup>* = 1/*D*<sup>2</sup> *j xi*, *yj* . Several schemes can be developed to attempt to calculate the weights of each attribute based on some discriminability criteria in the training set.

#### *4.2. Support Vector Machine*

SVM is a statistical method of machine learning that uses supervised learning [49]. Although this method was originally intended to solve binary problems, its use was easily extended to multiclass classification problems. The major objective of SVM is the minimization of the so-called structural risk by proposing hypotheses to minimize the risk of making mistakes in future classifications. This method finds optimal hyperplanes separating the distinct classes of training dataset in a high-dimensional feature space and, based on this, test data can be classified. The hyperplane is equidistant from the closest samples of each class to achieve a maximum margin on each side of it. Only the training samples of each class that fall right at the border of these margins are considered to define the hyperplane. These samples are called support vectors [50,51].

Next, a rough sketch of SVM is outlined below in an oversight-specific manner. Consider a dataset containing a data pair defined as *xi*, *yj* (*i* = 1, ... , *M*), where *M* is the number of samples, *yi* ∈ {−1, 1}. Based on an n-dimensional vector *w* normal to the hyperplane and a scalar *b*, the issue is to find the minimum value of *w* in the objective equation *<sup>f</sup>*(*x*) <sup>=</sup> % *wT*·*x* + *b* & . The position of the separating hyperplane can be determined based on the values of *w* and *b* that fulfil the constraint *yi*· *wT*·*xi* + *b* ≥ 1. The key parameter *b*/*w* gives the distance from the origin (*x*0, *y*0) to the closest data point along *w*. Furthermore, to deal with the case of the linear inseparable problem, where empirical risk is not zero, a penalty factor *C* and slack variables ξ*<sup>i</sup>* are introduced. The optimal separating hyperplane can be determined by solving the following constrained optimization problem [24,52]:

Minimize

$$\frac{1}{2} \cdot \left\| w \right\|^2 + \mathcal{C} \cdot \sum\_{i=1}^{M} \mathcal{xi}\_i \tag{4}$$

subject to

$$\begin{aligned} y\_i \cdot \left( w^T \cdot \mathbf{x}\_i + b \right) &\geq 1 - \xi\_i \quad for \; i = 1, 2, \dots, M\\ \xi\_i &\geq 0 \text{ for all } i \end{aligned} \tag{5}$$

where ξ*<sup>i</sup>* is the distance between the margin and wrongly located samples *xi*.

Despite SVM being a linear function set, it is possible to solve nonlinear classification problems by using a kernel function. As shown in Figure 3, the mapping translates the classified features onto a high-dimensional space where the linear classification is feasible.

**Figure 3.** Mapping kernel functions: Effect on the separation hyperplane of two-class datasets.

In SVM method, there are different types of specific kernel functions to improve the classifier, including the linear kernel (the easiest to interpret), Gaussian, or radial basis function kernel (RBF), quadratic, cubic, etc. These kernels differ in the complexity of definition and precision in the classification of different classes. In this work, both quadratic and cubic kernel functions were used.

Two approaches that combine multiple binary SVMs were used to address multiclass classification problems: One versus one (OVO) and one versus all (OVA). The OVO approach needs *m*·(*m* − 1)/2 SVM classifiers to distinguish between *m* classes [2]. The classifiers are trained to differentiate the samples of one class from those of another class. Based upon a vote of each SVM, an unknown pattern is classified. Thus, the strategy to accomplish a single class decision follows a majority voting scheme based on *sign yi*· *wT*·*xi* + *b* [52]. The class that wins the most votes is the one predicted for *<sup>x</sup>*. This winning class is directly assigned to the test pattern.

#### *4.3. Decision Tree*

The decision tree is a classification tool, based on decision rules, which uses a binary tree graph to find an unknown relationship between input and output parameters. A typical tree structure is characterized by internal nodes representing test on attributes, branches symbolizing outcomes of the test, and leaf nodes (or terminal nodes) defining class labels. Decisions at a node are taken with the help of rules obtained from data [43,53].

The DT should have as many levels as necessary to classify the input feature data. Depending on the number of levels of this DT, the classification can be more or less accurate, and more or less calculation complex. A key point, in this sense, is the suitable choice of the maximum number of splits. It is well known that high classification accuracy on the training dataset can be achieved through a fine tree with many leaves. However, such a leafy tree usually overfits the model and often reduces its validation accuracy in respect of the proper training accuracy. On the contrary, coarse trees do not reach such a high training accuracy, but they are easier to interpret and can also be more robust in the sense of approaching the accuracy between both training and representative test dataset.

Based upon the foregoing and in order to achieve the required degree of accuracy, in this work, the maximum number of splits was set to 91 and the so-called Gini's diversity index was chosen as the split criterion. At a node, this index was defined as follows

$$GINI\text{ index} = 1 - \sum\_{j} p\_j^2 \tag{6}$$

and it is the probability of class *j* complying with the criteria of the selected node. Gini's diversity index gives an estimation of node impurity since the optimization procedure in tree classifiers tends to nodes with just one class (pure nodes). Thus, a Gini index of 0 is derived from nodes that contain only one class; otherwise, the Gini index is positive. Therefore, the optimal situation for a given dataset is to achieve a Gini index with a value as small as possible.

#### *4.4. Bagged Decision Tree Ensemble*

Ensemble classifier compiles the results of many weak learners and combines them into a single high-quality ensemble model. The quality of that approach depends on the type of algorithm chosen. In this study, the selected bagged tree classifiers were based on Breiman's random forest algorithm [54]. In the bagged method, the original group of data divides into different datasets by random selection with replacement, and then a classification of each one of them is obtained by a decision tree method. The result of each learner is submitted to a voting process and the winner finally sets the best classification model of the bagged DT Ensemble method.

This method permits obtaining lower data variance than a single DT and also gets a reduced over-adjustment. The model can be improved by properly selecting the number of learners. It should be noted that a large number of them can produce high accuracy but also slow down the classification process. In this work, a compromise solution was found by setting the number of learners to 30.

#### **5. Full Comparative Classification of PQDs: Detailed Overview**

A detailed overview of the proposed hybrid classification plan is shown in Figure 4, where the main steps described in previous sections have been highlighted. The first one is the analysis stage, where signal processing of the PQDs was obtained via S-transform. Then, an initial feature set, which was defined by statistical parameters, was extracted. Next, feature vectors were optimized employing both GA and CSO algorithms, including an extra validation provided by K-NN algorithm. The last stage consisted of classification involving the determination of PQ multi-event by using DT (fine tree), bagged decision tree ensemble, weighted K-NN, and both quadratic and cubic SVMs.

**Figure 4.** Comparative flowchart of the proposed disturbance classification.

#### **6. Field Data Synthesis of PQDs**

In this section, PQDs' synthesis and the resulting training datasets are detailed.

#### *6.1. Generation of PQ Disturbances*

A database of 13 PQDs that include both single and multiple events was generated in accordance with both IEEE-1459 and EN-50160 standards [55,56]. The MATLAB R2017a software was used to program a virtual signal generator, called SIGEN, allowing for a customizable setup of the simulated signals required for testing. Thus, SIGEN generated a pattern in each category of events by varying the parameters that were controllable by the user. As a result, both steady-state and transient-state disturbances could be modelled. The graphical user interface of single-phase SIGEN is demonstrated in Figure 5.

The processes of SIGEN were performed to complete the effective generation of electric signals, as follows:


SIGEN was designed following the guidelines described in the IEEE-1159 standard for monitoring electric power quality [57].

**Figure 5.** Graphical interface of signal generator SIGEN.

All the signals had in common the following fundamental specifications: Voltage = 230 V RMS (root mean square), frequency = 50 Hz, duration = 0.2 s, sampling frequency = 12.8 kHz, total cycles = 10, total samples = 2560. Stationary and/or transient disturbances, as well as white Gaussian noise, were added to this fundamental signal. The levels of added noise were characterized by the signal-to-noise (SNR) ratio of 20 dB, 30 dB, 40 dB, and 50 dB. For each noise level and each PQD category, 100 signals were generated through SIGEN by varying all the parameters. This set had a total of 5600 simulated signals, 1400 for each noise level involving all categories (13 types of the PQDs plus one pure sinusoidal signal). These signal sets transformed in the dataset intended to verify the classification performance, as will be seen next.

Another set of signals was needed for the feature selection stage. It included 560 simulated signals (40 signals × 14 PQD categories) with random SNR between 20 dB and 50 dB.

#### *6.2. Training*/*Testing and Validation Datasets*

Once statistical features were computed based on the ST matrix, datasets were created. A first dataset contained 560 × 9 data (samples × features) and it was used as input of the K-NN-based fast validation tool for optimal feature selection (see Sections 3.1 and 3.2).

As concluded in Section 3, the choice of either of the selected optimization algorithms (GA and CSO) allowed obtaining a feature set which comprised only five features. Thus, for each of the four specific noise levels considered, the second type of dataset containing 1400 × 5 data (samples × features) was used to fit the different proposed classification models. This training dataset was partitioned into train/test datasets like 80/20%, respectively. Then, while the test dataset was kept aside, the train set split again into the actual train dataset (80% again) and the validation set (the remaining 20%). Data for each subset was randomly selected. This cross-validation procedure iteratively trained and validated

the models on these sets, avoiding overfitting. In this study, a 10-folds cross-validation method was used to evaluate the performance of the proposed classifiers.

It must be noted that the mentioned training dataset was different to that one used in the previous feature selection stage. This ensured that no observations that were a part of the feature selection task were a part of the classification task.

#### *6.3. PQDs' Classes*

The following 14 classes were tested and labelled as: Harmonic (C1), Flicker (C2), Sags (C3), Sags and harmonic (C4), Interruption (C5), Interruption and harmonic (C6), Notch (C7), Swells (C8), Swells and harmonic (C9), Oscillatory transient (C10), Oscillatory transient and harmonic (C11), Oscillatory transient and sag (C12), Impulsive transient (C13), and pure sinusoidal signal (C14). In Table 2 the simulated PQDs waveforms are depicted.

**Table 2.** Synthetized PQDs' waveforms.

#### **7. Results and Performance Comparison**

Depending on the optimization and pattern recognition approach selected, five hybrid classification methods were considered: CSO&QSVM (quadratic SVM), CSO&WK-NN (weighted K-NN), GA&FTree (fine tree), GA&ETree (ensemble tree), and GA&CSVM (cubic SVM). In Table 3, the classification accuracy on training dataset under different noise conditions of the proposed models are listed.


**Table 3.** Recognition accuracy comparison between proposed methods under noisy conditions.

The best results of classification accuracy were obtained using CSO&QSVM in any noise conditions. The results for 30 dB SNR and above had high classification accuracy for both CSO&QSVM and CSO&WK-NN, and the rest of the models had acceptable values. In high noisy conditions (20 dB), the accuracy was satisfactory only for CSO&QSVM. The values obtained applying the rest of the methods were lower and required the next analysis with separate PQDs for a better interpretation of the results.

In order to evaluate noise immunity under different accuracy requirements, single-class test datasets (each with 25 × 5 data of a unique class) were subjected to the trained classifiers separately. The noise threshold (minimum SNR value) for each class was determined through a trial-and-error method, by applying the principle of keeping SNR value as low as possible while maintaining the targeted classification accuracy. Thus, if the classification accuracy on an initial single-class dataset was, for example, lower than 80%, a new dataset with higher SNR value was generated and tested. The iterative procedure was stopped when the pretended accuracy was achieved. In Table 4, a comparison of minimum SNR values using these methods for different accuracy rates and distinct PQDs classes is presented. In each specific case, the maximum level of noise allowing the pretended accuracy rate (80%, 90%, and 100%) is shown. In the last row, the overall SNR average threshold value of each column is determined. This value was just an estimation of the noise immunity for a pretended accuracy rate in the separate classification process.



#### *Energies* **2020** , *13*, 2761

From Table 4 it can be seen that the GA-based models offered good SNR values for most PQDs, often even better than those resulting from CSO algorithm, especially when dealing with single events. However, none of the suggested GA-based models was capable to classify properly the multiple interruption plus harmonic disturbances. In the same terms of working with five optimal features, both GA&FTree and GA&ETree also misclassified swell plus harmonic events and GA&CSVM could not resolve notches and harmonics disturbances. This lack of identification of a separated PQDs could justify the modest results obtained by GA-based models in Table 3. On the contrary, the proposed CSO-based models achieved good individual noise rates for a separate classification of classes according to different accuracy targets. As a result, both CSO&QSVM and CSO&WK-NN methods presented an overall SNR average threshold under 20 dB, specifically 18.57 dB and 19.54 dB, respectively. When both QSVM and WK-NN classifiers acted on single-class datasets, this slightly lower SNR value in the CSO&QSVM method suggested a better performance with regard to noise immunity. But this behavior in high noisy conditions was not only valid for single-class datasets, but also for the complete dataset. As shown above, in Table 3, CSO&QSVM obtained the best results also with the complete training dataset.

On the other hand, in Table 5, performances of the best-proposed CSO-QSVM method are compared to those of other classification methods already reported in the literature. It can be seen that some methods reported information of accuracy with noise levels (SNR) up to 50 dB [37,58], 40 dB [27], and 30 dB [42]. Others [17,59,60], although reaching 20 dB, had low accuracy. It is remarkable the high impact of noise in the accuracy of the wavelet-based approaches [37,60]. The remaining works presented acceptable accuracy [15,38,39,41,43] but none was capable to deal with 13 PQDs as the actual proposal did. Only [15] achieved similar performances than those of the proposed CSO-QSVM approach. That studied 12 kinds of single and multiple PQDs through the use of improved principal component analysis (IPCA) and 1-dimensional convolution neural network (1-D-CNN) and achieved an overall accuracy of 99.76% and 99.85% for 20 dB and 50 dB, respectively. That classifier needed six features, distinct from the optimal feature set proposed in this work, which was composed of only five of them.


**Table 5.** Performance comparison of different methods.

Table 5 also displays a comparative in terms of feature dimension in each reported approach. The ST-based probabilistic neural network (PNN) approach [59] accepted a set with four features but only dealt with nine PQDs, while PNN based on wavelet [60] was treated with 14 PQDs but with poor

noise immunity as indicated by its accuracy rates. This comparative study shows that the proposed CSO-QSVM model, at last, equaled the better results of classification accuracy obtained in the literature, but using only five features per sample and dealing with 13 PQDs classes. These results, together with the comparison between alternative proposals (Table 3) and the detailed analysis of noise immunity (Table 4), constitute the main contributions of this paper.

Although the present work dealt with simulated signals, the results were so good that they could be extrapolated when applied to experimental data. In such a case, a comparative with those studies based on real signals could be applied properly.

As a future extension, an experimental setup would be used to test the effectiveness of the proposed hybrid methods under common real-time working conditions. Emulated PQ incidence on distribution networks could be modelled by low-cost hardware prototyping and software components.

#### **8. Conclusions**

The motivation for this work stemmed from challenges facing the electrical systems and equipment in determining optimal, cost-effective, and efficient power quality management. In this way, this paper addressed the optimal hybrid classification methods based on machine learning approaches for meeting detection, identification, and classification of simulated PQDs. Specifically, ST was selected for detection and feature extraction of PQDs, and, following the trend nowadays to further optimize the recognition approach, several optimization algorithms were tested for optimal feature selection. At this step, this work underlined the GA and CSO algorithms since they achieved the best results. The resulting optimal feature sets were fed to several classifiers, highlighting among them the QSVM, CSVM, FTree, ETree, and WK-NN approaches for showing improved performance.

The GA optimization algorithm associated with the FTree, ETree, and CSVM approaches could not classify properly all PQDs under the conditions established in this analysis. However, the results obtained through these approaches were very promising and showed the great potential of these kinds of models when dealing with a certain group of PQDs.

Alternatively, CSO-based methods including CSO-QSVM and CSO-WK-NN achieved high classification accuracy under noisy conditions. A thorough comparative assessment in terms of noise immunity and classification accuracy led us to conclude that the proficiency of CSO-QSVM method is slightly better than CSO-WK-NN method.

It can also be noted that the results found seemed to confirm the current trend by which, despite the optimization based on GA algorithms being highlighted by their efficiency, GA-based methodologies are progressively being replaced by the swarm optimization algorithms.

Finally, performances of CSO-QSVM method were compared to those of other classification methods already reported in the literature, concluding that the proposed method achieved a higher degree of efficiency than most of them, and, based on the results, it may work well under high noise background in practical applications.

**Author Contributions:** J.C.B.-R. conceived and developed the idea of this research, designed the whole structure of the comparative study, and wrote the paper; F.J.T. generated the data, performed the simulations, and contributed to the methodology; M.D.B. provided the theoretical background of the proposed methodology and contributed to it. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Universidad de Sevilla (VI Plan Propio de Investigación y Transferencia) under grant 2020/00000596.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

The continuous Stockwell transform (ST) [1] of a signal *x*(*t*) was originally described as

$$S(\tau, f) = e^{i2\pi\pi f\tau} \mathcal{W}(\tau, d) \tag{A1}$$

i.e., a phase factor acting on a continuous wavelet transform (CWT)

$$\mathcal{W}(\tau,d) = \int\_{-\infty}^{+\infty} x(t)\mu(\tau-t,d)dt\tag{A2}$$

where τ is a time displacement factor, *d* is a frequency-scale dilation factor, and μ(*t*, *d*) is a specific mother wavelet that includes the frequency-modulated Gaussian window

$$
\mu(t,d) = \frac{\left|f\right|}{\sqrt{2\pi}} e^{-\frac{r^2f^2}{2}} e^{-i2\pi\pi ft}.\tag{A3}
$$

Instead, in this paper, the *STFT* pathway to address the ST definition was preferred,

$$STFT(\tau, f) = \int\_{-\infty}^{+\infty} x(t)w(\tau - t)e^{-i2\pi\pi ft}dt\tag{A4}$$

From this approach, ST can be written as

$$S(\tau, f) = \int\_{-\infty}^{+\infty} x(t)w(\tau - t, f)e^{-i2\pi\pi ft}dt\tag{A5}$$

where *w*(*t*, *f*) is the Gaussian window function similar to that proposed by Gabor (1946), but now also introducing the aforementioned added frequency dependence

$$w(t,f) = \frac{\left|f\right|}{\sqrt{2\pi}}e^{-\frac{t^2f^2}{2}}\tag{A6}$$

where the inverse of the frequency 1/ *f* represents the window width.

From either of the two viewpoints, the complete ST definition can be written as

$$S(\tau, f) = \int\_{-\infty}^{+\infty} x(t) \frac{\left| f \right|}{\sqrt{2\pi}} e^{\frac{-(t-t)^2 f^2}{2}} e^{-i2\pi ft} dt \tag{A7}$$

and also can be represented in terms of its relationship with FT and related spectrum *X*(*f*) of *x*(*t*)

$$S(\tau, f) = \int\_{-\infty}^{+\infty} X(a+f)e^{\frac{2\pi^2 a^2}{f^2}}e^{i2\pi a\tau}da \quad f \neq 0. \tag{A8}$$

As is well known, the way to recover the original signal from continuous ST is expensive in terms of data storage due to oversampling. This sampled version of the ST permits calculating the widely used ST complex matrix that is obtained as (τ → *jT*, *f* → *n*/*NT*)

$$\begin{cases} S\left[j\tau, \frac{n}{NT}\right] = \sum\_{m=0}^{N-1} X\left[\frac{m+n}{NT}\right] G(m,n) e^{\frac{i2\pi mj}{N}} & , n \neq 0\\ S\left[j\tau, 0\right] = \frac{1}{N} \sum\_{m=0}^{N-1} X\left[\frac{m}{NT}\right] & , n = 0 \end{cases} \tag{A9}$$

where *T* denotes the sampling interval, *N* is the total number of sample points, and both *X m*+*n NT* and *G*(*m*, *n*) result after discrete fast Fourier transform (FFT), respectively, on the PQ disturbance signal *x*(*t*) and the Gaussian window function *w*(*t*, *f*):

$$X[\frac{n}{NT}] = \frac{1}{N} \sum\_{k=0}^{N-1} x[kT]e^{-\frac{j2\pi nk}{N}}\tag{A10}$$

$$G(m,n) = e^{\frac{-2\pi^2 m^2}{n^2}}\tag{A11}$$

where *j*, *k*, *m*, and *n* are integers in the range of 0 to *N* − 1.

The result of discrete ST is a 2D time-frequency matrix that is represented as

$$S(\tau, f) = A(\tau, f)e^{-i\phi(\tau, f)}\tag{A12}$$

where *A*(τ, *f*)is the amplitude andφ(τ, *f*)is the phase. Each column contains the frequency components present in the signal at a particular time. Each row displays the magnitude of a particular frequency with time varying from 0 to *N* − 1 samples.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*
