*Article* **Operational Risk Assessment of Electric-Gas Integrated Energy Systems Considering N-1 Accidents**

**Hua Liu 1, Yong Li 1,\*, Yijia Cao 1, Zilong Zeng 1,\* and Denis Sidorov <sup>2</sup>**


Received: 31 January 2020; Accepted: 28 February 2020; Published: 5 March 2020

**Abstract:** The reliability analysis method and risk assessment model for the traditional single network no longer meet the requirements of the risk analysis of coupled systems. This paper establishes a risk assessment system of electric-gas integrated energy system (EGIES) considering the risk security of components. According to the mathematical model of each component, the EGIES steady state analysis model considering the operation constraints is established to analyze the operation status of each component. Then the EGIES component accident set is established to simulate the accident consequences caused by the failure of each component to EGIES. Furthermore, EGIES risk assessment system is constructed to identify the vulnerability of EGIES components. Finally, the risk assessment of IEEE14-NG15 system is carried out. The simulation results verify the effectiveness of the proposed method.

**Keywords:** integrated energy system; risk assessment; component accident set; vulnerability

#### **1. Introduction**

Power system and natural gas system are strongly coupled systems. In recent years, the application scenarios of energy field research have gradually changed from single energy systems to multi-energy systems [1,2]. The energy sources are mutually coupled, which can allow the stepwise utilization and collaborative optimization of energy sources. And the mutual support between different energy systems also improves the security and stability of each system. However, it makes the operation and control of multi-energy systems more complicated [3,4]. In terms of system failure, it may be caused by the system's own factors, or it may be caused by other subsystems propagating the failure through coupling elements, which makes the security of multi-energy systems also more complicated [5,6]. For example, the output power fluctuation of renewable energy may cause the output fluctuation of gas turbine, which leads to the fluctuation of pipeline flow and node pressure of natural gas system. Interruption of gas source or sudden drop of air pressure in natural gas system may cause shutdown of gas turbine in power system, which will force other generators to increase output and cause transmission plugs, further affecting the safe and stable operation of power system. In order to ensure secure and stable operation of EGIES, it is very important for operators to quickly and accurately evaluate the operational risk of the system.

At present, research on integrated energy systems has focused on energy flow analysis [7,8], optimized operation [9–11], and collaborative planning [12]. Most of the researches on system risk assessment have stayed in a single energy system, and there are few studies on risk assessment of multi-energy systems. The research on integrated energy system risk assessment is in its infancy, and the research results of this research direction are currently mainly focused on the reliability assessment of integrated energy systems. In [13], the influence of electricity-gas coupling on the operation status of the integrated energy system was studied, but the impact of gas supply risks on the security of the

entire system has not been fully considered. In [14], the impact of the shortage for natural gas supply on the operation of the integrated energy system was analyzed, and the impact of intermittent new energy power injection on the feasible region of the natural gas system was also evaluated. Reference [15] proposed universal indicators from the energy link, device link, distribution network link and user link. The security assessment of the regional integrated energy system was performed. However, the risk of accidents caused by component failures to the system was not considered in [14,15]. In [16], the sufficiency and safety of the integrated energy system were analyzed, and the key fault scenarios and extreme operation scenarios were identified using the natural gas transient power flow model and the power system interlocking fault model. In [17], reliability indexes such as expected electric/gas/heat demand not supplied, expected wind power abandoned and power-to-gas device capacity utilization were proposed. [16,17] were not refined to evaluate the operating status of the integrated system. The above research results evaluate the reliability level of integrated energy system from the perspective of long-term planning, which is of great significance for system optimization planning and operation control. However, the quantitative calculation results based on the short-term scale of system operation risk are more helpful for the operating personnel to make online decisions. Operation regulators need to conduct risk assessment based on the real-time operation state of the system, so as to find potential safety hazards, give timely warnings, and assist in making decisions to adjust the current operation mode to ensure the safety of the system.

In this article, an EGIES steady-state analysis model considering operating constraints is established. Establishing natural gas system evaluation indicators including node low pressure severity, pipeline overload severity, pipeline tidal distribution severity, and gas load loss, we combine the power system risk assessment indicators to establish the EGIES assessment system. We also identify vulnerable components in EGIES by considering the possibility/severity of component failure. Finally, the risk assessment of IEEE14-NG15 EGIES was conducted to verify the effectiveness of the proposed model and method.

#### **2. The Steady-State Modeling and Power Flow Calculation of EGIES**

Due to the differences in physical characteristics of different electric-gas integrated energy system energy systems, our modeling needs to be coordinated uniformly. For the coupling of EGIES, it is actually a key element in the transformation of energy forms. In the modeling process, the energy transformation characteristics should be considered, similar to the energy hub.

#### *2.1. Gas Turbine Condition Analysis Model*

A gas turbine is an energy converter between the natural gas pipeline network and the power grid. For a natural gas pipeline network, a gas turbine can be equivalent to a natural gas load; for a power system, it can be equivalent to an adjustable output power source. The relationship between the gas consumption of a gas turbine and its active output can be expressed as follows [18]:

$$P\_{G,i} = a\_{\mathbb{g},i} f\_{\mathbb{g},i}{}^3 + \beta\_{\mathbb{g},i} f\_{\mathbb{g},i}{}^2 + \gamma\_{\mathbb{g},i} f\_{\mathbb{g},i} \tag{1}$$

where *fg*,*<sup>i</sup>* is the amount of gas consumed by the *i-th* gas turbine; α*g*,*i*, β*g*,*i*, γ*g*,*<sup>i</sup>* are the electrical energy conversion coefficients of the gas turbine; *PG*,*<sup>i</sup>* is the active output of the *i-th* gas turbine.

#### *2.2. Energy Flow Model of Gas Pressure Regulator*

The compressor is an important non-pipeline component in the natural gas pipeline network, and its parameters mainly include flow rate and inlet and outlet pressure. The relationship between the power required by the compressor and its air pressure ratio can be calculated by the following formula:

$$f\_{\rm clt} = \alpha\_{\rm c} + \beta\_{\rm c} \cdot HP + \gamma\_{\rm c} \cdot HP^2 \tag{2}$$

$$HP = f\_c^{in} \cdot \frac{\alpha}{\eta(\alpha - 1)} \cdot [(\frac{p^{out}}{p^{in}})^{\alpha(\alpha - 1)} - 1] \tag{3}$$

where *HP(*105*W*) is power; *fin <sup>c</sup>* (*m*3/*h*) is the equivalent flow through the compressor under standard conditions; α is the variable index (here α is 1.27); η is the compressor efficiency, generally maintained at 0.75–0.85; α*c*, β*c*, γ*<sup>c</sup>* are fuel ratio coefficients; *fch* is the amount of gas consumed by the gas-consuming compressor; *pout* is the output air pressure; *pin* is the input air pressure.

#### *2.3. Natural Gas Pipeline Model*

In the case of fixed external conditions, the flow of the pipeline is mainly related to the pressure at the head and end of the pipeline. Given the two variables of the pipeline flow, the pressure at the beginning of the pipeline, and the pressure at the end, the unknown variable can be solved. According to the conservation law of natural gas hydrodynamic mass and Bernoulli's equation, the natural gas flow equations of different pressure levels based on certain assumptions are as follows [19,20]:

$$f\_{ij} = \begin{cases} 5.72 \cdot 10^{-4} \sqrt{\left(p\_i - p\_j\right) \cdot \frac{D\_{ij}^5}{\text{FGL}\_{ij}}} \\ 7.57 \cdot 10^{-4} \cdot \frac{T\_u}{p\_n} \sqrt{\left(p\_i^2 - p\_j^2\right) \cdot \frac{D\_{ij}^5}{\text{FGL}\_{ij} T\_u}} \\ 7.57 \cdot 10^{-4} \cdot \frac{T\_u}{p\_n} \sqrt{\left(p\_i^2 - p\_j^2\right) \cdot \frac{D\_{ij}^5}{\text{FGL}\_{ij} T\_u Z\_u}} \end{cases} \tag{4}$$

These three types are applicable to natural gas pipelines with pipeline pressures below 0–0.75 bar, 0.75–7.0 bar and greater than 7.0 bar, respectively. *i* and *j* represent the beginning and end of the natural gas pipeline respectively, and *fij m*3/*h* represents the flow from node *i* to node *j* through the pipeline; *pi*(*bar*) and *pj*(*bar*) are the pressure at the beginning and end of the pipe; *Dij*(*mm*) is the diameter of the pipe; *F* is the non-directional friction coefficient; *Ta*(*K*) is the average temperature of natural gas, *Tn*(*K*) is the temperature under standard conditions; *G* is the specific gravity of natural gas; *Za* is the average compressibility coefficient.

#### *2.4. The Steady-State Power Flow Model of EGIES*

For the EGIES system, each subsystem has its own physical characteristics, so the original physical characteristics are still maintained during the modeling process. The coupling link mainly plays the interaction between the systems, so the transformation characteristics and physical characteristics of the coupling link are mainly considered. The expression of the EGIES steady-state model cited in the article [3] is:

$$\begin{cases} \begin{array}{l} f\_{\rm E} \big( \mathbf{x}\_{\rm t}, \mathbf{x}\_{\rm \mathcal{S}}, \mathbf{x}\_{\rm cl} \big) &= 0 \\ f\_{\rm N \mathcal{G}} \big( \mathbf{x}\_{\rm \mathcal{E}}, \mathbf{x}\_{\rm \mathcal{S}}, \mathbf{x}\_{\rm cl} \big) &= 0 \\ f\_{\rm E \mathcal{H}} \big( \mathbf{x}\_{\rm \mathcal{E}}, \mathbf{x}\_{\rm \mathcal{S}}, \mathbf{x}\_{\rm cl} \big) &= 0 \end{array} \tag{5}$$

These three formulas respectively represent the equations of the coupling of the power grid, natural gas network and energy; *xe* represent power system variables including power, phase angle, and voltage amplitude; *xg* represent natural gas system variables including pressure and flow; *xeh* represent the energy coupling variable including the power conversion factor.

#### *2.5. Flow Calculation for EGIES*

The similarity of power system and natural gas system in the solution of power flow is mainly reflected in two aspects:

(1) According to the law of conservation of mass, Kirchhoff's first law and Kirchhoff's second law are also applicable in natural gas systems. Correspondingly, natural gas flow solutions can be formed focusing on nodes and closed loops [20].

(2) The key to solving power flow of power system and natural gas system is to use high-efficiency iterative algorithm to solve high-dimensional nonlinear equations. Therefore, the solution method represented by Newton's method can be extended to natural gas systems [21].

In this paper, the nodal method is used to solve the power flow equation of the natural gas system. For the n-node natural gas system, according to Equation (5), when the *k-th* iteration is solved using Newton's method, the correction equation is as follows:

$$\begin{cases} \quad \boldsymbol{F} \Big( \mathbf{x}\_{\mathcal{S}}^{(k)} \Big) = \boldsymbol{J}^{(k)} \Delta \mathbf{x}\_{\mathcal{S}}^{(k)} \\ \quad \mathbf{x}\_{\mathcal{S}}^{(k+1)} = \mathbf{x}\_{\mathcal{S}}^{(k)} - \Delta \mathbf{x}\_{\mathcal{S}}^{(k)} \end{cases} \tag{6}$$

Among them:

$$F\mathbf{x}\_{\mathcal{S}}^{(k)} = \begin{bmatrix} f\_{\text{NGI}} \left[ \mathbf{x}\_{\mathcal{S}1}^{(k)}, \mathbf{x}\_{\mathcal{S}2}^{(k)}, \dots, \mathbf{x}\_{\mathcal{S}n}^{(k)} \right] \\ f\_{\text{NGI}2} \left[ \mathbf{x}\_{\mathcal{S}1}^{(k)}, \mathbf{x}\_{\mathcal{S}2}^{(k)}, \dots, \mathbf{x}\_{\mathcal{S}n}^{(k)} \right] \\ \vdots \\ f\_{\text{NGI}n} \left[ \mathbf{x}\_{\mathcal{S}1}^{(k)}, \mathbf{x}\_{\mathcal{S}2}^{(k)}, \dots, \mathbf{x}\_{\mathcal{S}n}^{(k)} \right] \end{bmatrix} \tag{7}$$

where *F x* (*k*) *g* is the error vector of the function sought; *J*(*k*) is the current Jacobian matrix; *J*(*k*) is the current correction vector. By iterating the above formula repeatedly until the convergence condition is satisfied, the result can be finally obtained.

The main power link of the comprehensive energy system is distribution network, and its main features include radial operation, large branch *R*/*X* ratio, multi-phase unbalance, multiple branches, and the existence of renewable energy access. In addition, with the gradual close coupling of multiple energy sources in the integrated energy system, the power system is not only the output object of other energy links, but also the energy supplier of the coupling links in other energy systems. These characteristics put forward new requirements for the steady-state analysis of power links in the integrated energy system, and the influence of other energy links coupled with it should be considered in the solution process. The flow calculation process of EGIES is shown in Figure 1.

In power flow calculation of power system, its basis is node voltage current equation *I* = *YU*, which is expressed by power variable and becomes:

$$\dot{I}\_m = \sum\_{n=1}^k Y\_{mn} \dot{U}\_n = \frac{P\_m - jQ\_m}{\Omega\_m}, m = 1, 2, \dots, N \tag{8}$$

where . *Im* and . *Un* are respectively the injection current of node *m* and the voltage of node *n*. *Ymn* is an element in the admittance matrix. *Pm* and *Qm* are respectively the injected active power and reactive power of node *m*. *U*ˆ *<sup>m</sup>* is the conjugate of the voltage vector; *N* is the number of system nodes. Because distribution network power flow solving technology is very mature, this article will not repeat them.

**Figure 1.** Flow chart of EGIES flow calculation.

#### **3. Accident Severity Assessment Indexes of EGIES**

The accident severity assessment of EGIES include power system and natural gas system accident severity assessment. The power system evaluation indicators have been described in [22], so this article will not repeat them. Natural gas system evaluation indicators include node low pressure severity, pipeline overload severity, pipeline flow distribution severity, and gas network load loss severity. The above indicators can reflect the operating characteristics of the natural gas system from some aspects.

#### *3.1. Low-Pressure Severity Index for Natural Gas System Nodes*

Node pressure reflects the gas supply capacity of the natural gas system. Considering the existence of factors such as non-directional friction coefficient, there is a transmission security zone due to the maximum transmission distance of natural gas during the transmission process. In the area, the pressure at the end of the pipe can be controlled within safe limits. The severity of low gas pressure at the nodes of the natural gas network indicates the gas supply capacity of the nodes. The low-pressure severity function of node *i* of the natural gas pipeline network is defined as:

$$a\_{\mathcal{S}}(p\_i) = \begin{cases} 0 & p\_i \ge p\_s \\ \frac{p\_s - p\_i}{p\_s - p\_{\rm inv}} & p\_i < p\_s \end{cases} \tag{9}$$

where *pi* is the pressure of the natural gas network node *i*; *ps* is the rated gas pressure of the natural gas network node *i*; *plim* is the maximum low-pressure risk threshold.

The severity of low-pressure in a natural gas system can be expressed as:

$$\mathcal{S}\_{\mathcal{S}}(p) = \sum\_{i=1}^{N} a\_{\mathcal{S}}(p\_i) / N \tag{10}$$

where *N* is the total number of nodes in the natural gas pipeline network; *ag*(*pi*) is a function of the low-pressure severity of natural gas network node *i*.

#### *3.2. Gas Pipeline Overload Severity Index*

When the transmission capacity of the power line exceeds its limit value, the thermal effect phenomenon will accelerate the aging of the line and even cause the line to fail. Analogous to the overload severity of power lines, the overload severity of natural gas pipelines is proposed to measure the pipeline operating status. The pipeline overload severity function between node *i* and node *j* can be expressed as:

$$a\_{\mathcal{S}}(f\_{ij}) = \begin{cases} 0 & f\_{ij} < f\_d \\ \frac{f\_{ij} - f\_d}{f\_{\text{lim}} - f\_d} & f\_{ij} \ge f\_d \end{cases} \tag{11}$$

where *fij* is the pipeline flow between nodes *i* and *j*; *fllim* is the maximum transmission flow, which represents the threshold of the overload risk of the branch; *fd* is the set pipeline overload risk threshold, which is generally 90% of *flim*.

Therefore, the pipeline overload severity of the natural gas network can be expressed as:

$$S\_{\mathcal{S}}(f) = \sum\_{i=1}^{M} a\_{\mathcal{S}}(f\_{i\bar{i}}) / M \tag{12}$$

where *M* is the total number of natural gas pipelines; *ag fij* is the pipeline overload severity function of natural gas pipeline network node *i* and node *j*.

#### *3.3. Gas Flow Distribution Severity Index of Natural Gas System*

This article uses tidal current entropy [23] to quantitatively describe the equilibrium of the pipeline flow distribution. The entropy theory was first applied to the laws of thermodynamics, and then gradually applied to systems such as information science and statistical physics. The entropy is a measure, which reflects the chaotic and disordered state of the system. If the order degree of the system is lower, the entropy is higher, Conversely, the higher the order degree of the system, the smaller its entropy. Although the average load rate of the electric power system and the natural gas system can reflect the load level of the system as a whole, the description of the load rate distribution of the line is insufficient and imperfect. When the system is at a certain load level, the following can happen: it may be that the load rate of all the lines is near the average load rate, or it may be that the load rate of some lines is much higher than the average load rate while the load rate of some lines is much lower than the average load rate. This information cannot be represented by the average load rate, and it is not possible to use the average load rate as an indicator to study how the unbalanced distribution of power flows will affect system security. Therefore, entropy theory is introduced in this paper to reflect the distribution of power flow in the system. The amount of gas transmitted by the natural gas system is closely related to the capacity of the natural gas pipeline. Gas lines with large natural gas pipelines carry large volumes of gas, and conversely, small volumes of natural gas pipelines carry small volumes of gas. In this way, the flow distribution of the natural gas system is balanced. The flow entropy of the natural gas pipeline is defined here as:

$$H\_{\mathcal{S}} = -\mathbb{C} \sum\_{k=1}^{n-1} [p(k) \ln p(k)] \tag{13}$$

where *C* is taken as ln10; the interval is equally divided into 20 parts, and *p*(*k*) is the ratio of the lines with the load ratio belonging to the same interval to the total number of lines.

When the load rates of all the pipelines are not in the same load rate interval, the power flow entropy reaches the maximum:

$$H\_{\text{max}} = -\text{Cln}\frac{1}{M} \tag{14}$$

At this time, the distribution of the pipeline flow is extremely uneven. Once the load or other factors cause fluctuations in the operating state, the line with a high load rate is likely to fluctuate and exceed the safe range, which will cause a failure. The larger the value of the flow entropy, the more uneven the flow distribution, the lower the security and the lower the line utilization.

The flow distribution severity function of a natural gas system can be expressed as:

$$S\_{\mathcal{S}}(H) = \begin{cases} 0 & H\_{\mathcal{S}} < H\_{o} \\ \frac{H\_{\mathcal{S}} - H\_{o}}{H\_{\max} - H\_{0}} & H\_{\mathcal{S}} \ge H\_{o} \end{cases} \tag{15}$$

where *Hg* is the pipe flow entropy of the system after component failure; *Ho* is the steady-state entropy of the pipeline before the component fails; *Hmax* is the maximum flow value of the natural gas system.

#### *3.4. Gas Load Loss Severity Index*

Regardless of whether it is a power network or a natural gas pipeline network, it is important to transfer energy from the supply side to the user side. Therefore, the gas load loss severity is an important indicator for evaluating the system. n this paper, the electric-gas load reduction optimization model [17] considering component faults is adopted to achieve as much reserved load as possible in case of system failure. The gas load loss ratio of natural gas pipeline network is defined as:

$$\eta = \frac{\sum\_{i=1}^{n} F\_i - F\_i'}{\sum\_{i=1}^{n} F\_i} \times 100\% \tag{16}$$

where η is the proportion of natural gas pipeline load loss after the accident; *Fi* is the gas load of natural gas node *i* before the fault; *F <sup>i</sup>* is the gas load of node *i* before and after the fault.

The gas load loss severity function is defined as:

$$S\_{\text{global}} = \begin{cases} \frac{\eta}{\eta\_{\text{lim}}} & \eta < \eta\_{\text{lim}} \\ 1 & \eta \ge \eta\_{\text{lim}} \end{cases} \tag{17}$$

where η*lim* is the threshold for the loss of the natural gas system, and 20% of the total natural gas load is taken in this paper.

#### **4. EGIES Risk Assessment Considering N-1 Failure**

In this chapter, combined with the severity assessment index of the EGIES established in the second part, the risk assessment model of EGIES established based on the risk assessment theory [24,25] and the failure probability of EGIES is considered. Finally, risk values of components in the electrical integrated energy system are calculated to identify vulnerable links.

#### *4.1. EGIES Failure Probability Considering N-1 Failure*

It can be seen from the statistical data that the occurrence probability of power system accidents basically conforms to the characteristics of the Poisson distribution [26]. The probability of power system accidents can be expressed as:

$$p(E\_i) = \left(1 - e^{-\lambda\_i}\right) e^{-\sum\_{j \neq i} \lambda\_j} \tag{18}$$

where *Ei* is the *i*-th accident in the power system; *p*(*Ei* is the probability of accident *Ei*; λ*<sup>i</sup>* is the failure rate of component *i*.

In a natural gas system, the probability of component accidents also meets the Poisson distribution law [27], and can be similarly expressed as:

$$p(G\_i) = (1 - e^{-\xi\_i})e^{-\sum\_{j \neq i} \xi\_j} \tag{19}$$

where *Gi* is the i-th accident in the natural gas pipeline network; *p*(*Gi* ) is the probability of the accident *Gi*; *gi* is the failure rate of the natural gas pipeline network component *i*.

The failure rates of the power system and the natural gas system are independent of each other, when the failures of the components in the two systems are considered separately. The following two formulas show the component failure rates of the natural gas system and the power system in the EGIES considering N-1 failure:

$$p(E\_{\mathcal{S}^i}) = p(E\_i) \prod\_{j=1}^{N^c} [1 - p(G\_j)] \tag{20}$$

$$p(G\_{ci}) = p(G\_i) \prod\_{j=1}^{N\_{\mathcal{S}}} \left[ 1 - p(E\_j) \right] \tag{21}$$

where *p*(*Ei* ) and *p*(*Gi* ) are respectively the initial failure rates of the electrical and natural gas network components; *p*(*E gi*) and *p*(*Gei*) are respectively the failure probability of the electrical and gas network components; *Ne* and *Ne* are respectively the total components of the electrical and gas network.

#### *4.2. Risk Assessment Model of EGIES*

In this paper, the coupling effect between EGIES multi-energy systems is considered, and the comprehensive risk assessment indicators as shown below are established based on the risk assessment theory:

$$R\_k = \begin{cases} p(E\_{\mathbb{S},k}) \left( \chi\_{\varepsilon,k} + \chi\_{\mathbb{S},k} \right) & k \in \mathcal{N}\_{\varepsilon} \\ p(G\_{\varepsilon,k}) \left( \chi\_{\varepsilon,k} + \chi\_{\mathbb{S},k} \right) & k \in \mathcal{N}\_{\mathbb{S}} \end{cases} \tag{22}$$

among them:

$$\mathcal{Y}\_{\mathfrak{c}} = \mathcal{S}\_{\mathfrak{c}}(\mathcal{U}) + \mathcal{S}\_{\mathfrak{c}^\mathfrak{c}}(\mathcal{P}) + \mathcal{S}\_{\mathfrak{c}}(\mathcal{H}) + \mathcal{S}\_{\text{eland}} \tag{23}$$

$$\mathcal{Y}\_{\mathcal{S}} = \mathcal{S}\_{\mathcal{S}}(p) + \mathcal{S}\_{\mathcal{S}}(f) + \mathcal{S}\_{\mathcal{S}}(H) + \mathcal{S}\_{\mathcal{gl}\text{odd}} \tag{24}$$

where *Se*(*U*), *Se*(*P*), *Se*(*H*) and *Seload* are respectively low voltage severity, line overload severity, power flow distribution severity and power loss load severity; where *Sg*(*p*), *Sg*(*f*), *Sg*(*H*) and *Sgload* are respectively low-pressure severity, gas pipeline overload severity, Gas flow distribution severity and Gas load loss severity.

#### *4.3. EGIES Security Assessment Process Based on Risk Theory*

The main steps of EGIES security assessment method based on risk theory [25] are as follows, and the process is shown in Figure 2:


meet the constraint conditions. If not, the cycle iteration is conducted by adjusting the energy coupling variable or optimizing load reduction to finally meet the operating conditions.

(3) Calculate the EGIES risk. Through the established EGIES evaluation system, the vulnerability and importance of each component of the system were identified horizontally, and the main impact of the component on the system operation was evaluated vertically.

**Figure 2.** Flow chart of EGIES security assessment.

#### **5. Case Study**

The EGIES system shown in Figure 3 is simulated and analyzed based on GAMS and MATLAB software. The system includes IEEE-14 nodes of power system and 15 nodes of natural gas network. The coupling elements include three gas turbines and three compressors. The operation constraints in the EGIES include are shown as follows:


**Figure 3.** IEEE14-NG15 system structure diagram.

It was assumed that the reliability of the natural gas pipeline, the compressor and the power line is 0.90, 0.90 and 0.92 respectively. The Pipeline parameters of 15-node natural gas system are shown in Table A1 of Appendix A, and the gas loads of natural gas system are shown in Table A2 of Appendix A. In the initial state of EGIES, the risk indicators of severity of natural gas system accident and severity of power system accident are shown in Figures 4 and 5 respectively.

Taking the outlet pipeline fault of natural gas source N1 (Element label 1 represents the risk value of the system under normal operation, component Q1 is labeled 2, component Q2 is labeled 3, and so on in Figures 4–7) as an example, it is illustrated that the above evaluation indicators can correctly reflect the changes in operating state of other parts of EGIES caused by gas network fault. As can be seen from Figure 4, a large number of air sources are missing, resulting in a large resection of the gas load. At the same time, the amount of gas transmitted in the pipeline also decreases correspondingly, resulting in low node pressure, low pipeline overload and abnormal power flow distribution. The 90 MW gas turbine at node E3 supports the operation of the whole large power grid. Due to the severe reduction of gas supply, the gas turbine output is seriously reduced, resulting in a substantial reduction of power load.

**Figure 4.** Natural gas system accident severity.

**Figure 5.** Power system accident severity.

**Figure 6.** Compressor operating ratio.

**Figure 7.** EGIES components comprehensive risk index value.

Although the E3 gas turbine is inadequately powered, other generators can still maintain the power supply of part of the electrical load. Therefore, there is a large loss of electrical load and an uneven distribution of power flow, but there is no serious line overload. As a result of the above process, the gas flow through the compressor Q5 and the compressor Q12 is zero, and the energy supply of the compressor Q16 is zero, which can also be reflected in Figure. 6 of the operating state of the three compressors.

Taking the fault of 60 MW gas turbine outlet line connected with node E8 (component Q32 is labeled 33 in in Figures 4–7) as an example, it is illustrated that the above evaluation indicators can correctly reflect the changes in operating state of other parts of EGIES caused by power grid failure. It can be seen from Figure 5 that the state change process of the EGIES after a gas turbine failure is similar to the N1 outlet pipeline failure scenario described above. The power generated by the gas turbine cannot be supplied to the system due to failure of the power line at the outlet of the gas turbine. Therefore, the natural gas supply to the gas turbine by the node N12 must be interrupted, and the operating status of the gas network also changes accordingly. Unlike the first case, the power line is severely overloaded in the second case. The main reason is that different energy systems have different definitions of load overload. In power systems, a load loss of more than 20% is defined as a severe load loss. In fact, in the first case, only 45.54% of the electric load was retained, while in the second case, 74.35% was retained. The greater the electric load retention, the higher the possibility of overload on power lines.

According to Equation (22), the comprehensive risk index value of EGIES can be calculated. The top ten comprehensive risk index of components is shown in Table 1, which include the outlet line of generation. the compressor and the outlet pipeline of the gas source and so on. The distribution of comprehensive risk index of components is shown in Figure 7. According to the integrated risk index curve of EGIES components, the vulnerable links of the EGIES can be identified.


**Table 1.** EGIES components comprehensive risk index value of the top ten.

As can be seen from Table 1 and Figure 7, the comprehensive risk value of component Q32 is the highest. The reason is that Q32 is the outlet line of the generator. If a fault occurs, Q32 cannot supply power to the system, and the system will suffer serious power load loss. Considering that the generator connected to Q32 is a gas turbine, it is also sensitive to the fault disturbance of the natural gas network. Q1, Q17, Q32 and Q5 rank higher in the table. Q1 and Q17 are both gas source outlet pipelines, which undertake the task of supplying gas to the natural gas system. If Q1 and Q17 fail, the gas source will not be able to supply gas normally, and the system will suffer serious loss of gas load. In addition, the terminal nodes of Q1 and Q17 are connected to the gas turbine, which are sensitive to the fault disturbance of the power grid. Q5 is the line where the air pressure regulating device is located, which is responsible for the air pressure regulation of the natural gas network. If Q5 fails, the pressure adjustment of the natural gas system will be abnormal, and natural gas cannot be normally transmitted. Moreover, Q5 is the first end of multiple gas supply lines, which is also the reason for the higher risk value. The reasons for the higher risk values of element Q33 are discussed below.

According to the method discussed in this article, combined with each severity index after the failure of comprehensive energy system, the comprehensive risk value is used to evaluate the security risk of *N-1* failure. Through comparative analysis with the methods proposed in reference [28], the ranking results of the top 10 comprehensive values of risk obtained by the two methods are shown in Table 1. It can be seen from Table 2 that the ranking of the first 10 high-risk faults obtained in this paper is very similar to the results of the method in literature [28], which proves the correctness and availability of the method in this paper.


**Table 2.** Comparison of the top 10 risk values of N-1 accidents.

The difference between the risk ranking obtained in this paper and the method proposed in [28] mainly lies in Q17, Q33 and Q29. The reasons for the higher risk of Q17 have been mentioned above and will not be repeated. Although the component Q33 is traversed by the shortest path between fewer power-load pairs, the reason for the greater risk of Q33 is that its end is connected to the air pressure regulator, which is greatly affected by the disturbance of the natural gas network. Q29 is not directly connected to the gas unit and is less affected by the fault of the gas network. However, Q29 is passed by more power-load node pairs, and it is in the key position of network energy transmission, which plays an important role in shortening the electrical distance between power generation node and load node. Once Q29 fails, other lines will be overloaded, which will easily cause a grid cascading fault.

#### **6. Summary**

In this paper, a comprehensive energy risk assessment index and a risk assessment strategy for the EGIES considering component n-1 accident are proposed. Firstly, the steady state power flow model of the EGIES is established to analyze the safe operation of each subsystem. Secondly, the vulnerability of components is analyzed according to the severity function of IENGS, and the critical and non-critical components in the system are identified. Thirdly, IENGS risk assessment model is established to analyze the security of EGIES running state. Furthermore, an IENGS risk assessment

method considering n-1 accidents is proposed to analyze the vulnerable links in the electric-gas integrated energy system. The research shows that the EGIES risk assessment method proposed in this paper can assess the coupling and interaction effects between subsystems, reflect the security of system operation to a certain extent, and provide scientific decision basis for relevant personnel.

**Author Contributions:** Conceptualization, Y.L. and Y.C.; methodology, D.S. and Y.C.; writing—original draft preparation, H.L. and Z.Z.; writing—review and editing, H.L., Z.Z. and Y.L.; investigation, Y.L. and Y.C.; supervision, Y.C. and D.S.; funding acquisition, Y.L., Y.C. and D.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the International Science and Technology Cooperation Program of China under Grant 2018YFE0125300, in prat by the Innovative Construction Program of Hunan Province of China under Grant 2019RS1016, in part by the 111 Project of China under Grant B17016, and in part by the Excellent Innovation Youth Program of Changsha of China under Grant KQ1802029. This work is fulfilled as part of the program of fundamental research of SB of Russian Academy of Sciences, reg. no. AAAA-A17-117030310442-8, research project III.17.3.1.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A**


**Table A1.** Pipeline parameters of 15-node natural gas system.

**Table A2.** Natural gas loads of 15-node natural gas system.


#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Cluster-Based Prediction for Batteries in Data Centers**

#### **Syed Naeem Haider and Qianchuan Zhao \* and Xueliang Li**

Center for Intelligent and Networked Systems (CFINS), Department of Automation and BNRist, Tsinghua University, Beijing 100084, China; haider.gillani12@gmail.com (S.N.H.); lixuelia16@mails.tsinghua.edu.cn (X.L.)

Received: 31 January 2020; Accepted: 24 February 2020; Published: 1 March 2020

**Abstract:** Prediction of a battery's health in data centers plays a significant role in Battery Management Systems (BMS). Data centers use thousands of batteries, and their lifespan ultimately decreases over time. Predicting battery's degradation status is very critical, even before the first failure is encountered during its discharge cycle, which also turns out to be a very difficult task in real life. Therefore, a framework to improve Auto-Regressive Integrated Moving Average (ARIMA) accuracy for forecasting battery's health with clustered predictors is proposed. Clustering approaches, such as Dynamic Time Warping (DTW) or k-shape-based, are beneficial to find patterns in data sets with multiple time series. The aspect of large number of batteries in a data center is used to cluster the voltage patterns, which are further utilized to improve the accuracy of the ARIMA model. Our proposed work shows that the forecasting accuracy of the ARIMA model is significantly improved by applying the results of the clustered predictor for batteries in a real data center. This paper presents the actual historical data of 40 batteries of the large-scale data center for one whole year to validate the effectiveness of the proposed methodology.

**Keywords:** forecasting; clustering; energy systems; classification

#### **1. Introduction**

Uninterrupted power source (UPS) batteries are an integral part of any data center, which ensure the stable performance of the data center during transitional fail-over mechanisms between power grids and diesel generators [1]. Data centers require steady power for smooth performance, which is thus managed by the UPS batteries. UPS is installed between the main power grid and the servers [2]. Since the electricity bill of a data center constitutes a significant portion of its overall operational costs, data centers are now major consumers of electrical energy [3]. In 2013, data centers in U.S.A. consumed 91 billion kilowatt-hours of electricity, and this was expected to continue to rise over the years [4]. In 2017, nearly 8 million data centers required an astronomical 416.2 terawatt-hours of electricity [5,6]. Even a single faulty battery in a pack could cause millions of dollars of damage to the equipment used in the data centers during transition. The layout of the data center's design is illustrated in Figure 1.

Despite the increasing improvements in battery manufacturing and storage technology [7], health estimation of batteries in data centers is still a challenge. Not surprisingly, many studies have been conducted to develop battery life prediction of the battery packs, such as voltage fault diagnosis, charge regimes, and state of health (SOH) estimation. Severson et al. [8] demonstrated a data-driven model to predict the battery life cycle with voltage curves of 124 batteries before degradation. Tang et al. [9] predicted the battery voltage with the model-based extreme learning machine for electric vehicles. L. Jiang et al. [10] employed the Taguchi method to search an optimal charging pattern for 5-stage constant-current charging strategy and improved the lithium-ion battery charging efficiency by 0.6–0.9%. D. Sidorov et al. [11] presented a review of battery energy storage and an example of battery modeling for renewable energy applications and demonstrated an adaptive approach to solve the load leveling problem with storage. Hu et al. [12] employed advanced sparse

**<sup>\*</sup>** Correspondence: zhaoqc@tsinghua.edu.cn

Bayesian predictive modeling (SBPM) methodology to capture the underlying correspondence between capacity loss and sample entropy. Sample entropy of short voltages displayed an effective variable of capacity loss. You et al. [13] proposed a data-driven approach to trace battery SOH by using data, such as current, voltage, and temperature, as well as historical distributions. Song et al. [14] proposed a data-driven hybrid remaining useful life estimation approach by fussing the IND-AR (Iterative nonlinear degradation autoregressive model) and empirical model via the state-space model in RPF (Regularized particle filter) for spacecraft lithium-ion batteries. Zhou et al. [15] combined Empirical Mode Decomposition (EMD) and Auto-Regressive Integrated Moving Average (ARIMA) models for the prediction of lithium-ion batteries' Remaining Useful Life (RUL) in the Battery Management System (BMS), which is used in electric vehicles. Chen et al. [16] proposed a hybrid approach by combining Variational Mode Decomposition (VMD) de-noising technique, ARIMA, and GM (Gray Model) (1,1) models for battery RUL prediction.

**Figure 1.** Data center layout. PDUS = Power Distribution Units.

The ARIMA model has been one of the most widely used models in time-series forecasting [17–19]. Kavasseri et al. [20] examines the use of fractional-ARIMA or f-ARIMA models to forecast wind speeds on the day-ahead (24 h) and two-day-ahead (48 h) horizons. A hybridization of Artificial Neural Network (ANN) and the ARIMA model is proposed by Khashei et al. [21] to overcome the mentioned limitation of ANNs and yield a more accurate forecasting model than traditional hybrid ARIMA-ANNs models. The annual energy consumption in Iran is forecasted by using three patterns of ARIMA–ANFIS model by Barak et al [22].

ARIMA is used in forecasting social, economic, engineering, foreign exchange, and stock problems. It predicts future values of a time series using a linear combination of its past values and a series of errors [23–27]. Since batteries in the data center are always on charging mode, the deep discharge is a rare occurrence for batteries and their distinctive internal chemistry causes different behaviors like stationary or stochastic for each battery. In addition, failure data is not available in real life which makes it a challenge to accurately predict the battery status before its first failure. For this paper, we developed a cluster-assisted ARIMA model framework to improve the accurate prediction of battery voltage. Clustered patterns are utilized as external regressors to improve the accuracy of the ARIMA model and provide a more accurate indication of battery status in the future. Clustering in machine learning is the grouping of a similar set of data points. This aspect is used to group the patterns of batteries within the data center and improve the forecasting model instead of predicting thousands of batteries individually. Clustering algorithms, like Dynamic Time Warping (DTW), hierarchical, fuzzy, k-shape, and TADPole all have unique functionality for grouping similar data points, and the features selected by clustering improve the model forecasting accuracy [28–30]. The proposed cluster-assisted forecasting results are compared with actual battery data and without clustered ARIMA forecasting.

The rest of the paper is organized as follows: Section 2 describes the features of the data center and data set used for the study. Section 3 describes data preprocessing and explain the methodology by introducing the algorithms for cluster consistency and clustered ARIMA forecasting. Section 4 shows the steps to implement the proposed clustered forecasting method. Section 5 demonstrates the battery cluster consistency detection results and cluster-assisted ARIMA forecasting, as well as discusses the effectiveness of the method by comparing the results with actual data and without the cluster-assisted forecasting ARIMA model. Section 6 concludes this work.

#### **2. Overview of the Data Set**

In this paper, data is collected from a large-scale social media company located in China. One year of data is used for research with 470,226 data points and a sampling interval time of 1 min. This data set includes the variables of data center's main power, transmission units, battery units, cooling systems, and DC (Direct Current) load values. Data set variables are shown in Table 1.


**Table 1.** Data center's data set with all 470,226 feature instances.

Our objective is to develop a scalable clustering framework to improve the forecasting accuracy of the ARIMA model for battery voltages in data centers. Voltage measurement of individual batteries is a common practice in data centers whereas other parameters like current and charging regimes are also collectively measured from a group of batteries. Voltage is utilized in the simplest of BMS of small vehicles to large scale data centers. Our data has voltage from 40 batteries; and battery aging features are selected from domain knowledge of batteries [8].

#### **3. Methodology**

Figure 2 shows the flowchart of the proposed method and the steps of the proposed method are given as follows:

• Data Preprocessing

**Step 1:** First, separate the battery voltage data from the data set. Extract the historic values of first-month battery voltages and keep updating the real-time voltage values.

• Cluster Consistency

**Step 2:** Carry out clustering analysis on first month data and real time updated data set and proceed to the step 3.

**Step 3:** Match the clustering results of first month and updated month data for cluster consistency. If cluster members are different in first and updated month clusters, then go to the next step.

• Clustered ARIMA Forecasting

**Step 4:** Fit an ARIMA model using the cluster members as external predictors to forecast the battery's voltage status, and if a cluster has only 1 member, then fit an ARIMA model without the external predictor. If the forecasted voltage has a declining trend, then the battery health is dropping comparative to its first-month's cluster members.

**Figure 2.** Proposed method flowchart.

#### *3.1. Data Preprocessing*

Data cleaning is the first step in the data preprocessing step by identifying the missing values and correcting the raw data for analysis. See Section 2 for multiple features of the data set. Battery voltage data is utilized to forecast battery health with the assumption that all the batteries are new and equally healthy. Data centers keep batteries in a safe and controlled environment, and all the batteries would show identical behavior and over fitted prediction models if short intervals are selected considering batteries do not fail in their early months. Our analysis suggests that discharge events occur sometimes once in a few months and sometimes twice a month. In order to analyze the effect of these events in a consistent manner, we used one year of data and divided it by 12 to update the data on each iteration on monthly basis. The first month's data was extracted from

the data set and used as a standard for comparing clustering and voltage status with real-time updated data. See Section 4.1.

#### *3.2. Cluster Consistency*

We now present our proposed cluster based predictor configuration Algorithm 1 for batteries in a data center. The approach to update the clustered predictor for forecasting on monthly basis is presented in this algorithm. For a detailed description of the k-shape-based and DTW clustering algorithm, see Appendixes A.1 and A.2.


Clustering algorithms accept the battery voltage data set, *Vij* , as the first-month historic voltage data set and *LVij* as the latest and updated voltage data set, where (*i*) is the time, and (*j*) is the total number of batteries. *FB* is the set of batteries when clustering is applied in the first month. *LB* is the set of batteries when clustering is applied in the latest month. *DA* is the set of inconsistent batteries' cluster which is a result of a comparison between clustering sets of latest month (*LB*) and first month (*FB*). If *DA* is not equal to ∅, it is an inconsistent or outlined battery cluster. *MB* and *MC* are the first and latest month clusters' mean voltage sets, respectively. These sets also represent cluster voltage status comparative to other clusters. The difference between *MB* and *MC* gives us *DM*. If *DM* is not equal to ∅, cluster voltage status changes.

#### *3.3. Clustered ARIMA Forecasting*

Algorithm 2 is proposed to improve the ARIMA accuracy by utilizing clustering results as external regressors to forecast battery health. ARIMA models are the integration of Auto-regressive (AR) models and Moving Average models. ARIMA models are good for forecasting stationary time-series data [31]. Input sets are either *DA* or *DM*. Extracting a battery element from the set, *vj*, makes a new set *DC*. Extracting another element from *DA* from the remaining elements results in *R*, where *R* is the set of predictors used to forecast the battery element in *DC*. Then, fit an ARIMA model with *R* predictors to forecast *DC*. *AF* is the battery's forecasted voltage values.


#### **4. Software Implementation**

#### *4.1. Cluster Consistency Detection*

Import the time-series data transformed into CSV format in the data preprocessing step for R programming. Dtwclust package is used for time series clustering in R. For clustering batteries, data frame should be converted into a matrix by (as.matrix) function. Visualize the clustering results using Plot function. Repeat this process every month until an inconsistent cluster is detected and then perform clustered ARIMA forecasting (see Section 4.2). An overview of the clustering inconsistency detection procedure is shown in Figure 3.

**Figure 3.** Battery cluster inconsistency and battery degradation forecast method.

#### *4.2. Implementing Clustered ARIMA Forecasting*

The objective of this procedure is to improve the forecasting accuracy of ARIMA model by utilizing cluster members as an external regressor. An overview of the method is shown in Figure 3. Import "Forecast" package in R. Select a battery from the inconsistent cluster to forecast. Perform ACF (Auto Correlation Function), PACF (Partial Auto Correlation Function), and Dickey-Fuller test to check the data stationarity. Use auto.ARIMA function to build the fitting model for the selected battery. Select cluster predictors for "Xreg" function in the fitting model; if the cluster contains only one battery, then "Xreg" function is not required. Use the "forecast" function to forecast the battery voltage. If the declining trend is shown, the cluster is degrading, and if the trend is stable, then the battery will be stable in the future, as well.

#### **5. Result and Discussion**

#### *5.1. Data Center Battery Setup*

Forty VRLA batteries were installed in a room, with 20 batteries in each rack with an average voltage level between 13 and 14 V. Voltage data was collected in the BMS of the data center. There were four discharge cycles and three power surges during one year of battery life in the data center, as shown in Figure 4.

**Figure 4.** One year battery voltages in data center.

#### *5.2. Battery Voltage Time Series Clustering*

Table 2 shows the Silhouette index test values, which were used to select number of clusters when clustering is applied on the batteries (see Figure 5). Figure 6 shows consistent cluster members from the first eight months. Inconsistent cluster is shown in Figure 7 after nine months. Battery 6 is now separated by battery 36 and 39, which was originally in the same cluster from the first month. Implementing DTW clustering and k-shape-based clustering on similar data resulted in different cluster members, which can be seen in Figures 8 and 9.

This change in cluster consistency is an indication of a change in battery voltage behavior. Utilizing this new information as a starting point to predict the battery health from each cluster, an improved accuracy forecasting model is discussed in Section 5.3.


**Table 2.** Silhouette index test for cluster number selection.

**Figure 5.** K-shape-based 1st month clusters.

**Figure 6.** Consistent clusters after eight months.

**Figure 7.** Cluster inconsistency encounter after nine months.

**Figure 8.** Dynamic Time Warping (DTW) clustering 1st month clusters.

**Figure 9.** DTW clustering after nine months.

#### *5.3. ARIMA Forecasting*

The proposed clustered ARIMA approach was evaluated by comparing actual voltage with *CK* predictors (k-shape-based clustered predictors), Single predictors (without clustering),Total predictors (complete data), and *CDTW* predictors (DTW clustered predictors). The metrics used are Root Mean Square Error (RMSE), Mean Average Error (MAE), and Mean Average Percentage Error (MAPE). One battery from each cluster, such as Battery 6, Battery 15, Battery 19, and Battery 36, was selected for demonstration. The cluster inconsistency was detected in the 9th month, thus transforming the data of 9th month for the forecasting model. ACF and PACF for the transformed data are shown in Figure 10. Table 3 shows the augmented Dickey-Fuller test of the selected batteries. Batteries were selected from different clusters, and each battery showed different voltage behavior, which would require a different fitting model for each battery. The forecast package used the (auto.ARIMA) function to automatically select the best-fitted model by comparing with the other models. AIC (Akaike information criterion) and BIC (Bayesian information criterion) are both

penalized-likelihood criteria that were used for fit criteria [32]. Tables 4 and 5 show the AIC and BIC values of the best-fitted model on the batteries for the Total, Single, *CK*, and *CDTW* predictors scenario.

**Table 3.** The augmented Dickey-Fuller.


**Figure 10.** Auto-correlation and partial correlation of the selected battery data.

**Table 4.** Fitted models AIC and BICvalues.


**Table 5.** Fitted models AIC and BIC values with Dynamic Time Warping (DTW) Clustering.


Battery 6 (cluster 2) is a single member in cluster 2, and it has zero external predictor in the cluster at the point of cluster inconsistency detection by k-shape clustering. This makes battery 6 (cluster 2) a special case because *CK* predictor and Single predictor case is equal for battery 6. Prediction results of battery 6 with Single/*CK* predictor have better accuracy than Total predictor. This argument is further verified for Battery 15 (cluster 1) and Battery 36 (cluster 3) with the metrics comparison of the *CK* predictor, Single predictor, and Total predictor in Table 6. Battery 15 (cluster 1), Battery 36 (cluster 2), and Battery 19 (cluster 3) are the chosen batteries from *CDTW* clustering. Table 7 shows the metrics comparison of the *CDTW* predictor, Single predictor, and Total predictor. ARIMA accuracy is improved when implemented with DTW and k-shape-based clustering. Results show that k-shape-based clustered ARIMA model has better accuracy than DTW.


**Table 6.** Auto-Regressive Integrated Moving Average (ARIMA) performance comparison of k-shape-based Clustered predictor (*CK*), Single predictor (S), and Total predictor (T). RMSE = Root Mean Square Error; MAE = Mean Average Error; MAPE = Mean Average Percentage Error.

**Table 7.** ARIMA performance comparison of Dynamic Time Warping (DTW) Clustered predictor (*CDTW*), Single predictor (S), and Total predictor (T) .


Comparison of voltage forecast of Battery 6, Battery 15, Battery 19, and Battery 36 with actual voltage, *CK* predictor, Single predictor, *CDTW* predictor, and Total predictor is shown in Figures 11–14, respectively. Battery 6 is a single member of k-shape-based cluster 2, so it is compared with *CK* predictor, Total predictor, and actual voltage in Figure 11. Battery 19 is the only member of Dynamic Time Warping (DWT) cluster 3, so it is compared with *CDTW* predictor, Total predictor, and actual voltage values in Figure 13. It is evident from Figures 6 and 7 and these figures that the *CK* predictor model is a better fit for the battery voltage data.

**Figure 11.** Comparison of measured and ARIMA forecasted voltage with Clustered (Single) predictor of Battery 6.

**Figure 12.** Comparison of measured and ARIMA forecasted voltage with *CK*, Single, Total, and *CDTW* predictor of Battery 15 from cluster 1.

**Figure 13.** Comparison of measured and ARIMA forecasted voltage with Total predictor of Battery 19.

**Figure 14.** Comparison of measured and ARIMA forecasted voltage with *CK*, Single, Total, and *CDTW* predictor of Battery 36.

#### *5.4. Effectiveness of Clustered ARIMA Approach*

Identifying a battery with a declining voltage is difficult in the data center, as can be seen in Figure 4. Voltage equalization depends on the voltage threshold levels, which is not a better solution for batteries in the data center because it causes false alarms during charge and discharge cycles, and, since the batteries are always on a charging mode, any flaw cannot be observed until it is too late, whereas weak batteries fail when there is a discharge cycle due to power supply failure. As battery 6 failed only in the battery discharging event caused by the power failure, Figure 15 shows that it resumes its voltage status from where it left off when charging recommences. Our proposed clustered ARIMA framework predicts the battery voltage and provides an estimate of battery status in the future with improved accuracy. Similarly, one-year actual resistance values of Battery 6, 15, 19, and 36 verify the predicted results in Figure 16.

**Figure 15.** One-year actual voltage value, voltage drop in Battery 6, as well as stable voltages for Battery 15, 19, and 36, validate the proposed method.

**Figure 16.** One-year actual resistance value, resistance rise in Battery 6, as well as Stable Resistance for Battery 15, 19, and 36, validate the proposed method.

#### **6. Conclusions**

Considering that the prediction model has a significant impact on a forecasting battery's degradation status, in order to improve the ARIMA model forecasting accuracy, a clustered ARIMA forecasting framework was proposed, with the 40 batteries in the data center. Cluster-assisted results can significantly improve the ARIMA forecasting accuracy compared with the Single predictor and Total data predictors. It was observed that the k-shape-based clustering assisted results are more accurate compared to Dynamic Time Warping (DTW) clustering. A few challenges with our data-driven technique implications are the cleaning and preparation of data set, loss of data, and missing values that have to be addressed to apply the proposed method.

**Author Contributions:** S.N.H. designed the algorithm and wrote the manuscript. X.L. helped to correct the paper. Q.Z. supervised and revised the findings of this work. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work received supports from National Key Research and Development Project of China under Grant 2017YFC0704100 and Grant 2016YFB0901900, in part by the National Natural Science Foundation of China under Grant 61425027, Tencent Inc., and in part by the 111 International Collaboration Program of China under Grant BP2018006, and BNRist Program (BNR2019TD01009).

**Conflicts of Interest:** Declare conflicts of interest or state "The authors declare no conflict of interest." Authors must identify and declare any personal circumstances or interest that may be perceived as inappropriately influencing the representation or interpretation of reported research results. Any role of the funders in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript, or in the decision to publish the results must be declared in this section. If there is no role, please state "The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results".

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A**

#### *Appendix A.1*

k-shape clustering is an iterative refinement algorithm to isolate each cluster with keeping the shapes of time-series data. In k-shape, cross-correlation measures are implemented to calculate the centroid of all clusters, and then update the members of each cluster [30], where *CCw*(*x*,*y*) is the cross-correlation sequence between *x* and *y*, and *Ro* is the Rayleigh quotient see Equation (A1).

$$SBD(\vec{x}, \vec{y}) = 1 - \max\_{w} \left( \frac{\mathbb{C}C\_w(\vec{x}, \vec{y})}{\sqrt{R\_o(\vec{x}, \vec{x}) \cdot R\_o(\vec{y}, \vec{y})}} \right) \tag{A1}$$

#### *Appendix A.2*

Several methods have been proposed to cluster time series. All approaches generally modify existing algorithms, either by replacing the default distance measures with a version that is more suitable for comparing time series as shown in Equation (A2). Dynamic Time Warping (DTW) is general and, hence, suitable for almost every domain. A warping path *W* = {*w*1, *w*2, ..., *wk*}, with *k* ≥ *m*, is a contiguous set of matrix elements that defines a mapping between *x* and *y* under several constraints [30].

$$DT\mathcal{W}(\vec{x},\vec{y}) = \min \sqrt{\Sigma\_{i=1}^{k} w\_{i}}.\tag{A2}$$

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
