*Article* **Transmission Line Fault-Cause Identification Based on Hierarchical Multiview Feature Selection**

**Shengchao Jian <sup>1</sup> , Xiangang Peng 1,\*, Haoliang Yuan 1,\*, Chun Sing Lai 1,2,\* and Loi Lei Lai 1,\***


**Abstract:** Fault-cause identification plays a significant role in transmission line maintenance and fault disposal. With the increasing types of monitoring data, i.e., micrometeorology and geographic information, multiview learning can be used to realize the information fusion for better fault-cause identification. To reduce the redundant information of different types of monitoring data, in this paper, a hierarchical multiview feature selection (HMVFS) method is proposed to address the challenge of combining waveform and contextual fault features. To enhance the discriminant ability of the model, an ε-dragging technique is introduced to enlarge the boundary between different classes. To effectively select the useful feature subset, two regularization terms, namely *l*2,1-norm and Frobenius norm penalty, are adopted to conduct the hierarchical feature selection for multiview data. Subsequently, an iterative optimization algorithm is developed to solve our proposed method, and its convergence is theoretically proven. Waveform and contextual features are extracted from yield data and used to evaluate the proposed HMVFS. The experimental results demonstrate the effectiveness of the combined used of fault features and reveal the superior performance and application potential of HMVFS.

**Keywords:** fault-cause identification; transmission line; sparse learning; multiview learning; feature selection

## **1. Introduction**

Transmission lines cover a wide area and work in diverse outdoor environments to achieve long-distance, high-capacity power transmission. In order to maintain stable power supply, high-speed fault diagnosis is indispensable for line maintenance and fault disposal.

Traditional fault diagnosis technologies concerning fault detecting, fault locating, and phase selection are well developed [1,2], while diagnosis on external causes is still underdeveloped. Operation crews attach great importance to fault location for line patrol and manual inspection. However, on-site inspection is labor-intensive and depends on subjective judgment. Moreover, cause identification after inspection is too late for dispatchers to give better instructions according to the external cause, such as forced energization. Fault-cause identification is expected to help dispatch and maintenance personnel make a proper and speedy fault response.

Transmission line faults are more often triggered by external factors due to environmental change or surrounding activities. Though the cause categories are slightly different between regions or institutions, the common causes can be listed as lighting, tree, animal contact, fire, icing, pollution and external damage [3]. Considering complexity and variability of open-air work, it is hard to model fault scenarios for diverse root causes [4,5]. Thus, these existing studies on line fault-cause identification have been developed based on data-driven methods rather than physical modeling.

**Citation:** Jian, S.; Peng, X.; Yuan, H.; Lai, C.S.; Lai, L.L. Transmission Line Fault-Cause Identification Based on Hierarchical Multiview Feature Selection. *Appl. Sci.* **2021**, *11*, 7804. https://doi.org/10.3390/app11177804

Academic Editor: Jordi Cusido

Received: 1 August 2021 Accepted: 21 August 2021 Published: 25 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The early identification methods were rule-based, such as statistical analysis, CN2 rule induction [6] and fuzzy inference system (FIS) [7–9]. Their identification frameworks are finally presented in the form of logic flow, demanding a great degree of robustness and generality for their rules or thresholds. In recent years, various machine learning (ML) techniques that attach great importance to hand-crafted features have been applied to diagnose external causes [10–14], such as logistic regression (LR), artificial neural network (ANN), k-nearest neighbor (KNN) and support vector machine (SVM). Deep learning (DL) provides a more efficient way in the field of fault identification. In [15], deep belief network (DBN) is used as the classification algorithm after extracting time–frequency characteristics from traveling wave data. Even when using DL methods, feature engineering is still an inevitable part to achieve high accuracy.

Feature signature study provides knowledge about fault information and plays a critical role in fault-cause identification. On the one hand, when fault events happen, power quality monitors (PQMs) enable us to have easy access to electrical signals and time stamps [16]. Time-domain features extracted from fault waveform and time stamp were used to construct logic flow to classify lightning-, animal- and tree-induced faults [6]. To exploit transient characteristics in the frequency domain, signal processing techniques such as wavelet transform (WT) and empirical mode decomposition (EMD) are used for further waveform characteristic analysis [17–20]. In [21], a fault waveform was characterized based on the time and frequency domain to develop an identification logic. However, a fault waveform is easily affected by the system operation state, and there is no direct connection between these characteristics and external causes. On the other hand, weather condition is directly relevant to many fault-cause categories such as lightning, icing and wind. With the development of monitoring equipment and communication technology, dispatchers now can make judgments with more and more outdoor information [22]. These nonwaveform characteristics such as time stamps, environment attributes and other textual data are called contextual characteristics in this paper. Table 1 lists and compares the characterization and classification methods in existing works.


Articles with \* concern faults on distribution network but their work is still inspiring for transmission network.

Studies have shown that waveform and contextual features can achieve high accuracy without each other, but there are high data requirements. For economic and operational reasons, data condition will not change significantly in the short term. It is necessary to study performance improvement for fault-cause identification based on current data conditions. One of the challenges is determining how to combine waveform features and multisource contextual features. This is an information fusion problem, and the simplest approach is feature concatenation. The authors of [23] tried to combine contextual features and waveform features as a mixed vector, but concatenated features reduce performance. Moreover, in contrast to focusing on either side, a few studies use both waveform and contextual characteristics for higher classification performance.

To tackle the fusion challenge, multiview learning (MVL) is introduced in this paper because waveform and contextual features describe the same fault event in different views. MVL aims to integrate multiple-view data properly and overcome biases between multiple views to obtain satisfactory performance. One of typical MVL methods is canonical correlation analysis (CCA), which maps multiview features into a common feature space [24]. Instead of mapping features, multiview feature selection that selects features from each view is preferred in fault-cause identification. Unlike traditional feature selection, multiview feature selection treats multiview data as inherently related and ensures that complementary information from other views is exploited [25,26]. In [27], a review on real-time power system data analytics with wavelet transform is given. The use of discrete wavelet transform was used to identify the high impedance fault and heavy load conditions [28]. The authors of [29] propose a fault diagnosis approach for the main drive chain in a wind turbine based on data fusion. To deal with the kind of multivariable fault diagnosis problem for which input variables need to be adjusted for different typical faults, the deep autoencoder model is adopted for the fault diagnosis model training for different typical fault types.

In this paper, we propose a hierarchical multiview feature selection (HMVFS) method for transmission line fault-cause identification. Two view datasets are composed of the waveform features and the contextual features. Our proposed HMVFS is applied to conduct the feature selection for the optimal feature combination. In our model, to enhance the discriminant ability of regression, an ε-dragging technology is used to enlarge the margin between classes. Next, two regularization terms, namely *l*2,1-norm and Frobenius norm (*F*-norm) penalty, are adopted to perform the hierarchical feature selection. Here, the *l*2,1-norm realizes the row sparsity to reduce the unimportant features of each view and the *F*-norm realizes the view-level sparsity to reduce the diversity between these two-view data. Hence, these two penalties can be viewed as low-level and high-level feature selection, respectively. At last, the fault-cause identification is carried out using ML classifiers and integrated features. The contributions of this paper are highlighted as follows:


The rest of this paper is organized as follows: Section 2 presents the proposed HMVFS algorithm and its convergence analysis. Section 3 outlines the real-life line fault dataset and extracts features in terms of waveform and nonwaveform. The empirical study is provided and discussed in Section 4. Section 5 presents concluding remarks.

#### **2. Hierarchical Multiview Feature Selection (HMVFS)**

#### *2.1. Notation*

Sparsity-based multiview feature selection can be formulated as an optimization problem and denoted by loss functions and regularization items. Before introducing our formulation, the notation is stated.

Matrices are denoted by boldface uppercase letters, and vectors are denoted by boldface lowercase letters. Given original feature matrix **X** = [**x**1, **x**2, . . . , **x***n*] **<sup>T</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>d</sup>* , each row of which corresponds to a fault instance, *n* is the total number and *d* denotes the size of features. **X** (*v*) <sup>∈</sup> <sup>R</sup>*n*×*<sup>d</sup>* (*v*) and **x** (*v*) *i* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* (*v*) denote a feature matrix and a vector in the

*v*th view. There are two views in this paper; thus, **X** = [**X** (1) ,**X** (2) ]. Suppose there are *c* categories, the label matrix will be represented as **Y** = [**y**<sup>1</sup> , **y**<sup>2</sup> , . . . , **y***<sup>n</sup>* ] **<sup>T</sup>** <sup>∈</sup> {0, 1} *n*×*c* . Weight matrix **W** can be derived as **W** = [**W**(1) ,**W**(2) ] **T** = [**w**1, **w**2, . . . , **w***<sup>d</sup>* ] **<sup>T</sup>** <sup>∈</sup> <sup>R</sup>*d*×*<sup>c</sup>* .

#### *2.2. The Objective Function*

Given the notation defined and a fault dataset (**X**, **Y**), the problem of HMVFS is transformed into determining weight matrix **W** and then ranking features for selection. We formulate the optimization problem as

$$\min\_{\mathbf{W}, \mathbf{M}} \mathbf{Y}(\mathbf{W}, \mathbf{M}) + \alpha \boldsymbol{\Phi}(\mathbf{W}) + \beta \boldsymbol{\Omega}(\mathbf{W}) = \min\_{\mathbf{W}, \mathbf{M}} \|\mathbf{XW} - \mathbf{Y} - \mathbf{B} \otimes \mathbf{M}\|\_F^2 + \alpha \|\mathbf{W}\|\_{2, 1} + \beta \sum\_{v=1}^{m} \|\mathbf{W}\_{v}\|\_{F\_{v}} \tag{1}$$

where *m* is the view number; *m* = 2 in this paper.

In this formulation, **Ψ**(**W, M**) is the loss function that measures the calculation distance to achieve minimum regression error, which is derived from the least square loss function. Furthermore, the ε-dragging is introduced to drag binary outputs in **Y** away along two opposite directions. The outputs for positive digits will become 1 + ε<sup>i</sup> and the outputs for negative digits will be −ε<sup>i</sup> , in which all of the εs are nonnegative. The treatment that enlarges the distance between data points from different classes helps to develop a compact optimization model for classification [30]. **B** ∈ {−1, 1} *n*×*c* in the formulation is a constant matrix, and its element *Bij* is defined as

$$B\_{ij} = \begin{cases} +1, & \text{Y}\_{ij} = 1 \\ -1, & \text{Y}\_{ij} = 0. \end{cases} \tag{2}$$

*<sup>B</sup>ij* denotes the dragging direction for elements in label matrix **<sup>Y</sup>**. **<sup>M</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>c</sup>* is a nonnegative matrix that records all εs. The operator ⊗ is the Hadamard product operator of matrices. Thus, B ⊗ M represents the dragging distance, and we have a new label matrix after the ε-dragging:

$$\mathbf{Y} = \mathbf{Y} + \mathbf{B} \otimes \mathbf{M}.\tag{3}$$

With the least square loss function defined as

$$\|\Psi(\mathbf{W}) = \|\mathbf{X}\mathbf{W} - \mathbf{Y}\|\_{\mathbb{F}'}^2 \tag{4}$$

we can attain our loss function **Ψ**(**W**,**B**,**M**).

$$\Psi(\mathbf{W}, \mathbf{B}, \mathbf{M}) = \|\mathbf{XW} - \mathbf{Y} - \mathbf{B} \otimes \mathbf{M}\|\_F^2. \tag{5}$$

Next, regularization items used in the formulation are *l2,1*-norm and *F*-norm, and we take row-wise feature selection and view-wise feature selection into account.

$$\Phi(\mathbf{W}) = \|\mathbf{W}\|\_{2,1} = \sum\_{i=1}^{d} \sqrt{\sum\_{j=1}^{c} w\_{ij}^{2}}.\tag{6}$$

$$\Omega(\mathbf{W}) \;= \sum\_{\upsilon=1}^{m} \|\mathbf{W}\_{\upsilon}\|\_{F} = \sum\_{\upsilon=1}^{m} \sqrt{\sum\_{i=1}^{d} \sum\_{j=1}^{c} w\_{ij}^{2}} . \tag{7}$$

*l*2,1-norm measures the distance of features as a whole and forces the weights of unimportant features to be assigned small values so that it can perform feature selection among all features. Similarly, *F*-norm measuring the distance between views forces the weights of unimportant views to be assigned small values [31]. The weight matrix **W** is regulated by these penalty terms, and hierarchical feature selection is completed with row-wise and view-wise selection. *l*2,1-norm penalty corresponds to the low-level feature selection, and *F*-norm penalty corresponds to the high-level feature selection.

Therefore, the objective function of the HMVFS model is obtained and represented as (1). *α* and *β* are nonnegative constants that tune hierarchical feature selection. This model is also available with more than two views.

#### *2.3. Optimization*

In order to solve *l*2,1-norm minimization and *F*-norm minimization problems, the regularization terms k**W**k2,1 and *m* ∑ *v*=1 k**W***v*k*<sup>F</sup>* need to be respectively relaxed by *Tr*(**WTCW**) and *Tr*(**WTDW**) [32]. The objective function is rewritten as

$$\begin{aligned} &\min\_{\mathbf{W},\mathbf{M},\mathbf{C},\mathbf{D}} \left\| \mathbf{XW} - \mathbf{Y} - \mathbf{B} \otimes \mathbf{M} \right\|\_{F}^{2} + a \operatorname{Tr}(\mathbf{W}^{\mathsf{T}} \mathbf{CW}) + \beta \operatorname{Tr}(\mathbf{W}^{\mathsf{T}} \mathbf{D} \mathbf{W}), \\ &\text{s.t.} \quad \mathsf{C}\_{\mathrm{il}} = \frac{1}{2 \|\mathbf{w}\_{\mathrm{i}}\|\_{2}} D\_{\overline{j}\overline{j}} = \frac{1}{2 \|\mathbf{w}\_{\mathrm{v}}\|\_{F}} \end{aligned} \tag{8}$$

where **<sup>C</sup>** <sup>∈</sup> <sup>R</sup>*d*×*<sup>d</sup>* and **<sup>D</sup>** <sup>∈</sup> <sup>R</sup>*d*×*<sup>d</sup>* are diagonal matrices and derived from **<sup>W</sup>**. For *<sup>D</sup>ii*, **<sup>w</sup>***<sup>i</sup>* is the row vector of **W***v*.

Though two more variables are introduced, we obtain a convex function, and we can solve the optimization problem iteratively. In each iteration, we update one variable while others are fixed, and all variables can be optimized in order. In view of **C** and **D** derived from **W**, we fix **M** and update **W** at first. The derivative of (8) w.r.t. **W** is calculated as

$$2\mathbf{X}^{\mathsf{T}}(\mathbf{XW} - \mathbf{Y} - \mathbf{B} \otimes \mathbf{M}) + 2\alpha \mathbf{C}\mathbf{W} + 2\beta \mathbf{D}\mathbf{W}.\tag{9}$$

Let (9) equal zero, then the updated **W** can be obtained by solving the equation. If there are big-size data or high-dimensional data, the gradient descent method is recommended. Following that, **C** and **D** can be updated.

When it turns to **M**, the optimization problem can be transformed from (8) to (10).

$$\begin{array}{ll}\underset{\mathbf{M}}{\text{min}} \|\mathbf{Z} - \mathbf{B} \otimes \mathbf{M}\|\|\_{F'}^2\\ \text{s.t.} \; \mathbf{Z} = \mathbf{X}\mathbf{W} - \mathbf{Y}.\end{array} \tag{10}$$

According to the definition of *F*-norm, this problem can be decoupled into *n* × *c* subproblems [30] and represented as

$$\min\_{\mathcal{M}\_{ij}} \left( Z\_{ij} - B\_{ij} \mathcal{M}\_{ij} \right)^2. \tag{11}$$

With *Bij* <sup>2</sup> = 1, (11) is equivalent to (12).

$$\min\_{\mathcal{M}\_{\vec{i}\vec{j}}} \left( Z\_{\vec{i}\vec{j}} B\_{\vec{i}\vec{j}} - M\_{\vec{i}\vec{j}} \right)^{2}. \tag{12}$$

With the nonnegative constraint, *Mij* is calculated as

$$M\_{\rm ij} = \max(Z\_{\rm ij} B\_{\rm ij}, 0). \tag{13}$$

Accordingly, **M** can be updated as

$$\mathbf{M} = \max(\mathbf{Z} \otimes \mathbf{B}, 0). \tag{14}$$

Up to now, all variables are updated in the iteration and we present the optimization process in Algorithm 1.

After optimization, we obtain weight matrix **W** learned across all views and then sort all features according to their importance. The importance is measured by the *l*2-norm value of each row vector of **W**, k**w***i*k<sup>2</sup> (*i* = 1, 2, . . . , *d*). Feature selection can be completed with features ranked in descending order.

#### *2.4. Convergence*

In this subsection, we analyze the convergence of Algorithm 1. We need to guarantee the objective function decreases in each iteration of the optimization algorithm. The following lemma is used to verify its convergence.

**Lemma 1.** *For any nonzero values a*, *b* ∈ R, *the following inequality holds:*

$$2ab \le \left(a^2 + b^2\right) \Rightarrow a - \frac{a^2}{2b} \le b - \frac{b^2}{2b}.\tag{15}$$

**Theorem 1.** *The objective Function (1) monotonically decreases in the iteration of Algorithm 1.*

**Proof.** According to Step 6 and Step 7 in Algorithm 1, we have **W***t+*<sup>1</sup> and **M***t+*<sup>1</sup> as follows:

$$\mathbf{W}\_{t+1} \leftarrow \min\_{\mathbf{W}} \left\| \mathbf{XW}\_{t} - \mathbf{Y} - \mathbf{B} \otimes \mathbf{M}\_{l} \right\|\_{F}^{2} + a \operatorname{Tr} (\mathbf{W}\_{l}^{\mathbf{T}} \mathbf{C}\_{l} \mathbf{W}\_{l}) + \beta \operatorname{Tr} (\mathbf{W}\_{l}^{\mathbf{T}} \mathbf{D}\_{l} \mathbf{W}\_{l}),\tag{16}$$

$$\mathbf{M}\_{t+1} \leftarrow \min\_{\mathbf{M}} \left\| \mathbf{XW}\_{t+1} - \mathbf{Y} - \mathbf{B} \otimes \mathbf{M}\_{t} \right\|\_{F}^{2}. \tag{17}$$

Firstly, according to (16) and (17), there is

k**XW***t*+<sup>1</sup> − **Y** − **B** ⊗ **M***t*+1k 2 *<sup>F</sup>* + *<sup>α</sup>Tr*(**W<sup>T</sup>** *<sup>t</sup>*+1**C***t***W***t*+1) + *<sup>β</sup>Tr*(**W<sup>T</sup>** *<sup>t</sup>*+1**D***t***W***t*+1) ≤ k**XW***t*+<sup>1</sup> − **Y** − **B** ⊗ **M***t*k 2 *<sup>F</sup>* + *<sup>α</sup>Tr*(**W<sup>T</sup>** *<sup>t</sup>*+1**C***t***W***t*+1) + *<sup>β</sup>Tr*(**W<sup>T</sup>** *<sup>t</sup>*+1**D***t***W***t*+1) ≤ k**XW***<sup>t</sup>* − **Y** − **B** ⊗ **M***t*k 2 *<sup>F</sup>* + *<sup>α</sup>Tr*(**W<sup>T</sup>** *<sup>t</sup>* **C***t***W***t*) + *βTr*(**W<sup>T</sup>** *<sup>t</sup>* **D***t***W***t*). (18)

Thus, according to the definition of **C**, we have

$$\begin{split} aTr(\mathbf{W}\_{l+1}^{\mathbf{T}}\mathbf{C}\_{l}\mathbf{W}\_{l+1}) &= \sum\_{i=1}^{d} \frac{\left\lVert\left(\mathbf{w}\_{l}\right)\_{l+1}\right\rVert\_{2}^{2}}{2\left\lVert\left(\mathbf{w}\_{l}\right)\_{l}\right\rVert\_{2}}\\ a &= a\sum\_{i=1}^{d} \left\lVert\left(\mathbf{w}\_{l}\right)\_{l+1}\right\rVert\_{2} - a\langle\sum\_{i=1}^{d} \left\lVert\left(\mathbf{w}\_{l}\right)\_{l+1}\right\rVert\_{2} - \sum\_{i=1}^{d} \frac{\left\lVert\left(\mathbf{w}\_{l}\right)\_{l+1}\right\rVert\_{2}^{2}}{2\left\lVert\left(\mathbf{w}\_{l}\right)\_{l}\right\rVert\_{2}} \end{split} \tag{19}$$

We also perform the same transformation with *Tr*(**W<sup>T</sup>** *<sup>t</sup>*+1**D***t***W***t*+1), *Tr*(**W<sup>T</sup>** *<sup>t</sup>* **C***t***W***t*) and *Tr*(**W<sup>T</sup>** *<sup>t</sup>* **D***t***W***t*). We can rewrite (18) as

$$\begin{split} & \mathbb{P}(\mathbf{W}\_{l+1},\mathbf{M}\_{l+1}) + a\Phi(\mathbf{W}\_{l+1}) + \beta\Omega(\mathbf{W}\_{l+1}) - a\left(\sum\_{i=1}^{d} \left\|(\mathbf{w}\_{i})\_{l+1}\right\|\_{2} - \sum\_{i=1}^{d} \frac{\left\|(\mathbf{w}\_{i})\_{l+1}\right\|\_{2}^{2}}{2\left\|(\mathbf{w}\_{i})\_{l}\right\|\_{2}}\right) - \beta\left(\sum\_{v}^{\text{m}} \left\|(\mathbf{W}\_{v})\_{l+1}\right\|\_{F} - \sum\_{v}^{\text{m}} \frac{\left\|(\mathbf{W}\_{l})\_{l+1}\right\|\_{F}^{2}}{2\left\|(\mathbf{W}\_{l})\_{l}\right\|\_{F}}\right) \\ & \leq \mathbb{P}(\mathbf{W}\_{l},\mathbf{M}\_{l}) + a\Phi(\mathbf{W}\_{l}) + \beta\Omega(\mathbf{W}\_{l}) - a\left(\sum\_{i=1}^{d} \left\|(\mathbf{w}\_{i})\_{l}\right\|\_{2} - \sum\_{i=1}^{d} \frac{\left\|(\mathbf{w}\_{i})\_{l}\right\|\_{2}^{2}}{2\left\|(\mathbf{W}\_{l})\_{l}\right\|\_{2}}\right) - \beta\left(\sum\_{v}^{\text{m}} \left\|(\mathbf{W}\_{l})\_{l}\right\|\_{F} - \sum\_{v}^{\text{m}} \frac{\left\|(\mathbf{W}\_{l})\_{l}\right\|\_{F}^{2}}{2\left\|(\mathbf{W}\_{l})\_{l}\right\|\_{F}}\right). \end{split} \tag{20}$$

According to Lemma 1, we arrive at

$$
\Psi(\mathbf{W}\_{t+1}, \mathbf{M}\_{t+1}) + a\Phi(\mathbf{W}\_{t+1}) + \beta\Omega(\mathbf{W}\_{t+1}) \le \Psi(\mathbf{W}\_{t\prime}\mathbf{M}\_{t}) + a\Phi(\mathbf{W}\_{t}) + \beta\Omega(\mathbf{W}\_{t}).\tag{21}
$$

Thus, Algorithm 1 decreases the optimization problem in (1) for each iteration so (1) will converge to its global optimum according to its convexity.

```
Algorithm 1 The optimization algorithm for (8)
```
**Input:** The feature matrix across all views, **<sup>X</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>d</sup>* ; the label matrix, **Y** ∈ {0, 1} *n*×*c* ; the parameters *α* and *β* **Output:** The weight matrix across all views, **<sup>W</sup>** <sup>∈</sup> <sup>R</sup>*d*×*<sup>c</sup>* 1: Calculate **B** from **Y** via (2) 2: Initialize **W**<sup>0</sup> and **M**<sup>0</sup> 3: Initialize *t* = 0 4: **Repeat** 5: Calculate **C***<sup>t</sup>* and **D***<sup>t</sup>* from **W***<sup>t</sup>* 6: **W***t*+1= (**X <sup>T</sup>X** + *α***C***<sup>t</sup>* + *β***D***t*) −1 (**X <sup>T</sup>Y** + **X <sup>T</sup><sup>B</sup>** <sup>⊗</sup> **<sup>M</sup>***t*) 7: **M***t*+<sup>1</sup> = max((**XW***t*+<sup>1</sup> − **Y**) ⊗ **B**) 8: *t* = *t* + 1 9: Calculate residue via (1) 10: **Until** convergence or maximum iteration number achieved

#### **3. Material and Characterization**

*3.1. Data Collection and Cleaning*

In this study, the fault data were collected from an AC transmission network located in a coastal populous city in Guangdong Province, China. These faults occurred between 2016 and 2019, and the voltage levels varied from 110 to 500 kV. Fault signals were recorded by digital fault recorders (DFRs) installed on substations. The DFR equipment involves PMUs and computer systems to synchronize, store and display analog data for voltage and current signals. These signals can be remotely accessed through a communication network and provide offline data stored in common format for transient data exchange (COMTRADE). The sampling rate is 5 kHz in the dataset. Environmental information and other associated monitoring data were obtained through the inner maintenance system. A patrol report of manual inspection was attached to each fault, describing the inspection result and labeling its cause. The original dataset comprised 551 samples, and 288 of them remained after cleansing. The distribution of fault-cause categories is shown in Figure 1. Lightning, external force and object contact are the three dominant causes. External force refers to collision or damage due to human activity. Object contact is usually caused by floating objects in the air. These are typical causes in a densely populated city, causing more than 90% of known faults.

**Figure 1.** Distribution of transmission line fault cause after cleansing.

#### *3.2. Waveform Characteristics*

It is believed that the disturbance variation of electrical quantity after faults occurring contains important transient information for fault diagnosis [33]. The original waveform data are recorded in COMTRADE files with the sampling frequency of 5 kHz. The first step is to acquire fault segments and extract valid waveform segments without disturbance caused by tripping. In this paper, the beginning of valid segments is determined by inspection thresholds based on root mean squared (rms) current magnitude. *dI* is the difference between consecutive values.

$$dI \ge 0.15 \text{ pu} \text{ or } I \ge 1.2 \text{ pu}.\tag{22}$$

The start thresholds are determined by inspection to make sure that fault measurements in this study are correctly captured. Since COMTRADE stores not only electrical

signals in analog channels but also tripping information in digital channels, one and a half cycles after tripping enabling signal is regarded as the end of the segment. In characterization, we extend previous research work on waveform characterization. The following waveform features are considered and extracted.

1. Maximum Change of Sequence Components: Instantaneous magnitude is calculated relative to prefault amplitude in order to be compatible with measurements from different voltage levels and operation conditions. Karenbauer transformation is used to obtain zero, positive and negative components of three-phase signals, denoted by *s*, *s* = 0, 1, 2.

$$V\_{\rm s(max)} = \frac{\max\left(V\_{\rm s(fault)}\right)}{V\_{\rm s(pre-fault)}}, \mathbf{s} = \mathbf{0}, \mathbf{2}.\tag{23}$$

$$I\_{\rm s(max)} = \frac{\max\left(I\_{\rm s(fault)}\right)}{I\_{\rm s(pre-fault)}}, s = 0, 1, 2. \tag{24}$$

2. Maximum Rate of Change of Sequence Components:

$$
\Delta V\_{\rm s(max)} = \frac{\max(|\Delta V\_{\rm s}|)}{V\_{\rm s(pre-fault)}}, s = 0, 1, 2. \tag{25}
$$

$$
\Delta \mathcal{I}\_{\rm s(max)} = \frac{\max(|\Delta I\_{\rm s}|)}{I\_{\rm s(pre-fault)}}, s = 0, 1, 2. \tag{26}
$$

3. Sequence Component Values at *t*-cycle: *t* is set to be 0, 0.5, 1 and 1.5. For instance, *t* = 0.5 means the measuring point is 1/2 cycle from the start.

$$V\_{s(t)} = \frac{V\_{s(t)}}{V\_{s(pre-fault)}}, s = 0, 1, 2. \tag{27}$$

$$\mathbf{I}\_{\mathbf{s}(t)} = \frac{\mathbf{I}\_{\mathbf{s}(t)}}{I\_{\mathbf{s}(pre-fault)}}, \mathbf{s} = \mathbf{0}, \mathbf{1}, \mathbf{2}. \tag{28}$$


$$p\_{\dot{j}} = \frac{E\_{\dot{j}}}{\sum\_{\dot{j}} E\_{\dot{j}}} = \frac{\sum \left| \mathbb{C}\_{\dot{j}} \right|^{2}}{\sum\_{\dot{j}} \sum \left| \mathbb{C}\_{\dot{j}} \right|^{2}}, \mathbb{S}\_{\dot{j}} = -p\_{\dot{j}} \log\_{2}(p\_{\dot{j}}).\tag{29}$$

where *C<sup>j</sup>* , *E<sup>j</sup>* , *p<sup>j</sup>* denote wavelet coefficient, wavelet energy and relative energy in scale *j*, *j* = 1, 2, 3.

7. Maximum DC Current: Equation (30) is used to calculate the maximum DC current on three-phase signals. *N<sup>s</sup>* is the number of data points in one cycle, and *n* = 0 means the triggering point.

$$I\_{dc(\max)} = \max(I\_{dc\_\text{-}a'}I\_{dc\_\text{-}b'}I\_{dc\_\text{-}c}), I\_{dc} = \frac{\sum\_{n=1}^{N\_s} i\_n - \sum\_{n \le -N\_s}^{N\_s} i\_n}{\max(I\_{pre-fault})}.\tag{30}$$

8. Time Domain Factors: Form factor, crest factor, skewness and kurtosis, denoted as *t*1–*t*4, respectively, are introduced to reflect characteristics of waveform shape and the shock for fault-phase current signals. *SD* denotes their standard deviation.

$$t\_1 = \frac{\sqrt{\frac{1}{N\_s} \sum\_{n=1}^{N\_s} \left(i\_n\right)^2}}{\frac{1}{N\_s} \sum\_{n=1}^{N\_s} \left|i\_n\right|}, t\_2 = \frac{\max\left(i\_n\right)}{\frac{1}{N\_s} \sum\_{n=1}^{N\_s} \left|i\_n\right|}, t\_3 = \frac{\frac{\sum\_{s}^{N\_s}}{n} \left(i\_n - \overline{i}\right)^3}{SD^3 N\_s}, t\_4 = \frac{\sum\_{n=1}^{N\_s} \left(i\_n - \overline{i}\right)^4}{SD^4 N\_s} \tag{31}$$

9. Approximation Constants *δ* for Neural Waveform: In order to learn more from the front wave, the waveform of rms neutral voltage/current is approximated by (32), as introduced in [33].

$$f(t, \delta) = 1 - e^{-\delta t},\tag{32}$$

where *t* is time step and *δ* is the approximation constant. Equation (32) estimates the closest value with regard to the actual waveform in per unit value.

10. Fault Inception Phase Angle (FIPA): FIPA is calculated based on the trigger time after the last zero crossing point prior to fault happening.

All waveform features are listed in Table 2. Faulted phase features are included in the next subsection.


**Table 2.** Feature pools.

*3.3. Contextual Characteristics*

Most monitoring technologies are developed for specified causes and work independently with interconnected data. In this study, due to data restriction, available nonwaveform data include time stamps, meteorological data, geographical data, protection data and query information. These informative values are preprocessed and integrated into the pool

of candidate contextual features, as shown in Table 2. Considering that there is no accurate discretization standard, we only discretize text data roughly if necessary. The time stamp information is discretized twice based on season and day/night as a contrast of months and daytime. As for dynamic records such as meteorological value, the records closest to the fault time are retained. Protection data are feedback information of protection devices after fault, usually obtained from the production management system. Although these collected data are related to fault events, they are not suitable for fault cause identification. These irrelevant features pose a great challenge in feature selection.

#### **4. Experiments and Discussion**

#### *4.1. Experiment Setup*

To validate the effectiveness and efficiency of HMVFS, we conducted comparison experiments using the mentioned field data previously. Three strategies for utilizing multiview data with feature selection were considered, namely single-view learning, feature concatenation after selection and feature selection after concatenation. The last two are the simplest early fusion methods. Single-view learning is represented via best single view (BSV) method, through which the most informative view achieves the best performance among views. As for the dataset in this paper, contextual features are more representative than hand-crafted waveform features. Feature concatenation after selection (FSFC) employs a feature selection technique separately and concatenates features selected from different views. Feature selection after concatenation (FCFS) concatenates original feature sets of two views and then performs feature selection. Adaptable feature selection methods listed in the next subsection are applied to select discriminative features.

The fault dataset was split into training data and testing data in a stratified fashion according to the ratio of 3:1. All samples were normalized by standard deviation after zero-mean standardization. Then, feature selection methods were used to seek the optimal feature combination using training sets and transform all samples for fault-cause classification. ML classifiers were utilized to finish the classification. In the presence of imbalanced data, criteria such as G-mean and accuracy were used to quantitatively assess classification performance. Since G-mean is a metric within biclass concepts, its microaverage was computed and adopted. The final results of each metric were calculated as the average of the 5 trials.

#### *4.2. Comparison Feature Algorithms*

As reviewed in [34], there are many feature selection methods. We conducted comparison experiments between our MVFS and several typical feature selection algorithms, namely Fisher score (F-Score), mutual information (MI), joint mutual information (JMI), joint mutual information maximization (JMIM), ReliefF, Hilbert–Schmidt independence criterion lasso (HSIC Lasso) [35] and recursive feature elimination (RFE). F-Score ranks features through variance similarity calculation, and the same rank can be obtained by analysis of variance (ANOVA). MI ranks features according to values of their mutual information with class labels. JMI and JMIM are developed from MI [36]. RFE ranks and discards features after training a certain kind of classifier. Starting from all features, the elimination process continues until the feature number or output error is settled to a minimum.

The above algorithms are developed for single-view learning and can be used in BSV, FCFS and FSFC directly. Except for RFE, all of them are filter feature selection approaches, as is HMVFS. Besides, the comparison algorithms designed for multiview learning are kernel canonical correlation analysis (KCCA) [24] and discriminant correlation analysis (DCA) [37]. These feature extraction approaches map multiview data into a common feature space so their results are attached to the comparison in FCFS. As for the proposed algorithm, there are two hyperparameters in HMVFS. In the experiments, these hyperparameters α and β were tuned ranging in {10−<sup>2</sup> , 10−<sup>1</sup> , 1, 10, 10<sup>2</sup> , 10<sup>3</sup> } through grid search on the training

sets. Moreover, experiments without any feature algorithm were conducted using BSV features and all features, tabbed as RAW\_BSV and RAW.

#### *4.3. Overall Classification Performance*

In this subsection, we compare the mentioned dimension reduction approach on the basis of SVM to verify the effectiveness of multiview learning and HMVFS. Two concatenating rules were applied to FSFC. The first rule tries to keep 1:1 proportion of waveform and contextual features. There is one more contextual feature when the total number is odd. The second rule holds the same proportion of waveform and contextual features as that in HMVFS.

The results in terms of Gmean with different numbers of selected features are shown in Figure 2. By comparing single-view feature selection methods among strategies, we notice that most of them perform best in BSV rather than in FSFC and FCFS. Added fault features from the other view will even degrade their classification, and this indicates that simple concatenation cannot help conventional feature selection methods adapt to multiview classification. A similar conclusion is drawn in [23]. Thus, the introduction of MVL appears vital in particular. HMVFS has comprehensive advantages in the comparison of FSFC and FCFS and achieves the best performance compared with methods in BSV. HMVFS outperforms others in the middle of feature increasing, and its result with 14 selected features is the global or near-global optimum. When features from the other view increase, the performance is degraded to a certain extent, and then it rises to another peak. Most methods in BSV produce a zigzag rise curve and reach their best when almost all view features are selected. They are also inferior to HMVFS in FSFC and FCFS. ReliefF is the best competitor that achieves acceptable performance in different strategies. As for KCCA and DCA, their performance is low. Figure 2 illustrates that HMVFS is more capable of obtaining the best performance combining waveform and contextual features.

**Figure 2.** Classification comparison between HMVFS and other feature algorithms in strategies: (**a**) BSV; (**b**) FSFC\_rule1; (**c**) FSFC\_rule2; (**d**) FCFS.

Due to the limit of yield data condition and fault signature study, irrelevant and redundant features are introduced with increasing feature numbers. This problem is more prominent in the waveform view in both theoretical and experimental studies. The advantage of HMVFS is that it selects features with independent and complementary information of all views, while the single-view methods are easily affected by irrelevant features facing concatenated assembly or meeting the limitation of single-view features. As seen from Figure 2, concatenating and mapping fail to select or transform discriminative features with combined waveform and contextual features. There are two local optimums for HMVFS, and they are better than the performance of competitors, which demonstrates that HMVFS overcomes the negative effect of redundant features in multiview data.

#### *4.4. Parameter Sensitivity*

Determination of hyperparameters is an open problem for many algorithms. We conducted parameter sensitivity study by testing different settings of parameters α and β. Since these parameters help HMVFS perform hierarchical feature selection, it is clear that HMVFS will be sensitive to parameter change, and this study may reveal a hierarchical feature relationship. The candidate set was {10−<sup>2</sup> , 10−<sup>1</sup> , 1, 10, 10<sup>2</sup> , 10<sup>3</sup> } for each parameter. Classification performance and average running time are recorded and illustrated in Figure 3.

**Figure 3.** Performance variation of HMVFS with different values for the parameters α and β in terms of (**a**) Gmean; (**b**) ACC; (**c**) time.

It is observed that α = 10 is beneficial to final selection and maintains relatively high classification performance, among which lower β has slight advantages. View importance is different in multiview learning. From the perspective of view importance, when only two views exist and one of them is generally better, acceptable performance can be achieved by one view, and additional features are expected for improvement. High-level feature selection is weak because the other view has relatively more redundant features and will be ignored with higher β. Meanwhile, appropriately higher α enhances low-level feature selection to exploit the most representative features from the unimportant view. Moreover, acceptable performance is achieved with α = 10−<sup>2</sup> , β = 10<sup>2</sup> and α = 10−<sup>1</sup> , β = 10<sup>2</sup> . Highlevel selection is enhanced, and low-level selection is restrained, which results in limited performance approximating in single-view learning and short convergence time.

#### *4.5. Comparison between ML Classifiers*

In order to investigate the effect of classifiers and explore better identification accuracy, we employed different ML learners to complete fault-cause classification with HMVFS. Owing to space limitation and performance stability, F\_Score and ReliefF were used for comparison. The typical individual classifiers CN2, LR, KNN, SVM and ANN, which have been proven effective in fault-cause identification studies, were tested, and the results

are presented in this subsection. Ensemble models promote fault-cause identification by combining individual learners [22], so we also explored the performance of various ensemble models, including random forest (RF), AdaBoosting, stacking ensemble and dynamic ensemble. META-DES, DES-Clustering and KNORA-U are dynamic ensemble techniques based on metalearning, clustering and k-nearest neighbors, respectively. Classification models were developed using Python machine learning library, scikit-learn and DESlib. Table 3 presents the best performance for each combination of feature selection methods and classifiers. Considering some data may be similar, AUC is introduced as a supplement criterion, which is derived from receiver operating characteristic (ROC) analysis and calculated as the area under the ROC curve.


**Table 3.** Best performance comparison with different ML classifiers.

As seen from the table, HMVFS outperforms F\_Score and ReliefF except with LR and ANN. It is observed that HMVFS always takes fewer features to achieve the best performance in the remaining comparisons. In the group of RF, the best scores of F\_Score, ReliefF and HMVFS are very close to each other because RF has the ability of variable selection. Thus the features that function in final classification are similar if selected feature subsets are large enough to contain valuable features. Except for mentioned learners, HMVFS has advantages in both score and feature number.

From the perspective of learners, the classification performance improves with the enhancement of model complexity. CN2 as a rule-based learner cannot cope with multiview features to achieve acceptable performance. Individual learners cannot achieve accuracies greater than 0.8, which are apparently inferior to most ensemble models. Among ensemble models, stacking ensemble realizes the best fault-cause identification in this study. The experimental results of ML classifiers indicate that HMVFS is more suitable for classifiers with high generalization and that ensemble models can bring significant improvement for fault-cause identification.

#### **5. Conclusions**

Associated multisource data for transmission line fault-cause diagnosis are divided and extracted as waveform and contextual features in this paper. MVL is introduced to appropriately combine these features for performance improvement. A novel hierarchical multiview feature selection method based on an ε-dragging technique and sparsity regularization is proposed to perform hierarchical feature selection with multiview data. The ε-dragging is applied in the loss function to enlarge sample distance between classes. *l*2,1-norm and *F*-norm conduct row-wise and view-level selection, respectively, which can be viewed as the low-level and high-level feature selection. We also develop the optimization algorithm and prove its convergence theoretically. The proposed HMVFS is evaluated by comparisons on yield data. The results reveal that HMVFS outperforms conventional feature selection methods in single-view and early fusion strategies. The further experiments concerning ML classifiers also demonstrate the superiority and effectiveness of the proposed method with high generalization learners. This study has shown the combined use of waveform and contextual features with HMVFS can cause significant improvement for fault-cause identification. In future work, more multiview data and further fault signature study are needed to refine the feature pools, and the performance of HMVFS is expected to be further improved.

**Author Contributions:** Conceptualization, S.J. and X.P.; methodology, H.Y. and C.S.L.; software, S.J.; experiment, validation and analysis, S.J. and H.Y.; investigation, S.J. and X.P.; resources, X.P.; data curation, S.J.; writing—original draft preparation, S.J. and X.P.; writing—review and editing, S.J., H.Y., C.S.L. and L.L.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China (61903091) and the Science and Technology Project of China Southern Power Grid Company Limited (031800KK52180074).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Hirotaka Takano 1,\*, Naohiro Yoshida <sup>1</sup> , Hiroshi Asano 1,2, Aya Hagishima <sup>3</sup> and Nguyen Duc Tuyen <sup>4</sup>**


**Abstract:** Demand response programs (DRs) can be implemented with less investment costs than those in power plants or facilities and enable us to control power demand. Therefore, they are highly expected as an efficient option for power supply–demand-balancing operations. On the other hand, DRs bring new difficulties on how to evaluate the cooperation of consumers and to decide electricity prices or rebate levels with reflecting its results. This paper presents a theoretical approach that calculates electricity prices and rebate levels in DRs based on the framework of social welfare maximization. In the authors' proposal, the DR-originated changes in the utility functions of power suppliers and consumers are used to set a guide for DR requests. Moreover, optimal electricity prices and rebate levels are defined from the standpoint of minimal burden in DRs. Through numerical simulations and discussion on their results, the validity of the authors' proposal is verified.

**Keywords:** demand response programs; social welfare maximization; utility function; power supply– demand balance; electricity price; rebate level

#### **1. Introduction**

Demand response programs (DRs) are defined as changes in electricity-consuming patterns in response to changes in electricity price or to incentive payment [1]. There are two major categories in DRs: one is the price-based DR, and the other is the incentivebased one. Time of use (TOU), real-time pricing (RTP), and critical-peak pricing (CPP) are well-known as the former. Unit prices of the electric power in these DRs become expensive during the periods of high electricity costs or critical power grid's conditions (peak periods) in comparison with those in off-peak periods. On the other hand, peak time rebate (PTR) and critical peak rebate (CPR) are categorized into the latter. In incentive-based DRs, power suppliers (or power producers, retailers, etc.) reward consumers, who respond to the request of DRs, with money rebates. Since DRs bring controllability in the power demand without huge investment costs in power plants or facilities, they have been attracting attention as one of the most economical and sustainable alternatives to traditional power supply–demand-balancing operations. Therefore, many DR-related studies have been carried out [2–4], as well as demonstrative field tests.

There are various studies contributing to the design of DRs. In [5], states of DR-related activities are analyzed with highlighted deregulation in electricity markets. The authors in [5] defined evaluation indices and discussed effects of DRs in electricity prices. Reference [6] reviews the means by which power suppliers induce their preferable electricity consumption in DRs. Several mathematical problem frameworks and models are summarized, and their solution techniques are introduced. In [7], DRs are classified from

**Citation:** Takano, H.; Yoshida, N.; Asano, H.; Hagishima, A.; Tuyen, N.D. Calculation Method for Electricity Price and Rebate Level in Demand Response Programs. *Appl. Sci.* **2021**, *11*, 6871. https://doi.org/ 10.3390/app11156871

Academic Editor: Chun Sing Lai

Received: 21 June 2021 Accepted: 23 July 2021 Published: 26 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the viewpoints of their control mechanisms, motivations offered to changes in electricity consuming patterns, and decision variables. Besides, several models for optimizing control strategies of the DRs are categorized in association with the application targets. The authors in [8] focused on price-based DRs and evaluated their advantages and disadvantages with experimental results. This reference also includes a review of case study results of the DRs in several countries. Reference [9], similarly, summarizes results of case studies, and its authors analyzed them to discuss preferable price settings in price-based DRs.

Although these references indicate a great deal of useful information in the design of DRs, there are still difficulties on settings of electricity prices or rebate levels while ensuring resources of DRs. In fact, electricity prices or rebate levels in field tests have been decided, relying on knowledge, experience, and experimental results [10–16], and thus, it is difficult to discuss on the appropriateness of their settings. For these reasons, the design of efficient DRs becomes a crucial component for advancing technologies of the power grids' management.

This paper presents a theoretical approach that calculates electricity prices and rebate levels in DRs. To set the basis of discussion, the framework of social welfare maximization (SWM), which has often been used to represent models of electricity markets [6,7,17–21], is applied. First, the authors set the utility functions of the power suppliers and the consumers and represent the power supply–demand-balancing operation under the SWM framework. In the process, contributions of the DRs become measurable as an increment/decrement in the utility functions. As a result, we can treat the DR-originated changes in the power demand as an influential factor in the power supply–demand management. Next, acceptable conditions of electricity prices and rebate levels are derived as a guide for the DR request, and then, their optimal values are defined in consideration of burden on both power suppliers and consumers. Finally, the validity of the authors' proposal is verified through numerical simulations with a model constructed by using the actual record of the electricity consumption.

#### **2. Formulation of Power Supply–Demand-Balancing Operation under SWM**

SWM is formulated as a problem to maximize the weighted sum of utility functions in a society without regarding to how the profit is distributed in each member of the society [22–24]. Since the members of the society are classified into power suppliers (or power producers, etc.) and consumers, the social welfare function is written as:

$$\text{SW} = \sum\_{t=1}^{T} \text{SW}\_{t} = \sum\_{t=1}^{T} \left( \sum\_{i=1}^{\text{NS}} \mathcal{U}\_{1,t}(s\_{i,t}) + \mathfrak{a} \sum\_{j=1}^{\text{NC}} \mathcal{U}\_{2,t}(d\_{j,t}) \right), \tag{1}$$

where *t* is the time slot (*t* = 1, · · · , *T*); *i* is the number assigned to the power suppliers (*i* = 1, · · · , *NS*); *si*,*<sup>t</sup>* is the electric power fed from power supplier *i* and an element of vector *s<sup>t</sup>* ; *U*1,*t*(·) is the utility function of the power suppliers; *j* is the number assigned to the consumers (*j* = 1, · · · , *NC*); *dj*,*<sup>t</sup>* is the power consumption in the consumer *j* and an element of vector *d<sup>t</sup>* ; *U*2,*t*(·) is the utility function of the consumers; *α* is the weighting coefficient.

In the design of DRs, we can regard the power suppliers and the consumers as aggregated ones. Besides, the coefficient *α* in Equation (1) equals to 1, if we align the units of the utility functions, e.g., into the price.

The suppliers' utility is expressed with the sum of the income by selling electricity and the operational costs in the power supply, while the consumers' utility is described with the sum of the satisfaction obtained in exchange for consuming electricity and the electricity costs [6,25]. Their utility functions are represented as:

$$
\mathcal{U}\_{1,t}(s\_t) = p\_t s\_t - E\_t(s\_t), \text{ for } \forall t,\tag{2}
$$

$$\mathcal{U}\_{2,t}(d\_t) = F\_t(d\_t) - p\_t d\_{t\prime} \text{ for } \forall t \,, \tag{3}$$

where *p<sup>t</sup>* is the standard price of electric power; *Et*(·) is the operational cost of the power suppliers; *Ft*(·) is the satisfaction of the power consumers.

Since *pts<sup>t</sup>* , *ptd<sup>t</sup>* , and *Et*(*st*) are expressed with the price, all units in Equation (1) were unified when we evaluated *Ft*(*dt*) in Equation (3) by the price. In this case, the SWM problem can be formulated as:

$$\max\_{s,d} \text{SW}\_{\prime} \tag{4}$$

$$SW\_t = F\_t(d\_t) - E\_t(s\_t) + p\_t(s\_t - d\_t), \text{ for } \forall t,\tag{5}$$

$$\text{s.t.}\,\text{s}\_{t} = d\_{t} \text{ for } \forall t. \tag{6}$$

$$G^{\min} \le s\_t \le G^{\max}, \text{ for } \forall t,\tag{7}$$

where *G* max and *G* min are the maximum and the minimum values of the power supply, respectively.

Equation (6) shows the balance of the power supply and demand, and the constraint Equation (7) restricts the controllability of the power supply, depending on specifications of the target power grid, e.g., the maximum and the minimum outputs of power generation units. Hence, the optimal solution of the formulated SWM problem represents the power supply–demand operation that maximizes the social welfare.

If Equation (5) is a convex function, we can apply Lagrange relaxation [6,26,27], and Lagrange multipliers correspond to shadow prices. Lagrangian function and Karush– Kuhn–Tucker conditions are represented as:

$$\mathcal{L}\_t(s\_{t\prime}d\_{t\prime}\,\lambda\_{t\prime}\,\mu\_{1,t\prime}\,\mu\_{2,t}) = \mathcal{S}\mathcal{W}\_t + \lambda\_t(s\_t - d\_t) + \mu\_{1,t}\left(s\_t - \mathcal{G}^{\min}\right) + \mu\_{2,t}(\mathcal{G}^{\max} - s\_t), \tag{8}$$

$$\begin{cases} -\frac{\partial \mathcal{U}\_{1,t}(s\_t)}{\partial s\_t} + \lambda\_t + \varepsilon\_t - \mu\_t = 0\\ \frac{\partial \mathcal{U}\_{2,t}(d\_t)}{\partial d\_t} - \lambda\_t = 0 \end{cases} \tag{9}$$

where *λ<sup>t</sup>* , *µ*1,*<sup>t</sup>* , and *µ*2,*<sup>t</sup>* are the Lagrange multipliers.

The authors assumed simple utility functions in the numerical simulations of this paper, and thus, we can solve the target problem based on the Newton-Raphson method. Otherwise, any of other techniques, e.g., intelligent optimization algorithms, will be useful for solving SWM problems. For detailed definitions of the assumed functions, refer to Section 4.

#### **3. Calculation Methodology of Electricity Prices and Rebate Levels**

*d*

The power suppliers bring the electricity consumption closer to the target value, which is preferable in the power supply–demand management by changing the electricity price or rewarding the money rebate to the consumers. The target electricity consumption in DRs is defined as:

$$d\_t' = d\_t^\* + \Delta d\_{t'} \tag{10}$$

where *d* ∗ *t* is the standard electricity consumption, which is the actual consumption without DRs; ∆*d<sup>t</sup>* is the change in electricity consumption by the DR request.

In the actual power grids' operations, *d* ∗ *t* is replaced with the estimated value, because we cannot know it. This replacement brings an uncertainty to Equation (10), and therefore, can lead to new challenges in the design of DRs. Detailed discussion on the issues remains as a future work of this study.

Under the following assumption, the authors set the acceptable conditions of electricity prices and rebate levels and define their optimal values.

#### **Assumption 1.** *Consumers buy electricity to maximize their utility.*

This assumption activates Equation (11), which is derived by Equations (8) and (9):

$$p\_t^\* = \left. \frac{\partial F\_t}{\partial d\_t} \right|\_{d\_t = d\_t^\*} \, \tag{11}$$

where *p* ∗ *t* is the standard electricity price, which is the actual price without DRs.

#### *3.1. Definition of the Optimal Electricity Price and Its Calculation*

The power suppliers decide the electricity price in the price-based DR to control the power demand. The optimal electricity price is defined as:

$$p\_t' = p\_t^\* + \Delta p\_{t\prime} \tag{12}$$

where ∆*p<sup>t</sup>* is the change of electricity price in the DR.

According to Equations (2) and (3), the values of the utility of the power suppliers and the consumers in the DR are calculated as:

$$\mathcal{U}L\_{1,t}(d\_t') = p\_t'd\_t' - E\_t(d\_t'),\tag{13}$$

$$\mathcal{U}\_{2,t}\left(d\_t'\right) = F\_t\left(d\_t'\right) - p\_t'd\_t'.\tag{14}$$

In addition, the DR-originated changes in them are calculated as:

$$
\Delta \mathcal{U}\_{1,t} = \mathcal{U}\_{1,t} \left( d\_t' \right) - \mathcal{U}\_{1,t} \left( d\_t^\* \right) = \left( p\_t' d\_t' - p\_t^\* d\_t^\* \right) - \Delta E\_{t\prime} \tag{15}
$$

$$
\Delta \mathcal{U}\_{2,t} = \mathcal{U}\_{2,t} \left( d\_t' \right) - \mathcal{U}\_{2,t} \left( d\_t^\* \right) = \Delta F\_t - \left( p\_t' d\_t' - p\_t^\* d\_t^\* \right), \tag{16}
$$

where ∆*E<sup>t</sup>* is the change in the suppliers' utility by the DR and is written as: *Et*(*d* 0 *t* ) − *Et*(*d* ∗ *t* ); ∆*F<sup>t</sup>* is the change in the consumers' utility by the DR and is described as: *Ft*(*d* 0 *t* ) − *Ft*(*d* ∗ *t* ).

In the DR, each of the power suppliers' utility and the consumers' one (or each sum of them during the target period) is greater than or equal to zero (*U*1,*t*(*d* 0 *t* ) ≥ 0; *U*2,*t*(*d* 0 *t* ) ≥ 0). Since the power suppliers request the DR cooperation considering their economic efficiency, the changes in the utility of the power suppliers are also greater than or equal to zero (∆*U*1,*<sup>t</sup>* ≥ 0). By contrast, without any incentive, the changes in the utility of the consumers are negative (∆*U*2,*<sup>t</sup>* < 0), because their utility function is maximized (approximately maximized in the actual situations) at the standard electricity consumption. By assumption 1, we can represent the electricity price that maximizes the consumers' utility at *d* 0 *t* as:

$$p\_t = \left. \frac{\partial F\_t}{\partial d\_t} \right|\_{d\_t = d\_t'}. \tag{17}$$

If the power suppliers set higher electricity price *p* + *t* than its standard, the consumers' utility decreases according to Equation (16), and thus, the electricity consumption is reduced (∆*d<sup>t</sup>* < 0). Meanwhile, the electricity consumption is encouraged by setting lower electricity price *p* − *t* (∆*d<sup>t</sup>* > 0). The acceptable conditions *p* + *t* and *p* − *t* to achieve the target of the pricebased DR are separately derived as:

$$\left. \frac{\partial F\_t}{\partial d\_t} \right|\_{d\_t = d\_t'} \le p\_t^+ \le \frac{F\_t(d\_t')}{d\_t'},\tag{18}$$

$$\frac{p\_t^\* d\_t^\* + \Delta E\_t}{d\_t^\prime} \le p\_t^- \le \left. \frac{\partial F\_t}{\partial d\_t} \right|\_{d\_t = d\_t^\prime}. \tag{19}$$

When the electricity price does not satisfy both Equations (18) and (19), the DR request brings negative economic impacts on the utility of the power suppliers, or the consumers' cooperation cannot reach the target of DR request. With a view to minimizing the burden in the price-based DR, the authors defined the optimal electricity price as:

$$p'\_t = \left. \frac{\partial F\_t}{\partial d\_t} \right|\_{d\_t = d'\_t}. \tag{20}$$

#### *3.2. Definition of the Optimal Rebate Level and Its Calculation*

If there is no incentive payment, the utility of the consumers decreases, depending on contribution to the DR. This is because the consumers must accept less satisfaction than that in the standard electricity consumption. In the incentive-based DR, values of the utility of the power suppliers and the consumers and their changes are separately calculated as:

$$dL\_{1,t} \left(d\_t'\right) = p\_t^\* d\_t' - E\_t \left(d\_t'\right),\tag{21}$$

$$dL\_{2,t}(d\_t') = F\_t(d\_t') - p\_t^\* d\_{t'}' \tag{22}$$

$$
\Delta l I\_{1,t} = p\_t^\* \Delta d\_t - \Delta E\_{t\prime} \tag{23}
$$

$$
\Delta \mathcal{U}\_{2,t} = \Delta \mathcal{F}\_t - p\_t^\* \Delta d\_t. \tag{24}
$$

These equations are similar to Equations (13)–(16); however, the electricity price is fixed to *p* ∗ *t* in the incentive-based DR. Although the consumers accept less satisfaction, their utility recovers to its original level by compensating the decrement of Equation (24). Therefore, we can set the acceptable condition for the unit price of money rebate *r<sup>t</sup>* as:

$$\frac{p\_t^\* \Delta d\_t - \Delta F\_t}{|\Delta d\_t|} \le r\_t \le \frac{p\_t^\* \Delta d\_t - \Delta E\_t}{|\Delta d\_t|}. \tag{25}$$

Under this condition, the DR does not bring negative impacts to both the power suppliers and the consumers. The authors defined the optimal rebate level to induce the active cooperation of the consumers as:

$$r\_t' = \frac{p\_t^\* \Delta d\_t - \Delta E\_t}{|\Delta d\_t|}. \tag{26}$$

#### **4. Numerical Simulation Model**

To apply the authors' proposal, actual utility functions are needed. Since there are no established utility functions, these functions were made in this paper using a widely used function for the fuel cost of power generation units and a record of smart power meters. Discussion on their appropriateness remains as a future work of this study.

In this paper, the standard electricity prices in the SWM framework were replaced with the annual average price as:

$$p\_t^\* = p^\* \ (= \text{23.90}), \text{ for } \forall t. \tag{27}$$

In the numerical simulations, *p* ∗ was set to 23.90 JPY/kWh, which is the Japanese annual average in 2015. The utility functions are shown below.

#### *4.1. Utility Function for Suppliers*

Thermal power generation has taken a large portion in the power supply, e.g., approximately 85% in Japan [28], and its fuel cost, as is well-known, has powerful influence on the operational costs of the power suppliers. The fuel costs of thermal power units are traditionally approximated as quadratic functions by means of generating power [29–34]. For these reasons, the authors added the following assumptions to make the operational cost function.

**Assumption 2.** *Total operational cost in the power supply is approximated as the quadratic function relying on the fuel costs of thermal power units.*

**Assumption 3.** *The power suppliers decide the electricity price to maximize their utility function at the annual average of electricity consumption (103.94 kW in this paper).*

The normalized utility function of the power suppliers was defined as:

$$M\_1^\dagger(s\_t) = -A^\dagger s\_t^2 + \left(p\_1^\dagger - B^\dagger\right)s\_t - C^\dagger,\text{ for }\forall t,\tag{28}$$

where *p* † 1 is the electricity price in the normalized function; *A* † , *B* † , and *C* † are the coefficients for the normalized function.

With reference to [35,36], the coefficients of the fuel cost function were set to 2.73 <sup>×</sup> <sup>10</sup>−<sup>7</sup> JPY/kWh<sup>2</sup> , 2.27 JPY/kWh, and 1.50 <sup>×</sup> <sup>10</sup><sup>5</sup> JPY. In addition, the maximum and the minimum outputs of power generation (*G* max and *G* min) were set to 175 kW and zero, respectively, based on the electricity consumption of 500 households in the record. With these values, the coefficients in Equation (28) were assumed to 4.63 <sup>×</sup> <sup>10</sup>−5/kWh<sup>2</sup> , 1.20 <sup>×</sup> <sup>10</sup>−7/kWh, and 0. Owing to Assumption 3, we can calculate *p* † 1 as:

$$p\_1^\dagger = 2A^\dagger \overline{\mathbf{s}} + B^\dagger \,\tag{29}$$

where *s* is the annual average of electricity consumption.

By multiplying *<sup>p</sup>* ∗ *p* † 1 on both sides in Equation (29), we can update the coefficients as *A* and *B*, and as a result, the functions of the power suppliers' utility and the operational cost were written as:

$$dL\_1(s\_t) = -As\_t^2 + (p^\*-\mathcal{B})s\_{t\prime} \text{ for } \forall t,\tag{30}$$

$$E\_l(s\_l) = As\_l^2 + Bs\_l \ (= E(s\_l)), \text{ for } $$

where *A* = *p* ∗ *p* † *A* † ; *B* = *p* ∗ *p* † *B* † .

1 1 Figure 1 displays the resulting operational cost function of the suppliers, and its coefficients are summarized in Table 1.

**Figure 1.** Operational cost function.

**Table 1.** Coefficients in Figure 1.


#### *4.2. Utility Function for Consumers*

The consumers' satisfaction is assumed by several functions such as logarithmic or sigmoidal functions [37–42]. In this paper, the functions of hourly satisfaction were made by relying on the record of smart power meters for 500 households. Figure 2 shows the profiles of the hourly total electricity consumption of the 500 households, which are samples including the highest or the lowest electricity consumption for one year. The hourly cumulative frequency distributions of the electricity consumption in each household are displayed in Figures 3 and 4.

**Figure 2.** Example of profiles of hourly electricity consumption.

**Figure 3.** Hourly cumulative frequency distributions in the day of the highest electricity consumption.

**Figure 4.** Hourly cumulative frequency distributions in the day of the lowest electricity consumption.

Based on these data, the normalized utility function of the consumers was approximated as:

$$\mathbb{E}\,\mathcal{U}\_{2,t}^{\dagger}(d\_{l}) = X\_{t}^{\dagger}\ln\left(Y\_{t}^{\dagger}\left(d\_{l} + Z\_{t}^{\dagger}\right)\right) - p\_{2,t}^{\dagger}d\_{l\prime} \text{ for } \forall t,\tag{32}$$

where *p* † 2,*t* is the electricity price in the normalized function; *X* † *t* , *Y* † *t* , and *Z* † *t* are the coefficients for the hourly normalized function of consumers' satisfaction.

Equation (32) suggests how many consumers are satisfied with the electricity consumption *d<sup>t</sup>* . With assumption 1, *p* † 2,*t* can be calculated as:

$$p\_{2,t}^{\dagger} = \frac{X\_t^{\dagger}}{d\_t^\* + Z\_t^{\dagger}}, \text{ for } \forall t. \tag{33}$$

As with Equations (30) and (31), we can derive the functions of the consumers' hourly utility and satisfaction as:

$$dI\_{2,t}(d\_t) = X\_t \ln(Y\_t(d\_t + Z\_t)) - p^\* d\_{t\prime} \text{ for } \forall t,\tag{34}$$

$$F\_t(d\_t) = X\_t \ln(Y\_t(d\_t + Z\_t)), \text{ for } \forall t,\tag{35}$$

where *X<sup>t</sup>* = *p* ∗ *p* † 1 *X* † *t* ; *Y<sup>t</sup>* = *Y* † *t* ; *Z<sup>t</sup>* = *Z* † *t* .

Figures 5 and 6 show the hourly satisfaction functions, and their coefficients are summarized in Tables 2 and 3.

**Figure 5.** Hourly satisfaction functions in the day of the highest electricity consumption.

**Figure 6.** Hourly satisfaction functions in the day of the lowest electricity consumption.


**Table 2.** Coefficients in Figure 5 (the day of the highest electricity consumption).

**Table 3.** Coefficients in Figure 6 (the day of the lowest electricity consumption).


#### **5. Numerical Simulation Results**

Numerical simulations were carried out with the model constructed in Section 4. The following scenarios were assumed:

**Scenario 1.** *The power suppliers request to reduce the electricity consumption at any time slot in the day of the highest electricity consumption.*

**Scenario 2.** *The power suppliers request to encourage the electricity consumption at any time slot in the day of the lowest electricity consumption.*

Under these scenarios, the authors calculated the sets of the optimal electricity price or rebate level and the acceptable condition on each time slot.

#### *5.1. Numerical Simulation Results for Price-Based DRs*

By using Equations (18)–(20), the optimal electricity prices and the acceptable conditions in the price-based DR were calculated as:

$$p'\_t = \frac{\mathbf{X}\_t}{d'\_t + Z\_t} \,'\,\tag{36}$$

$$p\_t' \le p\_t^+ \le \overline{p\_t^+} \,\tag{37}$$

$$
\underline{p\_t^-} \le p\_t^- \le p\_t'.\tag{38}
$$

$$\text{where } \overline{p\_t^+} = \frac{\chi\_t}{d\_t^{\prime}} \ln(\mathcal{Y}\_t(d\_t^{\prime} + Z\_t)); \underline{p\_t^-} = \frac{A\left(d\_t^{\prime 2} - d\_t^{\ast 2}\right) + B\left(d\_t^{\prime} - d\_t^{\ast}\right) + p\_t^{\ast} d\_t^{\ast}}{d\_t^{\prime}}.$$

Tables 4 and 5 summarize the calculation results in the case that the power suppliers set the DR target as a 1% decrease (Scenario 1) or increase (Scenario 2) of the standard electricity consumption. Figures 7 and 8 illustrate changes in the utility functions of the power suppliers and the consumers and those in the social welfare functions.

**Table 4.** Numerical simulation results of price-based demand response programs (DRs) under scenario 1.


**Table 5.** Numerical simulation results of price-based DRs under scenario 2.


**Figure 7.** Changes in each function by the 1% DR request in the day of the highest electricity consumption.

**Figure 8.** Changes in each function by the 1% DR request in the day of the lowest electricity consumption. In this scenario, the power suppliers could not request the DR cooperation in the target period.

In Table 4, the calculation results in scenario 1 satisfied with their acceptable conditions, and as a result, the optimal electricity prices were slightly increased (less than 1%) as compared to the standard electricity price (23.90 JPY/kWh). By contrast, as shown in Table 5, the power suppliers could not request a 1% increase in the electricity consumption in scenario 2, because the calculation results did not satisfy their acceptable conditions. In particular, the electricity consumption became higher than its annual average (103.94 kW) from 18:00 to 22:00, and therefore, *p* − *t* exceeded the standard price. In the other periods, the power suppliers can request the DR cooperation, until the condition shown in Equation (38) is violated. With reference to Figure 2 we can understand that the calculated electricity prices changed inversely to the profile of electricity consumption in Table 4, while synchronously in Table 5. These results indicated that the consumers can reduce the power demand during the peak periods, and it becomes difficult in the off-peak periods. That is, the calculated prices reflected the controllability in the electricity consumption on each time slot.

In Figures 7 and 8, there were no significant differences in the social welfare by the DR; however, we can confirm that the price-based DRs took burden on the consumers in Figure 7 or the suppliers in Figure 8. As displayed in Figure 9, the optimal electricity price became 25.01 JPY/kWh, and the increment of the price exceeded 1 JPY/kWh when the DR target was set to 7% of the actual electricity consumption. As for the reference, the optimal electricity price in scenario 2 is shown in Figure 10.

**Figure 9.** Results on the typical time slots in scenario 1 (at 22:00).

**Figure 10.** Results on the typical time slots in scenario 2 (at 4:00). In this scenario, the power suppliers could not request the DR cooperation in the target period.

In the numerical simulation model, the authors made the operational cost function of the power suppliers relying on the fuel costs of thermal power units. However, in the actual operations, the other factors such as the surplus power of renewable energy sources have influences on the operational cost. If we reflect them appropriately in the model, the results in scenario 2 can be activated.

From these results, the authors concluded that the authors' proposal functioned properly in the price-based DR.

#### *5.2. Numerical Simulation Results for Incentive-Based DRs*

By using Equations (25) and (26), the optimal rebate levels and the acceptable conditions in the incentive-based DR were calculated as:

$$r\_t' = \frac{-A\left(d\_t'^2 - d\_t^{\*2}\right) + (p^\* - B)\left(d\_t' - d\_t^\*\right)}{|d\_t' - d\_t^\*|}\,,\tag{39}$$

$$
\underline{r\_t} \le r\_t \le r'\_t. \tag{40}
$$

where *r<sup>t</sup>* = *p* ∗ *t* (*d* 0 *<sup>t</sup>*−*d* ∗ *t* ) *<sup>t</sup>*−*X<sup>t</sup>* log *<sup>d</sup>* 0 *t* +*Zt d* ∗ *t* +*Zt* |*d* 0 *<sup>t</sup>*−*d* ∗ *t*


**Table 6.** Numerical simulation results of incentive-based DRs under scenario 1.

.



**Table 7.** Numerical simulation results of incentive-based DRs under scenario 2.

**Figure 11.** Changes in each function by the 1% DR request in the day of the highest electricity consumption.

**Figure 12.** Changes in each function by the 1% DR request in the day of the lowest electricity consumption. In this scenario, the power suppliers could not request the DR cooperation from 18:00 to 22:00.

In Table 6, the calculated rebate levels were in the range of 0.95 JPY/kWh to 16.07 JPY/kWh. Meanwhile, the rebate levels in Table 7 were in the range of −2.61 JPY/kWh to 7.70 JPY/kWh. As opposed to the results of price-based DRs, the time variation in Table 6 and Figure 11 had a similar trend to the electricity consumption. As shown in Table 7 and Figure 12, the optimal rebate levels were lower than *r<sup>t</sup>* from 18:00 to 22:00, and thus, the consumers' utility became negative during the periods. It indicated that the power suppliers could not ensure the resources for the DR, and the DR request was impossible

during the periods. In Figures 11 and 12, there were no burden on the power suppliers and no decrement in the social welfare excepting the period from 18:00 to 22:00 in scenario 2.

As shown in Figures 13 and 14, the optimal rebate levels reduced in response to the decrease (Figure 13) or increase (Figure 14) of the target values of the DR. In contrast, the lower limits of the rebate levels were gradually raised. This is because the resources for the DR were limited in association with the utility function of the power suppliers.

**Figure 13.** Results on the typical time slots in scenario 1 (at 22:00).

**Figure 14.** Results on the typical time slots in scenario 2 (at 4:00). In this scenario, the power suppliers could not request the DR cooperation from 18:00 to 22:00.

These results showed that the authors' proposal functioned properly in the incentivebased DR as well.

#### **6. Conclusions**

The authors proposed a theoretical approach that calculates electricity prices and rebate levels in DRs, based on the framework of SWM. In the authors' proposal, first, the utility functions of the power suppliers and the consumers were set, and then, the power supply–demand-balancing operation was represented under the SWM framework. Next, the authors derived the acceptable conditions for price-based DRs as Equations (18) and (19) and the acceptable conditions for incentive-based DRs as Equation (25). Besides, the optimal values of electricity prices and rebate levels were defined as Equations (20) and (26), respectively, in consideration of burden on the society. The distinctive feature of the proposed approach was to make influences of the DRs measurable as an increment/decrement in the utility functions. Finally, to verify the validity of the authors' proposal, the numerical simulations were carried out with the model, which was constructed using the approximated fuel cost function of power generation units and the record of smart power meters.

As shown in Section 5, the calculated electricity prices and rebate levels became smaller than the values applied in the demonstrative field tests. This is because the calculation results, as defined in Equations (18)–(20) as well as Equations (25) and (26), strongly depended on the utility functions of the power suppliers and the consumers. In other words, the assumed utility functions have room for discussion on their appropriateness. However, the results of the numerical simulations reflected the controllability in electricity consumption, and we can conclude that the authors' proposal functioned appropriately.

In future works, the appropriateness of the assumed utility functions will be discussed in more detail. Furthermore, the authors will analyze influences of the replacement of the actual electricity consumption, *d* ∗ *t* , with the estimated one, and the proposed framework will be expanded.

**Author Contributions:** Conceptualization, H.T. and N.Y.; methodology, H.T. and N.Y.; software, N.Y.; validation, H.T., N.Y., H.A., A.H. and N.D.T.; writing—original draft preparation, H.T., H.A., A.H. and N.D.T.; writing—review and editing, H.T.; supervision, H.T. and H.A.; project administration, H.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partly funded by the Japan Society for the Promotion of Science (JSPS; grant number: 19K04325).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

