This section contains the following topics: (a) HyperSim simulator and COMTRADE files; (b) nominal operation, SNR, short-circuit faults, and main symptoms; (c) the proposed theoretical fault detection and diagnosis architectures and approaches.
3.2. Nominal Operation, SNR, Short-Circuit Faults, and Main Symptoms
On overhead power lines, in the nominal operating region (without short-circuit faults), the neutral current
should obey the condition expressed by Equation (3). When the current exceeds the rating of the differential circuit breaker, approximately 1 A or a few amperes, the differential circuit breaker opens the power contacts; in this work, the estimation of the fault start instant
is based on the instant at which |
| > 1 A. The grounded (earthed) neutral wire can also serve as a parallel path to earth for short-circuit fault currents, acting as a neutral fault protection.
Harmonics in power lines are mainly due to the use of nonlinear power electronics equipment, causing voltage distortions and affecting the quality of energy supplied to consumers. According to EN-50160/IEC-61000-3-2 standards, the total harmonic distortion (THD) limit value for voltages above 1000 V is 5%; this value is valid for nominal (normal) operating conditions with a probability of 95%.
In this work, in order to emulate some of the situations referred to as load imbalances, load variations, and voltage distortions, Gaussian noise was added to the voltage and current signals, assuming different signal-to-noise ratios (SNRs), as detailed in
Table 3, since the available signals generated by the HyperSim simulator, made available by the R&D Nester laboratory, do not contain noise. The lowest SNR values (25 dB and 20 dB) were established taking into account typical allowable tolerances in voltage and current signals in nominal operation; they were 5% and [10%; 20%].
In a power line, when a short-circuit fault occurs, two typical relevant symptoms are the rapid increase in current and the rapid decrease in voltage [
9,
27]. Based on this premise, for the faults under study,
Table 4 was constructed, which defines the rules used in this work for fault detection and diagnosis; a rapid increase in the current is denoted using the label “+1”, and a rapid decrease in the voltage is denoted using the label “−1”; the label “0” denotes a symptom not relevant to fault diagnosis.
In the event of a short-circuit fault, in addition to the symptoms of rapid changes in the amplitudes of currents and voltages, another relevant symptom is the rapid change in the frequency of these signals.
3.4. Fault Detection Approach
In
Figure 3, the detailed architecture of the proposed fault detection approach is depicted. The fault detection approach is based on applying linear principal component analysis (PCA) to voltage signals (
), given that voltage signals present a high level of correlation under nominal (fault-free) operating conditions. When a fault occurs, in our case a short-circuit fault, the correlation level decreases profoundly and quickly.
In dynamic processes where correlation or redundancy between variables exists, it is advantageous to reduce the number of variables, maintaining an important quantity and quality of relevant original information. Dimensionality reduction techniques, such as principal component analysis (PCA), can greatly simplify and improve process monitoring tasks, since they project the data into a lower-dimensional space that accurately characterizes the state of the process under study [
34,
51,
52,
53,
54]. Principal component analysis (linear PCA) is one of the most popular dimensionality reduction techniques. PCA is a multivariate statistical technique in which a number of related variables are transformed to a smaller set of uncorrelated variables. PCA preserves the correlation structure between the process variables, and captures the variability in the data. Principal component analysis is a multivariate statistical technique that can also be used to design linear controllers [
55,
56] or nonlinear controllers [
57].
Next, the methodology proposed for applying linear PCA to voltage signals is described, based on PCA and implemented using singular value decomposition (SVD) [
10]. Given a training set of
n observations and
m process variables stacked into a data matrix
, the loading vectors are computed by solving the stationary points of the optimization problem formulated in Equation (4), where
[
34].
The stationary points of Equation (4) can be computed via singular value decomposition (SVD) as described in Equation (5), where
and
are unitary matrices, and the matrix
contains the non-negative real singular values of decreasing magnitude along its main diagonal
, and zero off-diagonal elements. The loading vectors are the orthonormal column vectors in the matrix
, and the variance in the training set projected along the
column of
is equal to
.
Solving Equation (5) is equivalent to solving an eigenvalue decomposition of the sample covariance matrix
, as described in Equation (6), where the diagonal matrix
, with
, contains the non-negative real eigenvalues of decreasing magnitude
, and the
eigenvalue equals the square of the
singular value,
[
34].
In a great number of practical applications, and also in this work, one of the goals is to minimize the effect of random noise, or high-frequency disturbance signals, that can corrupt the PCA representation, and to optimally capture the relevant variations in the data. To achieve this goal, only the loading vectors associated with the
a largest singular values must be retained in the PCA model. PCA projects the observation space into two subspaces: the scores subspace, and the residual subspace. Selecting the columns of the loading matrix
to correspond to the loading vectors
associated with the
a largest singular values, the projections of the observation data
into the lower-dimensional space are contained in the scores matrix
, as described in Equation (7), and the projection of
back into the
m-dimensional observation space
is given by Equation (8).
The residual matrix
is computed according to Equation (9) and captures the variations in the observation space spanned by the loading vectors associated with the
smallest singular values. Typically, the two subspaces spanned by
and
are denominated the scores space and residual space, respectively. A more accurate representation of the process is given by the scores space, since residual spaces that have a small signal-to-noise ratio (SNR) are removed.
For a linear PCA model, the amount of variance explained by
a principal components is given by Equation (10), that depends on the eigenvalues
of the matrix
obtained in Equation (6) by SVD, assuming that
m is the number of process variables [
34,
52].
In the PCA approach, the number of dimensions of the reduced space
a defines the number of dimensions of the scores space. One way to define this number of dimensions
a is to choose a number of dimensions that explains a high percentage of the total variance in the features data, for example,
or more. For many applications, only two or three principal components are retained in the PCA model [
10,
34].
In this work, only the scores space
was selected for implementing the short-circuit fault detection approach, considering only two principal components,
, for
process variables (nominal voltage signals
,
,
and
); this decision was based on the high variance explained by the first two principal components, as described in
Table 5, for high-SNR voltage signals.
A big dataset with relevant short-circuit faults was generated by the HyperSim simulation software, allowing access to three-phase voltage and current signals in power lines,
and
, as described in
Section 3.1. Each one of the 619 files generated by the HyperSim software recorded a simulation lasting
s, including the occurrence of one of the seven short-circuit faults described in
Table 4; given that the sampling interval was
µs, the number of samples in each signal was
n = 30,001.
Based on the SVD and PCA concepts described in this section, it is now possible to present in detail the proposed short-circuit fault detection approach. The data matrix
30,001,
used to build the nominal PCA model is given by Equation (11), containing the nominal voltages (without short-circuit faults) in each phase and in the neutral wire. The covariance matrix
is described in Equation (12). The loading matrix
corresponds to the loading vectors
associated with the
a largest singular values, expressed by Equation (13). The projections of the observation data
into the lower-dimensional space are contained in the scores matrix
30,001,
, as described in Equation (14).
Human beings only have the ability to monitor signals well in one or two dimensions [
10]. In this work, the PCA analysis allowed a reduction in the dimensionality of the problem from
to
. Given that the choice was
(two principal components), the scores space is a two-dimensional (2D) space, i.e.,
, so we have a 2D nominal PCA model. In this 2D scores space, the scores matrix for a window of length
n can be represented by two column vectors Equation (15). Each line of
is a score, and each score is a projection of the original data in the 2D reduced scores space; the score with coordinates
, with
, is represented by a point in the two-dimensional scores space.
PCA models have the advantage that the scores variables produced, which are linear combinations of the original variables, are more normally distributed than the original variables themselves; this is a consequence of the central limit theorem. For problems where the data obeys a normal distribution, the threshold of the two-dimensional scores space is an ellipse, according to the T2 statistics, given by Equation (16), where
depends on Fisher’s F-distribution with
m and
degrees of freedom [
10,
34].
For the problem under study, the 2D graphics that relate the sinusoidal nominal voltage signals (
,
,
), for data in the nominal operating region (without short-circuit faults), are inclined ellipses, as depicted in
Figure 4; in this figure, it can also be verified that the nominal PCA model in the 2D scores space
, for data in the nominal operating region, is also an inclined ellipse. This nominal PCA model in the 2D scores space
, an inclined ellipse, is proposed in this work as the reference model for fault detection, as detailed next.
For each time sample
k, a short-circuit fault is detected if the signal
given by Equation (17) exceeds the threshold, i.e., the condition expressed in Equation (18) is verified. An adaptive threshold was used in this approach,
, that depends on the SNR computed in the nominal operating region (without faults), expressed by Equation (21) and assuming
, a value detailed later. Equation (17) expresses the distance between two points (two scores) in the scores space, the current score
and the nominal score
, as described in Equation (19) and in Equation (20), respectively, taking into account Equation (14).