1. Introduction
The urgent need to transition to more sustainable energy sources directly responds to the challenges imposed by climate change and the growing global energy demand. In this scenario, wind power stands out as a promising alternative, showing remarkable technological development and capacity expansion in recent decades. Thus, it has established itself as one of the primary renewable energy sources, with increasing acceptance by society and strong support from international public policies.
WFs, made up of sets of WTs, play an essential role in transforming wind into electrical energy. The efficient management of these parks is critical not only to ensure optimal energy production but also to extend the useful life of the equipment and minimize periods of inactivity. An effective preventive maintenance system, which proactively identifies potential failures before they become major problems, is key to achieving these goals [
1,
2,
3].
The present research proposes dynamically applying the Hierarchical Cluster (HC) method to SCADA data to better understand and manage turbines’ operational behavior. This approach is novel because it applies clustering based on the shape of signals considered in consecutive time windows.
Groupings of WT sets with similar characteristics in the WF will allow the WF operators to understand their operation and improve their management and maintenance. Specifically, this work explores the ability to group WTs according to criteria of similarity of specific signals since this would allow the subsystems of the machines that generate them to be compared and behaviors similar to detect possible irregularities prematurely. This can be of particular relevance when monitoring, for example, temperatures at critical points of the WT. In the context of WTs, this task presents specific difficulties since the behavior of WTs is highly nonlinear and varies enormously over time. The WTs are subject to important behavior changes due to wind fluctuation and other meteorological changes. Consequently, the variations of the shapes of the collected signals over time, or equivalently, the characteristics that represent them, also vary enormously.
As observed and proposed in [
4,
5], a detailed analysis of the variability and continuity of the collected signals and clustering WTs appropriately will facilitate the early detection of anomalies and the planning of maintenance interventions [
6].
In the literature, clustering techniques have been used differently in the context of WTs. Liu et al.’s (2014) article [
7] explores the application of WT clustering methods to improve short-term wind-power forecasts. These authors develop and validate different clustering techniques to categorize turbines based on similar performance characteristics, enabling more accurate prediction models. This research proposes using clustering to optimize energy management strategies, resulting in more accurate and efficient power generation predictions. The article [
8] addresses the development of a dynamic cluster equivalent model for WTs based on the use of spanning trees. This approach allows the clustering of WTs dynamically to improve the efficiency and representativeness of WF simulation models. Such an approach uses spanning tree techniques to identify and represent the most critical connections between turbines, thus facilitating the creation of simplified yet effective equivalent models. This methodology offers a significant advancement in modeling the collective behavior of WTs, contributing to improving WF planning and operation. The article [
4] presents a method for diagnosing and warning of faults in WTs using cluster analysis and a modified version of the Adaptive Neuro-Fuzzy Inference System (ANFIS). The article [
9] presents an advanced methodology for early fault detection in WTs, combining operational condition clustering with optimized deep belief network modeling. This approach segments WT operations into different sub-conditions, facilitating more effective detection of possible anomalies and improving fault detection accuracy by effectively handling the WTs’ nonlinear and heterogeneous operational data. The article [
5] explores a new strategy for fault detection in WTs through SCADA data clustering. This methodology groups WTs based on similarities in operational data to enhance fault detection and diagnosis through comparative analysis. Finally, advanced data visualization techniques are implemented to represent the information and facilitate its graphic interpretation. This study seeks to provide a clear and detailed understanding of the dynamics within the WF using scatter plots, time plots, and other graphical tools. The HC process is then applied to identify groupings of data that will share similar properties, revealing meaningful patterns that can inform maintenance and operation decisions [
7,
8]. This approach improves WFs’ operational performance and provides a replicable and scalable methodology for data analysis in other industrial applications, expanding companies’ ability to adapt to technological and market changes.
In the context of the clustering of WTs for condition monitoring, the present work focuses on the problem of dynamic clustering based on the shape of specific signals, which is a new approach to the problem. Focusing on signals, clustering is intended to locate the potential degradation of a particular subsystem based on one signal or group of signals showing anomalies. Such capability is desirable for condition monitoring of WTs and their preventive maintenance.
As the operating conditions of WTs are very variable and variations occur on different time scales, another very critical point is that the number of clusters is variable in time and unknown a priori. For example, all the WTs in the WF often stop working due to the lack of wind, and suddenly, the recorded SCADA temperatures of all the WTs tend to converge to the ambient temperature, falling in a single cluster. Therefore, one of the requirements for the algorithm is that it does not depend on the number of clusters, K.
The popular K-means [
10] and K-medoids [
11] clustering techniques organize data into K mutually exclusive clusters, requiring the number K to be known as a priori. The Gaussian mixture models (GMM) [
12] represent normally distributed subpopulations within an overall population and are appropriate when clusters have different sizes and different correlation structures within them. The parameters of GMM are the mixture component weights and the distributions’ means and variances/covariances. Moreover, GMM can be performed
soft by assigning the observation to multiple clusters based on the scores or posterior probabilities of the observation for the clusters. However, because of its need to know K to adjust the parameters, it does not suit the present application’s needs.
Therefore, a signal of all WT in a WF is analyzed fragment-by-fragment in time windows called frames. The clustering of the signals of each WT in every frame is obtained. Clustering, considering the shapes of those signals in a temporal frame, is updated with every new frame. The detailed methodology first involves a comprehensive review and selection of the most impactful variables, such as air temperature, relative humidity, wind speed, and generated power. The next step involves a meticulous data cleaning process, where the inputs are selected and filtered to remove outliers or erroneous values. This task is crucial to maintaining the integrity of the analytical model [
9,
13] but is very common, so it is not focused on very much. The database employed for this work contains SCADA signals that provide data every 5 min, thus collecting 288 points daily. More relevant in our approach is that to compact the information, the Discrete Cosine Transform (DCT) is used, and only the first coefficients are considered, which are organized into vectors of features to represent the signals frame to frame. The experiments reveal that a few coefficients (3 to 5) already synthesize the information very effectively. The representative vectors of each signal are used to calculate the distances (using the Euclidean distance) between each pair of signals. Then, a binary, agglomerative, hierarchical clustering tree is built from those distances [
14]. In this step, the objects (signals) are linked in pairs according to proximity, building a series of nested clusters, where the most similar elements are grouped first, and differences are incorporated as one descends the hierarchy. In agglomerative clustering, each element starts as an independent cluster and progressively merges into larger clusters based on similarity. This hierarchical tree (HT) provides an intuitive view of data structure, allowing exploration of different levels of detail in organizing elements, which is helpful in classification and pattern exploration in complex datasets because the HT can be cut to form the clusters at any particular point independently of a pre-established number of clusters [
15]. Finally, a way to name the clusters is developed so that the signals with the most similarity appear in the first cluster, and as they differ more, they appear in higher clusters. This nomenclature facilitates, at least in a small WF, the monitoring of the temporal evolution of the clusters.
Notice that the proposed method works only with the SCADA averages. When working with real-world data continuously over time, specifically with SCADA data, errors are inevitable due to failures in the sensors that collect them or errors in communications. In addition, there is always noise. Statistical measurements taken by the SCADA system are typically provided every 10 min. The biggest errors in these measurements are particularly collected in the max and min statistical operations, which capture the extreme values. For this reason, max and min are less reliable SCADA measurements. Averages work as a low-pass filter and smooth out noise and errors, making them more trustworthy.
Averages also eliminate fast variations and high-frequency components that could be present in these signals but still preserve the major characteristics of the signal shape. Standard deviations provide valuable information that could be used, although the best way to take advantage of them in this context must be investigated. The low-frequency rate of SCADA data acquisition presents a significant limitation that can hinder diagnostic capabilities, particularly in detecting short-duration events. Directly identifying abnormal vibration of damaged mechanical components or anomalous electrical behavior of faulty electrical elements is challenging, with data averaging every 10 min, as discussed in studies such as [
16,
17]. It should be noted that condition monitoring (CM) methods based on SCADA data typically focus on detecting secondary effects of faults [
18]. SCADA-based CM methods often identify incoming faults through abnormal conditions, such as the heating or underperformance of WTs.
The SCADA system’s averaging of slow-varying signals every 10 min still preserves the main information, and it does not have the devastating effect it has on fast-varying signals. For instance, practically half the SCADA system’s magnitudes are temperatures, which fall in this slow-varying signal category and are taken at many points in WT’s subsystems.
This work will be organized into
Materials and Methods (
Section 2) where detailed descriptions are provided of the following contents: the data used, the shape parameterization based on the DCT coefficients, a description of the hierarchical clustering employed, and the protocol to name the clusters that permits the following of the temporal evolution of clusters. Then,
Section 3,
Results, explains the interpretation of the dynamical clustering graphs and contains two experiments. The first is a study of the wind speeds recorded in the WTs’ nacelles through these clustering techniques. The second is a comparative study of applying this technique to two control variables commonly used for WT prognoses, such as the rotation speed of the generator shaft and the temperature of the oil in the gearbox.
Section 4,
Discussion, deals with the main parameters of the algorithm, the limitations and some future research to improve the method, and some comparisons with other clustering techniques that require the knowledge of the number of clusters to run. Finally, the main
Conclusions are summarized in
Section 5.
2. Materials and Methods
2.1. Data Used
The present study thoroughly analyzes SCADA records of five 2.5 MW Fuhrläender FL2500 wind turbines for three years and a sampling frequency of 5 min. The system has IEC 61400-25 as its standard communication protocol for transmitting data from the wind turbines and storing it in a MySQL database. This database includes 312 analog variables from 78 different sensors. Thus, the status of various essential components, such as the transmission, generator, and converter, among many others, can be known. The data are extracted from the open-access database available at
https://github.com/alecuba16/fuhrlander, accessed on 21 May 2024, and it is described in [
19].
2.2. Shape Parameterization Trough the DCT
The DCT will be exploited as a tool for parameterizing relatively long signals into a few parameters to compress their information. Unlike other transforms, such as the Fourier Transform, which yields complex-valued coefficients, DCT produces real-valued ones, simplifying signal processing.
To present it, let us consider the set of
N points
, and their
N DCT (of
type-II) transformed coefficients
. The forward and backward expressions take the form:
and,
where
for
and
for
.
As is well known, one of DCT’s key features is its ability to concentrate most of the signal energy into a few coefficients. Thus, a relatively small number of coefficients can capture much of the signal’s information, making it an efficient representation for compression purposes. In many applications, such as image and video compression, DCT is applied to small blocks of the signal rather than the entire signal. This block-based processing allows for parallelization and efficient implementation. DCT, like other discrete transforms, also has fast algorithms that are being computed very efficiently. Additionally, DCT has an inverse transform that reconstructs the original signal from its DCT coefficients. This property is essential for applications where compression is used, as it facilitates decompression to retrieve the original signal.
The compaction properties the DCT presents in the first transform coefficients will concentrate the shape characteristics of the time series of length N in a few parameters. Therefore, in the transformed domain, the
N points of the signals are characterized by the
L first transformed coefficients of their DCTs, where
L will be much smaller than
N. It is interesting to check the reconstruction capacity of only the
L = 2, 4, or 6 DCT coefficients to reconstruct a sequence of 128 points from the following reconstruction formula.
where
are the reconstructed samples from the first L DCT coefficients.
Figure 1 shows some reconstructions for an original bloc signal of 128 points by
L = 2, 4, and 6 DCT coefficients.
Notice in
Figure 1 that the DCT concentrates energy in the first coefficients, meaning the initial coefficients capture significantly more of the original information than the latter ones. For instance, reconstructing the original signal using only the first coefficient results in a horizontal line at the mean signal value, which can be observed from, first, Equation (
1) taking
and then from Equation (
3) taking
. Each additional coefficient incorporated into the calculation adds detail to the reconstruction.
Notice that by organizing the elements
and
of (
1) in the vectors
and
, the DCT can be written in matrix form as:
, with their elements
taking the form:
The
matrix
is unitary. Because their column vectors are orthogonal, it is fulfilled that
. That is why the backward expression in (
2) can be expressed as
, and the reconstructions as:
, being
.
2.3. Hierarchical Clustering
HC is a technique used in data mining and statistics to group similar data points into clusters based on their characteristics. It creates a hierarchical structure of clusters, where clusters at higher levels of the hierarchy contain fewer data points but represent broader similarities. In comparison, clusters at lower levels are more specific and may include individual data points.
There are two main types of HC: agglomerative and divisive. In agglomerative HC, each data point starts as its cluster. At each step, the two most similar clusters are merged until only one cluster remains, forming an HT-like structure. Divisive HC, on the other hand, starts with all data points in a single cluster and recursively splits them into smaller clusters until each data point is in its cluster.
This work uses agglomerative HC analysis, which follows three main steps on a data set. The first requires computing the similarity or dissimilarity between every pair of objects in the data set by calculating the distance between objects. Distance can be computed in many different ways. Standard distance metrics include Euclidean distance, Manhattan distance, and correlation-based distances, among many others. Once a distance metric is selected, the first task is to compute all the distances between all pairs of objects. Then, the distances between objects allow them to be grouped into a binary HC tree. Therefore, the second step consists of linking pairs of objects nearby using the distance information according to their proximity. As objects are paired into binary clusters, the newly formed clusters are grouped into larger clusters until an HT is formed. The third step is determining where to cut the HT to form the final clusters. This involves pruning branches off the bottom and assigning all the objects below each cut to a single cluster.
Once the HC is complete, dendrograms are often used to visualize the hierarchical structure of clusters. A dendrogram is a tree-like diagram that illustrates the order in which clusters are merged or split and can help identify the optimal number of clusters based on this structure.
HC is extremely useful in our application because it does not require specifying the number of clusters beforehand. However, it can be computationally intensive for large datasets, as it requires storing the entire dataset and computing pairwise distances between data points.
In
Figure 2, the most essential parts of this process are shown. In the upper graphic, the 5 WT signals to be classified according to their shape are represented, so, in this particular case, the first three coefficients of the DCT are used to parametrize them. It is noted that each WT’s signal is represented in a particular color, which is maintained in all the representations. The graph below displays the original signals’ reconstructions based solely on these three coefficients. In this case, the original signals consist of 128 points, corresponding to almost 11 h. Based on vectors of only 3 components, distances between signals (objects) are calculated, and the HC dendrogram is constructed (shown in the figure below on the left). In the dendrogram, the threshold used to form clusters, a distance of 60, is also depicted to observe how the two clusters form. The figure below on the right presents the result, indicating that within the analyzed time interval and according to the threshold utilized, four signals are classified together due to similarity, while the remaining signal falls into a second cluster.
Notably, in
Figure 2, the 128 points are reconstructed using only 3 DCT coefficients.
2.4. Dynamic Evolution of Clusters and Cluster Nomenclature Protocol
A good visualization of the temporal evolution of the clusters over time in this type of problem is considered challenging. Because the signals, and therefore the feature vectors representing them, can vary significantly over time, the hierarchical trees (HTs) built before cluster formation also undergo considerable variation. Although the signals from the same WTs tend to fall into the same clusters, these cluster assignments can change from frame to frame. This means that, for instance, in frames
,
k, and
, the cluster containing signals A, B, and C may be labeled as Clusters 1, 2, and 3, respectively, complicating the dynamic monitoring of the clusters, even in a small park like the one under consideration. Therefore, even if the clusters are well formed, the fact that Cluster 1 with signals 1 and 2 changes its name in the next frame to become Cluster 3 (also with signals 1 and 2) can make dynamic system monitoring difficult. A cluster nomenclature protocol has been developed based on the distance of the cluster in the HT, as represented in the dendrogram of
Figure 3, so that Cluster 1 will be the one with the lowest distance in its highest node, Cluster 2 the next one, and so on, according to such distance. In
Figure 3, note that the clusters are formed based on the distance used to prune the tree, represented by a vertical red line. Once the clusters are formed, they are named, starting with the one with the lowest distance to its junction point in the dendrogram (represented by the horizontal double-headed arrows) and continuing as those distances increase. This naming protocol stabilizes cluster names frame by frame, making them much easier to track.
Figure 4 illustrates the disordered numbering of the clusters caused by the significant variation that the HTs (represented in the dendrograms) can present frame by frame. The index
k represents time and the proposed order to facilitate tracking. Each of the WTs is identified with a particular color. The left part of the graphic, part (a), exemplifies the default arrangement, while the right part, part (b), is the proposed arrangement.
Sorting clusters is not just for tracking over time. Once a pruning threshold has been set for the hierarchical algorithms, the WTs clustered in higher clusters are those whose signals differ more from the rest (the distance separating them from the other signals is greater). In contrast, those that remain in low clusters are much more similar. This outcome is due to the cluster nomenclature protocol illustrated in
Figure 4b. According to this nomenclature, we start by assigning the number 1 to the cluster that presents the smallest distances between its WTs, Cluster 2, the cluster that requires raising the pruning distance less to fall into Cluster 1, and so on. For this reason, the most different signals of the set fall in the high clusters. The most distinct signal does not necessarily have to be related to the most damaged component. However, such clustering indicates to supervisors which components to observe more closely, which is valuable in predictive maintenance.
Figure 5 shows the process’ main steps in a flow diagram.
5. Conclusions
Specifically, this work explores the ability to group WTs according to criteria of similarity of specific signals since this would allow the subsystems of the machines that generate them to be compared and behaviors similar to detect possible irregularities prematurely.
In this work, a new methodology is proposed to cluster the WTs of a WF into clusters. It is crucial to understand that the operation of wind turbines is highly nonlinear and time-varying, requiring dynamic monitoring over time for extended supervision. The conditions at different instants can be vastly different. The work is at an early stage, but the results obtained showcase great potential. Below, the main contributions are highlighted:
Clustering is based on a specific SCADA signal observed during a time interval called a frame. It is carried out frame by frame and works for any averaged signal.
Compressing the information of the signals is critical, so the first coefficients of the DCT are used. With the help of the DCT, each WT’s signals are represented in low-dimensional vectors.
Widely known agglomerative hierarchical clustering techniques are used, and the Euclidean distance is employed, which is applied to the vectors of DCT coefficients. In a more advanced phase of knowledge, other distances can be explored. The advantage of these techniques is that they do not impose a fixed number of clusters. However, setting a distance (a threshold) is necessary to prune the hierarchical trees. Dendrogram-type representations can be used to explore the appropriate distance. Once such distance is decided, it is maintained and used to process all frames.
To maintain an interpretable temporal track of the clusters frame by frame, it is crucial to define a stable cluster nomenclature. The information generated when the hierarchical tree is built from the distances between vectors according to a previously explained criterion is used. According to this nomenclature, the most similar signals are organized in low clusters and the most different in high clusters.
Due to the investigations’ initial state, different topics must be explored in more detail. For example, it should be noted that this approach may not fully capture complexities in waveforms depending on the signal analyzed and their rapid transitions, thus implying more research on how best to select the appropriate number of DCT coefficients according to signal type, the length of the frame and operational conditions. Another point that may require a significant amount of research is the study of how to understandably represent dynamic WF diagrams with a more significant number of WTs and helpfully for making decisions, which indeed goes parallel to developing a form better to name the clusters.
In this work, however, one can already see the potential of the method presented by first analyzing an individual signal, such as the wind speed measured in the nacelles of the WTs, or the comparison of two critical variables, such as a rotation speed of the generator shaft or the gearbox oil temperature. Being able to know, for all critical variables, which are the WTs that deviate the most from normality, i.e., from Clusters 1 and 2, objectively can be an invaluable help to improve the preventive maintenance of the WFs.
Applying this clustering model in WF operational management could successfully plan preventive maintenance through early detection of abnormal conditions. Case studies have validated its practical applicability by reducing downtime and associated costs, demonstrating that this approach works.