The dataset includes different kinds of data: audio and tabular. Originally, the audio files were saved as .wav, but they were converted to .flac for a lossless compression. All tabular time series recordings were saved as .parquet, because it uses less disk size than .csv and can be opened in common data analysis languages like Python, R, and Matlab with a single command each. The time series data are split thematically into three: acoustical data, like (relative) sound pressure levels and one-third-octave bands; meteorological data; and SCADA data of the turbines in focus as well as neighboring turbines. All the time series files have the column “Time”, which holds the time stamp of the recording converted to UTC. Using these, it is possible to match and combine the different files.
2.2. Acoustical Data
The Institute of Structural Analysis of the Leibniz University Hannover set up three microphones at different distances from the turbine in focus, but in a similar angle. The one nearest to the turbine was placed in the distance of the hub height plus half of the rotor diameter, according to the distance prescribed by IEC 61400-11 [
18]. All three microphones recorded a mono audio signal with 51.2 kHz, the A- and Z-weighted sound pressure levels as well as Z-weighted one-third octave bands from 6.3 Hz to 20 kHz, as averages of 1 sec intervals each. In the present dataset, the SPL and octave band values were regrouped into 10 min intervals to have a better compatibility with the other data, which were only provided in 10 min time steps. For a valid value, at least 50% of the interval had to hold valid data; otherwise, it was set to NaN. Furthermore, percentile levels
with
were added for A- and Z-weighted SPL. This results in three microphone files with 53 columns each. The SPL values had to be anonymized as described in
Section 3.4. Descriptions on how they can be nevertheless used are given in
Section 4. Except for the “Time” column, all columns are thus given in dB
REF. The audio files were down-sampled to 32 kHz to save storage space and are provided as one zipped file per day containing the recordings split into 30 min files grouped by microphone.
2.3. Meteorological Data
During measurement campaigns two to five, we were provided with data from a preinstalled 100 m mast. It was equipped with eight wind speed, three wind direction, two temperature, two humidity, and one pressure sensors at different heights, according to
Table 2.
Horizontal wind speed is measured in two different directions at each height, with a north–west and a south–east facing cup, while all other sensors are only placed once per height. For all sensors, mean and standard deviation are given in 10 min intervals. For wind speed measurements, the minimum and maximum are given additionally. The column “Rain flag” is a Boolean indication of rainfall. Comparing this to the daily data from a weather station about 2 km east of the mast, the rainfall ranged from 0.2 mm to 8.2 mm on the rainy days. Furthermore, a set of (horizontal and vertical) wind speeds and directions from ultrasonic is given. To enhance the data of the measurement mast, the atmospheric stability was classified according to
Table 3 by calculating the wind shear exponent
using the mean horizontal wind speeds at 29 m and 100 m height with the equation
with
being the mean horizontal wind speed in m/s of the north–west-facing cup and
being the height z of the sensor in meters. Both
and the stability class were added to the data as their own columns.
Other additions are the “sound propagation direction” and “relative wind direction”. Both compare the wind direction at 96 m at the meteorological mast against the averaged angle of the three microphones relative to the wind turbine in focus and categorize the resulting direction offset in five different classes as described in
Table 4. The “sound propagation direction” and the “relative wind direction” point in exactly opposite directions.
2.4. SCADA Data
SCADA data were available for the turbine in focus as well as for turbines in close vicinity to the microphone positions with a resolution of 10 min. The contents of the SCADA files are described in
Table 5.
Operation and maintenance (O&M) reports were not provided except for two turbines during measurement campaign five, which were not the one in focus. Thus, the operating states of the turbines were derived from the available SCADA data and saved in the column “operating state”. The state was deduced from the raw, non-normalized data and additional information about the turbine models, like rated power, wind speed, and rotor speed, were used.
Based on the classification of Do and Huang in [
20], the following states were considered and applied in the given order:
STOP,
PARTIAL STOP,
CURTAILMENT,
PARTIAL CURTAILMENT,
OUTLIER,
NORMAL. There was no value-related definition given for the states and the transitions between them were not defined, so they were newly determined analyzing the power curves of our dataset. The relationship between these states is illustrated in
Figure 2 and described in more detail in the following. An exemplary power curve plot of normalized “power output” values over “wind speed” classified by “operating state” is shown in
Figure 3.
STOP
Initially, all measurements with “power output” are classified as STOP. It was considered also looking at the “rotor speed”, but it was rarely 0 at all. However, it can be said that the “rotor speed” stays below the minimal rotor speed according to the manufacturer “rs,min” of the wind turbine model during periods of no “power output”.
PARTIAL STOP
PARTIAL STOP denotes the state in-between STOP and NORMAL in which the turbine starts running, or, in reverse, in which the turbine stops running, but momentum has it still moving. The transition from (PARTIAL) STOP to NORMAL is reached when the “rotor speed” is bigger than the minimal rotor speed “rs,min” of that wind turbine model. It is assumed that the “power output” increases during the transition. Equivalently, the transition between NORMAL and PARTIAL STOP is reached as soon as the “rotor speed” drops below the minimal rotor speed. The “power output” decreases and as soon as it drops to or below zero, the state changes to STOP. A state labelled as PARTIAL STOP can also include a short stopping of the wind turbine, since the data are averages of 10 min intervals.
CURTAILMENT
CURTAILMENT is a state wherein the turbine is running for some time at a fixed “power output”, even though with the given wind condition a higher output could be yielded. This state can be described by having (almost) constant “power output” values (>0) for successive points in time, which lie also distinctly (more than 5% of the rated power) below the manufacturer’s power curve. It can be observed that during CURTAILMENT the “blade pitch” is higher than in NORMAL operation (, ).
PARTIAL CURTAILMENT
Similar to PARTIAL STOP, this state describes the change between CURTAILMENT and NORMAL operation, during which the “blade pitch” is regulated and the “power output” values change to the values expected for the given “wind speed”.
OUTLIER
Data points that were not classified with the preceding categories and have a “power output” value that is much smaller than the value of the manufacturer’s power curve for the current “wind speed”. OUTLIER points are completely unrelated with the “power output” values of the previous and next timestamp and also tend to come with high “blade pitch” values (). It is possible that, in the time period of these data points, there was a rather short shut down of the machine. In the end, a trade-off has been made where points with an Euclidean distance to the manufacturer’s power curve greater than the mean Euclidean distance of all potentially NORMAL states plus three times the standard deviation of the Euclidean distance are classified as OUTLIER. These data points are also characterized by higher “blade pitch” values than truly NORMAL states in their respective bin of “rated wind speed” ± 0.25 m/s. However, this can be prone to misclassification, as the distribution of the measurements is not always coherent with the available manufacturer’s power curve for that turbine, e.g., when the turbine is old.
NORMAL
Any data points not labeled so far are close to the manufacturer’s power curve and thus considered as NORMAL, which means that the turbine produces as much power as expected for the current wind speed.
To verify the algorithm described, a comparison of the O&M logs and the classification results was performed. The logs, however, only state three different states: “running”, “stopped”, and “error”. In the last state, at least in the data available, the turbine was not running, so it was treated as “stopped” as well. Both NORMAL and STOP, were classified correctly over 90% of the time. As a state such as PARTIAL STOP does not exist in the original logs, the results are not entirely correct. A misclassification of NORMAL instead of STOP happens in approximately 5.35% of the cases and the other way in 8.3%. The logs are not always continuous, resulting in 3.8% and 8.1% of the data points not being classified as either of the two.