1. Introduction
In this first section, we present an overall introduction to Smart Grid technologies from the point of view of their current challenges, due to lack of high-fidelity data.
While high-quality data (i.e., data sets that ensure the collected PMU data are correct and usable for decision-making and analysis) would be good to have, we consider that for Smart Grid research high-fidelity data are required, since it will often be used in simulations and/or technical models, i.e., data sets must ensure the PMU data are a precise, detailed, and a realistic representation of reality.
The evolution of traditional power grids into Smart Grids marks a transformative leap toward more efficient, reliable, and sustainable energy systems. Leveraging advanced digital technologies, Smart Grids can enable real-time monitoring, prediction, and control of electricity flow from generation to consumption, addressing the increasing complexity and variability of modern energy demands, especially with the integration of renewable sources [
1]. These grids are designed to enhance energy distribution efficiency, reduce operational costs, and improve resilience against disruptions [
2]. Through two-way communication systems between utilities and consumers, Smart Grids also enable dynamic and responsive grid management, allowing real-time adjustments based on consumption patterns, supporting better demand response and load balancing, and positioning Smart Grid research as the foundation of a more intelligent and adaptive energy system on a worldwide scale.
Phasor Measurement Units (PMUs) are integral to advanced grid infrastructure, delivering high-resolution, time-synchronized measurements of electrical waveforms, referred to as synchrophasors. These measurements provide a precise, real-time depiction of grid status, which is critical for immediate disturbance detection and response [
3].
PMUs are indispensable for applications such as monitoring grid stability, optimizing power flow, and identifying faults or anomalies [
4]. Their capability for high-speed, accurate data acquisition is particularly crucial in Smart Grids research, where complex and interconnected systems demand timely and precise information for reliable operation.
Despite their significance, the full utilization of PMUs is constrained by challenges in data quality and data acquisition. The availability of real-world PMU data are limited, due to issues related to privacy, security, and the inherent variability of grid conditions. These constraints impede the ability of researchers, developers, and related industries to access the necessary high-fidelity data for the testing and validation of new technologies aimed at enhancing grid performance.
High-fidelity data from Phasor Measurement Units (PMUs) is crucial for the optimal functioning of Smart Grids, enabling precise modeling of grid behavior as well as the development of advanced control algorithms and predictive maintenance strategies [
5]. The scarcity of real-world PMU data necessitates the use of synthetic data, which can effectively simulate grid operations and test new approaches in a controlled setting. For synthetic data to be useful, it must be accurate, scalable, and adaptable to various grid configurations. As the demand for advanced and resilient energy systems increases, the importance of frameworks that generate high-fidelity synthetic datasets will grow accordingly [
6].
This article presents a new application for the generation of synthetic datasets of PMU measurements. Differently from the recent literature about synchrophasor estimation techniques, the article introduces an open-access tool and intends to establish a common tool to compare real-world datasets in terms of phasor measurements. The objective is not the further improvement of estimation accuracy or reporting latency. Conversely, the focus is on the definition of a white-box model that can be easily customized in order to reproduce realistic streams of PMU measurements. The foreseen application is the generation of synthetic datasets for the characterization and validation of machine learning applications, e.g., state estimators, fault locators, virtualized protections, etc.
The subsequent sections of this article will explore the development of a digital metrology framework designed to address these needs mentioned above, based on the creation of high-fidelity and metrologically relevant synthetic PMU data, to drive the next generation of Smart Grid research and innovation. We consider that, otherwise, the scarcity of real-world PMU data will restrict all significant advances in the field.
Section 2 details the challenges of real-world PMU data for Smart Grids and the need for synthetic data to foster innovation in the field.
Section 3 presents the needs and benefits of synthetic data for research and development in Smart Grid.
Section 4 introduces the concept of sensor network metrology as a significant path for research and the future of metrology, mainly focusing on the potentiality of synthetic data for the development of digital representations.
Section 5 presents the overall conceptual development of the framework.
Section 6 details the presented case study and describes the synthetic PMU data generator, as the base for a digital metrology framework. Finally,
Section 7 concludes the potential of the proposed framework while
Section 8 mentions further potential developments, relevant to the presented Open Science aspects, to take into consideration on the path towards open research on Smart Grids.
2. Challenges in Obtaining Real-World PMU Data
The development and implementation of Smart Grid technologies heavily depend on access to accurate and comprehensive data. Phasor Measurement Units (PMUs) are crucial in this context, providing real-time, high-resolution data indispensable for monitoring grid stability, detecting faults, and optimizing power flow.
However, despite the critical importance of PMU data, there is a significant scarcity of real-world datasets available for research and development.
2.1. Data Scarcity and Privacy Concerns
One of the most significant reasons for the scarcity of PMU data primarily stems from data privacy concerns. PMU data reveals sensitive information about power grids, which poses security risks if exposed. Thus, utilities are reluctant to share it due to sophisticated threats to cybersecurity [
7]. The potential for data breaches and unauthorized access complicates not only the sharing of PMU data but also Smart Grid research in general [
8]. Additionally, stringent data privacy regulations and compliance requirements hinder data sharing, particularly across borders.
These legal constraints further complicate the acquisition of real-world PMU data for Smart Grid research. Adding to this issue, only a few utilities have deployed PMUs at a scale that provides comprehensive insights, and even when deployed, the data are often restricted to internal use, limiting external research opportunities [
7].
2.2. Variability and Inconsistencies
A major challenge in obtaining real-world PMU data is the inherent variability in grid operations. Factors such as weather, demand fluctuations, and renewable energy integration cause significant differences in data over time and locations [
9].
This variability complicates the development and testing of new algorithms and models as PMU data can be incomplete or insufficiently detailed. Moreover, it poses a challenge for researchers who need consistent, high-fidelity data to develop and test new algorithms, models, and technologies. In many cases, the data available from PMUs is incomplete or lacks the resolution necessary for detailed analysis. For instance, certain grid events or disturbances may be only partially captured or missed entirely, depending on the placement and sensitivity of the PMUs. This can lead to gaps in the data that hinder the ability to accurately model grid behavior and predict potential issues.
Inconsistencies in data collection methods and standards across different utilities further complicate the situation. PMU devices may have varying levels of accuracy, sampling rates, and synchronization capabilities, leading to discrepancies in the data they produce. These inconsistencies make it difficult to aggregate data from multiple sources or compare results across different studies, limiting the ability to generalize findings or apply them to broader contexts.
2.3. Impact on Research and Development
The combination of data scarcity, privacy concerns, and variability presents a significant barrier to advancing Smart Grid technologies. This lack of properly curated PMU data also hinders the ability to conduct large-scale simulations and stress tests, which are critical for understanding how new technologies will perform under different conditions.
As a result, the pace of innovation in Smart Grid technology is slowed [
10], with potential breakthroughs being delayed or missed altogether, proven by the lack of big data essential characteristics [
11,
12], i.e., the 5 Vs of Big Data: volume, variability, velocity, variety, and value, describing them in detail [
13]. Given these mentioned challenges, developing alternative approaches to real-world PMU data acquisition becomes essential.
By creating high-fidelity synthetic PMU data that accurately mimic real-world conditions, researchers can overcome the limitations of data scarcity and variability, paving the way for more rapid and robust advancements in Smart Grid technologies. This is why synthetic data generation, as discussed in the subsequent sections, plays a crucial role in Smart Grid research [
14].
3. The Need for Synthetic Data in Smart Grid Development
As the complexity and scale of power grids continue to grow, the need for accurate, comprehensive data becomes ever more critical in the development of Smart Grid technologies. However, as discussed in the previous sections, the challenges associated with obtaining real-world Phasor Measurement Unit (PMU) data—such as scarcity, privacy concerns, and variability—create significant barriers to innovation.
To address these challenges, synthetic data has emerged as a powerful tool—currently considered crucial—to bridge the gap between the need for high-fidelity, metrologically relevant data and the limitations of real-world data availability.
3.1. Benefits of Synthetic Data
Synthetic data refers to artificially generated data that accurately mimic the characteristics and patterns of real-world data. In the context of Smart Grids, synthetic PMU data can be created to simulate a wide range of grid conditions and scenarios [
15].
This capability is invaluable for researchers and developers who need to test new algorithms, models, and technologies in a controlled and repeatable environment for Smart Grid research. Hence, by using synthetic data, it becomes possible to explore the behavior of the grid under extreme or rare conditions that might not be easily (or even possibly) captured in real-world datasets.
In addition to that, another significant benefit of synthetic data is their ability to provide a safe and secure environment for experimentation [
15]. Since synthetic data are generated artificially, they do not contain any sensitive or proprietary information that could pose security risks if shared or published. This allows collaboration across institutions, since sharing data openly and adhering to data privacy regulations becomes simpler, without compromising the integrity of critical infrastructure.
Furthermore, synthetic data can be generated in large volumes and tailored to specific research needs. This scalability is particularly important for conducting stress tests, simulations, and other forms of analysis that require vast amounts of data to produce statistically significant results [
15].
3.2. Metrological Perspective and Impact
From the metrology perspective, synthetic data allows for the controlled generation of traceable datasets, ensuring consistent reference points for the validation of algorithms and models in Smart Grid applications.
By simulating grid conditions with known uncertainties, synthetic data enables a more precise assessment of measurement uncertainty, a critical factor for improving the reliability of power system monitoring and control. Additionally, synthetic datasets can be tailored to evaluate performance under diverse operational scenarios, facilitating robust uncertainty analysis and ensuring that developed solutions align with traceable, standardized measurement frameworks.
Researchers can therefore generate and use metrologically relevant datasets that can include a wide variety of scenarios, since the proposed synthetic PMU data generator can be fed by point-on-wave (PoW) inputs from normal operating conditions upon extreme events such as cascade blackouts or cyber-physical attacks, providing a comprehensive testing ground for new Smart Grid technologies.
Point-on-wave refers to the precise measurement and monitoring of electrical parameters at specific points within a power wave. Essentially, it involves capturing data on the exact phase angle of the voltage and current waveforms in real time [
16].
3.3. Overcoming Limitations of Real-World Data
While real-world PMU data are invaluable, they are often limited in scope and quality due to factors such as inconsistent data collection, incomplete datasets, and the natural variability of grid operations [
17].
These limitations can hinder the ability to fully understand and predict grid behavior, especially in the face of emerging challenges like the integration of renewable energy sources, increased demand, and the threat of cyber-physical attacks [
7].
Synthetic data addresses these issues by offering consistent, tailored datasets that fill gaps in real-world data, improving model accuracy and supporting the development of resilient Smart Grid technologies. It can also simulate future grid scenarios, enabling proactive planning and testing of algorithms in varied conditions, ensuring robustness, and reducing risks before real-world deployment [
15].
3.4. Evolving Smart Grid Scenarios
Another critical advantage of synthetic data is their ability to simulate future grid conditions and scenarios that have not yet been observed in the real world. As Smart Grids evolve and new technologies are integrated, the grid’s behavior may change in ways that are difficult to predict based solely on historical data. Synthetic data allows researchers to explore these potential future states, enabling proactive planning and development that can anticipate and address future challenges before they arise [
17].
This advantage of synthetic data over real-world data is particularly important in the context of Smart Grids, where the stakes are high and the margin for error is small. By thoroughly testing new technologies with synthetic data, researchers can reduce the risk of unexpected failures and ensure that innovations are ready for deployment.
3.5. Enabling Innovation in Smart Grid Technologies
The ability to generate high-fidelity synthetic PMU data is not just a convenience; it is a necessity for the continued advancement of Smart Grid technologies.
As the energy landscape becomes more complex and the demand for reliable and sustainable power grows, the need for innovative solutions becomes increasingly urgent. Synthetic data provides the foundation upon which these innovations can be built, offering researchers the tools they need to explore new ideas, test new approaches, and push the boundaries of what is possible in the world of Smart Grids [
18]. By overcoming the limitations of real-world data and providing a flexible, scalable, and secure environment for research, synthetic data generation is poised to play a central role in the next wave of Smart Grid development [
19].
The creation and use of synthetic PMU data will be crucial in ensuring that Smart Grids are not only more efficient and resilient, but also capable of meeting the challenges of the future [
20]. In this sense,
Table 1 summarizes all discussed benefits for synthetic PMU data as a central part of the Smart Grid development.
4. Synthetic PMU Data Generator for Digital Metrology
As we move into the digital age, the field of metrology is undergoing a profound transformation [
21]. At the heart of this evolution lies sensor network metrology, a promising new area that has the potential to redefine how we measure and interpret the world around us.
This emerging domain not only exemplifies the convergence of traditional metrological principles with cutting-edge digital technologies [
22], in particular for the design and development of Smart Cities, but it also aligns with the growing demand for real-time, high-resolution data across various sectors, also known as systems of systems (SoS) [
23].
4.1. Sensor Network Metrology: A New Paradigm
Sensor networks represent a shift from isolated measurement instruments to interconnected systems that offer continuous monitoring and analysis over large spatial and temporal scales. They mark a significant evolution from traditional, standalone measurement instruments and related metrological services to highly integrated systems (and corresponding metrological services) capable of real-time, continuous monitoring and analysis across vast spatial and temporal scales.
This shift is largely due to innovations in wireless communication, advanced data processing techniques, and the miniaturization of sensor technology. Unlike conventional measurement tools that typically function in static environments with limited flexibility, sensor networks are designed to adapt and operate in dynamic, ever-changing conditions.
This adaptability enhances the scope and accuracy of measurements, allowing for more comprehensive monitoring solutions. According to the [
24], these interconnected systems play a crucial role in system life cycle management by enabling greater interoperability and system resilience across various stages of development. Typical characteristics of Systems of Systems include [
25]:
Operational and managerial independence;
Geographical distribution;
Emergent behavior;
Evolutionary development;
Heterogeneity of constituent systems.
This SoS transformation, then, is driven by advancements in wireless technology, data processing, and the miniaturization of sensors. Therefore, unlike conventional measurement systems, sensor networks can operate in dynamic environments, providing a more comprehensive and adaptive approach to metrology.
For instance, in environmental monitoring, sensor networks can be deployed across vast areas to collect data on temperature, humidity, and pollutants in real time. These data are then processed using sophisticated algorithms, allowing for predictive analytics and more accurate modeling of environmental changes [
26].
The potential applications of sensor network metrology are vast and extend beyond Smart Cities [
27], ranging from industrial automation to healthcare and agriculture.
4.2. Advantages and Challenges of Sensor Network Metrology
The benefits of sensor network metrology seem, at first sight, evident.
Firstly, it allows for the continuous collection of data and bidirectional communication, offering a real-time snapshot of conditions that can be critical in fields like manufacturing, where even slight deviations can have significant consequences, allowing industrial big data-driven decision-making in the field of intelligent manufacturing [
28].
Secondly, the distributed nature of sensor networks enhances the robustness and reliability of measurements, as data sets from multiple sensors can be cross-referenced and validated, one of the most significant advantages of sensor networks, i.e., considering the systems of systems (SoS) one overall measuring system, bringing into light complex solutions for distributed estimation [
29].
However, the implementation of sensor network metrology is not without challenges. One major hurdle is the need for standardized protocols and calibration methods to ensure the accuracy and interoperability of sensors within the network. Additionally, managing the vast amounts of data generated by these networks requires advanced data storage and processing solutions, as well as robust cybersecurity measures to protect sensitive information, e.g., the multifaceted challenges of big data security and privacy [
30].
The role of smart sensors within systems of systems (SoS) in enhancing the interoperability of Smart Grids through standardization has been discussed in depth [
31]. The importance of integrating sensor networks by means of smart sensors, such as PMUs, focuses on ensuring bidirectional communication and efficient data exchange within complex systems. This aids current research and further development and deployment of solutions by adhering to standardized frameworks for monitoring and controlling grid operations [
32].
4.3. The Role of Digital Representations
The future of sensor network metrology relies mostly on digital representations, differentiated in
Figure 1. Digital representations, with grid-related examples, can be classified as the following [
33]:
Digital Model: A digital representation of a physical system or object (e.g., a network infrastructure map that utilizes data from a fixed point in time);
Digital Shadow: A digital model that integrates automated one-way data flow from the physical system or object (e.g., a network infrastructure map that pulls data from the system to dynamically update inventory, asset state, and constraints); and
Digital Twin: A digital model which integrates two-way data flow between the model and physical object or system. Where making a change to one can change the other for example a control center network map that displays real-time system status and enables engineers to control assets to mitigate issues (e.g., a network infrastructure map that utilizes data from real-world to adapt its model and the corresponding predictions to control the monitored system).
Focusing on digital shadows and twins —-also known as virtual replicas —-of physical systems, researchers can simulate, predict, and optimize performance [
34]. By incorporating data from sensor networks into digital shadows/twins, it becomes possible to create highly accurate models that can anticipate system behavior under various conditions [
35], perfectly suited for research driven by synthetic PMU data-driven smart grids.
4.4. The Road Ahead for Digital Metrology
The future of metrology, as shaped by sensor network technology, is likely to be characterized by increased collaboration between national metrology institutes (NMIs), industry stakeholders and research organizations [
21]. To fully realize the potential of sensor network metrology, concerted efforts will need to be made to develop new standards, protocols, and calibration techniques that can accommodate the complexities of these systems [
35].
Sensor network metrology represents a significant step forward in the digital transformation of metrology [
21]. As we expand into Smart Grid development and deployment, the continued integration of sensor networks into the metrological frameworks will be essential in meeting the demands of an increasingly data-driven world.
4.5. Frameworks Developed for Research on Digital Metrology
The development of digital metrology frameworks, mainly grounded in metrologically relevant synthetic data generators, represents a critical advance in sensor network metrology, particularly for applications such as Smart Grids and Smart Cities [
36].
These frameworks can establish a foundational knowledge base by addressing key metrological aspects, including the generation, validation, and application of synthetic data under traceable methods, like the one presented here for PMU data sets. By providing high-fidelity data, the suggested Smart Grid framework will allow researchers to simulate various scenarios and conditions, thereby improving the understanding of sensor network behavior and performance in various operational contexts [
37].
The bases for the proposed Smart Grid research framework presented here are the culmination of a series of PMU-based research and development for Smart Grids that includes [
38,
39,
40,
41,
42].
We believe that this synthetic PMU data generator is a milestone, crucial for developing reliable models and algorithms that can effectively manage and interpret dynamic environments while setting a precedent for future studies in sensor network metrology and paving the paths for innovations in technologies such as Smart Grid or Smart City, where precise, accurate, and adaptable metrologically relevant data and metadata will be essential for optimal system performance, monitoring, and resilience.
5. Synthetic PMU Data Generator for the Proposed Frameworks
The proposed framework for digital metrology will be able to generate and utilize synthetic Phasor Measurement Unit (PMU) data from point-on-wave (PoW) inputs, designed to address the limitations of real-world data by providing research opportunities in robust, adaptable, and accessible solutions for Smart Grid research.
The proposed minimum architecture of this type of framework consists of three key components, each serving a specific role in ensuring the generation, validation, and application of high-fidelity synthetic data. These components are (1) the Synthetic Data Generation Module (in the present article, synthetic PMU data to be precised); (2) validation and verification for metrologically relevant modules; and (3) integration modules with Smart Grid simulators by reusable data structure and syncing procedures.
Although the overall design of these modules is out of the scope of the current article, we consider it necessary to detail them in short descriptions, to properly define the context for the presented synthetic PMU data generator.
5.1. Data Generation Module
At the core of the proposed framework is the Data Generation Module, which, in the present case study, is responsible for creating synthetic PMU data for Smart Grid analytics. This module employs advanced algorithms and models that are metrologically relevant to simulate the electrical characteristics of power grids under various conditions, as presented in detail in the following section.
Synthetic data generators allow different type of simulation environment modules to test stochastic models, machine learning techniques, and system dynamics models [
43], under metrologically relevant considerations [
44].
5.2. Validation and Verification of Synthetic Data Generator
Ensuring the accuracy and reliability of synthetic data will be crucial for its effective application in Smart Grid research.
All digital metrology frameworks must also incorporate rigorous validation and verification processes to achieve this. Validation will involve comparing synthetic data against known benchmarks and real-world datasets to assess its reliability and accuracy, as simplified in [
45].
This validation process may include statistical analysis, consistency checks, and scenario-based testing. Verification, on the other hand, will ensure that the data generation algorithms and models function correctly and produce reliable results. Techniques such as cross-validation, sensitivity analysis, and error analysis tend to be used to confirm the integrity of the data and the correctness of the models [
46].
Altogether, these kinds of processes by means of in-environment modules (i.e., additional hardware components and/or software modules) will ensure that the synthetically created PMU datasets meet the required specifications on digital shadows/twins for the research, development, and application in the field of Smart Grid.
5.3. Integration with Smart Grid Simulators
The Synthetic PMU Data Generator presented in the following section has been designed to seamlessly integrate with any existing Smart Grid simulators and/or tools, i.e., our openly accessible asset is aimed to be a simulators-environment-independent tool.
Hence, the integration must be achieved through standardized interfaces and known structured data exchange protocols, which facilitate interoperability between the proposed digital metrology framework and any other existing Smart Grid simulation platform [
47]. By incorporating the synthetic PMU data generated from the proposed digital metrology framework into any other Smart Grid simulators, researchers and developers can test and validate their own developed new technologies and algorithms in their own controlled environment, without having to deal with unstructured data or develop their own parsers and/or importing modules, reducing investment for further development [
48].
6. Synthetic PMU Data Generator
In this section, we introduce the PMU Data Generator and we validate its performance in standard and realistic test conditions. Firstly, we briefly describe the main algorithmic steps and the input and output variables of the generator. The algorithm performance is evaluated by means of full compliance testing against the P-class requirements of the IEC 60255-118-1 [
49]. Secondly, we provide a simple user guide for customizing the Synthetic PMU Data Generator according to the specific requirements of the particular dataset under analysis. Finally, we present the results obtained on two well-known datasets taken from the recent literature of PMU-based measurement application.
6.1. Modeling Assumptions
In modern power systems’ theory, voltage and current waveforms are typically represented as a linear combination of different components [
50]:
where
t is the time-independent variables and the parameters
A,
f, and
denote the amplitude, frequency, and initial phase of the fundamental component. The functions
and
account for any variation of amplitude and phase over time. The additional terms
and
model narrow- and wide-band spurious components. The former include harmonic and out-of-band components that can be well approximated by sinusoidal tones.
The latter is typically related to measurement noise or any continuous spectrum disturbance that cannot be represented as a sum of sinusoidal tones.
In this scenario, the PMU is a measurement device capable of extracting the main parameters associated with the fundamental component in a given reporting time instant. There are three main measurements: First, the synchrophasor that consists of the vector representation of the fundamental amplitude and phase in a polar plane rotating at the nominal system frequency. Second, the fundamental frequency , which may not coincide with the nominal system one. Third, the Rate Of Change Of Frequency (ROCOF) , which is the first-order time derivative of frequency.
In other words, the PMU maps the signal in (
1) in a simplified and compressed model:
The transition from (
1) to (
2) can be seen as a lossy compression [
51]. Nevertheless, it is important to underline two considerations: First, the PMU limits its analysis to the fundamental components, as this is the one considered in most monitoring and control applications. Second, the PMU considers relatively short observation intervals (few cycles of the nominal system frequency) and adopts a reasonably high reporting rate (typically, once per cycle). It is thus reasonable to expect that the simplification introduced in (
2) is still capable of capturing the main dynamics of the power signal under analysis.
6.2. Algorithmic Details
The synthetic PMU Data Generator emulates the operating functions of a real-world PMU but provides some extra insights into the quality of the measurement results. The main steps of the data processing routine are summarized in Algorithm 1.
Algorithm 1 PMU Data Generator |
- 1:
Input: ▹ input variables and parameters - 2:
Require: ▹ corresponding TFM basis matrix - 3:
for do - 4:
▹ extract the i-th observation interval, pu - 5:
▹ projection over the TFM basis - 6:
▹ synchrophasor at i-th reporting time, pu - 7:
▹ frequency at i-th reporting time, Hz - 8:
▹ ROCOF at i-th reporting time, Hz/s - 9:
▹ recovered fundamental component, pu - 10:
▹ estimated SNR at i-th reporting time, dB - 11:
if then WDF ▹ activated wide-band distortion flag - 12:
else WDF ▹ de-activated wide-band distortion flag - 13:
end if - 14:
▹ frequency oscillation depth, pu - 15:
if then NDF ▹ activated narrow-band distortion flag - 16:
else NDF ▹ de-activated narrow-band distortion flag - 17:
end if - 18:
function OoB Sniffer - snob() - 19:
Input: ▹ 1-s frequency measurement FIFO, Hz - 20:
▹ frequency range of interest for OoB, Hz - 21:
▹ spectrum of frequency oscillation, pu - 22:
▹ spectrum peak detection, pu/Hz - 23:
end function - 24:
end for
|
The input variables are as follows: The signal is a column vector of uniformly sampled values of a voltage or current waveform. The parameters , , and denote the nominal system frequency, the sampling rate, and the reporting rate, respectively. The nominal system frequency can be set to 50 or 60 Hz, depending on the grid configuration under analysis. The sampling rate relies on for 50 Hz systems, then the sampling rate can be set to 10, 18, or 50 kHz, while for 60 Hz systems, the sampling rate can be set to 12, 18, or 60 kHz. In a similar way, the reporting rate is a function of . The synthetic PMU Data Generator implements the reporting rates required by the IEC Std but is also capable of operating in a sample-by-sample mode (i.e., ). Such a high reporting rate may be impractical in real-world monitoring and control infrastructures, but it represents a useful tool for in-depth analysis of unexpected behavior of the network as well as for the tracking of sudden variations of the signal main parameters.
The thresholds and set the corresponding criteria for the detection of low-quality PMU measurements. The former indicates the expected Signal-to-Noise Ratio (SNR) we expect for the signal under test (e.g., dB). The latter defines a lower limit for the detection of out-of-band components, also known as sub- and inter-harmonics. By exceeding one or both of these thresholds, the PMU Data Generator is aware of operating in non-ideal conditions. In other words, the consistency between the model of the measurement and the signal under test is questionable and the definitional uncertainty may become predominant when compared to the other intrinsic and algorithmic contributions. In this case, the PMU measurements are flagged as potentially unreliable and the quantities involved in the thresholding process are also reported.
The algorithm considers observation intervals of 80 ms , partially overlapped in order to reproduce the desired reporting rate (line 4). Such an interval corresponds to 4 and 5 nominal cycles at 50 Hz and 60 Hz system rates, respectively. The interval under analysis is projected over a Taylor-Fourier Multifrequency (TFM) Basis , specifically designed to minimize the spurious interference from low-order harmonic components (line 5). The projection produces the Taylor expansion coefficients of the fundamental component referred to as the interval mid-point. The 0th-order term is the synchrophasor (line 6). The 1st and 2nd order terms account for frequency and its rate of change (ROCOF) variations from the nominal values (lines 7 and 8).
Based on the estimates of synchrophasor, frequency, and ROCOF, the fundamental component is recovered in (line 9). By comparing with , the residual energy provides an estimate of the current SNR (line 10). If this value exceeds , the first flag is raised (lines 11 to 13). In parallel, a function called OoB Sniffer analyzes a 1 s buffer of the frequency measurements and allows for detecting possible out-of-band components in terms of component frequency and magnitude (lines 18 to 23). If the out-of-band component magnitude exceeds , the second flag is raised (lines 14 to 17).
The recent literature has proposed different applications of TFM basis to phasor estimation. In this case, the estimation approach is inspired by the so-called Enhanced Taylor–Fourier Multifrequency Model [
52]. In this paper, though, the frequency response of the TFM basis has been optimized in order to guarantee maximum flexibility between compliance with dynamic conditions and prompt response. Indeed, it is worth mentioning that the frequency response of any TFM basis depends on the included frequencies, on the derivative orders associated with each component, and on the windowing functions applied to the input signal.
6.3. Metrological Characterization
Before applying the presented PMU Data Generator on real or synthetic datasets, it is important to characterize its performance with respect to the reference standard for PMU applications, namely the IEC 60255-118-1 [
49]. For this analysis, we consider the P-class requirements as P-class PMUs are the most widely employed in control applications like protections, load shedding relays, or fast response mechanisms. Nevertheless, an equivalent PMU Data Generator for the M-class requirement is ready to be shared on the same repository (i.e., Zenodo Community: Research in Sensor Network Metrology) in the upcoming months. More information is in the Further Work section.
In the following, the performance of the PMU Data Generator is characterized in terms of the metrics indicated by the IEC 60255-118-1. To guarantee a statistically relevant sample, if not otherwise specified, each test has been carried out for a total duration of 10 s and with the highest possible reporting rate. In this regard,
Table 2 reports the worst-case Total Vector Error (TVE), Frequency Error (FE), and ROCOF Error (RFE) for each test. These metrics are compared against the P-class requirements to facilitate compliance verification.
In order to reproduce a plausible operating condition, the test waveform has been corrupted with an uncorrelated white Gaussian noise with SNR equal to 80 dB. The sampling rate has been fixed to 10 kHz and 12 kHz for 50 Hz and 60 Hz nominal system frequency, respectively.
The choice of the lowest sampling rate is conservative, as this implies a larger impact of measurement noise on the results. In the following paragraph, the sensitivity to both noise levels is further discussed.
As reported in
Table 2, the measurements provided by the synthetic PMU Data Generator comply with the P-class performance requirements in all tests. It is worth noticing that the TVE never exceeds 0.05% even in the presence of harmonic components or dynamic test conditions. The most challenging condition proves to be the harmonic distortion test. In this regard, it is worth observing that the test requires injecting a 1% distortion on each single harmonic component up to the 50th order. The worst case here reported refers to the 6th order harmonic. In reality, though, it is unlikely that the analog front end of the measurement infrastructure would present such a high distortion at an even harmonic. Nevertheless, the synthetic PMU Data Generator complies with the limit despite the combined effect of wide- and narrow-band distortions.
In
Table 3, we report the performance of the PMU Data Generator in the presence of a step change variation in the signal magnitude or phase. For this analysis, we set the step occurrence after 547.27 ms from the beginning of the test waveform. In this way, the step occurrence does not correspond to an exact reporting instant with any of the considered reporting rates.
According to the IEC 60255-118-1 requirements, the measurements have been evaluated in terms of response time, delay and overshoot [
49]. All the metrics comply with the standard requirements. In this context, the most challenging condition is represented by the phase step changes, where larger overshoots are noticed. Nevertheless, given the relatively small response times, it is reasonable to say that this would not largely affect the overall performance of the PMU Data Generator.
6.4. Missing and Invalid Data
When dealing with real-world datasets, it is likely to come across one or more data points that are either missing, corrupted, or invalid. In order to simplify the treatment of similar scenarios, the PMU Data Generator has been equipped with a simple yet effective interpolation functionality.
The missing or invalid data should be denoted as Not-A-Number, while the corresponding entries in the time axis should be kept unaltered. The missing portion will be recovered by means of shape-preserving piecewise cubic Hermite polynomials. In this sense,
Figure 2 shows an example of a waveform with a missing portion of data and the equivalent interpolated reconstruction. On the other hand, it is worth underscoring that this operation extrapolates the waveform features from the preceding samples and represents only a plausible guess of the possible waveform evolution during the considered time interval. There is no guarantee about the trustworthiness of the obtained waveform.
For instance, the test signal considered in
Figure 2 refers to a power outage that hit the Pacific Southwest power system on 8 September 2011. The signal is not stationary and presents an underlying amplitude and phase modulation. The recovered signal in red is consistent with the rest of the samples but may introduce slight discontinuities at the end of the recovered portion.
6.5. Parameter Setting
As discussed in the previous subsection, the PMU Data Generator presents two thresholds that allow for monitoring the level of wide- and narrow-band distortion levels, namely and . In order to suitably set these parameters, it is useful to perform a preliminary sensitivity analysis of the PMU Data Generator behavior in the presence of uncorrelated white Gaussian noise or out-of-band distortion components. If not otherwise specified, in the following the nominal system frequency and the sampling rate are set to 50 Hz and 10 kHz, respectively, but similar results could be obtained with any other combination of parameter values.
The left plot in
Figure 3 presents the TVE distributions as a function of the SNR. The boxplot representation shows how the errors are not symmetrically distributed around their mean values. Nevertheless, a clear descending trend is noticed (expectedly) when the SNR increases. The right plot, instead, shows the error of the SNR estimated by the PMU Data Generator in different test conditions. In this case, along with the P-class tests already reported in the previous section, we included three tests with an out-of-band distortion component. For this analysis, we consider an inter-harmonic component whose frequency is set to 75 Hz and whose magnitude is varied between 0.1, 1, and 10% of the fundamental one. The bar plot shows clearly that the estimated SNR is capable of clearly detecting the presence of an inter-harmonic component.
In this context, the left plot in
Figure 4 shows the worst-case TVE and FE as a function of the out-of-band component magnitude. In terms of TVE, the performance degradation is reduced: the 1% target performance is guaranteed up to a magnitude of 5% of the fundamental one. Conversely, the frequency estimates prove to be severely affected by out-of-band distortion. A 0.5% component magnitude is sufficient to exceed the performance target of 10 mHz. In the right plot of
Figure 4, we show the magnitude of the out-of-band component as estimated by the PMU Data Generator. For this analysis, we vary the component frequency according to the M-class compliance test of the IEC 60255-118-1, and we consider two component magnitudes, namely 1 and 10%. It is worth noticing that the estimated magnitude is significantly larger than the noise floor (
for an 80 dB SNR) for both the considered configurations. By properly setting the threshold
, it is then possible to detect the presence of out-of-band distortions and activate the NDF flag.
A similar example is represented by the measurement behavior in the presence of a transient or a parameter step change. In the left plot of
Figure 5, we compare the synchrophasor magnitude estimates vs. the true profile of the signal magnitude.
This plot shows the advantage of setting the reporting rate equal to the sampling rate: the time resolution is so fine, that it is possible to evaluate the actual step response of the algorithmic chain. The right plot of
Figure 5, instead, shows the corresponding time evolution of the SNR estimate. As soon as the step change is within the considered observation interval, the estimation accuracy degrades and the estimated SNR collapses from its expected value of 80 dB. By properly setting the threshold, it is possible to detect these events and suitably mark the measurements with the WDF flag to indicate a potential case of signal model inconsistency or high distortion.
As mentioned in
Section 6.2, the observation interval length is set by default to 80 ms. This parameter choice is based on the fact that 80 ms is the maximum interval duration to guarantee a reporting latency compliant with P-class requirements, and—at the same time—the minimum interval duration to allow for sufficient resolution in the spectral domain to distinguish out-of-band disturbances. In case shorter observation intervals are needed, it is possible to exploit zero-padding techniques. In this regard, it is important to underline that zero-crossing should be applied in a symmetric way: the real signal should be placed in the middle of the interval as the final estimates will be referred to the 80 ms interval midpoint. Another aspect to be stressed is that zero-padding allows for increasing the granularity of the spectral domain representation, but the actual resolution depends on the number of cycles of real signal.
Figure 6 provides a possible application of this technique. On the left, the original signal window of 80 ms consists of a fundamental tone at 50.05 Hz and a 1% out-of-band disturbance at 75 Hz. The zero-padded version consists of only 40 ms. On the right, the corresponding spectral representation is given by the DFT. As expected, the resolution in the spectral domain for the zero-padded version is much scarcer and risks to affect the capability of properly detecting the out-of-band component.
6.6. Validation on Test Cases
After having introduced the algorithmic and setting details, it is necessary to demonstrate the feasibility and practicality of the synthetic data generator with real datasets. To this end, in the following, we consider two test cases taken from the recent literature on PMU-based monitoring and control applications. If not otherwise specified, the threshold and are set equal to 60 dB and 0.05% of the nominal fundamental magnitude, respectively. The first one is intended to spot any dataset where the measurement noise is larger than 60 dB (feasible in real-world scenarios, yet inappropriate for high-accuracy synchrophasor estimation). The second one allows for detecting possible small disturbances in the proximity of the fundamental component.
The first data set makes reference to the well-known IEEE 5-bus benchmark model. In this case, we consider a varied configuration, specifically designed to evaluate an Under Frequency Load Shedding (UFLS) scheme, as described in [
53]. This model has been programmed in the MatLAB Simulink environment and is publicly available at [
54]. For this analysis, we consider the voltage signal at bus 1 for an overall test duration of 30 s.
At t = 5 s, the generator output power at bus 2 is suddenly reduced by 300 MW. The consequent UFLS is successful in preventing the system black-out, but it produces a progressive shed of power at buses 4 and 5. In the following phase, a progressive load restoration progressively brings back the system to quasi-normal operating conditions.
As shown in the left plot of
Figure 7, the reference values of frequency and magnitude exhibit the effects of the power outage and of the UFLS procedures. Each intervention corresponds to a sudden oscillation, whereas the progressive comeback to a normal operating condition is characterized by a frequency ramp and a slowly dampened amplitude modulation. The right plot, instead, focuses on the frequency measurements during the first phase of the event. In the upper graph, it is worth noticing how the synthetic PMU Data Generator correctly tracks the descending ramp, but it is also capable of capturing the fast transient variations at 5, 6.5, 6.75, and 7.25 s. The lower graph shows the frequency error with respect to the reference values and compares it with the performance targets for the closest test conditions, namely a phase modulation (60 mHz) and a frequency ramp (10 mHz). Most of the time, the frequency errors are largely within the performance targets. The only exceptions are represented by the four discontinuities. Only the first one (i.e., the largest one) is characterized by an error of several tens of mHz. Nevertheless, the divergence between measured and reference values last only 70 ms (i.e., within the response time limit for a phase step change).
The second data set is taken from the well-known Great Britain 34-bus benchmark model [
55]. The model has been programmed in DIgSILENT PowerFactory environment, publicly available at [
56].
In this case, we consider a varied configuration where the interconnection with France IFA2 is tripped at t = 0.05 s and results in the loss of 1 GW. For the sake of simplicity, this analysis refers to the voltage signal at bus 1 for an overall duration of 1 s.
In this regard, the left plot of
Figure 8 shows the frequency and magnitude evolution in the phases before and after the contingency. In a similar way to the preceding test case, the right plot shows two graphs. The upper one presents the frequency errors: They remarkably exceed the performance targets only for a few ms in correspondence with the tripping event, while they keep around 0.01 mHz for the rest of the acquisition (despite the presence of a frequency ramp and minor oscillations). In the lower graph, we present the estimated SNR.
It is interesting to observe how this value takes nearly half a second to converge to the optimal value of 80 dB. This proves that the PMU estimates are capturing only a part of the energy in the signal during such an event. Therefore, it is necessary to take into account that the PMU measurements may provide a limited or incomplete representation of the current situation.
6.7. Application Example
Once proven, the reliability of the PMU Data Generator in several test cases showcases a possible application. In this sense,
Figure 9 shows the distribution of ROCOF measurements in the aforementioned IEEE 5-bus dataset. On the left, the histogram of the synthetic measurements is fitted against a Rayleigh distribution. On the right, a new set of synthetic measurements is obtained by reproducing random numbers according to the same Rayleigh distribution (parameter
B equal to 0.053). The similarity between the two distributions is noticeable in the histogram plot and can be quantified with a Kolmogorov–Smirnov distance of 0.98.
7. Conclusions
In this article, we have explored the transformative potential of synthetic PMU data and introduced a comprehensive digital metrology framework, designed to address the challenges of real-world data scarcity and variability.
Synthetic PMU data will play a crucial role in advancing Smart Grid technologies by providing a controlled, scalable, and secure environment for testing and validation. Hence, our proposed digital metrology framework leverages state-of-the-art algorithms and models to generate high-fidelity synthetic data, ensuring that researchers and developers have access to the reliable and adaptable datasets needed to drive innovation.
Additionally, the proposed framework will be refined to better support the principles of Open Science and FAIR data management, ensuring that synthetic data remain accessible, interoperable, and reusable [
57]. These advancements will not only further the field of digital metrology but also strengthen the role of sensor network metrology in research and innovation.