Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis

Pisla, Mihai-Alexandru; Enache, Bogdan-Adrian; Argyriou, Vasilis; Sarigiannidis, Panagiotis; Seritan, George-Calin

doi:10.3390/electronics14163297

Open AccessArticle

Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis

by

Mihai-Alexandru Pisla

^1,*,

Bogdan-Adrian Enache

^1,*

,

Vasilis Argyriou

²

,

Panagiotis Sarigiannidis

³

and

George-Calin Seritan

¹

Faculty of Electrical Engineering, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, Romania

²

Faculty of Engineering, Computing and the Environment, Kingston University, Kingston upon Thames KT1 2EE, UK

³

Department of Electrical and Computer Engineering, University of Western Macedonia, 50100 Kozani, Greece

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(16), 3297; https://doi.org/10.3390/electronics14163297

Submission received: 8 July 2025 / Revised: 13 August 2025 / Accepted: 15 August 2025 / Published: 19 August 2025

(This article belongs to the Section Circuit and Signal Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The paper presents an FPGA-based, hardware-accelerated IEC 61850-9-2 Sampled Measured Values (SMV) subscriber—termed the high-speed SMV subscriber (HS3)—by integrating real-time energy-quality (EQ) analytics directly into the subscriber pipeline while preserving a deterministic, microsecond-scale operation under high stream counts. Building on a prior hardware decoder that achieved sub-3 μs SMV parsing for up to 512 subscribed svIDs with modest logic utilization (<8%), the proposed design augments the pipeline with fixed-point RTL modules for single-bin DFT frequency estimation, windowed true-RMS computation, and per-sample active power evaluation, all operating in a streaming fashion with configurable windows and resolutions. A lightweight software layer performs only residual scalar combinations (e.g., apparent power, form factor) on pre-aggregated hardware outputs, thereby minimizing CPU load and memory traffic. The paper’s aim is to bridge the gap between software-centric analytics—common in toolkit-based deployments—and fixed-function commercial firmware, by delivering an open, modular architecture that co-locates SMV subscription and EQ pre-processing in the same hardware fabric. Implementation on an MPSoC platform demonstrates that integrating EQ analytics does not compromise the efficiency or accuracy of the primary decoding path and sustains the latency targets required for protection-and-control use cases, with accuracy consistent with offline references across representative test waveforms. In contrast to existing solutions that either compute PQ metrics post-capture in software or offer limited in-FPGA analytics, the main contributions lie in a cohesive, resource-efficient integration that exposes continuous, per-channel EQ metrics at microsecond granularity, together with an implementation-level characterization (latency, resource usage, and error against reference calculations) evidencing suitability for real-time substation automation.

Keywords:

SMV subscriber; programmable system-on-chip architecture; FPGA; smart grid; energy quality; signal processing

1. Introduction

In modern power systems, the ability to assess and monitor power quality in real time has become a vital requirement as networks evolve into complex, digital smart grids. The IEC 61850 standard [1] plays a key role in enabling high-speed, standardized communication between intelligent electronic devices (IEDs) in substations. In particular, the IEC 61850-9-2 Sampled Measured Values (SMV) protocol allows fast and accurate transmission of analogue measurements in digital form, providing the foundation for real-time monitoring of voltage and current waveforms. Because the standard offers concrete solutions for data transmission, encoding, and decoding, smart grids are becoming increasingly efficient at monitoring and controlling energy quality in line with consumer-driven requirements. Within these networks, intelligent electronic devices (IEDs) located in electrical substations are the first to receive digitalized measurement data, encapsulated in SMV packets sent by merging units or sensors. These IEDs act as SMV subscribers and serve as the entry point for quality monitoring in the broader smart grid hierarchy. To enable real-time control and protection of electrical equipment, critical delay thresholds must be respected throughout the measured data processing and organization process, making low-latency solutions essential for reliable operation.

Most existing SMV subscriber solutions focus primarily on the timely decoding of the incoming sampled data, but are realized in software running on CPU platforms. This software-centric approach offers advantages such as ease of development, flexibility for algorithm updates, and simplified debugging. However, there are significant drawbacks in time-critical and high-throughput scenarios: software processing introduces additional latency, incurs high memory and bus bandwidth consumption, and often results in reduced timing determinism. These issues are exacerbated as the number of subscribed data streams grows, since a general-purpose processor may struggle to sustain hundreds of Ethernet frames per cycle with consistent low latency. The lack of hardware acceleration for SMV handling and PQ computation in many IEDs thus represents a notable research gap. Ensuring real-time performance with increasing data rates and stream counts remains challenging, prompting an investigation into dedicated hardware-based approaches that can offload processing from CPUs.

This paper builds on the foundation established in prior work, where an FPGA-based SMV subscriber subsystem was developed to achieve low-latency, deterministic decoding of sampled value streams. In that prototype (termed HS3), the entire SMV parsing and distribution pipeline was implemented in programmable logic on a Zynq-7020 system-on-chip, yielding decoding latencies under 3 μs for configurations up to 512 subscribed SV streams while using under 8% of the available FPGA resources [2]. This demonstrated that parallel hardware processing can deliver orders-of-magnitude faster and more predictable performance than traditional software parsing [3]. Despite such progress, the pool of fully integrated IEDs leveraging FPGA-based acceleration remains very limited. For example, National Instruments’ CompactRIO platform with the NI-Industrial Communications for IEC 61850 Toolkit [4] supports SMV, GOOSE, and MMS message handling within a LabVIEW environment. Still, its power quality calculations are executed on the CPU, outside the FPGA subscriber pipeline. This separation adds processing delay and reduces determinism for real-time PQ monitoring. One notable exception is the SoC-e RELY SV-PCIe card [5], an FPGA-based network interface that, when paired with SoC-e’s dedicated SMV IP core, functions as a high-performance SMV subscriber. This platform supports up to 256 concurrent SV streams and includes built-in fundamental frequency and RMS measurements for each stream, integrating specific power quality analytics directly in hardware. Its reported sub-7 μs SMV decoding latency (including the basic PQ computations) represents one of the most competitive benchmarks currently available and serves as a helpful reference point for our work. However, direct comparison between such commercial solutions and research prototypes is difficult, as we are comparing production-hardened devices with a purpose-built experimental subsystem. Table 1 summarizes the main features of these existing solutions versus the proposed system, highlighting the key performance metrics and architectural differences.

The limited adoption of in-hardware SMV processing and analysis in deployed IEDs underscores several essential gaps. Firstly, there is a lack of open, flexible architectures that integrate power quality metric computations into the SMV subscriber at the FPGA level. As noted, most available toolkits either leave PQ computations to software [6] or provide only fixed-function firmware solutions [7]. This leaves system designers with suboptimal choices between flexibility and real-time performance. Secondly, supporting a high number of simultaneous SV streams with low latency remains challenging; few works have demonstrated scaling beyond a handful of streams while maintaining microsecond-level processing times. The need to handle many streams in parallel (for multi-bay or substation-wide monitoring) drives up resource usage and demands careful hardware design to avoid bottlenecks. Thirdly, ensuring the accuracy of the computed PQ parameters under varying signal conditions is a non-trivial challenge when implementing algorithms in fixed-point FPGA logic. Power system signals can be distorted (harmonics, noise, etc.), so the hardware algorithms must be robust to non-sinusoidal conditions and adhere to accepted measurement standards. Prior research has shown that it is feasible to meet strict accuracy requirements in FPGA implementations—for instance, ref. [6] achieved full compliance with IEC 61000-4-30 Class A using parallel custom processors in an FPGA-based PQ analyzer—but this often requires significant design effort and optimization. Lastly, there is a need for real-time integration of these metrics into the control loop. Even if raw measurements are delivered quickly, any delay in producing actionable metrics (frequency, RMS, etc.) could hinder fast control responses. The challenge is to compute these metrics continuously in streaming fashion without adding more than a few microseconds to the overall data pipeline latency.

Several recent research efforts have begun to address aspects of these challenges by developing FPGA-based systems for SMV subscribing and power quality analysis. In [5], an analyzer for power quality applications that directly uses IEC 61850-9-2 SV frames as input is depicted. Their system can characterize instrument transformer behavior and PQ parameters from the received SV data, demonstrating the viability of on-site PQ assessment using digital substation measurements. Similarly, ref. [6] investigates the feasibility of power quality meters based on digital inputs, i.e., using only sampled value streams from non-conventional instrument transformers. They present a prototype SV-based PQ meter focused on voltage dip detection, with a combined hardware/software architecture that was validated against a commercial PQ analyzer. The results in [7] highlight the potential of SV-driven PQ monitoring and also underscore the importance of SV stream integrity (sampling rates, packet losses, etc.) on measurement accuracy. Beyond the substation context, FPGA technology has been applied to general power quality monitoring with an emphasis on real-time performance. In [8], an FPGA-based online PQ monitoring system for distribution networks that computes standard PQ indices (e.g., THD, voltage magnitude, etc.) on-chip and transmits the results over the network in real time is introduced. Their design embeds signal processing algorithms as hardware functions in the FPGA, achieving continuous sample-by-sample evaluation of PQ parameters in compliance with IEC standards [3]. The measured data are sent via a UDP/IP stack implemented in the same FPGA, enabling wide-area monitoring with minimal latency added by external processing. In another notable work, ref. [5], a comprehensive PQ calculation engine using multiple soft-core processors inside an FPGA is presented. Each processor core was tailored to compute specific PQ metrics (voltage, current, frequency, harmonics, etc.) following the IEC 61000-4-30 Class A methodology, and ran in parallel to cover all required parameters. The prototype by Luiz et al. was shown to calculate all Class A PQ parameters within the FPGA and met the accuracy and response time requirements of the standard, while optimizing logic and memory utilization through customizable hardware processors [4]. These studies [5,6,7,8] collectively indicate a clear trend towards integrating power quality analysis with digital substation data streams and exploiting FPGA acceleration for speed and determinism. However, each also has limitations: for instance, refs. [6,7] focus on specific metrics or use cases (like instrument diagnostics or voltage dips) rather than a broad set of PQ indices, ref. [8] implements PQ analysis on FPGA but does not directly interface with the IEC 61850 process bus (instead using local ADC measurements), and [8] demonstrates full PQ compliance in hardware but not in the context of subscribing to external SV streams. This leaves room for further innovation in combining these aspects into one system.

In light of the above challenges and gaps, the current research extends the earlier HS3 subscriber architecture to incorporate real-time power quality calculations as an integral part of the SMV processing pipeline. The goal is to enable an FPGA-based SMV subscriber not only to decode incoming measurement streams with low latency, but also to immediately evaluate key power quality indicators from those streams in hardware before handing off data to any higher-level applications. By integrating these functions directly into the FPGA logic, the need for separate processing units or software-based post-processing is significantly reduced, resulting in faster and more efficient system operation. Concretely, this work implements dedicated RTL modules for calculating fundamental frequency, true RMS, and active power for each subscribed channel in parallel with the decoding process. The frequency estimation leverages a single-bin DFT technique with a configurable reference window, providing a continuous measure of system frequency deviation and stability. True RMS values are computed through the accumulation of the squared samples over a moving window, which allows constant monitoring of signal magnitude while maintaining accuracy under non-sinusoidal conditions. In parallel, intermediate values such as instantaneous active power and average rectified voltage/current are calculated and time-synchronized per cycle. These intermediate hardware results are organized and made accessible to a lightweight software co-processor (running on the MPSoC’s ARM CPU), which can perform any remaining scalar operations with negligible overhead. The division of labor ensures that all heavy repetitive computations (requiring per-sample processing or data accumulation) are handled in the FPGA fabric. In contrast, the processor only needs to combine the already-aggregated results. This approach drastically lowers the CPU load compared to a traditional solution and minimizes the latency between measurement reception and availability of higher-level PQ metrics.

A modular design approach is adopted to extend the original SMV decoder with the PQ calculation blocks, carefully managing hardware resources to meet performance targets. Each analytical function (Frequency Estimator, RMS calculator, etc.) is implemented as an independent module that interfaces with the SMV stream decoder via well-defined data conduits. This modularity preserves the throughput of the decoding pipeline, as new functions can be enabled or disabled per configuration without altering the critical path. FPGA resource utilization and timing have been optimized so that the enhancements do not compromise the 5 μs end-to-end latency goal or the ability to handle a large number of streams simultaneously. The complete subsystem is implemented and tested on a Xilinx Zynq 7000 MPSoC platform, leveraging the parallelism and deterministic execution of FPGA logic alongside embedded processing. Experiments demonstrate that the hardware-accelerated subscriber can reliably decode SV packets and compute real-time frequency and RMS values for each channel within a few microseconds, even under high throughput conditions. Preliminary evaluations also indicate that the hardware results closely match offline calculations (e.g., Octave/MATLAB) for a variety of test waveforms, confirming the accuracy of the approach.

This paper is structured as follows: Section 2 reviews related work in more detail and recaps the previous SMV decoding architecture that our design builds upon. Section 3 describes the principles of the power quality algorithms (frequency estimation, RMS, active power) and outlines their straightforward implementation in hardware. Section 4 presents the detailed design of the integrated subscriber, highlighting the interaction between components, data flow, and resource utilization. This section also includes a discussion of processing latency for the augmented pipeline and an analysis of how the added functions impact system performance. Section 5 reports evaluation results, including the accuracy of the hardware computations, error analysis against reference computations, and a comparison between the proposed system’s outputs and those of a conventional approach. Finally, Section 6 offers conclusions and discusses directions for future work, emphasizing the potential of embedded power quality monitoring in FPGA-based smart grid IEDs.

2. Related Work and Background

The implementation of SMV decoding in hardware has become increasingly relevant due to the real-time constraints and deterministic behavior required in digital substations governed by the IEC 61850. Traditionally, SMV subscribers have been realized in software, running on general-purpose processors. While functional, these solutions often struggle to meet strict latency and jitter requirements in high-speed environments, especially as message rates and network traffic intensify.

To address these limitations, recent research has shifted toward hardware-based SMV decoding using FPGAs and SoC platforms. Among these, the solution proposed in our previous work [4] stands out by leveraging an MPSoC architecture to implement a parallelized, low-latency SMV subscriber. This implementation separates the decoding and validation of Ethernet and SMV packets into dedicated hardware modules, ensuring strict timing guarantees and minimal CPU intervention. The design utilizes DMA-based packet capture, a custom decoder implemented in programmable logic, and AXI interfaces to deliver the validated measurements to software components for higher-level processing or logging.

This architecture has been validated using recorded data, demonstrating a consistent decoding throughput at 256 samples per cycle. Because the design took advantage of the hardware-accelerated algorithms, the decoding latency kept a minimum variation while the number of valid streams increased significantly. The modular design also allows for the inclusion of additional data traffic filters, SMV interpreters, and configurable control registers, making it a flexible foundation for further functional expansion.

Figure 1 illustrates the previously developed data decoding architecture, now extended with the data processing blocks introduced in this paper. The core energy quality algorithms are encapsulated within a dedicated module referred to as the Energy Quality Accumulator (EQA) Engine. This module is directly connected to the filtered data bus, allowing it to efficiently access and organize incoming data streams and their associated metadata in block RAM (BRAM). The computed energy quality values are then transmitted through the same Dynamic Memory Access (DMA) interface used for logging the filtered data. A detailed overview and solution justification for the EQA Engine is presented in Section 3.

As both data decoding and data processing require dynamic memory traffic, both sections of the prototype have a dedicated DMA module and they operate completely independent from each other. The choice to not merge these two DMA modules was made to offer more flexibility to the prototype integration in terms of AXI infrastructure, e.g., if the prototype is not set to interact with too many SMV streams, then a basic interconnect module can be used to merge the DMA memory interfaces into a single one, and connect the output to the memory controller port. However, if the data traffic requirement exceeds a certain threshold, then the Processed Data DMA module might require its dedicated high-performance port of the embedded memory controller.

3. Basic Energy Quality Parameters

In electrical engineering, especially in the context of power systems and energy quality monitoring, basic energy quality parameters are the fundamental metrics that describe the condition and efficiency of voltage and current waveforms. These parameters are crucial for identifying issues such as inefficiencies, equipment stress, and power delivery problems. Particularly within the domain of power systems and substation automation, basic energy quality parameters refer to a set of core electrical measurements that describe the stability and efficiency of voltage and current waveforms over time. These parameters include RMS values, frequency, active and reactive power, power factor, and waveform shape descriptors such as crest and form factors. They are essential for assessing whether power delivery meets operational and regulatory standards, and are critical for real-time monitoring, protection, and control functions in smart grids. Monitoring these parameters enables early detection of abnormalities such as voltage sags, frequency deviations, harmonic distortion, and phase unbalance, all of which can lead to equipment malfunctions, reduced energy efficiency, or even system-wide disturbances. Accurate and timely calculation of these metrics is therefore vital for ensuring power system reliability and optimizing asset performance in increasingly dynamic electrical networks. According to the IEEE Standard 1159-2019 [9], these parameters form the cornerstone of effective power quality assessment, providing both utilities and industrial users with actionable insights into the performance and integrity of their electrical infrastructure. Table 2 presents the core energy quality metrics that have been incorporated into the system architecture and shows the distinction between the metrics handled in hardware and those that are suitable for software processing. Parameters which are not bolded are computed entirely within a dedicated FPGA module named Energy Quality Accumulator (EQA) Engine, illustrated in Figure 1. This hardware block is tightly integrated into the SMV decoding pipeline, with direct access to the filtered and validated data stream. By performing these calculations at the hardware level, the system achieves real-time response with minimal latency and avoids the performance penalties typically associated with repeated memory access in CPU-based architectures.

The parameters marked with bold symbols represent values that are either derived or refined in software, based on intermediate results provided by the hardware. This hybrid processing approach balances the computational load between the programmable logic and software, optimizing resource utilization while maintaining flexibility for future expansion or algorithm refinement.

A common requirement shared by most energy quality parameters and intermediate values—such as RMS, average, and active power—is the need to process a defined number of consecutive voltage and current samples. These calculations often involve simple mathematical operations such as accumulation, squaring, or averaging over a fixed window. In conventional CPU-based systems, such operations typically rely on repeated memory transactions, introducing latency and consuming valuable bandwidth, especially under high-throughput conditions. The in-development SMV Subscriber prototype offers a strategic advantage by enabling the direct integration of these computations into the data processing pipeline. By calculating key energy quality metrics in hardware immediately after decoding the last sample from SMV data, the system can deliver pre-processed values without additional memory access or CPU intervention. This significantly reduces data movement overhead, minimizes processing delays, and ensures that performance remains deterministic, which is an essential requirement for time-sensitive substation automation and protection applications.

Figure 2 illustrates a simplified version of the EQA Engine block and its contents. The main component of this major block is represented by the SMV Accumulator, which has the role of organizing the intermediate data required for parameter calculation and sending it to the DMA module. Each data stream to which the prototype subscribes has a dedicated memory space in this system’s dynamic memory, used to store only the parameters and intermediate values obtained using data accumulation. We will refer to this memory space as stream-accumulated metadata (SAM). The SAM contents will be presented at the end of this section, after the data storage requirement for each algorithm is showcased.

An initial implementation of the frequency estimation algorithm based on zero-crossing detection was analyzed as a potential solution due to its conceptual simplicity and minimal hardware requirements. However, the resulting frequency precision proved to be insufficient for the accuracy standards required in substation automation and power quality monitoring applications. While various enhancement techniques—such as signal interpolation or digital filtering—could improve measurement accuracy, these approaches introduce additional computational overhead and latency. Given the time-critical nature of protection functions, the increased delay rendered this method unattractive for real-time deployment. Consequently, a more robust and low-latency alternative was sought.

Zero-crossing detection is widely utilized for frequency measurement due to its simplicity and ease of implementation. Fundamentally, frequency can be estimated by identifying points where a signal crosses the zero-voltage threshold and calculating the time interval between consecutive crossings. In hardware implementations, this is commonly achieved using analogue voltage comparators synchronized with a reference threshold. In digital systems, the same function is typically realized by examining the sign changes between consecutive samples. By counting the number of samples between zero-crossings and referencing the known sampling rate, the signal period—and thus the frequency—can be determined [9,10].

However, a major limitation of this approach arises from the fixed number of samples per period, typically 80 or 256, as defined in IEC 61850-9-2. This discretization imposes a quantization limit on the frequency resolution, since each measurement can only resolve frequency changes in steps determined by the sampling interval. While interpolation techniques can be applied to estimate sub-sample zero-crossing points and enhance accuracy, these methods introduce additional computational latency, which may not be acceptable in time-critical protection systems. As presented in Table 3, the minimum frequency variation per sample (Δf) is dependent on the Least Significant Bit (LSB) of the sample counter, which in turn is directly dependent on the maximum number of samples used to quantize one period of the signal (N).

Another significant drawback is the susceptibility to false zero-crossings caused by noise, harmonics, or transient signal distortions. In digital systems, such spurious crossings may lead to incorrect frequency estimates. Although digital filtering can mitigate this issue by smoothing the input waveform, achieving sufficient attenuation of high-frequency components requires a filter with a relatively high number of coefficients. This, in turn, increases the processing delay and can easily undermine the responsiveness of the system in fast protection applications. Therefore, while zero-crossing detection is efficient, its limitations in terms of resolution and noise resilience make it less ideal for meeting low-latency measurement requirements.

The DFT is a fundamental mathematical tool used to analyze the frequency content of discrete signals. By transforming a signal from the time domain to the frequency domain, the DFT enables us to identify the individual frequency components that comprise a complex signal. The DFT converts a finite sequence of equally spaced samples into a set of complex numbers, each representing a specific frequency component [11,12]. Its computationally efficient implementation, known as the FFT, makes frequency analysis practical for real-time and large-scale data processing. The general formula of the DFT can be written as in the equation below:

X_{k} = \sum_{n = 0}^{N} x (n) * e^{- j 2 π k n / N}

(1)

Parameter k corresponds to the target frequency bin, and N is the number of samples in the analysis window. A conventional DFT or FFT calculation is performed for the entire frequency spectrum, yielding several frequency bins equal to the number of samples. The frequency resolution can be obtained using the following formula:

Δ f = \frac{f_{s}}{N} [7]

(2)

The fs parameter corresponds to the sampling frequency value, and N corresponds to the number of samples used to define the signal window. Table 4 presents the frequency resolution calculated for the standardized sampling frequency values with respect to the number of signal samples used to define the calculation window.

As can be observed from Table 4, the DFT algorithm needs at least 10 periods of the signal to obtain a frequency resolution of 5 Hz. For certain applications, it may be sufficient to calculate just the fundamental frequency bin, and setting the signal window to a single signal period would not impose any disadvantage. However, for calculating different harmonic content, the signal window needs to be extended to obtain a finer frequency resolution. One important aspect to be noted is that increasing the calculation window will implicitly increase the calculation effort required and the total latency of the result. It is implied that if we set the signal window to 4 signal periods, the result will be available after 4 signal periods have been fed to the algorithm.

Table 5 presents the variation between the calculation effort and the number of samples required to obtain the results for the different popular DFT methods. The number of samples is inherited from Table 4. Each value below the different DFT columns represents the number of complex multiplications required.

The calculation effort for a conventional DFT algorithm can be easily estimated as N2, representing the number of complex multiplications required. Other options to calculate the DFT are using the Radix-2 (R2) or Radix-4 (R4) simplifications of the algorithm, which reduce the calculation effort to N * log2 (N), respectively N * log4 (N). However, the difference can be easily observed in comparison with conventional DFT; the most notable disadvantage is the limitation of the sample window. Both Fast Fourier Transform (FFT) algorithms can be calculated for all the sample numbers using mathematical tricks, such as zero-padding; however, the resulting frequency resolution will differ significantly for the values presented in Table 4. Another disadvantage shared by the fast methods is the requirement to have the entire window of data samples available at once; thus, the algorithm cannot progress until this requirement is fulfilled. The most notable advantages of the single-bin DFT (SbDFT) algorithms are the ability to store a single intermediate value throughout the calculation steps and the capability to calculate the next intermediate value as soon as a new sample becomes available. Even if several frequency bins are calculated simultaneously, this method offers significantly better latency, as it can progress with each new available signal sample from the target window. The SbDFT algorithm used by the prototype offers significantly higher accuracy than zero-crossing detection, providing more precise data about the signal’s phase evolution at a specific frequency, which makes it less sensitive to noise and waveform distortion. Unlike zero-crossing methods, SbDFT provides sub-sample resolution without requiring heavy filtering or interpolation of the signal samples. An additional advantage of this method is its compatibility with accumulated sample data, aligning seamlessly with the design of the EQA Engine. This stands in contrast to conventional full-spectrum DFT implementations, which typically require access to the entire raw sample buffer. A simplified conceptual overview of the algorithm is presented in Figure 3, which illustrates the main steps and the required data flow.

The data samples and sample rate inputs are transferred from the Filtered Data Bus to the Accumulated Data Bus (see Figure 1). The Accumulated Data Bus serves as a shared interface for all modules within the EQA Engine, including the Frequency Estimator, which contains multiple single-bin Estimator instances configured for specific frequency bins. For each received sample, the Counter 2 Angle sub-block represents the translation logic between the current sample index and the input angle for the CODRIC module. This module plays a crucial role in the presented algorithm, as it can be used to generate sine and cosine values without requiring any additional data, except for the input angle. There are several variants available for implementing this module, but in our case, a pipelined approach is the best solution, as it can also be used for future configuration and arrangement of the SMV protocol.

As mentioned earlier, the algorithm requires accumulated data, so the Current Sample Index and the Old Accumulated Data will be added to the contents of the SAM elements. On each iteration, the incoming data samples will be multiplied by the corresponding complex coefficient, then added to the Old Accumulated Data received. The SAM elements have a dedicated field for the Current Sample Index and Old Accumulated Data, as well as for the final result, which will be written when the algorithm completion is flagged. Compared to a conventional full-spectrum DFT approach, the data storage requirements for this option are minimal, and the required twiddle factors for calculations are generated on the spot after the stream data are available at the Frequency Estimator’s inputs. In terms of memory space in the SAM elements, this module will require 4x 40-bit fields for the accumulated data and the final result, and an 8-bit field to keep track of the Current Sample Index. These values are set to facilitate the algorithm’s compatibility with sample rates of 80, respectively 256 samples per second, and the final result is always reported after a full signal period has been accumulated. The Multiply–Accumulate (MACC) logic is a common component for all of the algorithms encapsulated into the EQA Engine, and its main purpose is to calculate the sample summation while not having access to all the required samples at once. The intermediate result for a data stream can be stored for the current sequence and accessed when a new sample of the same stream is received again by the IED. For each algorithm, the MACC has different configuration for the input/output size and the type of the mathematical operations—e.g., the SbDFT requires a MACC that operates with complex numbers, while the other MACC components operate only with integers.

For the RMS estimation algorithm, a similar approach is presented in Figure 4, where the incoming data samples are synchronized with the accumulated data and a sample tracker.

The RMS Estimator module hosts multiple instances of the RMS calculation algorithm, enabling simultaneous processing of data streams containing measurements for all three phases. Unlike fixed-size implementations, the maximum number of samples used in the estimation is user-configurable and must be a power of 2. This requirement simplifies and accelerates the final division step by enabling the use of a right-shift operation through a dedicated RShift component, which efficiently replaces division when the sample count is a perfect power of 2.

Although RMS estimation is computationally simpler than complex-domain algorithms—since it involves only real values—the primary source of latency in this module is the SQRT component, which consumes a significant number of clock cycles to compute the square root of the accumulated average value. The maximum number of samples allowed primarily influences the memory space requirements of the RMS Estimator. For each incoming measurement, the initial requirement for storing a single squared sample is 64 bits, in line with the IEC 61850-9-2 standard’s maximum data representation. However, this is only sufficient for storing one squared value.

To manage the accumulation over a potentially large number of samples, the Current Sample Index is defined as a 32-bit value, allowing up to 232 samples to be included in the RMS calculation. To accommodate the resulting growth in bit width during accumulation, the accumulated data memory space is expanded to 96 bits, ensuring that no overflow occurs even at the maximum configured sample count. For the final result quantization, the number of required bits is significantly less than the number of bits required by the accumulation metadata. This happens because the division by the number of samples reduces the quotient size to the same number of bits as the representation of the squared samples, which is 64, representing the input to the SQRT component. The final result of the RMS value is represented on 32 bits, as the SQRT component’s output size is half of its input size.

Active power (P) estimation is slightly simpler than the previous algorithms, primarily because it does not require a square root calculation. This also improves the overall algorithm latency, as the square root algorithm is more time consuming compared to other mathematical operations. Figure 5 presents the concept of the algorithm.

In comparison with previous algorithms, a key difference for active power estimation is that two data streams are required to initiate the calculation. As the standard stipulates, one SMV data packet contains data streams for currents and voltages. The algorithm is not calculating a result for each data stream, as in the previous cases presented, but is calculating a result corresponding to two distinct data streams. The maximum number of samples used to compute the active power is shared with the RMS Estimator, so both modules’ results will be available almost simultaneously. As explained in the previous case, having several samples equal to a perfect power of 2 simplifies the division step. In terms of memory space requirement, the intermediary accumulated data will require a total of 96 bits for proper representation: 64 bits resulting from multiplying one voltage value by one current value, and an extra 32 bits to accommodate the summation for the maximum possible number of samples. The final result will be reduced to 64 bits, as the added 32 bits are eliminated with the division by the number of samples, which is performed by right-shifting the accumulated data. The apparent power (S) and the power factor (PF) metrics can be easily calculated on the software side after all the required parameters are available to the processor. In the case of S, even though a simple multiplication between RMS voltage and RMS current is required, the parameter is not accommodated in the EQA Engine as it requires additional memory space, which must be kept to a minimum size for each stream. As for the case of PF, calculating the division between P and S requires a minimum level of accuracy to obtain a relevant result. The division by a non-constant value can be a challenge in the FPGA, even when using dedicated digital signal processors. The easiest way to achieve a high level of accuracy for this division without overcomplicating the FPGA design is to send the required parameters to the CPU and perform the division at the software level.

The last three parameters are calculated over a fixed number of samples, more specifically, a full signal period. The RMS value over a signal period (U_RMST) could not be covered by reusing the RMS approach discussed earlier. Even though we can fix the sample counter threshold to one of the standard values, the main incompatibility problem resides in the fact that the number 80 is not a power of 2, and the division by the number of samples cannot be performed with binary shifting. Figure 6 presents the RMS estimator concept adapted to cover this obstacle.

Apart from a fixed number of samples received from the Filtered Data Bus, the accumulated result is sent to the RShift or MultShift component, depending on the data stream sample rate. Although counterintuitive, a better alternative to performing division by 80 is to multiply the accumulated value by 1/80. From a mathematical point of view, it is a completely equivalent operation, but from a digital design perspective, it is equivalent to multiplication by a constant, rather than division. From the perspective of memory space requirements in the SAM elements, the accumulated data are represented on 72 bits to cover the maximum sample rate value. For streams with 256 samples per period, the division by the number of samples can be performed as an arithmetic shift. For streams with 80 samples per period, a 16-bit constant is defined to represent the quotient for the 1/80 result. Since this multiplication will always result in an output value that is lower than the input value, the resulting 16 LSBs can be discarded from the start. By doing this, we can guarantee that the first 8 MSBs of the result will always be zero and can be discarded, resulting in a final value of 64 bits required to quantize the input value of the SQRT component.

For the final two energy quality parameters—average rectified value (U_ARVT) and peak absolute value (U_peakT)—the conceptual implementation approach is presented in Figure 7.

Since both algorithms require the absolute value of the signal samples, they have been incorporated into the same module. The peak detection is performed by saving the absolute value of the last received sample and comparing it with the absolute value of the current sample. The accumulation of the absolute value is performed on a full signal period, and the average value calculation requires division by the number of accumulated samples. Therefore, the same approach is adopted to handle the division, as shown in Figure 6. For the ARV Estimator, the latency value is significantly smaller than that of any of the RMS Estimators, as it does not require a square root operation. In terms of memory space required for the SAM elements, first, the absolute value of the last identified peak value must be stored, resulting in 32 bits being required for this part. The accumulation can be easily stored in a 40-bit memory space to cover the intermediate value for the maximum sample rate, which is 256 samples per period. After the division is performed, one way or another, the final result of the ARV is quantized on 32 bits.

Table 6 provides a detailed overview of the memory requirements for each independent data stream. Each SAM element is responsible for storing intermediate values needed to derive various energy quality parameters, as well as the final results. To optimize memory transaction efficiency, it is essential to understand the memory footprint of each parameter. This table summarizes the data fields stored per parameter, the bit widths allocated to each field, and the total memory consumed per stream, providing a clear reference for evaluating and comparing the resource demands of individual application data compatible with IEC 61850-9-2-LE.

The RMS and active power parameters share a single index tracking configurable counter. The SbDFT sample counter can be set to 4096 samples, allowing the calculation of frequency bins for up to 10 signal periods at the highest standard sampling frequency presented in Table 3. The remaining three parameters utilize a shared 8-bit counter to perform calculations on a signal period of up to 256 samples. Frequency and coarse harmonic content can be calculated for the measured data streams using the seven user-defined frequency bins. If the ASDU contains a full 3-phase measurement, there are seven frequency bins available for calculation for each phase. This feature enables the calculation of the fundamental frequency component, along with the other two harmonics [13], allowing the user to split the target harmonic components between the measured data streams. Although not the most accurate or efficient method, the approach focuses on obtaining the lowest possible latency value for calculating certain critical parameters, which is much quicker than conventional software implementations that depend on receiving most or all of the data samples before starting the calculation steps.

As presented in Table 6, the total number of bits required for a single data stream is the sum of the accumulated data size for each parameter, the final results size, and 52 bits for the index counters. Considering that the active power memory requirements are split between two streams, the final requirement to store a stream’s metadata and results amounts to 1732 bits, or 216.5 bytes, without accounting for the index counters. The maximum number of streams contained in an application-specific data unit (ASDU) is 8, representing four voltage measurement data streams (3 + N) and four current measurement data streams. For all eight streams, a 32-bit configuration register is stored in each SAM element to set the sample threshold for RMS and active power calculations. Each SAM element encapsulates the entire metadata required for all the data streams of a single ASDU, totaling 13,856 bits, or 1732 bytes. The absolute maximum values presented are calculated for the scenario where the ASDUs are built in the most inefficient mode, containing a single sample per stream and eight distinct streams, and the SbDFT calculation is enabled for the maximum number of bins for all of the voltage streams. Memory dimensions like the one presented are not suitable to map into the programmable logic’s block memory. For our target of 512 ASDUs actively decoded, the block memory requirement reaches ~139.5 blocks to process the data, and the target FPGA has a total of 140 blocks. Even if we reduce the number of pre-processed ASDUs, the memory resource requirement will still be incompatible with the hardware platform, resulting in very high routing congestion and affecting the overall system timing. This can make design synthesis and implementation very difficult, and a significant amount of FPGA resources will be used only for routing, which is undesirable.

All of the SAM elements are defined in the system’s dynamic memory, which is shared with the embedded ARM cores. Using the dedicated DMA module, the FPGA logic can have access to the system’s dynamic memory without any interaction from the system’s microprocessor. The SMV Scatter–Gather (SSG) module, as presented in Figure 2, has the role of scattering the received accumulated data from the DMA module to the rest of the EQA Engine’s components. Every component used to calculate an energy quality parameter has the output ports connected to the SSG module. As the DMA has the read channel logic working independently of the write channel logic, the SSG module is built similarly, as the scatter logic is synchronized with the DMA’s read channel, and the gather logic is synchronized with the DMA’s write channel.

4. Implementation Results

This section presents the components of the EQA Engine, as introduced by Figure 2, and highlights the data formats of the algorithms, along with their data flow and dynamic memory interaction. A high-level diagram of the implemented design is also presented to illustrate the configuration in relation to the SoC components. The latency of each presented hardware component is also discussed, from the point of feeding the raw data samples to the algorithms, to the points of obtaining the intermediate and final results. In the last subsection of this chapter, a resource report is presented to showcase the most intensive configuration of the subsystem, with the decoding part set for the maximum count of svIDs.

4.1. Data Flow Analysis of the Implemented Modules

This subsection presents each significant module encapsulated by the EQA Engine and its role in the data ordering and processing, as it can be found in Figure 8. Some of the modules, such as the register access and DMA components, are not detailed in this section because they represent semi-generic components with no role in the actual data processing steps.

The SSG module represents the first key component of the EQA Engine. The primary function of this module is to synchronize the incoming data from the Filtered Data Bus with the data received from dynamic memory, then distribute it across the other components of the EQA Engine. Note that the diagram is not covering all the connections between the components, as many synchronization and monitoring flags are not yet in their final stage.

All of the arithmetic modules use accumulation components to calculate the target parameters. The RMS Estimator and Active Power Estimators can be configured on a large number of points (up to 2³¹), and they use similar accumulation components that are inferring DSP units into the RTL design. The right-shift component covers the division of the accumulated value by the number of samples. These components also add a rounding constant before performing the arithmetic right shift and obtain a rounding to the nearest integer value [14]. For these two modules, the right-shift parameters are chosen by using the internal SMV ID, which provides the user-configured window length. In the current implementation, these modules are only working with sample windows which are perfect powers of 2, as the integration of other window dimensions requires more development and validation work. The SQRT components represent one important architecture difference between these two modules. The best solution for this component’s functionality is represented by a non-restorative and fully pipelined algorithm [15], which uses a fixed number of subtractions and additions to calculate the result. The sample counters for these modules are handled in the SSG module, which uses two synchronization signals to indicate the beginning and the end of a sample window.

The Frequency Estimator ramps up the development complexity as it requires several intermediate steps to prepare the data for the accumulation step. The implementation version of the SbDFT stands out as it does not use any pre-coded sine and cosine values. An Angle Translator component is used to calculate all the required angles with the steps defined by the user for the desired frequency bins, as presented in Equation (3).

S t e p = \frac{F * 2 * π}{N}

(3)

F represents the target frequency bin, and N represents the DFT sample window. Parameter n from the DFT generalized formula is equal to 1 for calculating the angle step. The Step value is multiplied by the current index of each incoming measurement data sample to obtain the final angle required to process the current element. In the current state of development, the user can set up to seven distinct Step values, corresponding to seven target frequency bins. A fully pipelined CORDIC module [16] is used to calculate the values of the trigonometric functions using a fixed number of additions, subtractions, and arithmetic right shifts. The number of iterations is determined by the desired output quantization, which in this scenario is 18 bits for sine and cosine. If the module performs more than 18 iterations, the error correction is saturated, and the final result will not get any additional accuracy. After the cosine (for real part) and sine (for imaginary part) are generated by the Angle Translator for the current measured sample, a complex multiplication is performed and the result is normalized (because the sine and cosine are sub unitary), then accumulated. The complex multiplication is also built using DSP units, using guidelines from Xilinx datasheets [17], and the same guidelines are used to build the MACC units from the RMS, RMS_T, and Active Power Estimator modules.

Although very similar to the RMS Estimator module, the main difference of the RMS_T Estimator module resides in the ability to operate with sample windows which are not a perfect power of 2. This feature is used to perform a division using multiplication and a bit shift instead. Implementation for the RMS and Active Power Estimator modules has its technical challenges, as the algorithm requires considerable FPGA resources for the extended data formats of 96 bits.

The last module presented is the ARV Estimator, which is essentially a version of the RMS_T Estimator with disabled features. Instead of squaring the incoming measured samples, the module will just calculate the absolute value, and the final result is not passed through a SQRT block. Because the absolute value is already calculated in this module, it is also stored and compared to the next incoming sample, to keep track of a peak value over a signal period.

4.2. Hardware Implementation Overview

The HS3 prototype, originally designed for high-speed, deterministic decoding of Sampled Measured Values (SMV), has been architecturally expanded to accommodate real-time energy quality analytics via the Energy Quality Accumulator (EQA) Engine. This enhancement required substantial adjustments to the system’s hardware infrastructure, particularly in memory management, synchronization logic, and inter-module communication. At the architectural level, the addition of a second DMA interface to the dynamic memory controller ensures that the EQA Engine and the SMV decoding pipeline can operate independently, reducing memory contention and ensuring deterministic throughput. This separation is crucial when scaling to higher SMV stream densities, especially when each ASDU may include up to eight distinct data streams. The updated infrastructure, shown in Figure 9, accommodates this by enabling dual-port access and higher bandwidth utilization without increasing AXI interconnect congestion.

As in the previous version of the project, the native data width of the Processing System AXI port is 64 bits. The AXI Interconnect 0 block has two 32-bit input ports. In earlier configurations, it was not possible to saturate the full data bandwidth due to the underutilized port width. However, with the latest enhancements to HS3, bandwidth saturation may occur if memory transactions are not properly synchronized. The newly added features of HS3 operate safely on a 200 MHz clock domain and can be adapted to higher clock speeds when deployed on more advanced FPGA architectures.

Until extensive field testing is conducted, the prototype continues to rely on a dedicated Data Generator module, which stores raw test data in local block memory. This method is particularly useful as it allows data to be delivered at high rates—often exceeding those observed in real-world scenarios—allowing future module testing under worst-case timing conditions.

Integration with the existing HS3 decoder is seamless. The energy quality results are handled along with the original decoded SMV data blocks via a dedicated DMA controller, ensuring that both decoded measurements and analytics results are correctly synchronized. The separation of the DMA ports for decoding and processing can enable features such as parallel memory access, at the cost of an additional high-performance memory port. This co-processing approach allows higher-level applications running on the embedded ARM cores to access pre-processed quality metrics alongside raw decoded data, simplifying post-processing or fault classification tasks. In terms of platform compatibility, the system is mapped to an MPSoC device, leveraging the parallelism of the FPGA fabric alongside ARM cores for supervisory and configuration functions. The prototype’s RTL architecture is portable to other FPGA or SoC platforms supporting AXI and DMA interfaces, making the architecture a strong candidate for future deployment on embedded substation devices. To achieve optimal performance and minimize LUT/FF requirements, the target architecture must also have DSP components available. Even though the multipliers, which are dependent on these embedded components, are optimized for the DSP48E2 [18], adaptation to other DSP models can be performed directly in the multiplier modules without affecting the rest of the design.

In terms of modularity, the subscriber prototype can be deployed only as a stream decoder, and the EQA Engine can be partially deployed to support the specific features required by the application. This feature enables the conservation of FPGA resources. It reduces memory transactions, but it also requires future work to develop predefined configuration options for faster deployment and ensure that each feature can operate independently of the others. Some of the modules also require future refinement, specifically to extract the shared components into independent modules, thereby removing the interdependency of the parameter estimators.

As presented in the resource reports, it can be observed that multiple instances of the modules can be used on the same hardware, provided the physical infrastructure allows it (e.g., an additional Ethernet port for a second process bus connection). The prototype also supports native data output, enabling the integration of powerful features that the hardware platform can offer in specific scenarios, depending on the physical components accommodated within the SoC’s printed circuit board.

4.3. Latency Analysis

In contrast to the decoding section of the prototype, the main sources of latency in the EQA Engine are less dependent on the maximum number of subscribed streams, and more influenced by ASDU formatting, the number of samples per ASDU, the number of frequency bins configured for calculation, and the sample rate of the measured data. Table 7 breaks down the latency components associated with each parameter calculation module in the EQA Engine.

Each module follows three iteration modes:

Initial iteration, where no accumulated data are expected and processing begins immediately;
Nominal iteration, which depends on accumulated data from the previous iteration;
Final iteration, which also depends on prior accumulations but includes additional computations to produce the final result.

In all modules, the first computational stage has zero latency relative to input during the initial iteration, followed by SSG read operations in both nominal and final iterations.

For the RMST and ARV Estimators, the double blue line separates the final iteration latency for standardized sample rates of 80 and 256 samples/period. As mentioned in the previous section, non-power-of-two sampling rates result in slower performance due to the higher delay required to perform the division steps.

As observed, the Frequency Estimator module is the only case where the latency of a nominal iteration is equal to the latency of the final one. This occurs because no additional mathematical operations are performed when the final iteration is reached. Another aspect to be observed is that latency is independent of the sample window, as the latency values are calculated based on the last SMV raw sample received by the EQA Engine.

Another important component in determining the maximum latency of a specific configuration is the SSG memory transactions and data alignment. The main role of this module is to deliver the required data to the modules presented in Table 7. Data transactions performed by this module are presented as follows:

Data alignment—Process used to align the raw data samples received by the EQA Engine into 16 bit or 32-bit elements

{A L}_{16} = \frac{N}{2}; {A L}_{32} = \frac{N}{4}

(4)

Read transaction—Memory transaction used to read the temporary data required for each ASDU

{S S G}_{R e a d} = D_{R D} + \frac{S * (96 + 32 + 48 + 12 + (F * 2 * 44) + 40 + 72 + 32 + 8)}{32} [c y c l e s]

(5)

Write transaction 1—Memory transaction used to write the temporary data for each ASDU

{S S G}_{W r i t e 1} = D_{W R} + \frac{S * (96 + 32 + 48 + 12 + (F * 2 * 44) + 40 + 72 + 32 + 8)}{32} + D_{R e s p} [c y c l e s]

(6)

Write transaction 2—Memory transaction used to write the final results for each ASDU

{S S G}_{W r i t e 2} = D_{W R} + \frac{S * (5 * 32 + (F * 2 * 44))}{32} + D_{R e s p} [c y c l e s]

(7)

In the formulas presented above, SSG_Read represents the number of clock cycles required to perform a memory read operation in order to retrieve the complete temporary accumulated data associated with each ASDU. These values are derived from Table 6, which details the memory space allocation for both intermediate accumulation and final result storage, collectively referred to as a single stream-accumulated metadata (SAM) structure. The added delay components—D_RD, D_WR, and D_Resp—represent the average memory transaction initialization delays and write confirmations. Depending on the AXI infrastructure utilization, their range can vary between 4 and 15 clock cycles, and the highest value will be considered for calculation. Because the SSG_Read operation starts at the same time as the data alignment, we can consider only this value for latency analysis.

Similarly, the number of cycles required to complete a memory write operation for storing the final results of each ASDU is denoted as SSG_Write2. This value constitutes the final component in the latency analysis, representing the cost of outputting the calculated energy quality parameters back into the dynamic memory.

To obtain the maximum latency of the energy quality parameters, we can consider the following configuration:

Sample Size = 32 bits
Subscribed ASDUs = 512
S = 8 streams/ASDU
F = 7 frequency bins
N = 8 samples/ASDU

T o t a l L a t e n c y = T_{S M V F i l t e r} + {S S G}_{R e a d} + \max (D e l a y t o R e s u l t R e a d y) + {S S G}_{W r i t e 2}

(8)

The T_{SMV Filter} component represents the total decoding delay required by the SMV Filter module to deliver data to the dedicated DMA from the moment it receives it at the physical interface. To obtain this value, we can refer to the previous work [4] and calculate it by subtracting the T_prep and T_write from the Total CLK Core Cycles value, resulting in a total of 2.266 μs. This value represents the delay between the data arrival at the physical interface and the data arrival at the EQA Engine’s input. The maximum Delay to Result Ready from the energy quality estimator modules is represented by the RMST Estimator with a value of 112 clock cycles for the streams sampled at 80 samples/period. For streams with a sample rate of 256 samples per period, the greatest value is represented by the Frequency Estimator—89 clock cycles. Taking into consideration that the clock period is 5 ns, we can substitute the values in (8) and calculate the worst-case latency as follows:

T o t a l L a t e n c y = 2.266 + [(12 + \frac{8 * (340 + 7 * 88)}{32}) + 112 + (12 + \frac{8 * (5 * 32 + 7 * 88)}{32} + 12)] * 5 * 10^{- 3}

(9)

After performing the calculation, we obtain a total worst-case scenario latency value of 5.171 μs. Although very close to dropping below the proposed 5 μs threshold, the main latency components are represented by the SQRT components and the Frequency Estimator module. There are good chances of reaching the proposed initial goal after several optimizations are performed. Until the optimization step, extensive validation must be performed, as a considerable number of behavioral problems may appear when simulating a high volume of process bus data traffic and corner cases. The mitigation of these problems might require additional changes in the design data ordering and the insertion of additional data buffers to align the data flow properly. All of these actions will have an impact on the final system latency and resource requirement.

4.4. Resource Reports

The integration of energy quality monitoring functions within the HS3 prototype adds significant complexity to the overall hardware resource footprint, particularly due to the introduction of the Energy Quality Accumulator (EQA) Engine and its associated computation modules. Unlike the baseline SMV decoding pipeline, which primarily relies on basic sequential logic, finite state machines, and straightforward packet filtering, the EQA Engine introduces multiple mathematically intensive blocks that require careful balancing between logic, memory, and dedicated arithmetic resources such as DSP slices. Table 8 presents the reported post-implementation resource requirements for the EQA Engine, the dedicated DMA module, and the maximum resource count of the decoding pipeline, highlighting the overall usage of the FPGA when all subscriber features are deployed.

The current development state of the SMV Decoder supports a maximum svID density configuration of 512 IDs. This value can correspond to up to 4096 distinct data streams, depending on the user configuration. Practical implementations of the SMV protocol often use a number of streams per ASDU less than 8 (contrary to the standard LE specification). For these cases, the data processing blocks are less constrained by latency issues, as the number of streams per ASDU is reduced for data transfer efficiency. Scalability up to 1024 svIDs and beyond is considered in future development stages, but will require a lot of synchronization work between the Data Decoding section and the Data Processing section, as presented in Figure 1.

One of the most resource-demanding components is the RMS Estimators. Their reliance on pipelined square root (SQRT) units, high-width accumulation logic, and wide input multipliers directly impacts the FPGA’s available LUTs, flip-flops, and DSP blocks.

The Active Power Estimator adds further DSP utilization, as it performs continuous point-wise multiplication of synchronized voltage and current streams. While its architecture is slightly simpler than the RMS Estimators—due to the absence of a SQRT stage—it still consumes an equivalent number of DSP blocks and comparable logical resources for Multiply–Accumulate operations. Its tight synchronization with the RMS Estimator allows the design to reuse certain configuration and control resources, reducing redundant logic but increasing the coupling level of the EQA Engine’s modules.

The Frequency Estimator stands out due to its extensive use of the pipelined CORDIC engine for generating sine and cosine functions. This approach was chosen to minimize the need for storing large lookup tables for trigonometric coefficients. The CORDIC implementation completely removes the need for BRAM usage at the cost of a considerable number of logic elements.

A significant factor influencing resource usage is the extensive reliance on dynamic memory for storing intermediate results and stream-accumulated metadata. For each ASDU processed, the SAM holds accumulation registers, current sample counters, final result buffers, and configuration values. Depending on the enabled algorithms, the metadata footprint of each data stream can reach up to 1.732 kilobytes per ASDU. With the maximum configuration of 512 concurrently decoded ASDUs, this quickly scales to almost 900 KB of total dynamic memory when fully populated. Due to this large footprint, it is infeasible to rely solely on on-chip BRAM for metadata storage. Instead, the design utilizes the MPSoC’s shared DDR memory, which is accessed through a dedicated DMA channel. This ensures that the programmable logic fabric remains flexible and scalable, while the ARM cores can perform supervisory tasks without blocking the memory bus. However, this design places extra emphasis on ensuring that the AXI interconnect can handle high-bandwidth, low-latency transactions for both decoding and energy quality operations in parallel.

The design was implemented with no major problems in the routing process and no failed paths, using a clock period constraint of 5 ns, corresponding to a frequency of 200 MHz. Unfortunately, higher frequencies are not achievable on the target hardware platform due to the limitation of the FPGA fabric; however, newer technologies should be able to implement the prototype at considerably higher values. While the prototype’s post-implementation results show manageable resource usage for a moderate-density device like the Zynq 7020, future deployments on higher-capacity MPSoCs (such as Xilinx Zynq Ultrascale+ or Intel Agilex SoCs) will unlock additional performance. These modern devices offer significantly more LUTs, higher clock speeds, abundant BRAM, and more DSP slices with advanced features such as floating-point support, which could allow improvement of certain SQRT and division modules, by increasing the resulting precision.

4.5. Performance Evaluation

To summarize the obtained results and showcase a quick performance evaluation of the current state of the presented HS3 subsystem, Table 9 presents the total latency of the subsystem and FPGA resource requirement for different configured values for the maximum svIDs which can be processed in hardware. More details about the data decoding latency and resource requirement can be found in our previous work [4].

The presented values are obtained when the data samples are defined on 32 bits. The latency values represent the total delay required to decode, store, and process data packets containing a single ASDU, equivalent to the initial latency obtained when the packets contain multiple ASDUs. Additional memory operation cycles have been considered in the calculations to cover the worst-case scenarios, which can easily be considered scenarios where the memory infrastructure lacks optimization. Suppose multiple ASDUs are received in the same data packet. In that case, the initial latency is represented in Table 9, and the subsequent latency for each additional ASDU is defined only by the data processing latency, which does not exceed 40% of the worst-case scenario. The resource variation is given mainly by the data decoding section, and the data processing section is accounted for with all of its features enabled.

5. Error Analysis

The accuracy of the energy quality parameters calculated within the HS3 prototype depends primarily on the design and implementation of its hardware arithmetic units, the word lengths used in fixed-point representations, and the quantization or rounding steps embedded in each calculation stage. Unlike purely software-based implementations that can rely on double-precision floating-point operations for intermediate results, FPGA-based arithmetic must balance accuracy against available resources, propagation delay, and area constraints. Therefore, understanding the potential sources of computational error is essential for validating the system’s suitability for real-world deployment in critical substation environments.

The first significant source of error arises from the fixed-point representation of signal samples and intermediate values. For example, the RMS Estimator accumulates the squares of voltage or current samples with a 96-bit accumulator. Although this wide bit-width ensures that overflow is virtually eliminated even for high sample counts, the final output must be right-shifted to normalize the average value, then quantized to 32 bits when passed to the SQRT block. The SQRT operation itself introduces a second quantization stage. Since the implemented non-restoring SQRT algorithm produces a result with finite precision, any residual error from the intermediate division or right-shift stage can be amplified or reduced by the root operation. In typical scenarios, this error remains bounded by the quantization step.

The RMS and Active Power Estimators share the same Multiply–Accumulate (MAC) logic for summing the squared values (for RMS) and the product of voltage and current pairs (for active power). Each multiplication uses the FPGA’s embedded DSP slices, which deliver near-exact results for operand widths up to 25–35 bits. The primary numerical limitation, therefore, comes not from the multiplication itself but from the final division step. When the divisor is a perfect power of two, the hardware implements this as a simple right shift with a simple rounding to nearest integer, which induces an error equal to ½ LSB of the resulting value. However, when computing parameters such as RMST or ARV over a fixed sample window that does not align with a power-of-two length (e.g., 80 samples/period), the division must be implemented using a multiplication by a pre-computed constant factor (the reciprocal of the sample count). This multiplication by a fractional constant is also carried out in fixed-point arithmetic and therefore introduces additional rounding noise, typically on the order of ½ LSB.

The multiplication of two quantized values adds a new rounding error at each step. As each SbDFT bin accumulation progresses, these small rounding errors accumulate linearly with the number of samples. However, since the CORDIC implementation is pipelined and the same quantization model is used consistently across iterations, the dominant source of error is typically the static quantization rather than a dynamic noise build-up. In practical configurations with typical window lengths (80 to 512 samples), the cumulative effect is expected not to exceed 0.5% relative error for the target frequency bin.

To properly illustrate the calculation errors between the double precision of the Octave IDE and the hardware-calculated values, the prototype has been fed with real signal captures obtained from a public GitHub (ID 0d6760c) project [19]. Figure 10 presents a single period of the signal for voltage and current extracted from the recorded samples for phase A.

The recorded samples have a scaling factor of 10³ for both voltage and current measurements, as this represents one of the rules defined by IEC 61850. As can be observed, the recorded signal has a nominal frequency of 60 Hz and a sample rate of 80 samples per second. Because of this, the RMS Estimator will be skipped from error analysis, as it does not yet support non-power-of-two signal periods. The RMS_T Estimator, on the other hand, has the capability to process these sample rates, and we can safely presume that the results are not significantly different, as the two estimators have a very similar architecture.

To properly highlight the difference between the calculations performed by the hardware modules and Octave’s double-precision format, the same signal samples fed into the prototype have been processed in Octave’s IDE. Figure 11 presents the RMS_T calculations for 30 signal periods (or 0.5 s), highlighting the difference between Octave output (blue) and the hardware output (orange).

It can be observed that the absolute error is almost proportional, but it is entirely generated by the division step, combined with the SQRT error. For the hardware values, the scaling is performed in Octave, as the hardware calculations never remove the scaling factor. As observed from the hardware results, the maximum observed relative error does not exceed 1.115 × 10⁻² [%] and has a variation of 1 × 10⁻⁵ [%]. For most applications, the value is not concerning, but improving it would require extensive optimization and re-design of the division and square root steps of the RMS Estimators.

Figure 12 presents the active power (P) calculations and the differences between the two calculated sets (streams).

As can be observed, the relative error between the hardware values and the software values is on the order of 7 × 10⁻¹² [%], which we can be safely considered negligible for substation automation applications. The plotted values from hardware almost perfectly overlap with the software values. This result is obtained using sample windows of 128 samples, which is a perfect power of 2. We can easily conclude that the best precision is obtained for sample rates that are a perfect power of 2, as the division step can be performed with minimal errors.

Figure 13 presents the average rectified values for 30 periods of the extracted signal, for both voltage and current.

The maximum relative error does not exceed 2.4 × 10⁻² [%] and has a variation of 1 × 10⁻⁵ [%]. Compared with the RMST Estimator, both modules share the division error, but in the case of the ARV Estimators, the division of the accumulated values by the number of samples is the final result. In the case of RMST Estimator, the final result is the square root of the ratio of accumulated values to the number of samples; thus, the maximum relative error value is almost half in comparison to ARV Estimators.

To compare the Frequency Estimator values with the software counterparts, a full DFT over 10 periods of the signal is performed, as presented in Figure 14.

Figure 14 was obtained in Octave by using the “fftshift(fft(x))” functions and normalizing the results before plotting their absolute value. The frequency bin axis has been calculated by considering the sampling frequency as 4800 samples/s (corresponding to the signal period of 80 samples/s), resulting in a total spectrum of 4800 values. Because we used 10 signal periods, according to Table 4, the resulting frequency resolution is 6 Hz. According to the function definition, the resulting fundamental strength should be found at the indices 391 and 411, corresponding to the frequency bins of −60 and 60 Hz. We observed that Octave presents a plot drawing error, as if we examine the closest points around the fundamental value, the magnitude values are sub unitary. The theoretical error analysis for the hardware module has several variables generated by the CORDIC estimation and the normalization for the intermediate results. The direct error analysis results in a maximum relative difference of 0.47% between one point from the Octave calculated spectrum (the fundamental value) and a targeted frequency bin from hardware results.

6. Conclusions

This work has demonstrated that an FPGA-based IEC 61850-9-2 SMV subscriber can be materially extended by embedding real-time energy-quality (EQ) analytics directly into the decoding pipeline, without sacrificing deterministic, microsecond-scale performance under multi-stream operation. Building upon the HS3 architecture, the Energy Quality Accumulator (EQA) Engine implements single-bin DFT frequency estimation, true-RMS computation, and active power evaluation in fixed-point RTL, co-located with SMV parsing to minimize CPU load and memory traffic.

The error analysis clarifies the numerical limits of the hardware path and the principal sources of deviation from double-precision software baselines. For RMST, the maximum observed relative error was ≤1.115 × 10⁻² % with a variation of ≈1 × 10⁻⁵ %, dominated by quantization in the division and square root stages; for ARV, the maximum relative error was ≤2.4 × 10⁻² % with similar stability, reflecting the same division-stage quantization. Active power results using power-of-two windowing (128 samples) were essentially indistinguishable from software, with a relative error of ≈7 × 10⁻¹² %, evidencing that when normalization reduces to bit-shifts the arithmetic noise becomes negligible. Frequency estimation exhibited a worst-case relative difference of 0.47% against a software DFT of 10 periods (Δf = 6 Hz), consistent with the expected impact of fixed-point CORDIC and normalization along with the coarse spectral resolution.

These findings support three technical conclusions: (i) the multipliers and accumulators in DSP slices are not the dominant error contributors within the tested operand widths; (ii) precision is primarily limited by fixed-point scaling, division by non-power-of-two window lengths (reciprocal multiplication), and the finite precision of the SQRT and CORDIC units; and (iii) power-of-two windows deliver the best attainable accuracy due to exact shift-based normalization.

Resource measurements confirm that the added analytics can be mapped to mid-range MPSoCs without prohibitive LUT/DSP/BRAM growth; while SQRT pipelines and the CORDIC estimator are the main footprint drivers, parameterization enables balanced trade-offs between latency, utilization, and accuracy. Overall, the enhanced HS3 subscriber provides continuous, per-channel EQ metrics with predictable timing, directly supporting time-critical protection and monitoring.

Future work will focus on comprehensively validating the system’s performance under a wide range of operating conditions, including high network traffic and corner-case scenarios, to identify and mitigate any residual issues. In parallel, the development of dedicated software libraries will streamline configuration management and simplify the deployment process, ensuring that the hardware modules can be easily integrated into diverse substation automation IEDs, with minimum development overhead.

Author Contributions

Conceptualization, M.-A.P.; methodology, B.-A.E. and M.-A.P.; validation, methodology, M.-A.P.; investigation, M.-A.P., B.-A.E., V.A., P.S. and G.-C.S.; writing—original draft preparation, M.-A.P.; writing—review and editing, B.-A.E., V.A. and G.-C.S.; visualization, P.S. and G.-C.S.; software, B.-A.E. and V.A.; supervision, G.-C.S.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “PubArt” Program, which is financed by the National University of Science and Technology POLITEHNICA Bucharest.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FPGA	Field Programmable Gate Arrays
SMV/SV	Sample Measured Values
HS3	High Speed SMV Subscriber
DFT	Discrete Fourier Transform
RMS	Root Mean Square
IED	Intelligent Electronic Device
NI	National Instruments
CPU	Central Processing Unit
PCIe	Peripheral Component Interconnect Express
SoC	System on Chip
MPSoC	Multi-processor SoC
HDL	Hardware Description Language
IP	Intellectual Property
RTL	Register Transfer Level
DMA	Dynamic Memory Access
EQ	Energy Quality
BRAM	Block RAM
FF	Flip Flop
LUT	Lookup Table
EQA	Energy Quality Accumulator
GOOSE	Generic Object-Oriented Substation Event
MMS	Manufacturing Message Specification
FFT	Fast Fourier Transform
MACC	Multiply Accumulate
SbDFT	Single bin DFT
SAM	SMV-Accumulated Metadata
SSG	SMV Scatter–Gather

References

IEC 61850-9-2 Ed. 2.1; Communication Networks and Systems for Power Utility Automation—Part 9-2: Specific Communication Service Mapping (SCSM)—Sampled Values over ISO/IEC 8802-3. International Electrotechnical Commission: Geneva, Switzerland, 2020.
NI Industrial Communications for IEC 61850. Available online: https://www.ni.com/en/support/downloads/drivers/download.ni-industrial-communications-for-iec-61850.html#374601 (accessed on 15 June 2025).
SoC-e PCI NIC. Available online: https://soc-e.com/products/rely-pcie/ (accessed on 13 June 2025).
Pisla, M.-A.; Enache, B.-A.; Argyriou, V.; Sarigiannidis, P.; Voicila, T.-I.; Seritan, G.-C. High-Speed SMVs Subscriber Design for FPGA Architectures. Electronics 2025, 14, 2135. [Google Scholar] [CrossRef]
Oliván, M.A.; Mareca, A.; Bruna, J.; Cervero, D. An IEC 61850 Sampled Values-based analyzer for power quality applications on smart substations. In Proceedings of the 2021 IEEE 11th International Workshop on Applied Measurements for Power Systems (AMPS), Cagliari, Italy, 29 September–1 October 2021; pp. 1–6. [Google Scholar]
Castello, P.; Sulis, S.; Frigo, G.; Agustoni, M. Power quality meters based on digital inputs: A feasibility study. In Proceedings of the 2022 20th International Conference on Harmonics & Quality of Power (ICHQP), Naples, Italy, 29 May–1 June 2022; pp. 1–6. [Google Scholar]
Yildirim, Ö.; Erişti, B.; Erişti, H.T.; Ünal, S.; Erol, Y.; Demir, Y. FPGA-based online power quality monitoring system for electrical distribution network. Measurement 2018, 121, 109–121. [Google Scholar] [CrossRef]
Luiz, M.M.; Duque, T.F.; Almeida, A.H.S.; Kapisch, E.B.; Silva, L.R.M.; Lima, M.A.A. Power quality parameters calculation using FPGA embedded parallel processors in compliance with the IEC 61000-4-30 standard. J. Control Autom. Electr. Syst. 2022, 33, 1249–1260. [Google Scholar] [CrossRef]
IEEE Std 1159-2019; IEEE Recommended Practice for Monitoring Electric Power Quality. IEEE Power and Energy Society: Piscataway, NJ, USA, 2019. [CrossRef]
Xue, S.; Kasztenny, B.; Voloh, I.; Oyenuga, D. Power System Frequency Measurement for Frequency Relaying. 2007. Available online: https://www.researchgate.net/profile/Bogdan-Kasztenny/publication/321255243_Power_System_Frequency_Measurement_for_Frequency_Relaying/links/5a17313aaca272df0808aafb/Power-System-Frequency-Measurement-for-Frequency-Relaying.pdf (accessed on 15 June 2025).
Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
Slade, G.W. The Fast Fourier Transform in Hardware: A Tutorial Based on an FPGA Implementation. 2002. Available online: https://web.mit.edu/6.111/www/f2017/handouts/FFTtutorial121102.pdf (accessed on 20 June 2025).
Axelberg, P.; Sweden, M.H.J. International Standards for Power Quality Measurement Systems. 2003. Available online: https://www.unipower.se/wp-content/uploads/2022/10/cired2002_paper.pdf (accessed on 15 June 2025).
Williams, D. How to Round Numbers in Verilog. ZipCPU. 2017. Available online: https://zipcpu.com/dsp/2017/07/22/rounding.html (accessed on 18 June 2025).
Li, Y.; Chu, W. A new non-restoring square root algorithm and its VLSI implementations. In Proceedings of the International Conference on Computer Design (ICCD), Austin, TX, USA, 7–9 October 1996; pp. 538–544. [Google Scholar]
Prashar, N.; Singh, B. FPGA implementation of pipelined CORDIC sine cosine digital wave generator. In Proceedings of the The Second International Conference on Computer Science, Engineering and Applications, Delhi, India, 25–27 May 2012; pp. 435–440. [Google Scholar] [CrossRef]
Xilinx. XtremeDSP for Virtex-4 FPGAs User Guide (UG073, Version 2.7); Xilinx Inc.: San Jose, CA, USA, 2008. [Google Scholar]
AMD. Vivado Design Suite: DSP48E2 Slice User Guide (UG958). AMD Inc. n.d. Available online: https://docs.amd.com/r/en-US/ug958-vivado-sysgen-ref/DSP48E2 (accessed on 30 June 2025).
Sampled Values. Available online: https://github.com/mgadelha/Sampled_Values/blob/master/ (accessed on 20 June 2025).

Figure 1. HS3 high-level overview.

Figure 2. EQA Engine dataflow overview.

Figure 3. Single-bin DFT conceptual diagram.

Figure 4. RMS conceptual diagram.

Figure 5. Active power conceptual diagram.

Figure 6. Full period RMS conceptual diagram.

Figure 7. ARV Estimator conceptual diagram.

Figure 8. EQA Engine modules and data flow diagram.

Figure 9. HS3 updated block design.

Figure 10. One period of signal from the recorded samples.

Figure 11. RMST values for 30 signal periods.

Figure 12. Calculated active power.

Figure 13. Calculated ARV.

Figure 14. DFT over 10 signal periods.

Table 1. Comparison with existing market solutions.

Feature/System	HS3	NI CompactRIO + Toolkit	SoC-e RELY-SV-PCIe + SMV IP
Hardware Platform	MPSoC (FPGA + ARM + RT CPU)
IEC 61850 Support	SMV decoding and parsing + embedded EQ analytics	SMV + GOOSE + MMS (via LabVIEW SW), not FPGA-integrated	SMV decoding in FPGA, optional GOOSE and TSN
Energy Quality Analytics	In-FPGA: Frequency (multiple SbDFT with configurable resolution), RMS_T, RMS, and active power on custom sample windows	CPU-based: must be implemented post-SMV decoding in software	RMS_T, Fundamental Frequency, Phase
SMV Decoding Latency	Goal: <5 μs including analytics	Unspecified; software pipeline dependent	<7 µs including analytics
Customization	Fully open HDL + software stack	LabVIEW-based, customizable via toolkit, not low-level FPGA access	Firmware-based; reconfigurable FPGA network interface card
Deployment Readiness	RTL prototype; not dedicated to a single standalone product or hardware platform	Commercial toolkit for deployment-ready hardware/software combination	Production-grade hardware with move-in SMV capability

Table 2. Basic energy quality parameters in discrete form.

Parameter	Symbol	Formula	Description
RMS Voltage	U_RMS	$\sqrt{\frac{1}{N} * \sum_{x = 0}^{N - 1} u_{x}^{2}}$	Voltage magnitude [V]
RMS Current	I_RMS	$\sqrt{\frac{1}{N} * \sum_{x = 0}^{N - 1} i_{x}^{2}}$	Current magnitude [A]
Frequency	X_k	$\sum_{n = 0}^{N} x (n) * e^{- j 2 π k n / N}$	Specific frequency bin used to calculate the magnitude and phase for a target frequency k
⁽¹⁾ Active Power	P	$\frac{1}{N} * \sum_{x = 0}^{N - 1} u_{x} * i_{x}$	Active power used by loads [W]
Apparent Power	S	U_RMS * I_RMS	Apparent power [VA]
Power Factor	PF	$\frac{P}{S}$	Efficiency of power usage—ratio of real to apparent power
Average Rectified Value	U_arv	$\frac{1}{T} * \sum_{x = 0}^{T - 1} \|u_{x}\|$	The rectified average value of the incoming samples, over a full period of the signal
Full Period RMS Value	U_RMST	$\sqrt{\frac{1}{T} * \sum_{x = 0}^{T - 1} u_{x}^{2}}$	Voltage magnitude over a full period of the signal
Peak Absolute Value	U_peakT	${m a x (\|u_{x}\|)}_{x = 0}^{T - 1}$	Absolute maximum value in a signal period
Crest Factor	CF	$\frac{\|U_{p e a k h}\|}{U_{R M S h}}$	Used to detect waveform distortion or voltage spikes
Form Factor	k_F	$\frac{U_{R M S h}}{U_{a r v}}$	Used to detect deviations from the expected waveform

Notes: ⁽¹⁾—Although not an energy quality parameter, the active power calculation is essential for the software to quickly calculate the power factor, using only three hardware generated values (P, U_RMS and I_RMS) instead of data stream evaluation.

Table 3. LSB weight and minimum frequency variation for standardized sample rates.

Counting on 1 Period
F [Hz]	T [ms]	N	Fs [Samples/s]	LSB Weight [ms]	Δf [Hz]
50	20	80	4000	0.25000000	0.63291139
50	20	256	12,800	0.07812500	0.19607843
60	16.67	80	4800	0.20833333	0.75949367
60	16.67	256	15,360	0.06510417	0.23529412

Table 4. DFT frequency resolution.

		Fs [Samples/s]	N [Samples]	Δf [Hz]	Fs [Samples/s]	N [Samples]	Δf [Hz]
	F [Hz]	50			60
Number of signal periods	1	4000	80	50.00	4800	80	60.00
	2		160	25.00		160	30.00
	4		320	12.50		320	15.00
	8		640	6.25		640	7.50
	10		800	5.00		800	6.00
	1	12,800	256	50.00	15,360	256	60.00
	2		512	25.00		512	30.00
	4		1024	12.50		1024	15.00
	8		2048	6.25		2048	7.50
	10		2560	5.00		2560	6.00

Table 5. DFT calculation effort.

N	DFT	FFT R2	FFT R4	SbDFT
80	6400			80
160	25,600			160
320	102,400			320
640	409,600			640
800	640,000			800
256	65,536	2048	1024	256
512	262,144	4608		512
1024	1,048,576	10,240	5120	1024
2048	4,194,304	22,528		2048
2560	6,553,600			2560

Table 6. Memory requirements for SAM.

Parameter/Algorithm	Description	Accumulated Data Size	Index/Tracking Size	Final Result Size	Additional Notes
RMS (U_RMS, I_RMS)	Root Mean Square of voltage/current	96 bits	32 bits	32 bits	Configurable window; right shift used for division
Active Power (P)	Real power: multiplication of voltage and current	96 bits	32 bits	64 bits	Uses two streams for voltage and current data
Frequency (SbDFT)	7× Single-bin DFT to extract frequency components	7 × 2 × 44 bits	12 bits	7 × 2 × 44 bits	Translates current sample index to angles. Index tracking and target bins are shared for all streams contained by the ASDU
Calculated over a signal period
Average Rectified Value (U_arv, I_arv)	Rectified average over one period	40 bits	8 bits	32 bits
Full-Period RMS (U_RMST, I_RMST)	RMS over fixed number of samples	72 bits		32 bits	Multiplication and right shift used for division
Peak Absolute Value (U_peak, I_peakT)	Maximum absolute value	32 bits		32 bits	Accumulated data Size = last peak identified

Table 7. Latency calculations in clock cycles for parameter estimation modules.

Module	Component	Delay to Input	Delay to Output	Ticks	Delay to Result Ready
RMS Estimator	MACC	0 or SSG_Read	8	N	N + 8
	Right Shift	8	2	N	S + 2*(32 + 1) + 8 + 2
	SQRT	8 + 2	32 + 1	S + (32 + 1)	S + 2*(32 + 1) + 8 + 2
Active Power Estimator	MACC	0 or SSG_Read	8	N	N + 8
Active Power Estimator	Right Shift	8	2	N	N + 8 + 2
RMS_T Estimator	MACC	0 or SSG_Read	8	N	N + 8
	Right Shift	8	2	N	S+ 2*(32 + 1) + 8 + 2
	SQRT	8 + 2	32 + 1	S + (32 + 1)	S+ 2*(32 + 1) + 8 + 2
	Multiply and Round	8 + 2	20	N	S + 2*(32 + 1) + 20 + N + 8 + 2
	SQRT	8 + 2	32 + 1	S + (32 + 1)	S + 2*(32 + 1) + 20 + N + 8 + 2
ARV Estimator	ACC	0 or SSG_Read	2	N	N + 2
	Right Shift	2	2	N	N + 2 + 2
	Multiply and Round	2	20	N	N + 2 + 20
Frequency Estimator	Angle Translator—Multiply	0 or SSG_Read	3	N*F	N*F + 3 + 18 + 8 + 2 + 2
	Angle Translator—CORDIC	3	18	N*F
	Complex Mult	3 + 18	8	N*F
	Right Shift	3 + 18 + 8	2	N*F
	Complex Accumulate	3 + 18 + 8 + 2	2	N*F

N = Number of samples in ASDU, S = Number of distinct stream measurements in ASDU, F = Number of calculated frequency bins.

Table 8. Post-implementation resource requirement on Zynq 7020.

	LUTs		FFs		BRAM		DSPs
	Required	% of Available	Required	% of Available	Required	% of Available	Required	% of Available
EQA Engine
SMV SSG	419	0.92%	871	0.81%	0	0.00%	0	0.00%
RMS Est.	1913	3.59%	2317	2.17%			4	1.81%
P Est.	251	0.47%	347	0.32%			4	1.81%
Freq. Est.	972	1.82%	637	0.59%			9	4.09%
RMS_T Est.	2391	4.49%	2753	2.58%			4	1.81%
ARV Est.	536	1.00%	649	0.60%			0	0.00%
Proc. DMA	384	0.72%	192	0.18%	2	1.42%	0	0.00%
Total	6866	12.90%	7766	7.29%	2	1.42%	21	9.54%
SMV Decoding (512 svIDs)
Total	2271	4.26%	2419	2.27%	11	7.85%	0	0.00%
HS3 (EQA Engine + SMV Decoding)
Total	9137	17.17%	10,185	9.57%	13	9.28%	21	9.54%

Table 9. Performance evaluation for different configurations.

Maximum Subscribed svIDs	Latency [μs]	FPGA Utilization %LUT–%FF–%BRAM–%DSP
64	4.851	16.74%–9.41%–4.32%–9.54%
128	4.991	16.88%–9.46%–5.02%–9.54%
256	5.061	17.09%–9.51%–6.42%–9.54%
512	5.171	17.69%–9.57%–9.28%–9.54%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pisla, M.-A.; Enache, B.-A.; Argyriou, V.; Sarigiannidis, P.; Seritan, G.-C. Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis. Electronics 2025, 14, 3297. https://doi.org/10.3390/electronics14163297

AMA Style

Pisla M-A, Enache B-A, Argyriou V, Sarigiannidis P, Seritan G-C. Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis. Electronics. 2025; 14(16):3297. https://doi.org/10.3390/electronics14163297

Chicago/Turabian Style

Pisla, Mihai-Alexandru, Bogdan-Adrian Enache, Vasilis Argyriou, Panagiotis Sarigiannidis, and George-Calin Seritan. 2025. "Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis" Electronics 14, no. 16: 3297. https://doi.org/10.3390/electronics14163297

APA Style

Pisla, M.-A., Enache, B.-A., Argyriou, V., Sarigiannidis, P., & Seritan, G.-C. (2025). Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis. Electronics, 14(16), 3297. https://doi.org/10.3390/electronics14163297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis

Abstract

1. Introduction

2. Related Work and Background

3. Basic Energy Quality Parameters

4. Implementation Results

4.1. Data Flow Analysis of the Implemented Modules

4.2. Hardware Implementation Overview

4.3. Latency Analysis

4.4. Resource Reports

4.5. Performance Evaluation

5. Error Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI