Next Article in Journal
Improving Person Re-Identification with Distance Metric and Attention Mechanism of Evaluation Features
Next Article in Special Issue
One-Dimensional Quaternion Discrete Fourier Transform and an Approach to Its Fast Computation
Previous Article in Journal
Obstacle Avoidance for Automated Guided Vehicles in Real-World Workshops Using the Grid Method and Deep Learning
Previous Article in Special Issue
Self-Tuning Process Noise in Variational Bayesian Adaptive Kalman Filter for Target Tracking
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Relative Jitter Measurement Methodology and Comparison of Clocking Resources Jitter in Artix 7 FPGA

by
Andrzej A. Wojciechowski
1,2,*,
Krzysztof Marcinek
3 and
Witold A. Pleskacz
1
1
Institute of Microelectronics and Optoelectronics, Warsaw University of Technology, ul. Koszykowa 75, 00-662 Warsaw, Poland
2
AAWO Andrzej Wojciechowski, 05-120 Legionowo, Poland
3
ChipCraft Sp. z o.o., 20-262 Lublin, Poland
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(20), 4297; https://doi.org/10.3390/electronics12204297
Submission received: 15 September 2023 / Revised: 7 October 2023 / Accepted: 13 October 2023 / Published: 17 October 2023

Abstract

:
Phase jitter is one of the crucial factors in modern digital electronics, determining the reliability of a design. This paper presents a novel approach to designing a jitter comparison system and methodology for FPGA chips using a Tapped Delay Line (TDL)—commonly used to implement a Time-to-Digital Converter (TDC). The design and its revision utilizing latches replacing some of the flip-flops are presented and discussed, with potential further improvements. A minimal temperature influence is verified and presented. The methodology of automated relative jitter measurements is discussed. Multiple different FPGA clock signal path configurations are measured, and the results are presented. The influence of clock routing is identified as critical when MMCM or PLL modules are omitted. It is demonstrated that with careful resource and routing allocation, the clock signal’s jitter performance does not have to be deteriorated by the absence of jitter filtering blocks. The proposed technique was implemented and verified and relative jitter performance was measured in the AMD/Xilinx Artix 7 35T FPGA platform.

1. Introduction

Jitter performance is an important aspect of modern digital electronics—especially microprocessors, field programmable gate arrays (FPGAs), and application-specific integrated circuit (ASIC) devices. However, only a few papers address the aspect of internal FPGA jitter specifically [1,2,3,4]. The measurement methodology of these publications involved the use of external equipment. For this reason, a different methodology was developed which utilizes only FPGA resources and allows for comparison of the influence of the jitter generated by different elements. Only an external test clock signal is required. While the presented methodology might not be as precise as other approaches, it gives a rough estimate of the jitter level and allows for a relative comparison of the influence of different FPGA resources on the phase noise.
Several jitter measurement techniques based on FPGAs have already been published. The “Follow Me” method [5] utilizes a Digital Clock Manager (DCM) module [6] to track the slow clock edge movements. However, this method is mainly applicable to low-frequency jitter measurements.
The second technique utilizes a Time-to-Digital Converter (TDC) approach [7]. In contrast to [5], this method is applicable for high-frequency jitter measurements. However, in order to minimize the interpolation error, it requires a significant amount of FPGA resources for multiple phase-shifted counters.
Another technique is based on the measurement of the probability density function (PDF) of the edge distribution over one unit interval (UI) [8]. It utilizes two identical D-type flip-flops clocked by two separated clock signals, sampling the measured signal. The number of clock cycles during which the output values of the sampling flip-flops are different from each other is counted. Combined with the phase shift of both sampling clock signals, it allows for the measurement of the edge distribution probability density function.
The aforementioned approaches focus on the absolute jitter measurement. However, in certain cases where a fixed and limited set of components is available, information about the relative jitter level obtained using different components may be sufficient. This is the case of FPGA chips with sets of resources used for clock signal propagation.
The presented jitter comparison technique combines the PDF of the edge distribution concept [8] with a tapped delay line (TDL) often used in TDC applications [7,9]. Some resource placement is manually chosen (in contrast to default automatic placement) to minimize the measurement errors between compared signals. The design is used to compare the jitter level of multiple configurations of input clock signal—including standard fabric instead of dedicated clocking resources.
This paper is organized as follows. Section 2 presents the architecture of the relative jitter measurement system. Section 3 describes the methodology of data acquisition and analysis. The reference measurements and initial concept verification are described in Section 4 The results are presented in Section 5. The paper ends with conclusions.

2. Design Architecture

A commercially available Digilent Arty A7-35T [10] development board was selected as a target device for implementation, verification, and measurements. The platform uses an AMD/Xilinx Artix 7 35T FPGA—specifically the XC7A35TICSG324-1L FPGA chip. This modern FPGA family is designed in a 28 nm process [11] and is a relevant example of currently available FPGA technologies.

2.1. Initial Concept

The relative jitter measuring system is based on the dual TDL-based phase difference detector architecture [12]. The main part of the system consists of two symmetrical TDLs implemented as carry chains in the FPGAs. The output of each tap is connected to a synchronizer consisting of two consecutive flip-flops, synchronizing the delayed signal to a system clock domain. The stored values are mutually compared using an array of XOR gates connected to the synchronizers’ outputs, as shown in Figure 1. The registered TDLs’ taps effectively create a temporary snapshot of an input signal being transmitted through the delay line. As a result, the output vector of the XOR gates is equivalent to the input signal misalignment during a given period of the system clock—assuming the transition of the input signal is captured in both TDLs in a given sample. The outputs of the XOR gates create a vector of differences between both snapshots of the captured input signals in both TDLs. The differences vector is updated every system clock cycle, thus enabling a continuous comparison of the input signals.
Tapped delay lines are often implemented in FPGAs using carry chains. This approach is common in TDC designs, e.g., in [9,13,14,15]. In contrast to other FPGA resources (i.e., lookup tables or LUTs), carry logic uses dedicated carry multiplexers (that can be used as delays) and can be cascaded to form a wider chain [16] in a Configurable Logic Block (CLB) without the use of generic switching fabric. Additionally, each carry multiplexer can be connected to a storage element (configured as a D-type flip-flop or level-sensitive latch) in the same CLB slice (in 7 Series FPGAs each CLB element contains a pair of slices [16]). In Figure 2, a single slice block diagram is depicted, with carry logic highlighted. For these reasons, CARRY4 primitives [17], implementing carry chain, were instantiated in the design.
The placement was manually selected so that both TDL delays match (across all taps) and the system clock source to each corresponding flip-flop delay is matched. All of the aforementioned delay requirements have been precisely matched for all four process corners reported by Xilinx Vivado (with 1 ps precision)—despite the graphically displayed differences in the device view. Until the very last interconnects to the flip-flops in the slices, the system clock signal is transmitted using the same path, as depicted in Figure 3. These results suggest that selected logic cells and their interconnects are the closest implementation of two mirrored and symmetrical TDLs that it is possible to achieve in this FPGA chip. A simplified diagram of the selected Artix 7 FPGA’s clock regions, resources, clock-capable pins, and TDL placement is depicted in Figure 4.
The symmetrical delays from the system clock source to TDL flip-flops ensure that the input signals are captured at the same time, with minimal to no impact from system clock jitter, process, supply voltage, or temperature variations.
The TDLs are connected to the transition detection circuit described in [12]. Once a transition is captured in both TDLs, the XOR-ed difference of the two TDL vectors is stored in a block RAM [18]. The memory can be read and written from a PC to which a FPGA board (in this case Digilent Arty A7-35T [10]) is connected via a USB cable. For this purpose, a combination of AXI Block RAM Controller [19] and JTAG to AXI Master [20] IP blocks has been implemented. This solution enables a block memory access via a Tcl console in a Xilinx Vivado development environment. To avoid additional complexity, all submodules operate in the same system clock domain. The block diagram is depicted in Figure 5.
The maximum configurable width of the AXI bus in the JTAG to AXI Master IP block is 64 bits [20]. The dual port block RAM, with one port set to the data width of 64 bits, can be configured to the maximum data width of 256 bits on the second port [18]. For this reason and to avoid additional complexity, 256 taps was set as the length of both TDLs. The length of 256 taps has later proven to be a sufficient value, which does not limit the operation of the measurement acquisition.
In parallel, the XOR-ed difference of the two TDL vectors is summed and filtered. In order to mitigate the issue of improper TDL lengths compared to the input signal frequency [12], additional logic has been added (named “false minimum removal” in Figure 5). The filter input is set to a maximum value (representing phase misalignment), if:
  • No transition is detected in both TDLs for a certain amount of time;
  • Multiple transitions are detected in the XORed difference of the two TDL vectors—as depicted in Figure 6.
The stream of XOR-ed differences is filtered using a moving average filter (MA filter). The moving average filter is a common smoothing method; it is a time-domain finite impulse response filter. It is quite simple in design as well as in implementation. The most significant parameter that needs to be determined when the MA filter is used is the filter length. Compared to other types of filters, it has the sharpest step response [21].
The filtered value is redirected to the Virtual Input/Output [22] IP block along with the current block memory address. This module enables access to selected signals in the FPGA design from a PC to which the FPGA board is connected, without the need for additional ports.
The Virtual Input/Output block also controls the DRP (Dynamic Reconfiguration Port) controller module. This module is based on the Xilinx Application Note [23] and a reference design. It is responsible for configuring the Mixed-Mode Clock Management (MMCM—an advanced PLL module with programmable phase shift) that is always placed on one of the input signal paths—as depicted in Figure 5. The MMCM is always configured to replicate the input signal frequency to the output signal frequency. The DRP controller is used only to selectively phase shift the signal prior to the TDL input. This is also the reason why the MMCM is always present on one of the input signals paths. This is further discussed in Section 3.
The design uses minimal resources. Table 1 presents the post-implementation resource utilization. The only resource used significantly is BRAM. Block memory is used only to store samples for later readout via JTAG to the AXI Master block. Therefore, a high utilization of this resource was intentional and can be reduced at the expense of larger number of memory readout operations.

3. Measurement Methodology

The measurement acquisition is automated using Tcl scripting and utilizing JTAG to the AXI Master and Virtual Input/Output (VIO) blocks depicted in Figure 5. The acquisition procedure takes seven steps, with the first three steps skipped for the design verification measurements presented in Section 4.1 and Section 4.2:
  • Set the MMCM phase shift value (from 0 to 255) via the VIO and restart the MMCM;
  • Read 16 average measurements (MA filter output values) via the VIO and calculate the final average for the currently selected MMCM phase shift value;
  • After repeating the first and second step for all 256 phase shifts, set the MMCM phase shift equal to the value corresponding to the least average measurement acquired in Step 2;
  • After resetting the BRAM via the VIO, wait for memory to fill up with data;
  • Read the FPGA die temperature using a temperature sensor in the FPGA XADC (system monitor) [24];
  • Read the BRAM data to the connected PC via the JTAG to the AXI Master;
  • Repeat steps four through six fifty times and merge the acquired data and temperatures to two CSV files.
The BRAM block contains 2048 256-bit words. Therefore, the total number of raw samples for each measurement equals 102,400.
Afterward, the raw measurements are processed on a PC. The probability density function (PDF) can be obtained by counting the number of ‘1s’ (the equivalents of the differences between two channels) in each 256-bit word and summing the number of occurrences of each value of the counted number of differences.
The input signals were generated using a Si5351 clock generator. Its maximum output jitter equals 100 ps peak-to-peak [25]. To reduce jitter, two outputs of the generator were programmed in the integer mode at 23.4375—both using the same internal phase-locked loop (PLL). During verification and measurements, each tested channel pin was directly connected to a separate output of the generator– with no additional intermediary such as a tee connector (which could negatively impact signal integrity and amplitude). The system clock frequency was set to 50 MHz and was generated from an on-board oscillator.
The initial verification involved the same signal sent to both TDLs’ inputs. The reported delays from each FPGA input pin to the corresponding TDL input have been matched (with 1 ps precision). Similarly to the connection of the system clock to the TDLs’ flip-flops depicted in Figure 3, the measured signal is transmitted using the same paths until the very last interconnects before the final slices. The delay from the MMCM OUTPUT0 port to each TDL input is the same with 1 ps precision. The signal was transmitted directly to the TDLs’ inputs—with no additional global buffers (BUFG) placed in the path. Such buffers are often added automatically by the tool; therefore, this action had to be manually prevented using design constraints. Theoretically, the global buffer has a rather negative influence [26] or no influence on the signal phase noise. Therefore, no additional buffer and a common clock region were selected. The described auto comparison is considered a reference result for further analysis of the results.

3.1. Calculation of Absolute Jitter

Calculating the absolute or period jitter values [27] based on the obtained differences vector for the presented design is theoretically possible, but difficult. It would require taking into account several effects:
  • The CARRY4 primitive delays reported by Xilinx Vivado are the same for all elements used for TDL implementation, but differ depending on the PVT corner—as depicted in Table 2;
  • The reported delays for each CARRY4 primitive are non-monotonic—as depicted in Table 2;
  • The interconnect delays between the consecutive CARRY4 elements reported by Xilinx Vivado are 0 (with 1 ps precision) for interconnects in a single clock region, and non-zero for interconnects across clock regions;
  • While the paths from the system clock source to the corresponding TDL flip-flops are symmetrical (as described in Section 2.1), the delays for each flip-flop pair are different and non-monotonic—as depicted in Tables S1 and S2 in the Supplementary Materials;
  • The setup and hold time violations (resulting in storage elements metastability) are inherent to the TDL architecture, which is an additional source of Gaussian jitter [28] that would need to be subtracted from the raw measurement. This effect is further discussed in Section 4.1.
The implications of these effects include counting the exact positions of the detected differences in each vector and calculating the ranges of timing differences for each PVT corner individually. As neither the input-to-output delays of the CARRY4 element, nor the delays from the system clock source to the flip-flop pairs are monotonic, the series of ‘1’ bits in the raw results can be interrupted by one or several ‘0’ bits. The temporary value of absolute jitter would need to be calculated for each of more than 100,000 raw samples individually for each measurement, resulting in a need for a significant amount of computing resources.
Taking into account the above considerations, the purpose of the described design is the measurement of relative jitter that enables multiple configurations to be compared. As a result, the most suitable configuration can be selected for a given application and system constraints.

3.2. Relative Entropy Calculation

As the task of obtaining the most valuable absolute jitter calculation was deemed impractical considering limited data on internal Xilinx resources, the comparison of the relative jitter became one of the most important objectives of the described research. The Kullback–Leibler divergence [29], also known as a relative entropy, is one of the most recognized measures to compare two probability distributions. It is a nonnegative function with the following formula:
D K L P | | Q = x χ P ( x ) l o g ( P ( x ) Q ( x ) )
where:
DKL(P‖Q): relative entropy;
χ : sample space;
P, Q: probability distributions.
The results of the Kullback–Leibler divergence calculation for multiple clocking resource configurations are presented in Section 5. It is worth noting that for meaningful results, the compared distributions need to have the same sample space. Moreover, non-zero values of Q distribution are required for proper division. To fulfill these requirements, the shorter distribution was padded with a non-zero guard value, which was very small with respect to the main lobe. All zero positions in P and Q distributions were also replaced with the guard value.

4. Reference Measurements and Initial Concept Revision

4.1. Design Verification

To verify the presented design, both TDLs’ inputs were connected to the same signal source, i.e., the output of the same MMCM primitive [17]. The MMCM block, the clock-capable input pin, and the first taps of TDLs were placed in the X0Y0 clock region, as depicted in Figure 4. Apart from the MMCM feedback loop, no global clock buffers (BUFG) [17] were used. Due to the placement of the TDLs, until the very last interconnects before the first CARRY4 input, the MMCM output signal is transmitted using the same path—similar to the system clock signal paths shown in Figure 3. The reported delay from the MMCM output to each carry chain input is the same for both TDLs.
In the ideal case, the measured differences between the captured signals would be equal to zero 100% of the time. However, the measurements show that only <50% of the captured samples contained zero differences. The exact statistics and distribution are presented in Figure 7a and Table 3. The details of the measurement methodology are described in Section 3.
The difference between the ideal theoretical results and the data actually obtained is partially caused by the metastability of the asynchronous flip-flops in the TDLs (presented in Figure 1). Metastability is an unavoidable phenomenon that might cause state uncertainty in a bistable circuit and variations in its propagation delay. State uncertainty is nondeterministic and depends on the circuit’s sensitivity to the initial condition near the metastable point. It is known that the initial condition is disturbed by thermal noise processes, thus the flip-flop’s operation near the metastable point ensures state randomness [30]. For this reason, the outputs of two corresponding flip-flops from different TDLs are random during input signal transitions. The minimum setup and hold times of CLB flip-flops equal 110 and 220 ps, respectively, ref. [31] and the CARRY4 element delays can be significantly lower than 100 ps (depending on ports and process corner—as depicted in Table 2). As a result, in the absolute worst case, up to 7 flip-flops input signals from CARRY4 outputs can violate the setup timing condition, and up to 18 can violate the hold timing condition—as depicted in Table 4. The timing violations can result in metastability and a greater number of differences between stored TDLs’ words than expected.
Another reason for the difference between the ideal theoretical results and the data actually obtained is non-ideal symmetry between the two TDLs. The FPGAs are not designed for this purpose. As a result, the complementary TDLs’ elements (both carry multiplexers and storage elements) as well as the interconnects differ in timing performance.
After examination of the CARRY4 elements’ timing performance reported by the Vivado tool, it turned out the delays are different for CYINIT and CIN inputs—as shown in Table 2. However, Vivado reports the same delays for each CARRY4 element used. As a result, the design was modified so that the first CARRY4 element was used only as a pass-through to the whole carry chain—with no outputs connected to storage elements. This ensures the same delays for each CARRY4 element in chain.
To mitigate the aforementioned issues, for each delay line tap the first flip-flop in the synchronizer chain has been replaced with a latch—using the LDCE primitive [17]. Latches are known to have better timing performance compared to flip-flops [32,33]. In contrast to flip-flops, a latch register is transparent for a portion of the clock period and stores the input on the clock edge that causes the latch to become opaque. Flip-flops are edge sensitive, and latches are level sensitive [34]. An example of sample acquisition waveform is depicted in Figure 8.
Similarly to the initial two flip-flop approach, the latch+flip-flop configuration has been verified using the auto comparison described previously. The measurements have shown a significant increase in the 0 difference detection number, by over 10 percentage points. The exact statistics and distribution are presented in Figure 9a and Table 3.

4.2. Temperature Influence

The delay time of a carry chain is sensitive to the manufacturing process, supply voltage, and operating temperature (PVT), thus the measurement precision and accuracy could deteriorate due to voltage and temperature variation [35]. Like time generators, the absolute phase comparators are sensitive to process, voltage, and temperature (PVT) variations. On the contrary, the relative time generators and comparators are robust to PVT. However, the relative methods are hampered by path or element mismatches [36].
The operation of the design has been also verified at temperatures below 0 °C. The distributions of detection numbers are presented in Figure 10 and Table 5. The observed temperature influence is minimal. The final deviation from the ideal theoretical distribution is most probably mainly caused by the metastability and element mismatch. Theoretically, using LVT transistors in the TDLs’ latches could further improve the results, as low threshold voltage transistors are known to have lower delay compared to SVT and HVT [37]. The implementation of a dedicated ASIC chip and verification might be an interesting topic for future research.
Apart from the number of zero detected differences (the first bar in Figure 7, Figure 9 and Figure 10), the number of bars (the horizontal axis range of each bar chart in this paper is always equivalent to the last non-zero result) and overall results distribution are equally important.

5. Results

All FPGA chips contain several types of I/O pins. One of their attributes is being a clock-capable pin or not. The clock-capable pins can access either a single or multiple clock region and the global clock tree, as well as other CMTs above and below in the same column [38] using clock resources. The non-clock-capable pins are not directly connected to clock resources and to be utilized as clock signal inputs, they need to be routed using general routing matrices. Both kinds of pins were compared in multiple configurations of the input clock signal. During all measurements, signals with the same frequency (generated in the same PLL in the Si5351 generator) were separately connected to the input pins.
The results are divided into five groups:
  • Involving a common input pin and a single common clock region for TDLs and both compared signal sources;
  • Involving a common input pin and a separate clock region for TDLs with a reference signal source and a second clock region for a compared signal source;
  • Involving a separate input pin and a single common clock region for TDLs and both compared signal sources;
  • Involving a separate input pin (in common I/O bank) and a separate clock region for TDLs with a ref. signal source and second clock region for a compared signal source;
  • Involving a separate input pin (in different I/O bank) and a separate clock region for TDLs with a ref. signal source and second clock region for a compared signal source.
All detailed results are presented in Tables S3–S9 in the Supplementary Materials.

5.1. Common Input Pin and Single Common Clock Region

The TDLs are placed in the X0Y0 clock region—as depicted in Figure 4. Therefore, the common clock region comparisons involve only resources located in the X0Y0 clock region (e.g., MMCM, PLL, input buffers (IBUF)) and between the contiguous clock regions (e.g., global buffers (BUFG)). The input pin used for these tests is clock-capable T14 (named CK_IO5 in Arty A7 board). Table 6 presents the schematic diagrams and the measurement results at ambient temperature of multiple connection configurations involving a common input pin and a single common clock region. Table S3 presents the detailed results—both absolute and percentage numbers.

5.2. Common Input Pin and a Separate Clock Region

In contrast to the previous tests, the placement of the MMCM or PLL blocks in a different clock region can both positively and negatively impact the jitter level. For this test the X0Y1 clock region was selected. A smaller number of clock signals in the X0Y1 clock region can result in less noise in the measured traces. At the same time the clock region crossing might introduce additional jitter. Due to the FPGA architecture, the global buffers are required for transmitting the signal in and out of the contiguous clock regions. Table 7 presents the schematic diagrams and the measurements results at ambient temperature of multiple connection configurations involving a common input pin and a separate clock region. Table S4 presents the detailed results—both absolute and percentage numbers.

5.3. Separate Input Pin in the Same Clock Region and a Single Common Clock Region

The TDLs are placed in the X0Y0 clock region—as depicted in Figure 4. Therefore, the common clock region comparisons involve only resources located in the X0Y0 clock region (e.g., MMCM, PLL, input buffers (IBUF)) and between the contiguous clock regions (e.g., global buffers (BUFG)). The input pins used for these tests are from the same X0Y0 clock region and the same I/O bank:
  • Clock-capable T14 pin (named CK_IO5 in Arty A7 board),
  • Clock-capable P15 pin (named CK_IO33 in Arty A7 board),
  • Non-clock-capable T16 pin (named CK_IO7 in Arty A7 board).
Table 8 presents the schematic diagrams and the measurements results at ambient temperature of multiple connection configurations involving a common input pin and single common clock region. Tables S5 and S6 present the detailed results—both absolute and percentage numbers.

5.4. Separate Input Pin (in Common I/O Bank) and a Separate Clock Region

In contrast to the previous tests, the placement of the MMCM or PLL blocks in a different clock region can impact the jitter level either positively or negatively. For this test, the X0Y1 clock region was selected. The tested input signals are received through clock-capable and non-clock-capable pins in the X0Y0 clock region.
A lower number of clock signals in the X0Y1 clock region can result in less noise in the measured traces. At the same time the clock region crossing might introduce additional jitter. Due to the FPGA architecture, the global buffers are required for transmitting the signal in and out of the contiguous clock regions. Table 9 presents the schematic diagrams and the measurements results at ambient temperature of multiple connection configurations involving a common input pin and a separate clock region. Table S7 presents the detailed results—both absolute and percentage numbers.

5.5. Separate Input Pin (in Different I/O Bank) and a Separate Clock Region

The TDLs are placed in the X0Y0 clock region—as depicted in Figure 4. Separate clock region comparisons involve resources located in the X0Y1 clock region (e.g., MMCM, PLL, input buffers (IBUF)) and between the contiguous clock regions (e.g., global buffers (BUFG)). The input pins used for these tests are from a separate X0Y1 clock region and matching I/O bank:
  • Clock-capable E15 pin (named JB1 in Arty A7 board),
  • Non-clock-capable J17 pin (named JB7 in Arty A7 board).
Table 10 presents the schematic diagrams and the measurements results at ambient temperature of multiple connection configurations involving a common input pin and a separate clock region. Tables S8 and S9 present the detailed results—both absolute and percentage numbers.

5.6. Comparison of the Results

Table 11 presents the calculated values of relative entropy for each obtained measurement in ascending order. The DKL(P||Q) calculations were performed for two reference probability distributions Q:
  • Ideal theoretical PDF of 100% zero differences detected;
  • Empirical PDF of auto compared latch+flip-flop configuration, stated in Section 4.1.
Table 11. Calculated relative entropies for obtained measurements, with theoretical (ideal) and empirical reference.
Table 11. Calculated relative entropies for obtained measurements, with theoretical (ideal) and empirical reference.
Section/
Config. Number
Result NameRelative Entropy
(Ideal Ref.)
Relative Entropy
(Empirical Ref.)
4.1IBUF + MMCM auto compare—latch+flip-flop—ambient temp.5.152350
4.2IBUF + MMCM auto compare—latch+flip-flop—freeze -> ambient5.226540.000155105
4.2IBUF + MMCM auto compare—latch+flip-flop—freeze5.42750.00167469
5.5.1Sep. IBUF (CC X0Y1 pin)5.400740.602199
4.1IBUF + MMCM auto compare—2xflip-flop—ambient temp.6.620130.732354
5.5.9Sep. IBUF (nCC X0Y1 pin) + MMCM (X0Y1) + BUFG6.135250.7401
5.5.2Sep. IBUF (CC X0Y1 pin) + BUFG5.459640.818643
5.5.7Sep. IBUF (nCC X0Y1 pin)5.372730.989554
5.5.6Sep. IBUF (CC X0Y1 pin) + BUFG + PLL (X0Y1) + BUFG6.142031.07626
5.5.5Sep. IBUF (CC X0Y1 pin) + PLL (X0Y1) + BUFG6.26091.22945
5.4.2Sep. IBUF (CC X0Y0 pin) + BUFG + PLL (X0Y1) + BUFG6.288311.37145
5.3.10Sep. IBUF (nCC X0Y0 pin) + BUFG + PLL6.271781.42141
5.5.8Sep. IBUF (nCC X0Y1 pin) + BUFG5.62761.47346
5.3.4Sep. IBUF (CC X0Y0 pin) + BUFG + PLL6.279081.50762
5.3.7Sep. IBUF (nCC X0Y0 pin)5.671751.98895
5.5.11Sep. IBUF (nCC X0Y1 pin) + PLL (X0Y1) + BUFG6.410652.04863
5.1.1Com. IBUF5.666292.05207
5.5.12Sep. IBUF (nCC X0Y1 pin) + BUFG + PLL (X0Y1) + BUFG6.426392.10493
5.5.10Sep. IBUF (nCC X0Y1 pin) + BUFG + MMCM (X0Y1) + BUFG6.530752.11106
5.4.3Sep. IBUF (nCC X0Y0 pin) + BUFG + MMCM (X0Y1) + BUFG6.632872.28577
5.1.6Com. IBUF + PLL + BUFG6.568712.40495
5.3.6Sep. IBUF (CC X0Y0 pin) + BUFG + PLL + BUFG6.587612.94546
5.3.1Sep. IBUF (CC X0Y0 pin)5.561032.98708
5.3.3Sep. IBUF (CC X0Y0 pin) + PLL6.664213.00739
5.3.11Sep. IBUF (nCC X0Y0 pin) + PLL + BUFG6.81343.20675
5.3.8Sep. IBUF (nCC X0Y0 pin) + BUFG5.848053.30594
5.4.4Sep. IBUF (nCC X0Y0 pin) + BUFG + PLL (X0Y1) + BUFG6.637023.48722
5.4.1Sep. IBUF (CC X0Y0 pin) + BUFG + MMCM (X0Y1) + BUFG6.730663.52268
5.3.2Sep. IBUF (CC X0Y0 pin) + BUFG5.869493.79135
5.1.7Com. IBUF + BUFG + PLL6.71783.89694
5.5.4Sep. IBUF (CC X0Y1 pin) + BUFG + MMCM (X0Y1) + BUFG6.802474.36376
5.3.12Sep. IBUF (nCC X0Y0 pin) + BUFG + PLL + BUFG6.81814.65398
5.2.2Com. IBUF + BUFG + MMCM (X0Y1) + BUFG7.028625.37672
5.2.2Com. IBUF + BUFG + PLL (X0Y1) + BUFG6.88685.59853
5.1.2Com. IBUF + BUFG6.344825.75471
5.1.4IBUF + MMCM (FB BUFG)—OUTPUT1 + BUFG7.244515.79941
5.1.3IBUF + MMCM (FB BUFG)—OUTPUT17.267375.80405
5.1.8Com. IBUF + BUFG + PLL + BUFG7.125675.88812
5.3.9Sep. IBUF (nCC X0Y0 pin) + PLL7.126886.1222
5.3.5Sep. IBUF (CC X0Y0 pin) + PLL + BUFG7.208296.12283
5.5.3Sep. IBUF (CC X0Y1 pin) + MMCM (X0Y1) + BUFG7.152396.14977
5.1.5Com. IBUF + PLL7.174646.20072
It is worth noting that the calculated relative entropy for the reference measurements compared to the ideal theoretical distribution is the lowest of all obtained results. Also, the change in relative entropy at different temperatures is negligible. In comparison, the initial design (utilizing two consecutive flip-flops—presented on Figure 7) noticeably differs in relative entropy comparisons. The initial design deviates from the ideal theoretical PDF notably more than the design utilizing the latch+flip-flop approach.
From the results in Table 11 we can deduce that almost all results utilizing an input clock pin located in the X0Y1 clock region have a relatively low Kullback–Leibler divergence, regardless of the pin’s clock capability. These results suggests that the signal routing and congestion of the neighboring resources has significant influence on the signal jitter. Even if these resources are not connected with the analyzed signal, their operation can greatly impact the jitter performance.

6. Conclusions

As expected, the worst results were obtained when any jitter filtering block (MMCM or PLL) was omitted. Particularly configuration 5.1.2 (Common IBUF + BUFG), configuration 5.3.1 (separate IBUF (clock-capable pin in the same clock region)), configuration 5.3.2 (Separate IBUF (clock-capable pin in the same clock region) + BUFG), configuration 5.3.7 (Separate IBUF (non-clock-capable pin in the same clock region)) and configuration 5.3.8 (Separate IBUF (non-clock-capable pin in the same clock region) + BUFG) present the widest spread of the number of the detected differences—over 25 differences were detected at least once. Both clock-capable and non-clock-capable input pins result in relatively high jitter levels. However, regardless of the clock-capability of the input pin, the addition of a MMCM or PLL block significantly reduces the jitter level—which confirms the theoretical assumptions. Notably, the absolute worst results were obtained for separate clock-capable pin configurations rather than for non-clock-capable pin configurations. This suggests that for configurations with no MMCM or PLL blocks the clock signal routing is crucial for jitter level.
In every tested configuration omitting the usage of jitter filtering modules (MMCM or PLL), the addition of a global buffer clearly worsened the obtained results. Specifically, this can be observed by comparing the results:
  • Common IBUF (configuration 5.1.1, relative entropy: 5.66629) vs. Common IBUF + BUFG (configuration 5.1.2, relative entropy: 5.75471);
  • Separate clock-capable pin IBUF in X0Y0 region (configuration 5.3.1, relative entropy: 2.98708) vs. separate clock-capable pin IBUF in X0Y0 region + BUFG (configuration 5.3.2, relative entropy: 3.79135);
  • Separate non-clock-capable pin IBUF in X0Y0 region (configuration 5.3.7, relative entropy: 1.98895) vs. separate non-clock-capable pin IBUF in X0Y0 region + BUFG (configuration 5.3.8, relative entropy: 3.30594);
  • Separate clock-capable pin IBUF in X0Y1 region (configuration 5.5.1, relative entropy: 0.602199) vs. separate clock-capable pin IBUF in X0Y1 region + BUFG (configuration 5.5.2, relative entropy: 0.818643);
  • Separate non-clock-capable pin IBUF in X0Y1 region (configuration 5.5.7, relative entropy: 0.989554) vs. separate non-clock-capable pin IBUF in X0Y1 region + BUFG (configuration 5.5.8, relative entropy: 1.47346).
In these cases, the additional global buffers effectively extended the traces required to route the design. As a result, routing the signal path closer and in longer segments to other clock signal paths (e.g., a system clock signal) with no jitter filtering modules causes interference from the aggressor signal. The results suggest that keeping the internal routing short and avoiding the usage of global buffers (BUFG) when these are not necessary, is beneficial in terms of jitter level. This conclusion coincides with the general intuitive understanding of FPGA design.
Surprisingly, the best results (the most similar to the reference—Figure 9) were obtained for a separate input pin in a different I/O bank and a separate clock region, regardless of the input pin type—configuration 5.5.1 (separate IBUF (clock-capable pin in X0Y1 clock region)), configuration 5.5.2 (separate IBUF (clock-capable pin in X0Y1 clock region) + BUFG), configuration 5.5.5 (separate IBUF (clock-capable pin in X0Y1 clock region) + X0Y1 region PLL + BUFG), configuration 5.5.6 (separate IBUF (clock-capable pin in X0Y1 clock region) + BUFG + X0Y1 clock region PLL + BUFG), configuration 5.5.7 (separate IBUF (non-clock-capable pin in X0Y1 clock region)), configuration 5.5.8 (separate IBUF (non-clock-capable pin in X0Y1 clock region) + BUFG), configuration 5.5.9 (separate IBUF (non-clock-capable pin in X0Y1 clock region) + X0Y1 clock region MMCM + BUFG), configuration 5.5.10 (separate IBUF (non-clock-capable pin in X0Y1 clock region) + BUFG + X0Y1 clock region MMCM + BUFG) and 5.5.11 (separate IBUF (non-clock-capable pin in X0Y1 clock region) + X0Y1 clock region PLL + BUFG). Particularly, the configurations omitting MMCM and PLL blocks. These results show that even non-clock-capable input pins can be successfully used as clock input with low jitter—even without jitter filtering. The most probable cause of these results is the usage of a very weakly utilized X0Y1 clock region. Hardly any resource was used in this part of the FPGA chip; therefore, the input signal was not interfered with by other signals. Considering that the usage of additional global buffers can slightly deteriorate the results, reaffirms this conclusion.
The conclusion that the jitter level is related to the resource utilization of a clock region with different and asynchronous clock signals is further justified by comparing the results of similar configurations, that use resources placed in different clock regions. Specifically, this can be observed by comparing the results:
  • Separate clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y1) + BUFG (configuration 5.4.2, relative entropy: 1.37145) vs. separate clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) + BUFG (configuration 5.3.6, relative entropy: 2.94546);
  • Separate non-clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y1) + BUFG (configuration 5.4.4, relative entropy: 3.48722) vs. separate non-clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) + BUFG (configuration: 5.3.12, relative entropy: 4.65398).
In both cases, the only difference between the compared configurations is the PLL block placement. Regardless of the clock capability of the input pin, the PLL block placement in a less utilized clock region resulted in a lower jitter level.
The measurements comparison of configurations with and without a global buffer between the input buffer and the jitter filtering block (MMCM or PLL) gives non-uniform results. In five out of seven cases, the additional BUFG lowered the jitter level—often significantly. In the remaining two cases, the jitter level increased in the configuration with an additional buffer—in one of these cases the difference was relatively small. Specifically, this can be observed by comparing the results:
  • Separate non-clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) (configuration 5.3.10, relative entropy: 1.42141) vs. separate non-clock-capable pin IBUF in X0Y0 region + PLL (X0Y0) + BUFG (configuration 5.3.9, relative entropy: 6.1222);
  • Separate clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) (configuration 5.3.4, relative entropy: 1.50762) vs. separate clock-capable pin IBUF in X0Y0 region + PLL (X0Y0) + BUFG (configuration 5.3.3, relative entropy: 3.00739);
  • Common IBUF (configuration 5.1.7, relative entropy: 3.89694) vs. common IBUF + BUFG (configuration 5.1.5, relative entropy: 6.20072);
  • Separate clock-capable pin IBUF in X0Y1 region + BUFG + MMCM (X0Y1) + BUFG (configuration 5.5.4, relative entropy: 4.36376) vs. separate clock-capable pin IBUF in X0Y1 region + MMCM (X0Y1) + BUFG (configuration 5.5.3, relative entropy: 6.14977);
  • Separate non-clock-capable pin IBUF in X0Y1 region + MMCM (X0Y1) + BUFG (configuration 5.5.9, relative entropy: 0.7401) vs. separate non-clock-capable pin IBUF in X0Y1 region + BUFG + MMCM (X0Y1) + BUFG (configuration 5.5.10, relative entropy: 2.11106);
  • Separate clock-capable pin IBUF in X0Y1 region + BUFG + PLL (X0Y1) + BUFG (configuration 5.5.6, relative entropy: 1.07626) vs. separate clock-capable pin IBUF in X0Y1 region + PLL (X0Y1) + BUFG (configuration 5.5.5, relative entropy: 1.22945);
  • Separate non-clock-capable pin IBUF in X0Y1 region + PLL (X0Y1) + BUFG (configuration 5.5.11, relative entropy: 2.04863) vs. separate non-clock-capable pin IBUF in X0Y1 region + BUFG + PLL (X0Y1) + BUFG (configuration 5.5.12, relative entropy: 2.10493).
The measurements comparison of configurations with and without a global buffer placed after the jitter filtering block (MMCM or PLL) gives non-uniform results as well, though more ambiguous than in the previous part. In four out of seven cases, the additional BUFG lowered the jitter level. In the remaining three cases, the jitter level increased in the configuration with an additional buffer. Specifically, this can be observed by comparing the results:
  • Separate clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) (configuration 5.3.4, relative entropy: 1.50762) vs. separate clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) + BUFG (configuration 5.3.6, relative entropy: 2.94546);
  • Separate clock-capable pin IBUF in X0Y0 region + PLL (X0Y0) (configuration 5.3.3, relative entropy: 3.00739) vs. separate clock-capable pin IBUF in X0Y0 region + PLL (X0Y0) + BUFG (configuration 5.3.5, relative entropy: 6.12283);
  • Separate non-clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) (configuration 5.3.10, relative entropy: 1.42141) vs. separate non-clock-capable pin IBUF in X0Y0 region + BUFG + PLL (X0Y0) + BUFG (configuration 5.3.12, relative entropy: 4.65398);
  • Separate non-clock-capable pin IBUF in X0Y0 region + PLL (X0Y0) + BUFG (configuration 5.3.11, relative entropy: 3.20675) vs. separate non-clock-capable pin IBUF in X0Y0 region + PLL (X0Y0) (configuration 5.3.9, relative entropy: 6.1222);
  • Common IBUF + BUFG + PLL (X0Y0) (configuration 5.1.7, relative entropy: 3.89694) vs. Common IBUF + BUFG + PLL (X0Y0) + BUFG (configuration 5.1.8, relative entropy: 5.88812);
  • Common IBUF + PLL (X0Y0) + BUFG (configuration 5.1.6, relative entropy: 2.40495) vs. Common IBUF + PLL (X0Y0) (configuration 5.1.5, relative entropy: 6.20072);
  • Separate MMCM output + BUFG (configuration 5.1.4, relative entropy: 5.79941) vs. Separate MMCM output (configuration 5.1.3, relative entropy: 5.80405).
It can be concluded that with a very careful FPGA resource allocation and routing, any input pin (both clock capable and standard) can be successfully used as a clock signal input, even without a MMCM or PLL block. However, this approach is generally not recommended, as the carelessness in resource placement can lead to a very high jitter level when jitter filtering blocks (MMCM or PLL) are omitted, and the results can vary significantly in different chips. This can potentially be a topic for further research.
The presented relative jitter measurement design and methodology can be applied to different FPGA families, enabling the comparison of different clocking resources and paths.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/electronics12204297/s1, Table S1: Delays of system clock to first flip-flops/latches in TDLs (reported by Xilinx Vivado); Table S2: Delays of system clock to second flip-flops in TDLs (reported by Xilinx Vivado); Table S3: Distribution of the number of detected differences in different configurations with common input pin and a single clock region, at ambient temperature; Table S4: Distribution of the number of detected differences in different configurations with common input pin and a separate clock region, at ambient temperatures; Table S5: Distribution of the number of detected differences in different configurations with separate clock-capable input pin in the same clock region and a single common clock region, at ambient temperatures; Table S6: Distribution of the number of detected differences in different configurations with separate non clock-capable input pin in the same clock region and a single common clock region, at ambient temperatures; Table S7: Distribution of the number of detected differences in different configurations with separate input pin in the same clock region and a separate clock region, at ambient temperatures; Table S8: Distribution of the number of detected differences in different configurations with separate clock-capable input pin in separate clock region and a separate clock region, at ambient temperatures; Table S9: Distribution of the number of detected differences in different configurations with separate non clock-capable input pin in separate clock region and a separate clock region, at ambient temperatures.

Author Contributions

Conceptualization, methodology, software, validation, investigation, resources, data curation, visualization, writing—original draft preparation, A.A.W.; formal analysis, A.A.W. and K.M.; writing—review and editing, K.M. and W.A.P.; funding acquisition, W.A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article and supplementary material.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zamek, I.; OnWong, M.; Boyle, P.; Daud, N.; Soh, L.N.; Teng, H.L.; Fong, C.S. A study of jitter effects in nm-FPGA based on various physical and electrical quantities. In Proceedings of the 2007 Asia-Pacific Conference on Applied Electromagnetics, Melaka, Malaysia, 4–6 December 2007. [Google Scholar]
  2. Kho, J.; Loh, C.I.; Moo, W.H.; Fong, C.S.; Wong, M.O. Extended analysis of SSN effect on phase-locked loop (PLL) circuit. In Proceedings of the 2009 IEEE Electrical Design of Advanced Packaging & Systems Symposium (EDAPS), Hong Kong, China, 2–4 December 2009; pp. 1–4. [Google Scholar]
  3. Teng, H.L.; Sun, S.; Wong, M.O.; Boyle, P.; Fong, C.S. A study of the relationship between on-chip power distribution network voltage noise, charge per clock cycle, on-chip decoupling capacitance and clock jitter in a 40-nm field programmable gate array test chip. In Proceedings of the 2010 International Conference on Applications of Electromagnetism and Student Innovation Competition Awards (AEM2C), Taipei, Taiwan, 11–13 August 2010; pp. 75–79. [Google Scholar]
  4. Aloisio, A.; Giordano, R.; Izzo, V. Jitter issues in clock conditioning with FPGAs. In Proceedings of the 2010 17th IEEE-NPSS Real Time Conference, Lisbon, Portugal, 24–28 May 2010. [Google Scholar]
  5. Borgosz, J. “Follow Me”—Digital Jitter Measurement Method. Meas. Sci. Rev. 2006, 6, 30–33. [Google Scholar]
  6. Xilinx Inc. Digital Clock Manager (DCM) Module Data Sheet (DS485); Xilinx Inc.: San Jose, CA, USA, 2009. [Google Scholar]
  7. Marins, C.N.M.; Kaufmann, P.; Júnior, A.A.F.; Paiva, M.; Swart, J.W. New Jitter Measurement Technique Using TDC Principle in a FPGA Component. In Proceedings of the Seminário Internacional de Metrologia Elétrica—VIII SEMETRO, João Pessoa, Brazil, 17–19 June 2009. [Google Scholar]
  8. Kubíček, M. In-system jitter measurement using FPGA. In Proceedings of the 20th International Conference Radioelektronika 2010, Brno, Czech Republic, 19–21 April 2010. [Google Scholar]
  9. Szplet, R.; Czuba, A. Two-Stage Clock-Free Time-to-Digital Converter Based on Vernier and Tapped Delay Lines in FPGA Device. Electronics 2021, 10, 2190. [Google Scholar] [CrossRef]
  10. Digilent. Arty A7. Available online: https://digilent.com/reference/programmable-logic/arty-a7/start (accessed on 5 July 2023).
  11. Xilinx Inc. Extending 28 nm Leadership with an Expanded Portfolio and Lower Power; Xilinx Inc.: San Jose, CA, USA, 2015. [Google Scholar]
  12. Wojciechowski, A.A.; Marcinek, K.; Pleskacz, W.A. Dual TDL Based Phase Difference Detector Architecture. In Proceedings of the 2023 30th International Conference on Mixed Design of Integrated Circuits and System (MIXDES), Kraków, Poland, 29–30 June 2023. [Google Scholar]
  13. Sui, T.; Zhao, Z.; Xie, S.; Xie, Y.; Zhao, Y.; Huang, Q.; Xu, J.; Peng, Q. A 2.3-ps RMS Resolution Time-to-Digital Converter Implemented in a Low-Cost Cyclone V FPGA. IEEE Trans. Instrum. Meas. 2019, 68, 3647–3660. [Google Scholar] [CrossRef] [PubMed]
  14. Dikopoulos, E.; Birbas, M.; Birbas, A. An Adaptive Downsampling FPGA-Based TDC Implementation for Time Measurement Improvement. Chips 2022, 1, 175–190. [Google Scholar] [CrossRef]
  15. Tontini, A.; Gasparini, L.; Pancheri, L.; Passerone, R. Design and characterization of a low-cost FPGA-based TDC. IEEE Trans. Nucl. Sci. 2018, 65, 680–690. [Google Scholar] [CrossRef]
  16. Xilinx Inc. 7 Series FPGAs Configurable Logic Block User Guide (UG474); Xilinx Inc.: San Jose, CA, USA, 2016. [Google Scholar]
  17. Xilinx Inc. Vivado Design Suite 7 Series FPGA and Zynq 7000 SoC Libraries Guide (UG953); Xilinx Inc.: San Jose, CA, USA, 2023. [Google Scholar]
  18. Xilinx Inc. Block Memory Generator v8.4 Product Guide (PG058); Xilinx Inc.: San Jose, CA, USA, 2021. [Google Scholar]
  19. Xilinx Inc. AXI Block RAM (BRAM) Controller v4.1 Product Guide (PG078); Xilinx Inc.: San Jose, CA, USA, 2019. [Google Scholar]
  20. Xilinx Inc. JTAG to AXI Master v1.2 Product Guide (PG174); Xilinx Inc.: San Jose, CA, USA, 2021. [Google Scholar]
  21. Al-Mbaideen, A.A. Application of Moving Average Filter for the Quantitative Analysis of the NIR Spectra. J. Anal. Chem. 2019, 74, 686–692. [Google Scholar] [CrossRef]
  22. Xilinx Inc. Virtual Input/Output v3.0 Product Guide (PG159); Xilinx Inc.: San Jose, CA, USA, 2018. [Google Scholar]
  23. Tatsukawa, J. MMCM and PLL Dynamic Reconfiguration Application Note v1.8 (XAPP888); Xilinx Inc.: San Jose, CA, USA, 2019. [Google Scholar]
  24. Xilinx Inc. 7 Series FPGAs and Zynq-7000 SoC XADC Dual 12-Bit 1 MSPS Analog-to-Digital Converter User Guide (UG480); Xilinx Inc.: San Jose, CA, USA, 2022. [Google Scholar]
  25. Silicon Laboratories. Si5351A/B/C; Silicon Laboratories: Austin, TX, USA, 2011. [Google Scholar]
  26. Skyworks Solutions Inc. Understanding and Optimizing Clock Buffer’s Additive Jitter Performance (AN766); Skyworks Solutions Inc.: Irvine, CA, USA, 2021. [Google Scholar]
  27. Gardner, F.M. Phaselock Techniques, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
  28. Serra, P.C.; Conklin, J.W. On-Chip System for Fast, High-Range, High-Precision Measurements of Delays. IEEE Trans. Instrum. Meas. 2020, 69, 5243–5250. [Google Scholar] [CrossRef]
  29. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  30. Wieczorek, P.Z. Dual-metastability FPGA-based true random number generator. Electron. Lett. 2013, 49, 744–745. [Google Scholar] [CrossRef]
  31. Xilinx Inc. Artix-7 FPGAs Data Sheet: DC and AC Switching Characteristics (DS181); Xilinx Inc.: San Jose, CA, USA, 2022. [Google Scholar]
  32. Yoshikawa, K.; Kanamaru, K.; Inui, S.; Hagihara, Y.; Nakamura, Y.; Yoshimura, Y. Timing optimization by replacing flip-flops to latches. In Proceedings of the ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753), Yokohama, Japan, 27–30 January 2004; pp. 186–191. [Google Scholar]
  33. Vasumathi, S.P.; Murlidharan, D. A survey on flip flop replacement to latch on various design. Int. J. Pure Appl. Math. 2018, 119, 13453–13467. [Google Scholar]
  34. Chinnery, D.; Keutzer, K. Reducing the Timing Overhead. In Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design; Springer: Boston, MA, USA, 2002; pp. 57–100. [Google Scholar]
  35. Mao, X.; Yang, F.; Wei, F.; Shi, J.; Cai, J.; Cai, H. A Low Temperature Coefficient Time-to-Digital Converter with 1.3 ps Resolution Implemented in a 28 nm FPGA. Sensors 2022, 22, 2306. [Google Scholar] [CrossRef] [PubMed]
  36. Yan, C.; Hu, C.; Wu, J. A High Resolution Vernier Digital-to-Time Converter Implemented with 65 nm FPGA. Appl. Sci. 2019, 9, 2705. [Google Scholar] [CrossRef]
  37. Bosco, K.J.; Pavalam, S.M.; Mpamije, L.J. Fundamental Flip-Flop Design: Comparative Analysis. J. VLSI Circuits Syst. 2023, 5, 1–7. [Google Scholar]
  38. Xilinx Inc. 7 Series FPGAs Clocking Resources User Guide (UG472); Xilinx Inc.: San Jose, CA, USA, 2018. [Google Scholar]
Figure 1. TDL-based phase difference detector architecture diagram.
Figure 1. TDL-based phase difference detector architecture diagram.
Electronics 12 04297 g001
Figure 2. Diagram of SLICEM [16], with carry logic path and connections to storage elements highlighted.
Figure 2. Diagram of SLICEM [16], with carry logic path and connections to storage elements highlighted.
Electronics 12 04297 g002
Figure 3. Selected section of an implemented device view illustrating the system clock to TDL flip-flops connection. The orange elements are parts of the TDLs.
Figure 3. Selected section of an implemented device view illustrating the system clock to TDL flip-flops connection. The orange elements are parts of the TDLs.
Electronics 12 04297 g003
Figure 4. Simplified block diagram of the selected FPGA clocking resources and TDL placement in AMD/Xilinx Artix 7 35T FPGA.
Figure 4. Simplified block diagram of the selected FPGA clocking resources and TDL placement in AMD/Xilinx Artix 7 35T FPGA.
Electronics 12 04297 g004
Figure 5. Simplified block diagram of data acquisition system.
Figure 5. Simplified block diagram of data acquisition system.
Electronics 12 04297 g005
Figure 6. Example waveform of the captured input signals resulting in a false minimum detection. All registers are in the same clock domain.
Figure 6. Example waveform of the captured input signals resulting in a false minimum detection. All registers are in the same clock domain.
Electronics 12 04297 g006
Figure 7. Measured number of differences using TDLs with 2x flip-flop configuration at ambient temperature: (a) measurement results; (b) simplified schematic diagram of the corresponding implemented TDL.
Figure 7. Measured number of differences using TDLs with 2x flip-flop configuration at ambient temperature: (a) measurement results; (b) simplified schematic diagram of the corresponding implemented TDL.
Electronics 12 04297 g007
Figure 8. Example waveform of latch+flip-flop pair sample acquisition.
Figure 8. Example waveform of latch+flip-flop pair sample acquisition.
Electronics 12 04297 g008
Figure 9. Measured number of differences using TDLs with latch+flip-flop configuration at ambient temperature: (a) measurement results; (b) simplified schematic diagram of the corresponding implemented TDL.
Figure 9. Measured number of differences using TDLs with latch+flip-flop configuration at ambient temperature: (a) measurement results; (b) simplified schematic diagram of the corresponding implemented TDL.
Electronics 12 04297 g009
Figure 10. Measured number of differences using TDLs with latch+flip-flop configuration at temperatures ranging (a) from −4.9 °C to −1.4 °C; (b) from 21.4 °C to 30.3 °C.
Figure 10. Measured number of differences using TDLs with latch+flip-flop configuration at temperatures ranging (a) from −4.9 °C to −1.4 °C; (b) from 21.4 °C to 30.3 °C.
Electronics 12 04297 g010
Table 1. Total resource utilization of the implemented design.
Table 1. Total resource utilization of the implemented design.
ResourceUtilizationAvailableUtilization %
LUT311620,80014.98%
LUTRAM37296003.88%
FF444241,60010.68%
BRAM445088%
DSP0900%
Table 2. Delays of CARRY4 element reported by Xilinx Vivado for four PVT corners.
Table 2. Delays of CARRY4 element reported by Xilinx Vivado for four PVT corners.
CARRY4 InputCARRY4 OutputFAST_MAX CornerFAST_MIN CornerSLOW_MAX CornerSLOW_MIN Corner
CYINITCO0206 ps165 ps536 ps432 ps
CYINITCO1180 ps144 ps494 ps398 ps
CYINITCO2210 ps169 ps592 ps477 ps
CYINITCO3215 ps173 ps580 ps467 ps
CINCO0100 ps76 ps271 ps206 ps
CINCO156 ps45 ps157 ps127 ps
CINCO281 ps65 ps228 ps184 ps
CINCO349 ps39 ps114 ps92 ps
Table 3. Distribution of the number of detected differences at ambient temperature for two different TDLs’ configurations.
Table 3. Distribution of the number of detected differences at ambient temperature for two different TDLs’ configurations.
Number of ‘1s’2x Flip-Flop ConfigurationLatch + Flip-Flop Configuration
046.682%59.646%
115.561%20.775%
215.658%14.968%
310.456%4.194%
46.604%0.415%
53.192%0.001%
61.188%0.000%
70.502%0.000%
80.158%0.000%
90.000%0.000%
Table 4. Cumulative delays of cascaded CARRY4 elements in FAST_MIN process corner (reported interconnect delay between CARRY4 elements in the same clock region equals 0 ps).
Table 4. Cumulative delays of cascaded CARRY4 elements in FAST_MIN process corner (reported interconnect delay between CARRY4 elements in the same clock region equals 0 ps).
CARRY4 Element IndexCARRY4 Element OutputAccumulated DelaySetup Time ViolationHold Time Violation
0CO076 ps
CO145 ps
CO265 ps
CO339 ps
1CO0115 ps
CO184 ps
CO2104 ps
CO378 ps
2CO0154 ps
CO1123 ps
CO2143 ps
CO3117 ps
3CO0193 ps
CO1162 ps
CO2182 ps
CO3156 ps
4CO0232 ps
CO1201 ps
CO2221 ps
CO3195 ps
Table 5. Distribution of the number of the detected differences using TDLs lath+flip-flop configuration with 2 CARRY4 CO outputs used, at different temperatures.
Table 5. Distribution of the number of the detected differences using TDLs lath+flip-flop configuration with 2 CARRY4 CO outputs used, at different temperatures.
Number of ‘1s’Ambient
Temperature
<42.8 °C; 43.2 °C>
Freezing
Temperature
<−4.9 °C; −1.4 °C>
Heating Up from
Freezing Temperature
<21.4 °C; 30.3 °C>
059.646%57.697%59.130%
120.775%21.491%21.155%
214.968%15.648%14.910%
34.194%4.640%4.397%
40.415%0.510%0.406%
50.001%0.0137%0.001%
60.000%0.000%0.000%
Table 6. Measurements results and diagrams for configurations with common input pin and common clock region.
Table 6. Measurements results and diagrams for configurations with common input pin and common clock region.
Config. NumberConfig. NameSchematic Diagram and
Short Summary
Measurement Results at Ambient Temperature
5.1.1Common IBUFElectronics 12 04297 i001Electronics 12 04297 i002
  • Maximum number of the detected differences: 16.
  • The most common number of the detected differences is 0 (50.614% occurrences).
5.1.2Common IBUF + BUFGElectronics 12 04297 i003Electronics 12 04297 i004
  • Maximum number of the detected differences: 28.
  • The most common number of the detected differences is 0 (40.490% occurrences).
5.1.3Separate MMCM outputElectronics 12 04297 i005Electronics 12 04297 i006
  • Maximum number of the detected differences: 12.
  • The most common number of the detected differences is 0 (38.984% occurrences).
5.1.4Separate MMCM output + BUFGElectronics 12 04297 i007Electronics 12 04297 i008
  • Maximum number of the detected differences: 12.
  • The most common number of the detected differences is 0 (39.118% occurrences).
5.1.5Common IBUF + same clock region PLLElectronics 12 04297 i009Electronics 12 04297 i010
  • Maximum number of the detected differences: 22.
  • The most common number of the detected differences is 0 (35.665% occurrences).
5.1.6Common IBUF + same clock region PLL + BUFGElectronics 12 04297 i011Electronics 12 04297 i012
  • Maximum number of the detected differences: 12.
  • The most common number of the detected differences is 0 (44.912% occurrences).
5.1.7Common IBUF + BUFG + same clock region PLLElectronics 12 04297 i013Electronics 12 04297 i014
  • Maximum number of the detected differences: 15.
  • The most common number of the detected differences is 0 (42.565% occurrences).
5.1.8Common IBUF + BUFG + same clock region PLL + BUFGElectronics 12 04297 i015Electronics 12 04297 i016
  • Maximum number of the detected differences: 20.
  • The most common number of the detected differences is 0 (36.882% occurrences).
Table 7. Measurements results and diagrams for configurations with common input pin and a separate clock region.
Table 7. Measurements results and diagrams for configurations with common input pin and a separate clock region.
Config. NumberConfig. NameSchematic Diagram and
Short Summary
Measurement Results at Ambient Temperature
5.2.1Common IBUF + BUFG + separate clock region MMCM + BUFGElectronics 12 04297 i017Electronics 12 04297 i018
  • Maximum number of the detected differences: 19.
  • The most common number of the detected differences is 0 (38.537% occurrences).
5.2.2Common IBUF + BUFG + separate clock region PLL + BUFGElectronics 12 04297 i019Electronics 12 04297 i020
  • Maximum number of the detected differences: 18.
  • The most common number of the detected differences is 0 (39.225% occurrences).
Table 8. Measurements results and diagrams for configurations with a separate input pin and a single common clock region.
Table 8. Measurements results and diagrams for configurations with a separate input pin and a single common clock region.
Config. NumberConfig. NameSchematic Diagram and
Short Summary
Measurement Results at Ambient Temperature
5.3.1Separate IBUF (clock-capable pin in the same clock region)Electronics 12 04297 i021Electronics 12 04297 i022
  • Maximum number of the detected differences: 148.
  • The most common number of the detected differences is 0 (48.333% occurrences).
5.3.2Separate IBUF (clock-capable pin in the same clock region) + BUFGElectronics 12 04297 i023Electronics 12 04297 i024
  • Maximum number of the detected differences: 61.
  • The most common number of the detected differences is 0 (46.363% occurrences).
5.3.3Separate IBUF (clock-capable pin in the same clock region) + same clock region PLLElectronics 12 04297 i025Electronics 12 04297 i026
  • Maximum number of the detected differences: 16.
  • The most common number of the detected differences is 0 (43.463% occurrences).
5.3.4Separate IBUF (clock-capable pin in the same clock region) + BUFG + same clock region PLLElectronics 12 04297 i027Electronics 12 04297 i028
  • Maximum number of the detected differences: 13.
  • The most common number of the detected differences is 0 (47.636% occurrences).
5.3.5Separate IBUF (clock-capable pin in the same clock region) + same clock region PLL + BUFGElectronics 12 04297 i029Electronics 12 04297 i030
  • Maximum number of the detected differences: 24.
  • The most common number of the detected differences is 0 (35.111% occurrences).
5.3.6Separate IBUF (clock-capable pin in the same clock region) + BUFG + same clock region PLL + BUFGElectronics 12 04297 i031Electronics 12 04297 i032
  • Maximum number of the detected differences: 14.
  • The most common number of the detected differences is 0 (44.056% occurrences).
5.3.7Separate IBUF (non-clock-capable pin in the same clock region)Electronics 12 04297 i033Electronics 12 04297 i034
  • Maximum number of the detected differences: 34.
  • The most common number of the detected differences is 0 (50.376% occurrences).
5.3.8Separate IBUF (non-clock-capable pin in the same clock region) + BUFGElectronics 12 04297 i035Electronics 12 04297 i036
  • Maximum number of the detected differences: 40.
  • The most common number of the detected differences is 0 (47.387% occurrences).
5.3.9Separate IBUF (non-clock-capable pin in the same clock region) + same clock region PLLElectronics 12 04297 i037Electronics 12 04297 i038
  • Maximum number of the detected differences: 22.
  • The most common number of the detected differences is 0 (36.021% occurrences).
5.3.10Separate IBUF (non-clock-capable pin in the same clock region) + BUFG + same clock region PLLElectronics 12 04297 i039Electronics 12 04297 i040
  • Maximum number of the detected differences: 13.
  • The most common number of the detected differences is 0 (47.764% occurrences).
5.3.11Separate IBUF (non-clock-capable pin in the same clock region) + same clock region PLL + BUFGElectronics 12 04297 i041Electronics 12 04297 i042
  • Maximum number of the detected differences: 14.
  • The most common number of the detected differences is 0 (42.387% occurrences).
5.3.12Separate IBUF (non-clock-capable pin in the same clock region) + BUFG + same clock region PLL + BUFGElectronics 12 04297 i043Electronics 12 04297 i044
  • Maximum number of the detected differences: 18.
  • The most common number of the detected differences is 0 (40.851% occurrences).
Table 9. Measurement results and diagrams for configurations with a separate input pin (in common I/O bank) and a separate clock region.
Table 9. Measurement results and diagrams for configurations with a separate input pin (in common I/O bank) and a separate clock region.
Config. NumberConfig. NameSchematic Diagram and
Short Summary
Measurement Results at Ambient Temperature
5.4.1Separate IBUF (clock-capable pin in the same clock region) + BUFG + different clock region MMCM + BUFGElectronics 12 04297 i045Electronics 12 04297 i046
  • Maximum number of the detected differences: 16.
  • The most common number of the detected differences is 0 (42.397% occurrences).
5.4.2Separate IBUF (clock-capable pin in the same clock region) + BUFG + different clock region PLL + BUFGElectronics 12 04297 i047Electronics 12 04297 i048
  • Maximum number of the detected differences: 13.
  • The most common number of the detected differences is 0 (47.857% occurrences).
5.4.3Separate IBUF (non-clock-capable pin in the same clock region) + BUFG + different clock region MMCM + BUFGElectronics 12 04297 i049Electronics 12 04297 i050
  • Maximum number of the detected differences: 15.
  • The most common number of the detected differences is 0 (44.335% occurrences).
5.4.4Separate IBUF (non-clock-capable pin in the same clock region) + BUFG + different clock region PLL + BUFGElectronics 12 04297 i051Electronics 12 04297 i052
  • Maximum number of the detected differences: 15.
  • The most common number of the detected differences is 0 (43.118% occurrences).
Table 10. Measurements results and diagrams for configurations with a separate input pin (in different I/O bank) and a separate clock region.
Table 10. Measurements results and diagrams for configurations with a separate input pin (in different I/O bank) and a separate clock region.
Config. NumberConfig. NameSchematic Diagram and
Short Summary
Measurement Results at Ambient Temperature
5.5.1Separate IBUF (clock-capable pin in X0Y1 clock region)Electronics 12 04297 i053Electronics 12 04297 i054
  • Maximum number of the detected differences: 11.
  • The most common number of the detected differences is 0 (55.176% occurrences).
5.5.2Separate IBUF (clock-capable pin in X0Y1 clock region) + BUFGElectronics 12 04297 i055Electronics 12 04297 i056
  • Maximum number of the detected differences: 12.
  • The most common number of the detected differences is 0 (54.273% occurrences).
5.5.3Separate IBUF (clock-capable pin in X0Y1 clock region) + X0Y1 clock region MMCM + BUFGElectronics 12 04297 i057Electronics 12 04297 i058
  • Maximum number of the detected differences: 24.
  • The most common number of the detected differences is 0 (35.465% occurrences).
5.5.4Separate IBUF (clock-capable pin in X0Y1 clock region) + BUFG + X0Y1 clock region MMCM + BUFGElectronics 12 04297 i059Electronics 12 04297 i060
  • Maximum number of the detected differences: 18.
  • The most common number of the detected differences is 0 (41.317% occurrences).
5.5.5Separate IBUF (clock-capable pin in X0Y1 clock region) + X0Y1 region PLL + BUFGElectronics 12 04297 i061Electronics 12 04297 i062
  • Maximum number of the detected differences: 11.
  • The most common number of the detected differences is 0 (48.355% occurrences).
5.5.6Separate IBUF (clock-capable pin in X0Y1 clock region) + BUFG + X0Y1 clock region PLL + BUFGElectronics 12 04297 i063Electronics 12 04297 i064
  • Maximum number of the detected differences: 11.
  • The most common number of the detected differences is 0 (49.297% occurrences).
5.5.7Separate IBUF (non-clock-capable pin in X0Y1 clock region)Electronics 12 04297 i065Electronics 12 04297 i066
  • Maximum number of the detected differences: 12.
  • The most common number of the detected differences is 0 (54.666% occurrences).
5.5.8Separate IBUF (non-clock-capable pin in X0Y1 clock region) + BUFGElectronics 12 04297 i067Electronics 12 04297 i068
  • Maximum number of the detected differences: 11.
  • The most common number of the detected differences is 0 (52.228% occurrences).
5.5.9Separate IBUF (non-clock-capable pin in X0Y1 clock region) + X0Y1 clock region MMCM + BUFGElectronics 12 04297 i069Electronics 12 04297 i070
  • Maximum number of the detected differences: 10.
  • The most common number of the detected differences is 0 (49.954% occurrences).
5.5.10Separate IBUF (non-clock-capable pin in X0Y1 clock region) + BUFG + X0Y1 clock region MMCM + BUFGElectronics 12 04297 i071Electronics 12 04297 i072
  • Maximum number of the detected differences: 13.
  • The most common number of the detected differences is 0 (45.351% occurrences).
5.5.11Separate IBUF (non-clock-capable pin in X0Y1 clock region) + X0Y1 clock region PLL + BUFGElectronics 12 04297 i073Electronics 12 04297 i074
  • Maximum number of the detected differences: 12.
  • The most common number of the detected differences is 0 (46.335% occurrences).
5.5.12Separate IBUF (non-clock-capable pin in X0Y1 clock region) + BUFG + X0Y1 clock region PLL + BUFGElectronics 12 04297 i075Electronics 12 04297 i076
  • Maximum number of the detected differences: 15.
  • The most common number of the detected differences is 0 (46.019% occurrences).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wojciechowski, A.A.; Marcinek, K.; Pleskacz, W.A. Relative Jitter Measurement Methodology and Comparison of Clocking Resources Jitter in Artix 7 FPGA. Electronics 2023, 12, 4297. https://doi.org/10.3390/electronics12204297

AMA Style

Wojciechowski AA, Marcinek K, Pleskacz WA. Relative Jitter Measurement Methodology and Comparison of Clocking Resources Jitter in Artix 7 FPGA. Electronics. 2023; 12(20):4297. https://doi.org/10.3390/electronics12204297

Chicago/Turabian Style

Wojciechowski, Andrzej A., Krzysztof Marcinek, and Witold A. Pleskacz. 2023. "Relative Jitter Measurement Methodology and Comparison of Clocking Resources Jitter in Artix 7 FPGA" Electronics 12, no. 20: 4297. https://doi.org/10.3390/electronics12204297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop