High-Precision Digital-to-Time Converter with High Dynamic Range for 28 nm 7-Series Xilinx FPGA and SoC Devices

Garzetti, Fabio; Lusardi, Nicola; Corna, Nicola; Fiumicelli, Gabriele; Cattaneo, Federico; Bonanno, Gabriele; Costa, Andrea; Ronconi, Enrico; Geraci, Angelo

doi:10.3390/electronics13234825

Open AccessArticle

High-Precision Digital-to-Time Converter with High Dynamic Range for 28 nm 7-Series Xilinx FPGA and SoC Devices

by

Fabio Garzetti

^†

,

Nicola Lusardi

^*,†

,

Nicola Corna

,

Gabriele Fiumicelli

,

Federico Cattaneo

,

Gabriele Bonanno

,

Andrea Costa

,

Enrico Ronconi

and

Angelo Geraci

DEIB (Dipartimento di Elettronica, Informazione e Bioingegneria), Politecnico di Milano, Via Golgi 40, 20133 Milano, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(23), 4825; https://doi.org/10.3390/electronics13234825

Submission received: 11 October 2024 / Revised: 29 November 2024 / Accepted: 4 December 2024 / Published: 6 December 2024

(This article belongs to the Special Issue Feature Papers in Circuit and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Over the last ten years, the need for high-resolution time-domain digital signal production has grown exponentially. More than ever, applications call for a digital-to-time converter (DTC) that is extremely accurate and precise. Skew compensation and camera shutter operation represent just a few examples of such applications. The advantages of adopting a flexible and rapid time-to-market strategy focused on fast prototyping using programmable logic devices—such as field-programmable gate arrays (FPGAs) and system-on-chip (SoC)—have become increasingly evident. These benefits outweigh those of performance-focused yet expensive application-specific integrated circuits (ASICs). Despite the availability of various architectures, the high non-recurring engineering (NRE) costs make them unsuitable for low-volume production, especially in research or prototyping environments. To address this trend, we introduce an innovative DTC IP-Core with a resolution, also known as least significant bit (LSB), of 52 ps, compatible with all Xilinx 7-Series FPGAs and SoCs. Measurements have been performed on a low-end Artix-7 XC7A100TFTG256-2, guaranteeing a jitter lower than 50 ps r.m.s. and offering a high dynamic range up to 56 ms. With resource utilization below 1% and a dynamic power dissipation of 285 mW for our target FPGA, the design maintains excellent differential and integral nonlinearity errors (DNL/INL) of 1.19 LSB and 1.56 LSB, respectively.

Keywords:

digital-to-time converter (DTC); pulse generator; delay line (DL); programmable delay line (PDL); field-programmable gate array (FPGA); system-on-chip (SoC)

1. Introduction

Programmable time delays with resolutions in the ps range are required in many applications. These applications involve skew compensation [1], accurate timing of opening, like for camera shutters [2], and device/detector testing and characterization [3,4,5]. These are but a few of the numerous applications where a precise digital-to-time converter (DTC) is necessary [6].

DTCs create an output signal whose duration is directly proportionate to the input value from a digital input code. These devices often have to perform a careful trade-off between merits. They aim for ps-level resolution, also known as least significant bit (LSB), and jitter, which allows the generation of very short and precise delays. However, in order to produce the greatest range of signal values, they strive for a wider full-scale range (FSR) with low differential and integral nonlinearity errors (DNL/INL). These crucial characteristics frequently conflict with one another [7,8].

In the scientific literature and on the market, numerous architectures of DTCs exist, most of which are designed for application-specific integrated circuits (ASICs) [9], which, although offering excellent performance, are characterized by high time-to-market and non-recurring engineering (NRE) costs. This makes such solutions difficult to apply in fast-prototyping and research contexts where only a few units are required. In order to tackle this, we suggest a programmable logic solution compatible with a Xilinx 28 nm 7-Series field-programmable gate array (FPGA) and system-on-chip (SoC). This solution maintains flexibility and occupies a low area while delivering a resolution (LSB) of 52 ps, a full-scale range (FSR) up to 56 ms, and high VT stability, ensuring interoperability with various programmable logic devices.

In the scientific literature, there are multiple FPGA-based DTC architectures.

A simple counter (also known as a timer) is the simplest and most compact DTC that can be built; it is characterized by a resolution equal to the clock period, a wide FSR, determined by the number of bits, and low jitter (i.e., a few ps). Pure synchronous logic’s limitation to the clock’s resolution is its main flaw. Our constraints are a result of oscillator performance, which for FPGA devices usually maxes out at about 1 GHz. Regarding the 28 nm Xilinx 7-Series devices, we have 630 MHz (i.e., resolution less than 1.6 ns) for the low-end Artix-7 FPGAs and the Zynq-7000 SoCs up to the 7020 model, while we can go up to 800 MHz (i.e., resolution less than 1.25 ns) for the Kintex-7, Virtex-7, and Zynq-7000 from the 7030 model onwards [10]. Using N clocks with

2 π / N

phase separation and all operating at the same frequency is one method to exceed this limitation. The system resolution is improved by a factor of N using this technique, known in the scientific literature as N-clock synchronous logic [11]. However, clock buffers and routing resources, on the other hand, impose restrictions on FPGA and SoC designs, limiting the maximum number of clocks. Regarding the 28 nm Xilinx 7-Series devices, we have a maximum of 32 clock buffers in a single FPGA/SoC with a limit of up to 10 clock lines per region [10], making resolutions on the order of 100 ps possible. Clock networks also add a lot to the total dynamic power consumption, which reduces the power efficiency of this approach.

Most high-resolution DTC solutions are based on the concept of a (digital) programmable delay line (PDL). These DTCs are characterized by low jitter (i.e., a few ps), high resolution (i.e., a few ps), and good linearity, but limited FSR. Typically, delay lines (DLs) are built with a series of buffers, sometimes referred to as “taps” or “bins”. A simpler PDL can be made by connecting each buffer’s output to a multiplexer’s input (see Figure 1 for an example). This makes it possible to control the circuit’s delay in real time. So, the propagation delay of the tap defines the LSB and the total delay the FSR.

PDL FPGA-based solutions are implemented by connecting several look-up tables (LUTs) in series [12], or by using carry propagation chains (i.e., CARRY) available within the FPGA fabric [13] or in digital signal processing (DSP) modules [7]. Regardless of the type of buffer chosen, since FPGAs are not optimized to have logic elements with identical propagation delays, these PDLs require a calibration mechanism to estimate the delay introduced by each tap. This is crucial to achieve high resolution while keeping DNL and INL errors low. Moreover, as FPGAs do not feature automatic stabilization mechanisms for propagation delay in response to temperature and voltage fluctuations, the jitter and resolution provided by the PDL experiences significant variations depending on the operating temperature and voltage [7]. The choice of buffer type is usually made by balancing performance, resource availability, and practicality. In fact, DSP-based PDLs are much more effective in terms of jitter and resolution compared to those based on carry chains and LUTs, but DSPs are far less abundant.

Regardless of the buffer’s nature, this architecture exhibits a direct dependency between area utilization, resolution, jitter, and FSR. To increase the dynamic range while maintaining the same resolution, it is necessary to increase the number of buffers, which results in a larger multiplexing mechanism and a more complex and prolonged calibration process [13,14]. On the other hand, if one aims to increase the FSR while keeping the area constant, the tap must be slowed down (i.e., the LSB must be increased), thereby degrading the system’s resolution. Additionally, the jitter between the input and output signals (i.e.,

σ_{T O T}

) increases as the number of taps traversed grows, because the signal passing through the PDL accumulates the jitter introduced by each tap. In this regard, two main trends have been observed. Specifically, in a PDL with N taps, where

σ_{t}

denotes the jitter of a single buffer, a total jitter of

\sqrt{n} \cdot σ_{t}

is observed in [15] and

n \cdot σ_{t}

in [12], where

n \in [1; N]

is the number of buffers the signal traverses, determined by the multiplexer’s selection input.

To mitigate area utilization while maintaining high resolution, techniques based on ring oscillators have been proposed in the literature, specifically the Vernier Delay-Locked Loop (VDLL) [16]. In this architecture, the time delay is generated as the differential delay between the edges of two ring oscillators operating at different frequencies. This provides a very compact and high-resolution solution; however, due to the nature of ring oscillators, it is highly susceptible, more than PDL, to voltage and temperature (VT) variations.

To mitigate VT variations and increase resolution, at the expense of system simplicity (area utilization and calibration), systems based on Programmable Vernier Delay Lines (PVDL) have been proposed [17]. These are solutions where the generated delay is obtained as the time difference between two PDLs, each characterized by a different propagation delay. In this way, the differential nature of the approach helps limit the dispersions caused by VT fluctuations, and the LSB is equal to the difference in the propagation delays of the taps.

To eliminate the trade-offs related to FSR, techniques rooted in Nutt interpolation are used [18]. This approach involves pairing a fine DTC (e.g., PDL, VDLL, PVDL), which is highly precise but has a limited FSR, with a coarse counter. Thus, the total delay (i.e.,

T_{n u t t}

) is the sum of a fine delay (i.e.,

T_{f i n e}

) and a coarse delay (i.e.,

T_{c o a r s e}

); i.e.,

T_{n u t t} = T_{c o a r s e} + T_{f i n e}

[14,19].

As Figure 2 shows, the system starts an n-bit digital counter clocked at

c l k

(i.e., with

T_{C L K}

as period) and loads a digital comparator with the “coarse” part of the desired delay (i.e.,

T_{c o a r s e}

). Concurrently, the “fine” segment (i.e.,

T_{f i n e}

) is utilized to configure the fine DTC (e.g., PDL). When the counter hits the designated

t h r e s h o l d

, a

s y n c h r o n o u s_o u t

, characterized with a delay

T_{c o a r s e} = t h r e s h o l d \cdot T_{C L K}

refereed to the

s t a r t

(i.e., count reset), is generated and forwarded to the fine DTC (e.g., PDL), obtaining

a s y n c h r o n o u s_o u t

characterized with a total delay of

T_{n u t t} = t h r e s h o l d \cdot T_{C L K} + T_{f i n e}

.

a s y n c h r o n o u s_o u t

has the same FSR of

s y n c h r o n o u s_o u t

(i.e., coarse counter,

2^{n} \cdot T_{C L K}

) with the resolution of the fine DTC (e.g., PDL).

The goal of this paper is to present a DTC compatible with all programmable logic solutions (i.e., FPGA and SoC) at 28 nm from the Xilinx 7-Series, offering high performance. The proposed solution does not require a calibration mechanism or specific component placement, resulting in a streamlined, simple, and compact structure (i.e., 348 LUTs and 550 flip-flops). The design is characterized by good linearity, immunity to PVT variations, and the elimination of key trade-offs such as jitter vs. FSR and area vs. FSR.

The paper is organized as follows: the proposed architecture is presented in Section 2, while Section 3 focuses on the experimental validation using a low-end Artix-7 XC7A100FTFG256-2, achieving a jitter lower than 50 ps r.m.s., a DNL of 1.19 LSB, an INL of 1.56 LSB, and an average dynamic power dissipation of 285 mW. Finally, a comparison with other academic works and commercial solutions is presented in Section 4.

2. Hardware Implementation

The proposed architecture adopts Nutt interpolation [18] and combines dual-clock synchronous coarse logic with a PDL asynchronous fine logic.

Every I/O block in Xilinx 28 nm 7-Series FPGAs and SoCs has an adjustable PDL primitive called IDELAYE2 [20]. This primitive can be used on signals coming from the FPGA logic as well as combinational and registered input signals.

These primitives have a tap delay adjusted (i.e., IDELAYCTRL [21]) for fluctuations in process, voltage, and temperature (PVT), and are implemented as 32-tap wraparound PDLs. A reference clock is needed as input for IDELAYCTRL in order to guarantee precise calibration. This primitive’s basic mechanism splits the calibration clock period (i.e.,

T_{I D E L A Y C T R L}

) into 64 steps (i.e.,

T_{I D E L A Y C T R L} / (2 \cdot 2^{5})

) and a signal can be delayed by 32 steps (i.e., from 0 to 31), achieving a maximum delay of half of the period (i.e.,

T_{I D E L A Y C T R L} / 2

). As such, the frequency of the clock applied to the primitive itself directly affects the delay values. The tap resolution varies depending on the calibration clock frequency because of the underlying process. Table 1 lists the calibration frequencies (i.e.,

f_{I D E L A Y C T R L} = 1 / T_{I D E L A Y C T R L}

) that are available as well as the delays that relate to them.

It is feasible to separate the proposed architecture into a dual-clock synchronous logic (i.e., coarse) and an asynchronous (i.e., fine) part based on IDELAYE2 and IDELAYCTRL.

The main difficulty in dual-clock logic lies in deriving the 180°-phase-shifted clocks while ensuring low jitter between them, considering that a double-data-rate (DDR) approach is not always feasible due to the dispersion and duty cycle fluctuations that affect most commercial clock sources. Phase-locked loops (PLLs) are the standard circuits used to create clocks. Unfortunately, the PLLs hosted inside the Xilinx 7-Series FPGAs and SoCs cause several hundred picoseconds of jitter, which is more than IDELAYE2’s resolution (Table 1). Our suggested fix is to apply the “clock gating” method using the Xilinx primitive known as BUFGCE [10] in order to avoid the use of high-performance PLLs external to the FPGA device.

Clock division is made possible via the BUFGCE primitive, which functions as a buffer with a clock enable (CE) input. Figure 3 shows the proposed solution to convert an input clock signal (i.e.,

C L K_{I N}

) at frequency

f_{i n}

into two output clock signals (i.e.,

C L K_{O U T}^{0}

and

C L K_{O U T}^{180}

) with frequency

f_{o u t}

, where

f_{o u t} = f_{i n} / 2,

shifted by 180° and characterized by a duty cycle of 25%.

f_{i n}

is connected to a 2-bit circular buffer, where the two type-D flip-flops (DFFs) store “1” and “0” [22,23]; the two DFFs’ outputs are used to toggle the CE inputs of the BUFGCE (i.e.,

C E_{C L K}^{0}

and

C E_{C L K}^{180}

) connected to

f_{i n}

.

Referring to the dual-clock synchronous coarse logic shown in Figure 4, we generate a high-speed n-bit counter using

C L K_{O U T}^{0}

to clock the

n - 1

most significant bits (i.e.,

c o u n t_h

), while

C L K_{O U T}^{180}

clocks the least significant bit (i.e.,

c o u n t_l

). The comparison with the threshold value (i.e.,

t h_r e a c h e d_h

for

c o u n t_h

and

t h_r e a c h e d_l

for

c o u n t_l

) happens independently in the two clock domains (i.e.,

t h_r e a c h e d_h_r e g

for

C L K_{O U T}^{0}

and

t h_r e a c h e d_l_r e g

for

C L K_{O U T}^{180}

); so, a coarse DTC output signal with a resolution of

1 / f_{i n}

and an FSR of

2^{n} / f_{i n}

(i.e.,

s y n c h r o n o u_o u t

) is generated, combining

t h_r e a c h e d_h_r e g

in and with

t h_r e a c h e d_l_r e g

.

The final architecture, depicted in Figure 5, is obtained by forwarding the

1 / f_{i n}

resolution DTC signal (i.e., th_reached), provided by the dual-clock synchronous coarse logic, to the IDELAYE2 primitive that uses

C L K_{O U T}^{0}

for the IDELAYCTRL. In this manner, the IDEALYE2 acts as a fine interpolator, allowing for an increase in resolution up to

1 / (32 \cdot f_{i n})

; that is,

1 / (2 \cdot 2^{5} \cdot f_{o u t})

with

f_{i n} = 2 \cdot f_{o u t} .

3. Experimental Validation

The following Section describes the experimental validation. Section 3.1 presents the measurement setup, Section 3.2 covers the experimental results of the DTC described in Section 2, while Section 3.3 outlines the performance of the simple IDELAYE2-based PDL-TDC used as a benchmark.

3.1. Experimental Setup

In order to fully evaluate the performance metrics of the proposed architecture, we utilize a LeCroy WaveRunner digital oscilloscope (DO) and a customized FELIX time-to-digital converter (TDC) provided by TEDIEL S.r.l. (via Alberico Albricci, 8, Milan, 20133, Italy) [24,25] as our two main tools for validating our design. The DO and TDC operate at a precision below 2 ps r.m.s. and 12 ps r.m.s., respectively, which is negligible compared to what the projected DTC is expected to achieve. In fact, knowing that the precision of the TDC and DO impacts the jitter measurement quadratically, and considering that jitter in the range of 25–50 ps r.m.s. was measured, we note that both

12^{2}

ps² (TDC) and

2^{2}

ps² (DO) are negligible compared to

25^{2}

ps² and

50^{2}

ps².

The experimental tests, shown in Figure 6, Figure 7 and Figure 8, were performed using the FELIX board, as it allows automated measurements through direct interfacing with Python scripts. The DO, on the other hand, was used for cross-checking, occasionally verifying the measurements acquired by the TDC. Specifically, Python scripts were used to send the delays to be executed (d) to the DTC, while the TDC measured them and sent the average measured delay (T) and its standard deviation (i.e., jitter,

σ

) to the same personal computer (PC). This approach enabled the characterization of both the jitter measured as a function of the generated delay,

σ

vs. d or

σ (d)

(Figure 6), and the average measured delay as a function of the generated one, T vs. d or

T (d)

, thus allowing the creation of DNL curves (i.e.,

D N L (d) = T (d) - T (d - 1) - L S B

) and INL curves (i.e.,

I N L (d) = \sum_{i = 0}^{i = d} D N L (i)

) (Figure 7 and Figure 8). Figure 9 shows the experimental setup diagram, including the DTC board, the FELIX TDC, and the PC.

3.2. Measurement

A single channel of our proposed design architecture at 25 bit of coarse logic clocked at 600 MHz has been successfully implemented on the Artix-7 XC7A100TFTG256-2 FPGA occupying, in terms of area, only 550 DFFs, 348 look-up tables (LUTs), 1 IDEALYCTRL, and 1 IDELAYE2. Remarkably, due to the efficient use of resources (i.e., 1% of the Artix-7 XC7A100TFTG256-2 FPGA), it becomes feasible to incorporate hundreds of fully independent channels on the same chip. Alternatively, this efficient utilization can create additional space for other purposes.

By looking at the information in Table 2, we can see the power usage characteristics for a single DTC channel. Dynamic power, that is firmware dependent, accounts for about three-quarters of the power consumption (i.e., 285 mW). The existence of many high-frequency-switching components, which are essential in forming the pulse waveform, justifies this phenomenon. A static power, that is device dependent, of 109 mW was measured. These values were obtained using the Xilinx Power Estimator (XPE) tool with default settings (i.e., ambient temperature of 25 °C and 12.5% input switching activity (i.e., meaning an average input pin rate equal to 12.5% of the clock frequency of the respective domain)) and a clock frequency of 300 MHz.

In terms of performance, we opted to use 300 MHz as

f_{I D E L A Y C T R L}

, that means an LSB of 52 ps. This decision ensures compatibility across a wide spectrum of FPGA devices, ranging from the lower-tier Artix-7 and Zynq-7000 (up to 7020 model) to the higher-tier Kintex-7, Virtex-7, and Zynq-7000 (from 7030 model) families [10]. Consequently, we can expect similar trends in the performance metrics across these families. However, the only metrics that heavily depend on the specific device are related to static power dissipation.

The jitter of the DTC signal as a function of the selected delay is shown in Figure 6; it is between 25 and 50 ps r.m.s. with an average value of 37 ps r.m.s. DNL (1.19 LSB) and INL (1.56 LSB) errors as function of the selected delay are shown in Figure 7 and Figure 8. All measurements were performed at 25 °C.

The system’s robustness was finally evaluated in terms of the FPGA’s PVT variations. Regarding temperature dependence, a 9% increase in jitter was observed within the 25–75 °C range, while a 10 ps r.m.s. increase in jitter was noted when varying the supply voltage by 200 mV (the maximum allowed by Xilinx). No significant process variations were observed when switching between different FPGA models. Finally, the robustness against system clock variations was assessed. Since the system clock serves as a reference, any changes are reflected one to one in performance. Thanks to the presence of the IDELAYCTRL block, the system operates correctly with clock variations of up to ±10 MHz, exhibiting only a small LSB fluctuation of ±1.8 ps.

3.3. IDELAYE2-Based PDL-DTC Benchmark

The performance of a simple IDELAYE2-based PDL-DTC, implemented by cascading multiple IDELAYE2 modules (as shown in Figure 10), has been evaluated and used as a comparison with the proposed DTC architecture (i.e., Nutt-interpoated DTC with dual-clock coarse counter and a fine IDELAYE2-based PDL) to verify improvements in the trade-offs between jitter and FSR, as well as between FSR and resource consumption, and to achieve a better balance in resource utilization, which is essential for multichannel extension.

The tests were conducted by cascading 300 IDELAYE2 blocks (i.e., 100% of the available ones considering the Artix-7 XC7A100TFTG256-2 FPGA), enabling an FSR of 500 ns and an LSB of 52.083 ps. Figure 11, Figure 12 and Figure 13 show the jitter, DNL, and INL behavior in the range

0 \div 960

L S B

, respectively, as a function of the generated delay, which should be compared with Figure 6, Figure 7 and Figure 8.

Regarding jitter, a clear trade-off between jitter and FSR is observed, with a linear increase of 0.37 ps r.m.s per ns of generated delay (i.e., 0.37 ps r.m.s./ns), leading to jitter exceeding the LSB for delays greater than 225 ns. Moreover, fully utilizing the available IDELAYs in the Artix-7 XC7A100TFTG256-2 FPGA (i.e., 300) would result in a maximum FSR of only 500 ns, with a corresponding jitter of approximately 185 ps r.m.s.. Regardless of the number of IDELAYE2 modules cascaded, the PDL-DTC requires four LUTs and 332 DFFs for control logic, indicating a significant imbalance in resource utilization. A dynamic power consumption of 125 mW is measured.

Furthermore, a better DNL (i.e., 87% of the LSB) and a worse INL (i.e., 360% of the LSB) with respect to the proposed solution is observed.

Table 3 presents a comparison between the proposed DTC and the IDELAYE2-based PDL-DTC benchmark. It can be observed that the proposed solution is better suited for multichannel applications requiring an FSR greater than tens of nanoseconds.

4. State of the Art and Discussion

In the field of DTC research, both scholarly and commercial options are available these days.

Regarding the academic works, a clear tendency quickly becomes apparent: solutions frequently sacrifice FSR in order to achieve the best resolution with the least amount of jitter. For example, an LSB of 20 ps was obtained in [13], which is in close agreement with the 14.2 ps presented in [7]. But both pieces struggle with dynamic ranges that are limited to hundreds of ps or, in the best-case scenario, ns. Families of one-shot devices, such as Dallas Semiconductor’s [26], are common in the market sector. The device with the highest resolution among them creates pulses between 5 ns and 15 ns, and the one with the widest FSR produces pulses between 100 ns and 500 ns. Furthermore, the fact that each device in this series only has five preset pulse-width possibilities is a major constraint. Table 4 displays the main academic solutions based on programmable logic devices (i.e., FPGAs and SoCs) compared to the proposed solution.

The use of IDELAY2 as a PDL eliminates the need for external calibration systems, as required in [7,13,19], or specific primitive placement [12,14]. This enables a simple and compact system (i.e., 550 DFFs, 348 LUTs, and 1 IDELAY2), free from any dependency between resolution and FSR [13,14], while maintaining stability under PVT variations and ensuring good linearity (i.e., DNL/INL of 1.19/1.56 with LSB of 52 ps). The linearity is fully comparable to that of calibrated PDL-based solutions: [7] (i.e., DNL/INL of 3.19/7.11 with LSB of 9.1 ps), [13] (i.e., DNL/INL of 22.08/19.63 with LSB of 14.2 ps), and [19] (i.e., DNL/INL of 3.95/6.2 with LSB of 20 ps), although slightly inferior to VDLL [16] (i.e., DNL/INL of 0.24/0.02 with LSB of 38.6 ps) and PVDL [17] (i.e., DNL/INL of 0.17/0.62 with LSB of 1.02 ps) solutions, which, being based on “Vernier” techniques, are more complex. To eliminate the dependency between FSR and jitter, characteristic of PDLs in general [12] (i.e., 7 ps r.m.s. for a few ps of delay and 165 ps r.m.s. for a few units of ns), and of IDELAY specifically (Section 3.3), Nutt interpolation was adopted. This approach allowed the jitter to be kept below the LSB (i.e, in the range between 25 and 50 ps r.ms.) while ensuring an FSR of up to 58 ms. Furthermore, the use of Nutt interpolation, as seen in [19] (i.e., jitter of 20 ps r.m.s. in an FSR of 33 μs) and [14] (i.e., jitter of 35 ps r.m.s. in an FSR of 57.3 ns), enabled the trade-off between area occupation and FSR to be overcome, contributing to the compactness of the system.

With reference to the DSP-based PDLs presented in [7], only the implementation with an FSR of 10.9 ns provides an operating range suitable for practical applications. Both DSP-based PDLs deliver excellent performance in terms of jitter and resolution, superior to those of our solution. However, from the perspective of area usage, the use of IDELAYE2 primitives, which are more abundant than DSP resources, as proposed in our solution, proves to be a better choice. Specifically, considering the FPGA Artix-7 XC7A100TFTG256-2 as a target (126,800 DFFs, 63,400 LUTs, 300 IDELAYs, 240 DSPs), it would only be possible to implement 15 DSP-based PDLs, utilizing 100% of the DSPs (a scarce and valuable resource). In contrast, our solution enables the implementation of 182 channels, with LUTs (a less critical resource) being the limiting factor.

Regarding the CARRY-based PDL presented in [13], while it offers excellent jitter and LSB performance, it exhibits significant nonlinearity, caused by the propagation delay inconsistencies within the CARRY blocks. The same issue is found in the PDL discussed in [14].

Moving to the “Vernier” architecture, the VDLL solution described in [16] achieves similar performance in terms of jitter and LSB but performs worse in terms of area utilization compared to the solution presented in this work. Specifically, the substantial imbalance between the LUT and DFF requirements for implementing the VDLL limits the maximum number of channels to 94 on the Artix-7 XC7A100TFTG256-2, consuming 100% of the LUTs but only 11% of the DFFs. While the PVDL proposed in [17] offers an excellent balance between resolution, jitter, nonlinearity, and FSR, but it requires highly complex manual placement to meet the timing constraints necessary for its functionality. Conversely, our solution does not impose any specific place-and-route constraints, delegating these tasks to the compiler. This approach simplifies the firmware and facilitates multichannel scalability.

The architecture proposed in [19] also achieves a good balance between resolution, jitter, nonlinearity, and FSR. However, the DCM-based PDL limits the maximum operational rate to only 2 MHz, a restriction not present in our proposed structure, which allows the delays to be generated sequentially without similar constraints.

For a fairer comparison of dynamic power consumption, it was considered appropriate to normalize the dynamic power (i.e.,

\sim f C V_{D D}^{2}

) to the corresponding clock frequencies (i.e., 300 MHz for the proposed work, 25 MHz for [7], and 200 MHz for [17]), thereby estimating the average energy dissipated in each clock cycle (i.e.,

\sim C V_{D D}^{2}

). Naturally, this will depend on the number of resources used in the circuit (i.e., area), the parasitic capacitance C (depending on the technology node), and the core supply voltage

V_{D D}

of the FPGA (i.e., 1 V for the proposed work, 1.2 V for [7], and 1 V for [17]). In this regard, an energy consumption of 0.95 mW/MHz (i.e., 0.95 nJ) is obtained for the 28 nm system we present, 3.4 mW/MHz (i.e., 3.4 nJ) [7] in a 45 nm system, and 0.825 mW/MHz (i.e., 0.825 nJ [17]) in a 40 nm system.

The architecture proposed here is compatible with all FPGAs equipped with the IDELAYE2 primitive, specifically all 28 nm Xilinx 7-Series FPGA and SoC devices, and can be easily migrated to any devices that feature an equivalent (i.e., IODELAYE1, IDELAYE3, IDELAYE5) present in Xilinx devices from the 40/45 nm 6-Series (i.e., IDELAYE1, which have slightly lower performance than IDELAYE2), 20/16 nm UltraScale/UltraScale+ (i.e., IDELAYE3 [27], which have approximately 10 times better performance than IDELAYE2), and 7 nm Versal (i.e., IDELAYE5 [28], which also have approximately 10 times better performance than IDELAYE2). Migration to devices from Intel/ALTERA [29], Lattice Semiconductor [30], and Microsemi [31] is more complex. In fact, Intel/ALTERA does not have a direct delay block equivalent to Xilinx’s IDELAY, but their FPGA architecture includes flexible modules for I/O line delay through PLLs and clock adjustment using integrated resources such as IOE (Input–Output Element) modules, which can be configured to support adjustable delays. Lattice offers more compact solutions with their ECP5 and iCE40 FPGA series, which support configurable delay modules. However, these solutions are typically aimed at low-power designs and do not offer the same level of granularity as Xilinx’s IDELAY. Microchip provides advanced clock management modules in their PolarFire and IGLOO2 FPGAs. While there is no direct IDELAY equivalent, their FPGAs support configurable I/O delays through the use of CCG (Configurable Clock Generator) and SERDES modules, which can be programmed to achieve precise signal synchronization.

5. Conclusions

A DTC architecture compatible with 28 nm Xilinx 7-Series FPGAs and SoCs with a resolution of 52 ps and an FSR of 56 ms has been proposed. The design was validated on the low-end Artix-7 XC7A100TFTG256-2 FPGA, exhibiting remarkable linearity, with a DNL of 1.19 LSB and an INL of 1.56 LSB. Additionally, its resource efficiency stands out: it utilizes only one IDELAYE2 and IDELAYCTRL primitive, along with 348 LUTs and 550 FFs per channel. Furthermore, the power consumption remains low, with dynamic power dissipation totaling only 285 mW. This efficiency makes it well suited for applications requiring multiple channels. The proposed high-performance solution does not require a calibration mechanism or specific component placement. Nevertheless, it offers a streamlined, simple, and compact structure that eliminates key trade-offs, such as jitter vs. FSR and area vs. FSR. Moreover, the system’s robustness under PVT fluctuations was verified by measuring a 9% increase in jitter within the 25–75 °C range and a 10 ps r.m.s. increase when varying the power supply by 200 mV. No significant process variations were observed when switching between different FPGA models. Regarding the oscillator, the system’s correct operation was verified within the 290–310 MHz range, with LSB fluctuations of ±1.8 ps.

Author Contributions

Conceptualization, N.L.; Methodology, N.C.; Software, G.F. and G.B.; Investigation, F.C.; Data curation, E.R.; Writing—original draft, A.C.; Writing—review & editing, A.G.; Project administration, F.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

A special thanks to TEDIEL S.r.l., a spin-off of Politecnico di Milano.

Conflicts of Interest

The authors declare no conflict of interest.

References

Garzetti, F.; Salgaro, S.; Venialgo, E.; Lusardi, N.; Corna, N.; Geraci, A.; Charbon, E. Plug-and-play TOF-PET Module Readout Based on TDC-on-FPGA and Gigabit Optical Fiber Network. In Proceedings of the 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), Manchester, UK, 26 October–2 November 2019; pp. 1–4. [Google Scholar] [CrossRef]
Abramov, E.; Evzelman, M.; Peretz, M.M. Low-Voltage Sub-Nanosecond Pulsed Current Driver IC for High-Speed LIDAR Applications. IEEE J. Emerg. Sel. Top. Power Electron. 2020, 8, 3001–3013. [Google Scholar] [CrossRef]
Selvaraj, M.; Subramani, A.; Ramanathan, K.; Cautero, M.; Richter, R.; Pal, N.; Bolognesi, P.; Avaldi, L.; Vinitha, M.; Jureddy, C.S.; et al. Comprehensive survey of VUV induced dissociative photoionization of aniline: Role of H migration assisted isomerization. Chem. Phys. Lett. 2023, 829, 140716. [Google Scholar] [CrossRef]
Arun, S.; Ramanathan, K.; Selvaraj, M.; Cautero, M.; Richter, R.; Pal, N.; Chiarinelli, J.; Bolognesi, P.; Avaldi, L.; Vinitha, M.V.; et al. In search of universalities in the dissociative photoionization of PANHs via isomerizations. J. Chem. Phys. 2023, 159, 104308. [Google Scholar]
Lusardi, N.; Garzetti, F.; Costa, A.; Cautero, M.; Corna, N.; Ronconi, E.; Brajnik, G.; Stebel, L.; Sergo, R.; Cautero, G.; et al. High-Resolution Imager Based on Time-to-Space Conversion. IEEE Trans. Instrum. Meas. 2022, 71, 2004811. [Google Scholar] [CrossRef]
Qin, X.; Zhang, W.Z.; Wang, L.; Tong, Y.; Yang, H.; Rui, Y.; Rong, X.; Du, J.F. A pico-second resolution arbitrary timing generator based on time folding and time interpolating. Rev. Sci. Instrum. 2018, 89, 074701. [Google Scholar] [CrossRef] [PubMed]
Kwiatkowski, P. Digital-to-time converter for test equipment implemented using FPGA DSP blocks. Measurement 2021, 177, 109267. [Google Scholar] [CrossRef]
Song, Y.; Liang, H.; Zhou, L.; Du, J.; Ma, J.; Yue, Z. Large dynamic range high resolution digital delay generator based on FPGA. In Proceedings of the 2011 International Conference on Electronics, Communications and Control (ICECC), Ningbo, China, 9–11 September 2011; pp. 2116–2118. [Google Scholar] [CrossRef]
Al-Ahdab, S.; Mäntyniemi, A.; Kostamovaara, J. A 12-bit digital-to-time converter (DTC) for time-to-digital converter (TDC) and other time domain signal processing applications. In Proceedings of the NORCHIP 2010, Tampere, Finland, 15–16 November 2010; pp. 1–4. [Google Scholar] [CrossRef]
Xilinx. 7 Series FPGAs Clocking Resources User Guide (UG472); Xilinx: San Jose, CA, USA, 2018. [Google Scholar]
Lusardi, N.; Garzetti, F.; Costa, A.; Ronconi, E.; Geraci, A. From Multiphase to Novel Single-Phase Multichannel Shift-Clock Fast Counter Time-to-Digital Converter. IEEE Trans. Ind. Electron. 2024, 71, 9886–9894. [Google Scholar] [CrossRef]
Mazin, G.; Stejskal, A.; Dudka, M.; Ježek, M. Non-blocking programmable delay line with minimal dead time and tens of picoseconds jitter. Rev. Sci. Instrum. 2021, 92, 114712. [Google Scholar] [CrossRef] [PubMed]
Kwiatkowski, P.; Szplet, R. Digital-to-time converter with pulse train generation capability. In Proceedings of the 2018 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Houston, TX, USA, 14–17 May 2018; pp. 1–6. [Google Scholar] [CrossRef]
Chaberski, D. High-resolution time-interval generator. Przegląd Elektrotechniczny 2017, 1, 27–34. [Google Scholar] [CrossRef]
Abdulrazzaq, B.I.; Halin, I.A.; Kawahito, S.; Sidek, R.M.; Shafie, S.; Yunus, N.A.M. A review on high-resolution CMOS delay lines: Towards sub-picosecond jitter performance. SpringerPlus 2016, 5, 1–32. [Google Scholar] [CrossRef] [PubMed]
Cui, K.; Li, X.; Zhu, R. A high-resolution programmable Vernier delay generator based on carry chains in FPGA. Rev. Sci. Instruments 2017, 88, 064703. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Zhang, M.; Liu, Y. High-Resolution Digital-to-Time Converter Implemented in an FPGA Chip. Appl. Sci. 2017, 7, 52. [Google Scholar] [CrossRef]
Nutt, R. Digital Time Intervalometer. Rev. Sci. Instrum. 1968, 39, 1342–1345. [Google Scholar] [CrossRef]
Vornicu, I.; Carmona-Galán, R.; Rodríguez-Vázquez, Á. Time interval generator with 8 ps resolution and wide range for large TDC array characterization. Analog. Integr. Circuits Signal Process. 2016, 87, 181–189. [Google Scholar] [CrossRef]
Xilinx. 7 Series FPGAs Libraries Guide for Schematic Designs; UG953; Xilinx: San Jose, CA, USA, 2019; p. 386. [Google Scholar]
Xilinx. Artix-7 FPGAs Data Sheet: DC and AC Switching Characteristics (DS181); IDELAYE jitter and supported frequencies; Xilinx: San Jose, CA, USA, 2022; p. 26. [Google Scholar]
Veendrick, H. The behaviour of flip-flops used as synchronizers and prediction of their failure rate. IEEE J. Solid-State Circuits 1980, 15, 169–176. [Google Scholar] [CrossRef]
Kim, S.J.; Lee, J.G.; Kim, K. A parallel flop synchronizer for bridging asynchronous clock domains. In Proceedings of the 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, Fukuoka, Japan, 5 August 2004; pp. 184–187. [Google Scholar] [CrossRef]
Corna, N.; Garzetti, F.; Lusardi, N.; Geraci, A. Digital Instrument for Time Measurements: Small, Portable, High–Performance, Fully Programmable. IEEE Access 2021, 9, 123964–123976. [Google Scholar] [CrossRef]
TEDIEL S.r.l. Felix Board Product Page. Available online: https://tediel.com/products/ (accessed on 28 November 2024).
Dallas Semiconductor. DS1040 Programmable One-Shot Pulse Generator. Available online: https://www.analog.com/en/products/ds1040.html (accessed on 28 November 2024).
Xilinx. IDELAYE3; UltraScale Architecture Libraries Guide (UG974). Available online: https://docs.amd.com/r/en-US/ug974-vivado-ultrascale-libraries/IDELAYE3 (accessed on 28 November 2024).
Xilinx. IDELAYE5; Versal Architecture Prime Series Libraries Guide (UG1344). Available online: https://docs.amd.com/r/2023.2-English/ug1485-versal-architecture-premium-series-libraries/IDELAYE5 (accessed on 28 November 2024).
Altera. Available online: https://www.intel.com/content/www/us/en/products/programmable.html#gs.ieawlg (accessed on 28 November 2024).
Lattice Semiconductor. Available online: https://www.latticesemi.com/ (accessed on 28 November 2024).
Microsemi. Available online: https://www.microsemi.com/ (accessed on 28 November 2024).

Figure 1. Schematic plot of a PDL.

Figure 2. Schematics (a) and waveforms (b) of Nutt interpolation and waveforms.

Figure 3. (a) Register-transfer level (RTL) representation of the 2-bit circular buffer used to generate the CE signals. (b) Two BUFGCEs are used to generate the 180°-shifted clocks.

Figure 4. Schematic (a) and waveforms (b) of the proposed dual-clock synchronous coarse logic.

Figure 5. Overview of the proposed FPGA-based Nutt-interpolated DTC, where the dual-clock counter is used as coarse logic and the IDELAYE2 is employed as the fine interpolator.

Figure 6. Jitter of the DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 1023

L S B

dynamic range at 25 °C.

Figure 6. Jitter of the DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 1023

L S B

dynamic range at 25 °C.

Figure 7. DNL of the DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 1023

L S B

dynamic range at 25 °C.

Figure 7. DNL of the DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 1023

L S B

dynamic range at 25 °C.

Figure 8. INL of the DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 1023

L S B

dynamic range at 25 °C.

Figure 8. INL of the DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 1023

L S B

dynamic range at 25 °C.

Figure 9. Block diagram of the measurement setup.

Figure 10. Block diagram of IDELAYE2-based PDL-DTC used as benchmark.

Figure 11. Jitter of the IDELAYE2-based PDL-DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 960

L S B

dynamic range at 25 °C.

Figure 11. Jitter of the IDELAYE2-based PDL-DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 960

L S B

dynamic range at 25 °C.

Figure 12. DNL of the IDELAYE2-based PDL-DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52

.083 ps in

0 \div 960

L S B

dynamic range at 25 °C.

Figure 12. DNL of the IDELAYE2-based PDL-DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52

.083 ps in

0 \div 960

L S B

dynamic range at 25 °C.

Figure 13. INL of the IDELAYE2-based PDL-DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 960

L S B

dynamic range at 25 °C.

Figure 13. INL of the IDELAYE2-based PDL-DTC with

f_{I D E L A Y C T R L} = 300

MHz,

L S B = 52.083

ps in

0 \div 960

L S B

dynamic range at 25 °C.

Table 1. Ideal IDELAYE2 parameters.

IDELAYCTRL Clock Frequency	Tap Delay	Maximum IDELAYE2 Delay
200 MHz	78.125 ps	2.422 ns
300 MHz	52.083 ps	1.615 ns
400 MHz	39.0625 ps	1.210 ns

Table 2. Dynamic and static power performed with XPE: ambient temperature of 25 °C, 12.5% input switching activity, and clock frequency of 300 MHz.

Static on-chip power	0.109 W
Dynamic on-chip power	0.285 W
Clocks	0.040 W
Signals	0.019 W
Logic	0.022 W
I/O	0.051 W

Table 3. Comparison between the proposed Nutt-INterpolated DTC and the IDELAYE2-based PDL-DTC benchmark.

FPGA	Arch.	Jitter	LSB	FSR	DNL/INL	Power	Area
FPGA	Arch.	[ps r.m.s.]	[ps]	[LSB]	[LSB]	[mW]	DFFs/LUTs
Artix-7 28 nm	Nutt ¹	25–50	52	$1.12 \times 10^{9}$	1.19/1.56	285	550/348 + 1 IDELAYE2
Artix-7 28 nm	IDELAYE2-based PDL-DTC	14–185	52	$9.615 \times 10^{3}$	0.87/3.60	125	332/4 + 300 IDELAYE2

¹ With dual-clock coarse counter and an IDELAYE2-based PDL.

Table 4. Comparison between different DTC solutions available from academic research.

Ref.	FPGA	Arch.	Jitter	LSB	FSR	DNL/INL	Power	Area
(Year)	FPGA	Arch.	[ps r.m.s.]	[ps]	[LSB]	[LSB]	[mW]	DFFs/LUTs
This	Artix-7 28 nm	Nutt ¹	25–50	52	$1.12 \times 10^{9}$	1.19/1.56	285	550/348 + 1 IDELAYE2
[12] (2021)	Cyclone IV 40 nm	PDL ²	7–165	10	$1.6 \times 10^{3}$	N.A.	N.A.	N.A.
[7] (2021)	Spartan-6 45 nm	PDL ³	1.9	9.1	49	3.19/7.11		286/229 + 1 DSP
[7] (2021)	Spartan-6 45 nm	PDL ⁴	2.5–3.4	14.2	768	22.08/19.63	85	286/989 + 16 DSP
[13] (2018)	Spartan-6 45 nm	PDL ⁵	4.5	20	248	3.95/6.2	N.A.	N.A.
[16] (2017)	Stratix III 65 nm	VDLL	33.6	38.6	161	0.24/0.02	N.A.	146/668
[14] (2017)	Virtex-5 65 nm	Nutt	35	1	$57.3 \times 10^{3}$	high	N.A.	N.A.
[17] (2017)	Virtex-6 40 nm	PVDL	10	1.02	578	0.17/0.62	165	N.A.
[19] (2017)	Virtex-5 65 nm	Nutt ⁶	20	8	$4.125 \times 10^{6}$	3.38/6.75	N.A.	N.A.

¹ With dual-clock coarse counter and an IDELAYE2-based PDL. ² LUT-based. ³ DSP-based. ⁴ DSP-based. ⁵ CARRY-based. ⁶ With a DCM-based PDL.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Garzetti, F.; Lusardi, N.; Corna, N.; Fiumicelli, G.; Cattaneo, F.; Bonanno, G.; Costa, A.; Ronconi, E.; Geraci, A. High-Precision Digital-to-Time Converter with High Dynamic Range for 28 nm 7-Series Xilinx FPGA and SoC Devices. Electronics 2024, 13, 4825. https://doi.org/10.3390/electronics13234825

AMA Style

Garzetti F, Lusardi N, Corna N, Fiumicelli G, Cattaneo F, Bonanno G, Costa A, Ronconi E, Geraci A. High-Precision Digital-to-Time Converter with High Dynamic Range for 28 nm 7-Series Xilinx FPGA and SoC Devices. Electronics. 2024; 13(23):4825. https://doi.org/10.3390/electronics13234825

Chicago/Turabian Style

Garzetti, Fabio, Nicola Lusardi, Nicola Corna, Gabriele Fiumicelli, Federico Cattaneo, Gabriele Bonanno, Andrea Costa, Enrico Ronconi, and Angelo Geraci. 2024. "High-Precision Digital-to-Time Converter with High Dynamic Range for 28 nm 7-Series Xilinx FPGA and SoC Devices" Electronics 13, no. 23: 4825. https://doi.org/10.3390/electronics13234825

APA Style

Garzetti, F., Lusardi, N., Corna, N., Fiumicelli, G., Cattaneo, F., Bonanno, G., Costa, A., Ronconi, E., & Geraci, A. (2024). High-Precision Digital-to-Time Converter with High Dynamic Range for 28 nm 7-Series Xilinx FPGA and SoC Devices. Electronics, 13(23), 4825. https://doi.org/10.3390/electronics13234825

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Precision Digital-to-Time Converter with High Dynamic Range for 28 nm 7-Series Xilinx FPGA and SoC Devices

Abstract

1. Introduction

2. Hardware Implementation

3. Experimental Validation

3.1. Experimental Setup

3.2. Measurement

3.3. IDELAYE2-Based PDL-DTC Benchmark

4. State of the Art and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI