Next Article in Journal
Gaze Estimation Method Combining Facial Feature Extractor with Pyramid Squeeze Attention Mechanism
Previous Article in Journal
A New Single-Cell Hybrid Inductor-Capacitor DC-DC Converter for Ultra-High Voltage Gain in Renewable Energy Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Low-Latency, Low-Jitter Retimer Circuit for PCIe 6.0

1
School of Computer, National University of Defense Technology, Changsha 410003, China
2
School of Air and Missile Defense College, Air Force Engineering University, Xi’an 710051, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2023, 12(14), 3102; https://doi.org/10.3390/electronics12143102
Submission received: 6 June 2023 / Revised: 7 July 2023 / Accepted: 12 July 2023 / Published: 17 July 2023
(This article belongs to the Section Microelectronics)

Abstract

:
As the PCIe 6.0 specification places higher requirements on signal integrity and transmission latency, it becomes especially important to improve signal transmission performance at the physical layer of the transceiver interface. Retimer circuits are a key component of high-speed serial interfaces, and their delay and jitter size directly affect the overall performance of PCIe. For the typical retimer circuit with large-latency and low-jitter performance, this paper proposes a low-latency and low-jitter Retimer circuit based on CDR + PLL architecture for PCIe 6.0, using a jitter-canceling filter circuit to eliminate the frequency difference between the retiming clock and data, reduce the retiming clock jitter, and improve the quality of Retimer output data. The data are sampled using the retiming clock and then output, avoiding the problem of large penetration latency of typical retimer circuits. The circuit is designed using the CMOS 28 nm process. Simulation results show that when 112 Gbps PAM4 data are input to the retimer circuit, the Retimer penetration latency is 27.3 ps, which is 83.5% lower than the typical Retimer structure; the output jitter data are 741 fs, a 31.4% reduction compared to the typical retimer structure.

1. Introduction

Since its official release, PCIe (PCI-Express) has evolved rapidly and has become an indispensable technology for high-performance computer (HPC) communications [1], ethernet, industrial control, etc. The release of the PCIe 6.0 specification has dramatically increased computing speeds for applications such as HPC, cloud computing, and data center solid state drives (SSD), but with that comes the negative impact of the channel on clock signals and transmitted data. Severe signal attenuation and interference limit the overall performance of PCIe 6.0, especially on the receiving end where signal integrity and transmission latency are greatly affected. Retimer, a technology for data synchronization and transmission, plays a key role in the physical layer of the PCIe 6.0 interface subsystem and is expected to be the primary solution in the PCIe 6.0 era with its better performance and more economical processing.
Since PCIe 3.0, the data rate has doubled with each new generation of the standard. PCIe 6.0 has increased the data transfer rate to 64 GT/s, and the single channel bandwidth has reached 63.02 Gbps. The retimer, as a PHY chip, needs to be compatible with various key technologies such as serialization and deserialization, clock generation and distribution, clock data recovery, data drive and equalization, etc. For the ultra-high speed input signal of 100 Gbps or more, the synchronization signal still has large jitter and signal transmission and also has large latency. In the circuit design, when there is a latency difference between the clock and data path, the correlation of the clock to data sampling is weakened [2], resulting in an increase in the correlation jitter between the clock and data path, which reduces the jitter tolerance. Therefore, many studies have minimized the delay matching between clock and data through clock sampling and forwarding techniques, which in turn achieves noise filtering and jitter cancellation. In [2], the receiver-side clock path uses a high-bandwidth filtered PLL to track data-related jitter and to cut off high-frequency jitter, but this method does not guarantee that the PLL output clock and data path are at the same frequency; the literature [3] uses a multiplying delay-locked loop (MDLL) to reset the oscillator jitter at the rate of the reference clock frequency, which does not require a high bandwidth loop to the suppress oscillator jitter, reducing the complexity of the design, but this can subject the MDLL to large duty cycle distortion. A region-efficient phase filtering technique is proposed in [4] to filter the jitter between cascaded repeaters, but the noise environment is not fundamentally eliminated, and the jitter accumulated by subsequent cascaded circuits degrades the system performance and jitter tolerance. To reduce the output jitter, [5] uses a retimer design with symmetric layout, which reduces the differential coupling capacitance and adjusts the serial data and clock phase into the retimer by a phase regulator with the help of an external control signal and provides the clock drive for the Retimer. But this design cannot eliminate the phase difference between the data and clock and also increases the circuit penetration latency. In the literature [6], a retiming driver based on clock and data recovery (CDR) + vertical cavity surface emitting laser (VCSEL) architecture was designed for 50 Gbps PAM4 (pulse-amplitude—modulation-4) signal. To address the challenges posed by signal characteristics, skinning effects, dielectric losses, and inter-symbol crosstalk, the retimer of the repeater type was chosen as the signal conditioning technique in the literature [7]. The retimer consists of a receiver and a driver that uses the clock recovered from the data stream by the CDR or a reference clock to achieve synchronous driving of the data. The literature [8] proposes a PAM4 transceiver architecture based on analog to digital converter (ADC) + digital signal processor (DSP), which makes the channel equalization capability greater than 40 dB for solving the encapsulation insertion loss caused in long distance transmission. In [9], an FPGA-based low-latency transfer scheme was used to achieve high-speed data transfer between the FPGA platform and the server. However, this scheme does not consider the data compression process which is more time consuming for transmission and the data transfer rate is still at a low level. A low-latency forward error correction coding was proposed in [10]; this technique has too much resource overhead in the coding layer.
In this paper, we propose a new retimer solution to address the problem of latency and jitter in high-speed data transmission in order to recover low-jitter data at the SerDes receiver and reduce the latency of data transmission in subsequent circuits.

2. Retimer Circuit Latency and Jitter Performance Analysis

A retimer contains the CDR circuit, which is the core component of SerDes PHY, and the Rx signal is reduced to a digital signal in the retimer, then reconverted to an analog signal and sent out through its Rx. The essence is to use PLL to recover data from the clock jitter introduced by connector channel crosstalk and cable and board impedance distortion and then send the signal out through the serial channel, which can reduce signal jitter and can better reduce physical loss. It shows that the retimer contains the CDR function and can effectively filter the received signal jitter, but the complex high-speed retimer will lead to a poor quality recovery signal due to timing constraints, and it is easy to timeout and increase the line latency. Figure 1 shows a retimer circuit design based on a typical component of the retimer implementation [7]. As shown in Figure 1, the process of CDR recovery data sent out again goes through parallel-in serial-out (PISO) conversion, decoding and descrambling, and first in first out (FIFO) timing adjustment, which are time consuming.
A 112 Gbps PAM4 signal with a sinusoidal jitter amplitude of 0.1 unit interval (UI) and a frequency of 1 MHz is fed to the typical retimer circuit. The schematic-level simulation of the circuit is implemented in the Candence platform to verify the system performance, and the delay and jitter measurements are shown in Figure 2. Figure 2a shows that the signal penetration latency in the circuit is 165.3 ps from the input to the output, and Figure 2b shows that the jitter value of the transmitted data are 1.08 ps, with a jitter attenuation of −10.38 dB compared to the input signal. Although the latency characteristics and jitter characteristics of the circuit are good, they are still not enough for a high-speed signal of 112 Gbps.
From the above analysis, it is clear that the key technical issue of retimer circuit design is to effectively reduce the physical loss of high-speed signals without increasing the line latency and to ensure balanced signal transmission.

3. The Retimer Circuit Architecture Proposed in This Paper

3.1. Low-Delay, Low-Jitter Retimer Circuit Based on CDR + PLL Architecture

To address the key technical issues of retimer design, this paper designs and implements a 112 Gbps PAM4 low-latency and low-jitter retimer circuit applied to PCIe 6.0, with CDR, PLL, and a retiming module as its main components. Its system architecture is shown in Figure 3. Due to the low-pass characteristics of the transmission channel, because of the intersymbol interference (ISI) and noise and other irrational factors, the received signal usually passes through the equalizer first, but the equalized signal cannot be used directly, and further processing of the signal is required. The output signal of the equalizer goes through CDR sampling, phase tracking, phase adjustment, and loop locking before the output recovers the clock and data. However, for the 112 Gbps high-speed signal, the clock rate recovered by CDR is too high, the high-frequency jitter attenuation is large, the CDR cannot effectively track the input signal jitter, and the recovered clock quality is poor, resulting in high data BER. The high BER signal needs to be secondarily retimed before transmission to the next level circuit, otherwise the signal noise will expand step by step.
Although 100 + Gbps multiplexers and transceivers have been reported in the past decade, the flip-flops and retimers required in single-channel, full-rate 100 + Gbps serial communication systems and 100 + GSa/s ADCs have not yet been represented in any technology. This paper describes a retiming flip-flop based on CDR + PLL architecture, which is the first to use the CDR output clock as the reference clock of the PLL that generates the retiming clock to substantially reduce the transmission delay and data jitter while achieving a single-channel communication rate of over 100 Gbps.
The specific implementation process is to input the CDR recovery clock as a reference clock to the filtered PLL, filter out the high-frequency jitter in the clock signal, and then output a new clock signal. The new clock signal jitter performance is good for meeting the high-speed signal sampling accuracy requirements, and the clock frequency and CDR recovery data rate are equal. This clock is used to directly sample and combine the CDR recovery data to achieve the retiming of the data. The retiming data are close to the ideal state of the low error signal and then sent to the next level circuit through the equalization and drive module. The whole process eliminates the need for time-consuming synthesizable digital logic modules, thus reducing transmission latency. This design not only eliminates the frequency difference between the sampling clock and data to ensure the quality of the sampling clock but also reduces the signal penetration delay and achieves a low-latency and low-jitter retiming function.

3.2. CDR Design

In the process of designing the retimer circuit, considering the advantages of phase interpolator (PI)-based CDR [11] and the characteristics of the PAM4 baud-rate-phase-detector (BRPD) [12], the BRPD-based PI type CDR was chosen in this paper. In addition, considering the frequency difference that exists in the signals at the transceiver side of the SerDes, the CDR uses a second-order digital filter to achieve stronger correction capability for phase difference and frequency difference [13].

3.2.1. CDR Key Circuit Design

The CDR circuit used in this paper is a mixed digital-analog structure. To facilitate an understanding of the circuit composition and the role of each module, a detailed circuit of the following modules is provided.
  • Sampling module circuit
Figure 4 shows the general structure and main circuit of the sampling module. Corresponding to the PAM4 baud rate detector, the overall structure of the sampling module in Figure 4a consists of nine groups of comparators, samplers, and decoder circuits. Figure 4 shows the general structure and main circuit of the sampling module. Corresponding to the PAM4 BRPD, the overall structure of the sampling module in Figure 4a consists of nine groups of comparators, samplers, and decoder circuits. The nine comparators have six error decision levels and three data decision levels, which determine the correlation between the input data and the decision levels at the sampling point, and then the input signal is sampled and amplified by the sampler and finally handed over to the decoder at the back stage to obtain the sampled data information and the error information.
As shown in Figure 4b, the comparator uses a differential structure. The differential comparator compares the magnitude of the input data level with the decision level by differentiating, resulting in different currents in the two branches and finally a differential output result.
As shown in Figure 4c, the sampler uses a CML sampler, which consists of two level-sensitive CML latches in a master-slave configuration.
2.
BRPD
The BRPD is based on a digital circuit design that includes a data sampler, a waveform filter, an error sampler, and a phase detector.
The data sampler is used to decode the 8-bit data D[7:0] of the data sampling signal into the 3-bit data sampling result d[2:0], and the data sampler logic gate circuit is shown in Figure 5.
The waveform filter is used to logically combine the data judgment decoding result dn[2:0] of the current moment with the data judgment decoding results dn−1[2:0] and dn-2[2:0] of the previous two moments to obtain the signal mode[2:0] of the 3-bit screening mode. The waveform filter consists of two isomorphic data comparators and a waveform selector. The data comparator inputs two consecutive sets of sampling results to give the comparison result Sn[1:0], and the logic gate circuit of the data comparator is shown in Figure 6.
The waveform selector is based on the preset waveform screening logic to obtain the screening pattern, and the logic gate circuit of the waveform selector is shown in Figure 7.
The detector circuit obtains two phase difference signals YE and YL based on the waveform selector output mode[2:0] and the error signal from the error sampler output based on the preset detector truth table, and the circuit is shown in Figure 8.
3.
PI
The phase interpolator is the key module of the CDR and sits between the reference clock and the data sampler in the CDR loop, shifting the clock phases in the data sampling window. Figure 9 shows the circuit structure of the phase interpolator, which receives two phase quadrature clocks of the same frequency and generates a clock whose phase is the weighted sum of the two input phases. The ideal output phase interpolator must generate multiple equally spaced phase steps over the entire period from 0 to 2π.

3.2.2. Modeling Analysis and Parameter Design of CDR

The linear model of the analog clock recovery unit is shown in Figure 10.
The loop gain of its linear system [14] is given by the following equation:
L ( s ) = I P K B R ( K V C O s ) ( R + 1 s C )
Linear equivalent modeling of the CDR is performed to verify the functional correctness of the CDR circuit. The analog components in the analog charge pump phase-locked loop (CPPLL)-based CDR are replaced with digital components and converted to a digital phase-locked loop (DPLL)-based small-signal model of the CDR. The Z-domain linear equivalent model of the CDR circuit is shown in Figure 11, and the modeling approach has been provided in the literature [14].
The open-loop transfer function, closed-loop transfer function can be obtained using the equivalent model:
L o ( z 1 ) = ϕ samp ϕ err = ( K B R K V K D P C 1 z 1 ) ( K P + K I 1 z 1 ) z N
L c l ( z 1 ) = ϕ samp ϕ in = L ( z 1 ) 1 + L ( z 1 )
In order to be able to verify the performance of the system, the model analysis will use parameters that are consistent with the CDR in the test circuit, which are listed in Table 1 with remarks on the significance of the parameters.

3.2.3. Simulation Verification of CDR

System-level simulation of 112 Gbps PAM4 CDR circuit using MatlabR2015b to verify the jitter transfer characteristics of CDR. Figure 12 shows the theoretical and simulated jitter transfer curves of the CDR for input jitter amplitudes of 0.1 UI and 0.2 UI. The results show that the theoretical calculation and the simulated measurement are in good agreement, and the output jitter decays continuously when the jitter frequency exceeds ω3dB.
Figure 13 is the result of the schematic-level simulation of the Candence platform, showing the input clock and recovery clock eye diagram for a jitter amplitude of 0.1 UI and a jitter frequency of 1 MHz, 10 MHz, and 100 MHz, respectively. The jitter attenuation at the three frequencies is −0.063 dB, −6.9 dB, and −28.8 dB, which are basically consistent with the theoretical calculation results in Figure 12.
Figure 12 and Figure 13 show that when the jitter frequency ωp < ω3dB, the output jitter completely tracks the input jitter, and the jitter transfer is 0 dB. When the jitter frequency ωp > ω3dB, the output jitter decays at a rate of 20 dB/dec.
The simulation and analysis show that the CDR can completely track the jitter on less than the loop bandwidth and can play an obvious filtering role for the high-frequency noise and jitter beyond the loop bandwidth. With the jitter amplitude of 0.1 UI and 0.2 UI, the CDR loop bandwidth is 4.9 MHz and 2.3 MHz, respectively, which can verify the reasonableness of the CDR parameter setting and the correctness of the model function.

3.3. PLL Design

By introducing a jitter-canceling filtered phase-locked loop in the retimer circuit, the jitter of the clock recovered by the CDR circuit is filtered, and a high-quality retimed clock is outputted.

3.3.1. PLL Linear Model Analysis

The CPPLL has a wide locking range and good stability. In order to enhance the stability of the phase-locked loop system and stabilize the control voltage of the voltage-controlled oscillator, a charge-pump phase-locked loop containing a second-order loop filter is used as the filtering module of the retimer circuit in this paper, whose linear model [15] is shown in Figure 14.
Its open-loop transfer function and closed-loop transfer function are
G ( s ) = K V C O K P F D F ( s ) N s
H ( s ) = K V C O K P F D F ( s ) / s 1 + K V C O K P F D F ( s ) / ( N s )
The transfer functions of the charge pump and filter are
K P F D = I P 2 π
F ( s ) = 1 C 1 s ( 1 C 2 s + R 2 ) 1 C 1 s + 1 C 2 s + R 2 = R 2 C 2 s + 1 s ( R 2 C 1 C 2 s + C 1 + C 2 )
The filter has only one zero point at ωz = 1/R2C2. The voltage-controlled oscillator has one pole at the origin ωp1, and the filter introduces another origin pole ωp2 and another non-zero pole ωp3 = (C1 + C2)/R2C1C2.
If the ratio of C2 and C1 is denoted by b, the relationship between the zero point and the non-zero pole is ωp3 = (b + 1)ωz. The phase margin of the open-loop transfer function G() at any angular frequency is:
θ ( j w ) = tan 1 w w Z tan 1 w w P 3
The maximum value of phase margin is obtained by deriving the above equation and making it equal to 0. In order to maximize the phase margin, the phase margin maximum must be obtained at the open-loop bandwidth ω3dB, ω 3 dB = ω z ω p 3 , and the phase margin maximum can be obtained as
θ m = tan 1 [ 1 2 ( b + 1 1 b + 1 ) ]
w Z = w 3 dB ( b + 1 )
w P 3 = ( b + 1 ) w 3 dB
Refer to Appendix A for specific mathematical proofs.
The non-zero poles introduced by the filter [16] increase the filtering of high frequencies on the one hand and reduce the phase margin on the other. The selection of the location of the zero and pole points has a great influence on the system stability. The article takes the values of the zero and pole points as follows: The zero-point ωz is 1/4 of ω3dB, while the non-zero pole point ωp3 is 4 times of ω3dB, i.e., b = 15. Its phase margin can be estimated as:
θ = tan 1 4 tan 1 1 4 62
Near the −3 dB bandwidth, the filter impedance can be approximated as R2, because the integrating capacitor C2 can be regarded as a short-circuit, and the pole capacitor C1 can be regarded as an open circuit. At the −3 dB bandwidth, its open-loop gain is 1, that is, its forward gain is equal to the crossover ratio of the feedback divider. Thus, at the −3 dB bandwidth, we have
N = K V C O K P F D R 2 w 3 dB
The −3 dB bandwidth can then be expressed as
w 3 dB = K V C O K P F D R 2 N
This shows that the −3 dB bandwidth is independent of the value of the capacitor. At this point, the values of R2, C2, and C1 are taken as follows:
R 2 = N w 3 dB K P F D K V C O b + 1 b = 15 N w 3 dB 16 K P F D K V C O
C 2 = b + 1 R 2 = K V C O K P F D N w 2 3 dB b b + 1 = 15 K V C O K P F D 4 N w 2 3 dB
C 1 = C 2 b = K V C O K P F D N w 2 3 dB 1 b + 1 = K V C O K P F D 4 N w 2 3 dB

3.3.2. PLL Parameter Design

PLL as a negative feedback system has multiple additive noises that affect the output clock jitter throughout the circuit operation. Through [17] on the PLL multiple noise sources transmission characteristics, analysis can be seen, and a reduction in the loop bandwidth can reduce the impact of input noise, discriminating frequency detector and charge pump equivalent noise, loop filter noise, and divider noise on the output clock signal; and increase the loop bandwidth can reduce the impact of voltage-controlled oscillator on the output clock signal. Therefore, in order to obtain a low-jitter output clock signal, there is a need for a compromise in the selection of the loop bandwidth to take the value. In this paper, according to the performance parameters of the CDR recovery clock signal, as well as the requirements of the output clock jitter of the PLL, selecting the value of the loop bandwidth of 0.1 MHz. combined with the theoretical calculation of the PLL linear model determines the parameters of the PLL as shown in Table 2.
The PLL gain and phase margin are obtained by Matlab system-level simulation, as shown in Figure 15:
Figure 15 shows that the open-loop gain of the phase-locked loop at the loop bandwidth of 0.1 MHz is almost 1, when the phase margin reaches a maximum of 62°, which is consistent with the analysis in Section 3.3.1; the closed-loop gain of the phase-locked loop at 1 MHz is −16.23 dB, when the theoretical value of the output clock jitter is 554 fs, and the results are verified in Section 4.

4. Retimer Circuit Simulation

The retimer circuit designed in this paper is simulated at schematic level on the Candence platform (input signal is 112 Gbps PAM4, sinusoidal jitter frequency is 1 MHz, amplitude is 0.1 UI, clock frequency is 14 GHz, no frequency difference) to measure its delay and jitter performance. After the retimer circuit is locked, we can get the result of the signal penetration latency through the circuit, PLL output clock eye diagram, and retiming data eye diagram at this clock.
Figure 16 shows that from the time the signal is input to the retimer circuit to the time the retiming signal is sent out, the transmission latency of the signal in the retimer circuit is 27.3 ps, which is 83.5% lower than that of the typical retimer circuit.
As shown Figure 17, by observing the eye diagram of the PLL output clock after the retimer lock and the retimed data under this clock, it can be found that the PLL output clock jitter is 600.8 fs, and compared with the input clock jitter of 3.57 ps, the jitter is attenuated by −15.5 dB, which verifies the theoretical value in Figure 15; the jitter of the retimed data are 741 fs, which is 31.4% lower compared to the jitter of retimed data of a typical retimer circuit.
Table 3 provides the comparison results of the retiming scheme proposed in this paper and the existing retiming scheme.
The simulation results illustrate that the retimer circuit based on CDR + PLL architecture designed in this paper has good jitter performance and latency performance and also shows the rationality of the analysis and parameter design of the CDR and PLL system in this paper.

5. Conclusions

In this paper, a retimer circuit based on CDR + PLL architecture is proposed for PCIe 6.0 to address the problems of high-speed retimer circuits with large penetration delays and low-jitter performance. By connecting the PLL circuit at the back end of CDR, the clock signal containing large jitter recovered from CDR is input to the PLL for high-frequency filtering, and a low-jitter retiming clock is obtained, and then the CDR recovery data are sampled and combined and sent, which reduces the signal penetration delay while reducing the output data jitter. This study provides a new design architecture for high-speed retimer and provides technical support for enhancing the overall performance of PCIe 6.0.

Author Contributions

Conceptualization, H.W. and F.L.; methodology, Q.L., G.Z. and D.L.; software, Q.L., H.W. and G.Z.; validation, Q.L. and D.L.; resources, F.L.; formal analysis, Q.L. and D.L.; data curation, Q.L.; writing—original draft preparation, Q.L. and H.W.; writing—review and editing, H.W.; funding acquisition, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program (2022YFB2803101).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Mathematical Proof of Equations (9)–(11)

Phase margin of the open-loop transfer function at any angular frequency is:
θ ( j w ) = tan 1 w w Z tan 1 w w P 3
Under the premise of b = C 2 C 1 , the relationship between the zero point w Z and the non-zero pole w P 3 is:
w P 3 = ( b + 1 ) w Z
In order to maximize the phase margin θ ( j w ) , the derivative of Equation (A1) is given as:
[ θ ( j w ) ] = 1 w Z 1 + ( w w Z ) 2 1 w P 3 1 + ( w w P 3 ) 2 = w Z w Z 2 + w 2 w P 3 w P 3 2 + w 2
Let [ θ ( j w ) ] = 0 , then we have:
w Z w Z 2 + w 2 = w P 3 w P 3 2 + w 2 w P 3 w Z 2 + w P 3 w 2 w Z w P 3 2 + w Z w 2 = 0 w Z w P 3 ( w Z w P 3 ) = w 2 ( w Z w P 3 )                 w = w Z w P 3
Then θ ( j w ) obtains the phase margin maximum θ m at w = w Z w P 3 . Usually the phase margin maximum is obtained at the open-loop bandwidth w 3 dB . Then we have:
w 3 dB = w = w Z w P 3
From Equations (A1), (A2) and (A4) we can obtain:
θ m = tan 1 w Z w P 3 w Z 2 tan 1 w Z w P 3 w P 3 2 θ m = tan 1 b + 1 tan 1 1 b + 1 θ m = tan 1 [ b + 1 1 b + 1 1 + b + 1 · 1 b + 1 ] = tan 1 [ 1 2 ( b + 1 1 b + 1 ) ]
w 3 dB = w Z w P 3 = w Z ( b + 1 ) w Z = ( b + 1 ) w Z w Z = w 3 dB ( b + 1 )
Similarly, we can obtain:
w P 3 = ( b + 1 ) w 3 dB

Appendix B

The MUX Circuit Design and Description

The parallel-serial conversion circuit in this paper uses a high-speed analog CMOS merging technique, and the circuit and structure diagram of the 4:1 MUX used in this paper are shown in the Figure A1.
Figure A1. Circuit and structure diagram of the 4:1 MUX.
Figure A1. Circuit and structure diagram of the 4:1 MUX.
Electronics 12 03102 g0a1
With this circuit, four low-speed parallel data can be serialized at only 1/4 of the clock rate. The parallel-serial conversion requires low clock rate, but high phase accuracy of the clock. The working process is as follows: the low-speed parallel data are sampled at X under the joint action of two orthogonal clocks and realize the wired-and and amplify at the last CMOS tube M7 to convert the four low-speed signals into a high-speed signal through sampling, amplification, and the wired-and form, and then through the CML to CMOS circuit drive module so that the signal has a strong load-carrying capacity, which is then sent to the voltage mode. The signal is then sent to the voltage mode driver circuit for encoding and driving.

References

  1. Loi, C.; Mellati, A.; Tan, A.; Farhoodfar, A.; Tiruvur, A.; Helal, B.; Killips, B.; Rad, F.; Riani, J.; Pernillo, J.; et al. 6.5 A 400Gb/s Transceiver for PAM-4 Optical Direct-Detect Application in 16 nm FinFET. In Proceedings of the 2019 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 17–21 February 2019; pp. 120–122. [Google Scholar] [CrossRef]
  2. Reutemann, R.; Ruegg, M.; Keyser, F.; Bergkvist, J.; Dreps, D.; Toifl, T.; Schmatz, M. A 4.5 mW/Gb/s 6.4 Gb/s 22+1-Lane Source Synchronous Receiver Core with Optional Cleanup PLL in 65 nm CMOS. IEEE J. Solid-State Circuits 2010, 45, 2850–2860. [Google Scholar] [CrossRef]
  3. Ng, H.T.; Farjad-Rad, R.; Lee, M.J.; Dally, W.J.; Greer, T.; Poulton, J.; Edmondson, J.H.; Rathi, R.; Senthinathan, R. A second-order semidigital clock recovery circuit based on injection locking. IEEE J. Solid-State Circuits 2003, 38, 2101–2110. [Google Scholar] [CrossRef]
  4. Tamer, A.; Robert, D.; Ron, H.; Chih-Kong, K.Y. A 100+ Meter 12 Gb/s/Lane Copper Cable Link Based on Clock-Forwarding. IEEE J. Solid-State Circuits 2013, 48, 1085–1098. [Google Scholar] [CrossRef]
  5. Nakasha, Y.; Suzuki, T.; Kano, H.; Tsukashima, K.; Ohya, A.; Sawada, K.; Makiyama, K.; Takahashi, T.; Nishi, M.; Hirose, T.; et al. A 43-Gb/s full-rate-clock 4:1 multiplexer in InP-based HEMT technology. IEEE J. Solid-State Circuits 2002, 37, 1703–1709. [Google Scholar] [CrossRef]
  6. Hu, S.; Yao, T.; Yin, B.; Song, C.; Zhao, L.; Wang, J.; Wang, L.; Bai, R.; Wang, X.; Xia, T.; et al. A 50Gb/s PAM-4 Retimer-CDR + VCSEL Driver with Asymmetric Pulsed Pre-Emphasis Integrated into a Single CMOS Die. In Proceedings of the 2019 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 3–7 March 2019; pp. 1–3. [Google Scholar]
  7. Tang, T.; Wray, B.; Murugan, R. Die-Package-PCB Signal Integrity Performance Debug of a High-Speed (25 Gbps) Retimer: Simulation to Measurement Correlation. In Proceedings of the 2020 IEEE International Symposium on Electromagnetic Compatibility & Signal/Power Integrity (EMCSI), Reno, NV, USA, 27–31 July 2020; pp. 170–175. [Google Scholar] [CrossRef]
  8. Mishra, P.; Tan, A.; Helal, B.; Ho, C.R.; Loi, C.; Riani, J.; Sun, J.; Mistry, K.; Raviprakash, K.; Tse, L.; et al. A 112Gb/s ADC-DSP-Based PAM-4 Transceiver for Long-Reach Applications with >40 dB Channel Loss in 7nm FinFET. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 138–140. [Google Scholar] [CrossRef]
  9. Zhao, L.W. The Low Latency of Data Transmission Design Based on FPGA. Master’s Thesis, Zhengzhou University, Zhengzhou, China, 2017. Available online: https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201702&filename=1017128975.nh (accessed on 1 June 2023).
  10. Wang, C. Design and Implementation of Low-Latency Error Correction Coding for Ultra-High-Speed Interconnection Transmission Links. Master’s Degree Thesis, National University of Defense Technology, Changsha, China, 2020. [Google Scholar] [CrossRef]
  11. Guo, K.-L.; Wang, H.-M.; Liu, T. A Non-Equivalent Tail Current Source Based New Phase Interpolator with High Linearity for High-Speed SerDes. J. Air Force Eng. Univ. (Nat. Sci. Ed.) 2020, 21, 61–67. [Google Scholar] [CrossRef]
  12. Li, T.J.; Zhang, G.; Zhang, J.M.; Xin, K.W. A Novel High-Gain PAM4 Baud-Rate Phase Detector for ADC-Based CDR. In Proceedings of the 2022 7th International Conference on Integrated Circuits and Microsystems (ICICM), Xi’an, China, 28–31 October 2022; pp. 606–609. [Google Scholar] [CrossRef]
  13. Luan, W.H.; Wang, D.J.; Chen, J. Modeling Analysis and Circuit Design of Second-Order Clock Data Recovery Circuit Applied to 10 Gbase-KR. Microelectron. Comput. 2020, 37, 1–4. [Google Scholar] [CrossRef]
  14. Jeff, L.S.; John, S. A digital clock and data recovery architecture for multi-gigabit/s binary links. In Proceedings of the IEEE 2005 Custom Integrated Circuits Conference (CICC), San Jose, CA, USA, 18–21 September 2005; pp. 537–544. [Google Scholar] [CrossRef] [Green Version]
  15. Xin, K.W.; Lyu, F.X.; Wang, J.Y. A Low Noise Clock Generator for High-Speed Serial Interface. Microelectronics 2019, 49, 817–823. [Google Scholar] [CrossRef]
  16. Zhang, G. Design of COMS Integrated Phase Locked Loop Circuit, 1st ed.; Tsinghua University Press: Beijing, China, 2013; pp. 15–17. [Google Scholar]
  17. Yang, C. Research and Design of Fast-Locked, High-Speed and Low-Jitter Clock Generation. Master’s Thesis, University of Electronic Science and Technology of China, Chongqing, China, 2015. [Google Scholar]
Figure 1. The typical retimer circuit structure.
Figure 1. The typical retimer circuit structure.
Electronics 12 03102 g001
Figure 2. (a) Signal penetration latency from input to output; (b) the output signal eye diagram.
Figure 2. (a) Signal penetration latency from input to output; (b) the output signal eye diagram.
Electronics 12 03102 g002
Figure 3. The retimer circuit architecture proposed in this article.
Figure 3. The retimer circuit architecture proposed in this article.
Electronics 12 03102 g003
Figure 4. (a) Sampling module structure; (b) differential comparator; (c) CML sampler.
Figure 4. (a) Sampling module structure; (b) differential comparator; (c) CML sampler.
Electronics 12 03102 g004
Figure 5. Data sampler logic gate circuit.
Figure 5. Data sampler logic gate circuit.
Electronics 12 03102 g005
Figure 6. Data comparator logic gate circuit.
Figure 6. Data comparator logic gate circuit.
Electronics 12 03102 g006
Figure 7. Waveform selector logic gate circuit.
Figure 7. Waveform selector logic gate circuit.
Electronics 12 03102 g007
Figure 8. Phase detector logic gate circuit.
Figure 8. Phase detector logic gate circuit.
Electronics 12 03102 g008
Figure 9. Phase interpolator circuit structure.
Figure 9. Phase interpolator circuit structure.
Electronics 12 03102 g009
Figure 10. The linear model of the analog clock recovery unit.
Figure 10. The linear model of the analog clock recovery unit.
Electronics 12 03102 g010
Figure 11. The Z-domain linear equivalent model of CDR circuit.
Figure 11. The Z-domain linear equivalent model of CDR circuit.
Electronics 12 03102 g011
Figure 12. The jitter transmission characteristics of CDR.
Figure 12. The jitter transmission characteristics of CDR.
Electronics 12 03102 g012
Figure 13. (a) Input clock eye diagram; (b) recovery clock eye diagram (jitter amplitude 0.1 UI, jitter frequency 1 M); (c) recovery clock eye diagram (jitter amplitude 0.1 UI, jitter frequency 10 M); (d) recovery clock eye diagram (jitter amplitude 0.1 UI, jitter frequency 100 M).
Figure 13. (a) Input clock eye diagram; (b) recovery clock eye diagram (jitter amplitude 0.1 UI, jitter frequency 1 M); (c) recovery clock eye diagram (jitter amplitude 0.1 UI, jitter frequency 10 M); (d) recovery clock eye diagram (jitter amplitude 0.1 UI, jitter frequency 100 M).
Electronics 12 03102 g013aElectronics 12 03102 g013b
Figure 14. The CPPLL S-domain linear model.
Figure 14. The CPPLL S-domain linear model.
Electronics 12 03102 g014
Figure 15. PLL gain and phase margin.
Figure 15. PLL gain and phase margin.
Electronics 12 03102 g015
Figure 16. Retimer circuit transmission latency.
Figure 16. Retimer circuit transmission latency.
Electronics 12 03102 g016
Figure 17. (a) PLL recovery clock eye diagram; (b) retiming data.
Figure 17. (a) PLL recovery clock eye diagram; (b) retiming data.
Electronics 12 03102 g017
Table 1. CDR Design Parameters.
Table 1. CDR Design Parameters.
CDR ParametersDesign Value
PD gain (KBR)0.56
Voting gain (KV)0.54 × 64 = 34.65
Proportional path gain (KP)1
Integral path gain (KI)2−14
Digital phase converter gain (KDPC)2−9
Loop latency (N)4
Table 2. PLL Design Parameters.
Table 2. PLL Design Parameters.
PLL ParametersDesign Value
Charge pump current (IP)0.15 mA
Filter capacitance (C1)2.2616 nF
Filter capacitance (C2)34.119 nF
Filter resistance (R2)187.09 Ohms
VCO gain (KVCO)600 MHz/V
Divider (N)4
Table 3. Comparison of the retiming scheme proposed in this paper with the existing retiming scheme.
Table 3. Comparison of the retiming scheme proposed in this paper with the existing retiming scheme.
ParametersSolution 1Solution 2Solution 3
SourceA 4:1 MUX scheme based on the literature [5]Retimer scheme based on the literature [7]The Retimer scheme proposed in this paper
Structural featuresPhase Adjuster + Retimer +
Clock Distributor
CDR + DFFCDR + PLL + DFF
ModulationNRZPAM4PAM4
Output data rate43 Gbps112 Gbps112 Gbps
Output data rms jitter0.94 ps1.08 ps0.741 ps
Jitter attenuation −0.87 dB−10.38 dB−13.66 dB
Penetration latency 5.6 ps165.3 ps27.3 ps
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Q.; Wang, H.; Lyu, F.; Zhang, G.; Lyu, D. A Low-Latency, Low-Jitter Retimer Circuit for PCIe 6.0. Electronics 2023, 12, 3102. https://doi.org/10.3390/electronics12143102

AMA Style

Liu Q, Wang H, Lyu F, Zhang G, Lyu D. A Low-Latency, Low-Jitter Retimer Circuit for PCIe 6.0. Electronics. 2023; 12(14):3102. https://doi.org/10.3390/electronics12143102

Chicago/Turabian Style

Liu, Qing, Heming Wang, Fangxu Lyu, Geng Zhang, and Dongbin Lyu. 2023. "A Low-Latency, Low-Jitter Retimer Circuit for PCIe 6.0" Electronics 12, no. 14: 3102. https://doi.org/10.3390/electronics12143102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop