跳到主要导航 跳到搜索 跳到主要内容

A 64 Gb/s Single-Ended Simultaneous Bi-Directional Transceiver for Die-to-Die Interfaces

投稿的翻译标题: 面向芯粒互连的单端64 Gb/s全双工收发机设计
  • Zhifei Wang
  • , Zhiwen Huang
  • , Tianchen Ye
  • , Bingyi Ye
  • , Fangzhu Li
  • , Wei Wang
  • , Dunshan Yu
  • , Weixin Gai*
  • *此作品的通讯作者
  • Peking University
  • Beijing Advanced Innovation Center for Integrated Circuits

科研成果: 期刊稿件文章同行评审

摘要

Objective Chiplet technology, which packages multiple dies with different functions and processes together, offers a cost-effective way for fabricating high-performance chips. For die-to-die data transmission, the edge density, Bit Error Rate (BER), and power consumption of the interface are crucial to the chip’s key performance metrics, such as computing power and throughput. Simultaneous Bi-Directional (SBD) signaling is an effective way to double the edge density by transmitting and receiving data on the same channel. However, with higher data rate and smaller channel pitch, channel reflection and crosstalk bring severe challenges to the design of interface circuits. This paper presents a single-ended SBD transceiver with echo and crosstalk cancellation to achieve a larger edge density and a lower BER. Methods The transceiver improves the per-wire data rate by utilizing the SBD signaling and denser shield-less channels. However, as both ends of the channel transmit data simultaneously, bi-directional signal coupling arises. Signal coupling, echo from impedance mismatch, and crosstalk from adjacent channels degrade the received data’s Signal-to-Noise Ratio (SNR). To decouple the bi-directional signal and cancel the echo and Near-End Crosstalk (NEXT), this paper proposes a Dynamic Voltage ThresHold generator (D-VTH). It generates the slicer’s threshold voltage according to the interfering signals needing to be subtracted. To cancel the Far-End Crosstalk (FEXT), a channel with the same capacitive and inductive coupling is designed by adjusting its width and space. FEXT is the subtraction of these two kinds of coupling, so it is canceled as expected. The source-synchronize architecture enhances the clock-data tracking performance, thereby reducing the clock-to-data jitter to improve the link’s noise margin. The synchronous clock distribution circuit includes a standing wave-based half-rate clock (CK2) distribution and a delay-controlled reset chain. The end of the CK2’ s Transmission Line (TL) is terminated by a dedicated inductor, making the reflected wave have a proper amplitude and phase relative to the incident wave; thus, a standing wave can be formed, and CK2 synchronization is realized. To ensure the divided clocks (up to 1/32-rate) are synchronous, the dividers’ reset signals must be released at the same time or skewed with an integer multiple of 32 Unit Interval (UI). A reset chain is proposed to release the reset signals with controlled delay. The delay increases by 2 UI at each lane and is compensated by different stages of DFFs. After the CK2 and the divided clocks’ synchronization, the transmitter’s output and NEXT cancellation synchronization are achieved. Results and Discussions The test chip, including the proposed transceiver and the 3 mm on-chip channel, is fabricated in 28 nm CMOS. The shield-less data channels are routed in the M9 layer, with a channel pitch of 6.1 um. An electromagnetic field solver calculates the channel’s frequency response and the equivalent lumped model. The equivalent Cm/Cs is 0.28, and the Lm/Ls is 0.26, making FEXT 24 dB smaller than the Insertion Loss (IL) at the Nyquist frequency. In contrast, NEXT and Return Loss (RL) are much larger; they are just 7.3 dB and 8.3 dB smaller than the IL at the Nyquist frequency, respectively (Fig.12). The D-VTH filter’s coefficients are obtained from the Sign-Sign Least Mean Square (SS-LMS) adaptation algorithm, and the data is received correctly using the adapted coefficients. The bi-directional decoupling coefficient is the largest because the local transmitter’s output is the strongest compared to the echo and crosstalk. The echo cancellation coefficient is the smallest because it has to undergo additional insertion loss in the channel (Fig.13). The simulated clock-to-data tracking performance shows the transceiver’s robustness against power supply noise (Fig.15). The standing wave distribution’s simulation results show its amplitude is double that of the conventional traveling wave because of the superposition of incident and reflected waves. A slight skew of 0.6 ps is observed, caused by the residual traveling wave due to the TL’s loss (Fig.18). The measured internal eye diagrams and bathtub curves at 64 Gb/s shows the eye-opening is 0.68 UI/80 mV at 10–9 BER and 0.64 UI/77 mV at 10–12 BER, with both crosstalk cancellation and echo cancellation enabled (Fig.21). In addition, the measured BER at the optimal sampling point is less than 10–16 with all the lanes counting bit errors. The Crosstalk-Induced Jitter (CIJ) is reduced from 0.58 UI to 0.06 UI after crosstalk cancellation is enabled, representing a reduction ratio of 89.6% (Table 1). The measured power efficiency is 1.21 pJ/b, and the simulated power breakdown shows that the transmitter, receiver, D-VTH, and clock distribution account for 40%, 23%, 34%, and 3%, respectively (Fig.22). This work achieves the best per-wire data rate and per-layer edge density compared with previous works (Table 2). Conclusions This paper utilizes SBD signaling and denser shield-less channels to achieve a per-wire data rate of 64 Gb/s and a per-layer edge density of 10.5 Tb/(s·mm). The proposed echo and crosstalk cancellation circuit ensures an extremely low BER of less than 10–16. It provides new insights for increasing the edge density of die-to-die interfaces.

投稿的翻译标题面向芯粒互连的单端64 Gb/s全双工收发机设计
源语言英语
页(从-至)2979-2993
页数15
期刊Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
47
8
DOI
出版状态已出版 - 1月 2025

指纹

探究 '面向芯粒互连的单端64 Gb/s全双工收发机设计' 的科研主题。它们共同构成独一无二的指纹。

引用此