ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath Transmission
Abstract
:1. Introduction
- The study presents a high-performance RDMA NIC with out-of-order packet direct write method. ORNIC supports both sequential and OOO packet reception. The payload of OOO packets is written directly to user memory without reordering. The write address is embedded in the packets only when necessary. ORNIC is implemented on a U200 FPGA and can achieve 95 Gbps RDMA throughput, which is nearly 2.5 times that of MP-RDMA. When handling OOO packets, ORNIC’s performance is virtually unaffected, while the RDMA throughput of Xilinx ERNIC or Mellanox CX-5 drops below 1 Gbps.
- To support data integrity checks when receiving OOO packets, we redesign the bitmap structure into an array of bitmap blocks that support dynamic allocation. Once a bitmap block is full, it is marked and can be freed in advance. Compared with MELO and LEFT, our bitmap has higher performance and lower bitmap block usage.
- ORNIC is a compact design that consumes less than 15% of hardware resources on a U200 FPGA. While ORNIC requires more resources than ERNIC, it offers additional support for handling out-of-order packets.
2. Related Work
3. System Design
3.1. Architecture Overview
3.1.1. RX Plane
3.1.2. Control Plane
3.1.3. TX Plane
3.2. Out-of-Order Packet Direct Write Method
3.3. Multipath Transmission and Packet Loss Detection
3.4. Bitmap
3.5. Acknowledgment Aggregation
4. Implementation and Evaluation
4.1. RDMA Performance
4.2. Resource Utilization
4.3. Bitmap Performance
5. Conclusions
6. Future Work
6.1. Higher Throughput
6.2. Outstanding WQEs
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Qian, K.; Xi, Y.; Cao, J.; Gao, J.; Xu, Y.; Guan, Y.; Fu, B.; Shi, X.; Zhu, F.; Miao, R.; et al. Alibaba HPN: A data center network for large language model training. In Proceedings of the ACM SIGCOMM 2024 Conference, Sydney, NSW, Australia, 4–8 August 2024; pp. 691–706. [Google Scholar]
- InfiniBand. Annex A17: RoCEv2. Available online: https://cw.infinibandta.org/document/dl/7781 (accessed on 26 October 2024).
- Zhu, Y.; Eran, H.; Firestone, D.; Guo, C.; Lipshteyn, M.; Liron, Y.; Padhye, J.; Raindel, S.; Yahia, M.H.; Zhang, M. Congestion control for large-scale RDMA deployments. ACM SIGCOMM Comput. Commun. Rev. 2015, 45, 523–536. [Google Scholar] [CrossRef]
- Mon, M.T. Flow Collision Avoiding in Software Defined Networking. In Proceedings of the 2020 IEEE Conference on Computer Applications (ICCA), Yangon, Myanmar, 27–28 February 2020; pp. 1–5. [Google Scholar]
- Alizadeh, M.; Edsall, T. On the data path performance of leaf-spine datacenter fabrics. In Proceedings of the 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, San Jose, CA, USA, 21–23 August 2013; pp. 71–74. [Google Scholar]
- Greenberg, A.; Hamilton, J.R.; Jain, N.; Kandula, S.; Kim, C.; Lahiri, P.; Maltz, D.A.; Patel, P.; Sengupta, S. VL2: A scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, Barcelona, Spain, 17–21 August 2009; pp. 51–62. [Google Scholar]
- Al-Fares, M.; Loukissas, A.; Vahdat, A. A scalable, commodity data center network architecture. ACM SIGCOMM Comput. Commun. Rev. 2008, 38, 63–74. [Google Scholar] [CrossRef]
- Xu, Y.; Ni, H.; Zhu, X. Survey of Multipath Transmission Technologies in Information—Centric Networking. J. Netw. New Media 2023, 12, 1–9, 20. [Google Scholar]
- Nvidia. Nvidia Spectrum-X Network Platform Architecture. Available online: https://nvdam.widen.net/s/h6klwtqv5z/nvidia-spectrum-x-whitepaper-2959968 (accessed on 27 October 2024).
- Cisco. Evolve your AI/ML Network with Cisco Silicon One. Available online: https://www.cisco.com/c/en/us/solutions/collateral/silicon-one/evolve-ai-ml-network-silicon-one.pdf (accessed on 27 October 2024).
- Cao, J.; Xia, R.; Yang, P.; Guo, C.; Lu, G.; Yuan, L.; Zheng, Y.; Wu, H.; Xiong, Y.; Maltz, D. Per-packet load-balanced, low-latency routing for clos-based data center networks. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT ’13), Santa Barbara, CA, USA, 9–12 December 2013; pp. 49–60. [Google Scholar] [CrossRef]
- Dixit, A.; Prakash, P.; Hu, Y.C.; Kompella, R.R. On the impact of packet spraying in data center networks. In Proceedings of the 2013 Proceedings IEEE INFOCOM, Turin, Italy, 14–19 April 2013; pp. 2130–2138. [Google Scholar]
- Mittal, R.; Shpiner, A.; Panda, A.; Zahavi, E.; Krishnamurthy, A.; Ratnasamy, S.; Shenker, S. Revisiting network support for RDMA. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, Budapest, Hungary, 20–25 August 2018; pp. 313–326. [Google Scholar]
- Lu, Y.; Chen, G.; Li, B.; Tan, K.; Xiong, Y.; Cheng, P.; Zhang, J.; Chen, E.; Moscibroda, T. {Multi-Path} transport for {RDMA} in datacenters. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), Renton, WA, USA, 9–11 April 2018; pp. 357–371. [Google Scholar]
- Lu, Y.; Chen, G.; Ruan, Z.; Xiao, W.; Li, B.; Zhang, J.; Xiong, Y.; Cheng, P.; Chen, E. Memory efficient loss recovery for hardware-based transport in datacenter. In Proceedings of the First Asia-Pacific Workshop on Networking, Hong Kong, China, 3–4 August 2017; pp. 22–28. [Google Scholar]
- Huang, P.; Zhang, X.; Chen, Z.; Liu, C.; Chen, G. LEFT: LightwEight and FasT packet Reordering for RDMA. In Proceedings of the 8th Asia-Pacific Workshop on Networking, Sydney, NSW, Australia, 3–4 August 2024; pp. 67–73. [Google Scholar]
- Ukon, Y.; Yamazaki, K.; Nitta, K. Video service function chaining with a real-time packet reordering circuit. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar]
- Hoang, V.Q.; Chen, Y. Cost-effective network reordering using FPGA. Sensors 2023, 23, 819. [Google Scholar] [CrossRef] [PubMed]
- Zhou, S.; Gong, Y.; Fan, Z.; Chen, Y.; Zhang, W.; Tian, W.; Liu, Y. SR-DCQCN: Combining SACK and ECN for RDMA Congestion Control. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2022; pp. 788–794. [Google Scholar]
- Wang, Z.; Luo, L.; Ning, Q.; Zeng, C.; Li, W.; Wan, X.; Xie, P.; Feng, T.; Cheng, K.; Geng, X.; et al. SRNIC: A scalable architecture for RDMA NICs. In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), Boston, MA, USA, 17–19 April 2023; pp. 1–14. [Google Scholar]
- Li, Z.; Huang, J.; Wang, S.; Wang, J. Achieving Low Latency for Multipath Transmission in RDMA Based Data Center Network. IEEE Trans. Cloud Comput. 2024, 12, 337–346. [Google Scholar] [CrossRef]
- Song, C.H.; Khooi, X.Z.; Joshi, R.; Choi, I.; Li, J.; Chan, M.C. Network load balancing with in-network reordering support for rdma. In Proceedings of the ACM SIGCOMM 2023 Conference, New York, NY, USA, 10–14 September 2023; pp. 816–831. [Google Scholar]
- Bosshart, P.; Daly, D.; Gibb, G.; Izzard, M.; McKeown, N.; Rexford, J.; Schlesinger, C.; Talayco, D.; Vahdat, A.; Varghese, G.; et al. P4: Programming protocol-independent packet processors. ACM SIGCOMM Comput. Commun. Rev. 2014, 44, 87–95. [Google Scholar] [CrossRef]
- Xilinx. Xilinx Embedded RDMA Enabled NIC v4.0 LogiCORE IP Product Guide. Available online: https://docs.xilinx.com/r/en-US/pg332-ernic (accessed on 27 October 2024).
- Zhong, G.; Kolekar, A.; Amornpaisannon, B.; Choi, I.; Javaid, H.; Baldi, M. A Primer on RecoNIC: RDMA-enabled Compute Offloading on SmartNIC. arXiv 2023, arXiv:2312.06207. [Google Scholar]
- Firestone, D.; Putnam, A.; Mundkur, S.; Chiou, D.; Dabagh, A.; Andrewartha, M.; Angepat, H.; Bhanu, V.; Caulfield, A.; Chung, E.; et al. Azure accelerated networking: {SmartNICs} in the public cloud. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), Renton, WA, USA, 9–11 April 2018; pp. 51–66. [Google Scholar]
- Sidler, D.; Wang, Z.; Chiosa, M.; Kulkarni, A.; Alonso, G. StRoM: Smart remote memory. In Proceedings of the Fifteenth European Conference on Computer Systems, Heraklion, Greece, 27–30 April 2020; pp. 1–16. [Google Scholar]
- Yang, F.; Wang, Z.; Kang, N.; Ma, Z.; Li, J.; Yuan, G.; Tan, G. JingZhao: A Framework for Rapid NIC Prototyping in the Domain-Specific-Network Era. arXiv 2024, arXiv:2410.08476. [Google Scholar]
- Dmitry, S. To Spray or Not to Spray. Available online: https://community.juniper.net/blogs/dmitry-shokarev1/2023/11/21/to-spray-or-not-to-spray (accessed on 27 October 2024).
- Ultra Ethernet Consortium. Overview of and Motivation for the Forthcoming Ultra Ethernet Consortium Specification. Available online: https://ultraethernet.org/wp-content/uploads/sites/20/2023/10/23.07.12-UEC-1.0-Overview-FINAL-WITH-LOGO.pdf (accessed on 27 October 2024).
- Guo, C. RDMA in Data Centers: Looking Back and Looking Forward. ACM SIGCOMM APNet 2017. Available online: https://conferences.sigcomm.org/events/apnet2017/slides/cx.pdf (accessed on 27 October 2024).
- Guo, C.; Wu, H.; Deng, Z.; Soni, G.; Ye, J.; Padhye, J.; Lipshteyn, M. RDMA over commodity ethernet at scale. In Proceedings of the 2016 ACM SIGCOMM Conference, Florianópolis, Brazil, 22–26 August 2016; pp. 202–215. [Google Scholar]
- Xilinx. Alveo u200 and u250 Data Center Accelerator Cards Data Sheet (ds962). Available online: https://docs.amd.com/r/en-US/ds962-u200-u250/Summary (accessed on 27 October 2024).
- Mellanox Technologies. Product Brief of ConnectX-5 EN Card. Available online: https://network.nvidia.com/files/doc-2020/pb-connectx-5-en-card.pdf (accessed on 27 October 2024).
- Intel. Agilex 7 FPGAs and SoCs Product Brief. Available online: https://cdrdv2-public.intel.com/762901/agilex-7-fpga-product-brief.pdf (accessed on 27 October 2024).
IPs | LUT | LUTRAM | FF | BRAM | URAM |
---|---|---|---|---|---|
(1,182,240) | (591,840) | (2,364,480) | (2160) | (960) | |
ORNIC | 111,086 | 25,686 | 72,150 | 279.5 | 106 |
ERNIC v4.0 2 | 87,222 | 17,060 | 58,423 | 215.5 | 93 |
Solutions | Avg. Number of Clock Cycles (the Fewer, the Better) | |
---|---|---|
No Packet Loss | With Packet Loss | |
MELO+ 1 | 1.75 | 7.75 |
LEFT 2 | 1.25 | 5.25 |
LEFT+ 3 | 1.25 | 1.65 |
ORNIC | 1 | 1 |
Solutions | Maximum Number of Bitmap Blocks (the Fewer, the Better) | |
---|---|---|
No Packet Loss | With Packet Loss | |
MELO+ | 4 | 20 |
LEFT | 4 | 20 |
LEFT+ | 4 | 20 |
ORNIC | 4 | 5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, J.; Guo, Z.; Pan, Y.; Zhang, M.; Zhao, Z.; Sun, Z.; Chang, Y. ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath Transmission. Electronics 2025, 14, 88. https://doi.org/10.3390/electronics14010088
Ma J, Guo Z, Pan Y, Zhang M, Zhao Z, Sun Z, Chang Y. ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath Transmission. Electronics. 2025; 14(1):88. https://doi.org/10.3390/electronics14010088
Chicago/Turabian StyleMa, Jiandong, Zhichuan Guo, Yipeng Pan, Mengting Zhang, Zhixiang Zhao, Zezheng Sun, and Yiwei Chang. 2025. "ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath Transmission" Electronics 14, no. 1: 88. https://doi.org/10.3390/electronics14010088
APA StyleMa, J., Guo, Z., Pan, Y., Zhang, M., Zhao, Z., Sun, Z., & Chang, Y. (2025). ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath Transmission. Electronics, 14(1), 88. https://doi.org/10.3390/electronics14010088