Low Latency TOE with Double-Queue Structure for 10Gbps Ethernet on FPGA
Abstract
:1. Introduction
2. Related Works
- We analyze the TOE transmission structure for 10-gigabit ethernet and build an end-to-end TOE transmission delay model. From the perspectives of theoretical analysis and experimental verification, the correctness of the model is confirmed.
- A double-queue storage structure combining first input first output (FIFO) and DDR3 is proposed, which is capable of dynamically switching transmission channels and achieving a minimum end-to-end transmission delay of 600 ns for 1024 TCP sessions. We also use a multi-mode method of updating address length to achieve consistency in data transmission.
- A non-blocking data transmission method for multiple-session server application layer reception is proposed. A handshake data query update mechanism with priority is used to obtain the amount of transferable data at the application layer and achieve efficient slicing and transmission of stored data.
3. TOE Reception Transmission Delay Theoretical Analysis Model
3.1. TOE Framework Architecture
- Tx engine, which is used to generate new packets to send to the physical (PHY) layer.
- Rx engine, which is used to process incoming packets to send to the application layer.
- TCP session management pool, also called TCP PCB BLOCK.
- TCP session state manager, which is used to switch and transfer TCP state.
- Tx buffer and Rx buffer, which are used to store data.
3.2. TOE Reception Principle
3.3. Proposed FPGA-Based TOE Reception Transmission Model Structure
3.4. Delay Model Parameterization
- Direct storage latency. This contains the latency of the TCP session state manager module for state switching of the current TCP session, updating the state FIFO, and feeding the payload data into the payload FIFO before conducting direct queries. Since the direct query state machine determines that the condition for opening a query is that the state FIFO and the payload FIFO are non-empty, the latency of direct storage does not vary with variables such as payload length and is a constant noted as . The direct storage latency is expressed by the following Equation (3):
- Indirect storage latency. Constant latency includes TCP session state manager processing latency, payload data waiting storage latency, and updating the finished FIFO’s latency after the completion of storage, noted as . Non-constant latency includes DDR3 write operations. Using the AXI interface protocol for burst operation to control the MIG IP core to write to DDR3 requires the write address, write data, and write response operations, which will be affected by the payload length and DDR3’s own characteristics. Ideally, when continuously writing payload data with a data bit width of bits and a length of byte, the indirect storage latency can be expressed by the following Equation (4), where “” is the upward rounding sign:
- When the direct query results in X ≤ Y, i.e., the application layer receives data without blocking, then the data should be read directly. Thus includes direct query, notification of indirect query, reading the payload FIFO, and interface arbitration delay. Among them, the direct query time uses a handshake interaction, which mainly depends on the application layer response time, while other parts are pipeline architectures whose delay is denoted as . It assumes that the query request is initiated and waits clock cycles for the application layer to respond to the ack signal. Therefore, the data read delay is expressed by the following Equation (5):
- When the direct query results in X > Y, i.e., the application layer is slow to transfer data, then the data should be read indirectly. Thus includes indirect query, reads the DDR3, and interface arbitration time. In the case of indirect query, if the application layer feeds back the amount of transferable data Y = 0, it still needs to wait for the next round of queries until Y is not zero before it can start slicing and reading. If the application layer responds to the non-zero Y () after clock cycles, and other parts are pipeline architectures whose delay is denoted as , the data read delay is expressed by the following Equation (6):
- X ≤ Y:
- X > Y:
3.5. Analysis of Factors Affecting TOE Transmission Delay
3.5.1. Data Configuration Parameter Factors
3.5.2. DDR3 Read/Write Characteristic Factor
3.5.3. Application Layer Processing Data Rate Factor
3.5.4. State Machine Processing Fixed Latency Factor
4. TOE Transmission Structure Design Key Factors
4.1. Transmission Scheduling Strategy
- If X ≤ Y and X’ = 0, assign a direct length to X and a channel flag to 1.
- If X > Y or X’ ≠ 0, assign a direct length to 0 and a channel flag to 2.
- If the channel flag equals 1, the indirect read control module writes the direct length value to the direct report FIFO and returns to an idle state.
- If the channel flag equals 2, this module reads the length of RAM2 to obtain the remaining length X’ and applies to interact with the application layer to obtain Y. Then, it takes the smaller value of X’ and Y as the slice length and reads the address of RAM3 to obtain a read pointer. Finally, this module reads the payload data stored in DDR3 in segments and transfers them to APP data interface II.
4.2. Data Transmission Consistency
- The first mode is the chain-building mode, i.e., the current TCP connection is established, the session index and DDR3 write address assigned by the previous module are obtained, and it applies to update the read pointer to the DDR3 write address.
- The second mode is the direct mode, indicating that the DDR3 read data pointer needs to skip the data transmitted by the direct channel. When the status information and direct report FIFO is non-empty, it obtains the current session index, reads the direct length, and applies to add the read pointer to the direct length.
- The third mode is the slicing mode, indicating that DDR3 has already transferred a part of this payload length, which obtains the session index and slice length of the current connection and applies to add the read pointer to the slice length.
4.3. Multi-Session Priority Arbitration
5. Experimental Design and Analysis of Results
5.1. TCP Data Transmission Validation Experiment
5.2. TOE Transmission Latency Performance Experiments
- Double-queue storage structure TOE.
- Single DDR3 storage structure TOE which has modified the internal logic so that direct query still selects DDR3 transfer.
- BRAM storage structure of other literature.
- Single DDR3 storage structure of other literature.
5.2.1. Application Layer Unblockage Scenario Latency Experiment
5.2.2. Application Layer Blockage Scenario Latency Experiment
5.3. Maximum Number Experiment of TCP Sessions
5.4. Interactive Query Mechanism Experiment
5.5. TOE Reception Performance Experiment
5.6. TOE Resource Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lim, S.S.; Park, K.H. TPF: TCP Plugged File System for Efficient Data Delivery over TCP. IEEE Trans. Comput. 2007, 56, 459–473. [Google Scholar] [CrossRef] [Green Version]
- Thomas, Y.; Karaliopoulos, M.; Xylomenos, G.; Polyzos, G.C. Low Latency Friendliness for Multipath TCP. IEEE/ACM Trans. Netw. 2020, 28, 248–261. [Google Scholar] [CrossRef]
- DPU White Paper. 2023. Available online: https://www.xdyanbao.com/doc/gct8cww2xv?bd_vid=10958326526835579132 (accessed on 3 January 2023).
- Balanici, M.; Pachnicke, S. Hybrid electro-optical intra-data center networks tailored for different traffic classes. J. Opt. Commun. Netw. 2018, 10, 889–901. [Google Scholar] [CrossRef]
- Jia, W.K. A Scalable Multicast Source Routing Architecture for Data Center Networks. IEEE J. Sel. Areas Commun. 2014, 32, 116–123. [Google Scholar] [CrossRef] [Green Version]
- Kant, K. TCP offload performance for front-end servers. In Proceedings of the 2003 IEEE Global Telecommunications Conference, San Francisco, CA, USA, 1–5 December 2003; pp. 3242–3247. [Google Scholar]
- Langenbach, U.; Berthe, A.; Traskov, B.; Weide, S.; Hofmann, K.; Gregorius, P. A 10 GbE TCP/IP hardware stack as part of a protocol acceleration platform. In Proceedings of the 2013 IEEE Third International Conference on Consumer Electronics (IC-CE-Berlin), Berlin/Heidelberg, Germany, 9–11 September 2013; pp. 381–384. [Google Scholar]
- Sidler, D.; Alonso, G.; Blott, M.; Karras, K.; Vissers, K.; Carley, R. Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware. In Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, Vancouver, BC, Canada, 2–6 May 2015; pp. 36–43. [Google Scholar]
- Sidler, D.; Istvan, Z.; Alonso, G. Low-latency TCP/IP stack for data center applications. In Proceedings of the 2016 26th International Conference on Field Programmable Logic and Applications, Lausanne, Switzerland, 29 August–2 September 2016; pp. 1–4. [Google Scholar]
- Ruiz, M.; Sidler, D.; Sutter, G.; Alonso, G.; López-Buedo, S. Limago: An FPGA-Based Open-Source 100 GbE TCP/IP Stack. In Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL), Barcelona, Spain, 8–12 September 2019; pp. 286–292. [Google Scholar]
- Wang, W.; Zheng, J.S. Design and implementation of FPGA-based TCP/IP network communication system. Mod. Electron. Technol. 2018, 41, 5–9. [Google Scholar]
- Yu, H.S.; Deng, H.W.; Wu, C. Design of multi-channel acquisition and TCP/IP transmission system based on FPGA. Telecom Power Technol. 2019, 36, 25–27. [Google Scholar]
- Wu, H.; Liu, Y.Q. FPGA-based TCP/IP protocol processing architecture for 10 Gigabit Ethernet. Electron. Des. Eng. 2020, 28, 81–87. [Google Scholar]
- Yang, Y.; Zhou, S.Y.; Wang, S.P. Design of TCP/IP protocol offload engine based on FPGA. Pract. Electron. 2023, 31, 48–53. [Google Scholar]
- Intilop. 2023. Available online: http://www.intilop.com/tcpipengines.php/ (accessed on 20 January 2023).
- Dini Group. 2023. Available online: http://www.dinigroup.com/new/TOE.php/ (accessed on 20 January 2023).
- PLDA. 2023. Available online: https://www.plda.com/products/fpga-ip/xilinx/fpga-ip-tcpip/quicktcp-xilinx/ (accessed on 3 January 2023).
- Ding, L.; Kang, P.; Yin, W.B.; Wang, L.L. Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi’an, China, 7–9 December 2016; pp. 269–272. [Google Scholar]
- Xiong, X.J.; Tan, L.B.; Zhang, J.J.; Chen, T.Y.; Song, Y.X. FPGA-based implementation of low-latency TCP protocol stack. Electron. Des. Eng. 2020, 43, 43–48. [Google Scholar]
- Xilinx. Virtex 7 FPGA VC709. 2023. Available online: https://china.xilinx.com/products/boards-and-kits/dk-v7-vc709-g.html (accessed on 22 April 2023).
- Xie, J.; Yin, W.; Wang, L. Achieving Flexible, Low-Latency and 100 Gbps Line-rate Load Balancing over Ethernet on FPGA. In Proceedings of the 2020 IEEE 33rd International System-on-Chip Conference (SOCC), Las Vegas, NV, USA, 8–11 September 2020; pp. 201–206. [Google Scholar]
- Kumar, M.; Gavrilovska, A. TCP Ordo: The cost of ordered processing in TCP servers. In Proceedings of the IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA, 10–14 April 2016; pp. 1–9. [Google Scholar]
System Parameter | Category | Value | Unit |
---|---|---|---|
Data clock frequency | MAC | 156.25 | MHz |
TOE | 200 | ||
APP | 200 | ||
Data bit width | MAC | 64 | |
TOE | 512 | bits | |
APP | 128 | ||
IP address | Host A | 192.168.116.20 | / |
TOE | 192.168.116.1 | ||
Port number | Host A | 30604 | / |
TOE | 10000 |
Resource | Utilization | Available | Utilization% |
---|---|---|---|
LUT | 51,591 | 433,200 | 11.91 |
FF | 69,031 | 866,400 | 7.97 |
BRAM | 363 | 1470 | 24.69 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, D.; Xu, X.; Chen, T.; Chen, Y.; Zhang, J. Low Latency TOE with Double-Queue Structure for 10Gbps Ethernet on FPGA. Sensors 2023, 23, 4690. https://doi.org/10.3390/s23104690
Yang D, Xu X, Chen T, Chen Y, Zhang J. Low Latency TOE with Double-Queue Structure for 10Gbps Ethernet on FPGA. Sensors. 2023; 23(10):4690. https://doi.org/10.3390/s23104690
Chicago/Turabian StyleYang, Dan, Xuhan Xu, Tianyang Chen, Yanhao Chen, and Junjie Zhang. 2023. "Low Latency TOE with Double-Queue Structure for 10Gbps Ethernet on FPGA" Sensors 23, no. 10: 4690. https://doi.org/10.3390/s23104690
APA StyleYang, D., Xu, X., Chen, T., Chen, Y., & Zhang, J. (2023). Low Latency TOE with Double-Queue Structure for 10Gbps Ethernet on FPGA. Sensors, 23(10), 4690. https://doi.org/10.3390/s23104690