QUIC is a new protocol standardized in 2021 designed to improve on the widely used TCP / TLS stack. The main goal is to speed up web traffic via HTTP, but it is also used in other areas like tunneling. Based on UDP it offers features like reliable in-order delivery, flow and congestion control, streambased multiplexing, and always-on encryption using TLS 1.3. Other than with TCP, QUIC implements all these features in user space, only requiring kernel interaction for UDP. While running in user space provides more flexibility, it profits less from efficiency and optimization within the kernel. Multiple implementations exist, differing in programming language, architecture, and design choices. This paper presents an extension to the QUIC Interop Runner, a framework for testing interoperability of QUIC implementations. Our contribution enables reproducible QUIC benchmarks on dedicated hardware. We provide baseline results on 10G links, including multiple implementations, evaluate how OS features like buffer sizes and NIC offloading impact QUIC performance, and show which data rates can be achieved with QUIC compared to TCP. Our results show that QUIC performance varies widely between client and server implementations from 90 Mbit/s to 4900 Mbit/s. We show that the OS generally sets the default buffer size too small, which should be increased by at least an order of magnitude based on our findings. Furthermore, QUIC benefits less from NIC offloading and AES NI hardware acceleration while both features improve the goodput of TCP to around 8000 Mbit/s. Our framework can be applied to evaluate the effects of future improvements to the protocol or the OS.
翻译:QUIC是2021年标准化的新兴传输协议,旨在改进广泛使用的TCP/TLS协议栈。其首要目标是加速基于HTTP的Web流量,但也应用于隧道等其它领域。该协议基于UDP实现,提供可靠有序交付、流控与拥塞控制、流式多路复用以及基于TLS 1.3的全时加密等核心特性。与TCP不同,QUIC在用户空间实现所有协议功能,仅需内核处理UDP数据包。尽管用户态实现提供了更高的灵活性,但难以充分利用内核层面的效率优化。当前存在多种QUIC实现,它们在编程语言、架构和设计选择上各不相同。本文提出QUIC互操作性测试框架(QUIC Interop Runner)的扩展方案,该框架原用于测试QUIC实现的互操作性。我们的贡献在于实现了在专用硬件上进行可复现的QUIC基准测试。具体工作包括:在10G链路上获取包含多种实现的基准测试结果,评估缓冲区大小、网卡卸载等操作系统特性对QUIC性能的影响,并展示QUIC相较于TCP可达的数据速率。实验结果显示,不同客户端与服务器实现的QUIC性能差异显著(90 Mbit/s至4900 Mbit/s)。研究表明,操作系统默认缓冲区设置普遍过小,根据实验发现至少需要增大一个数量级。此外,网卡卸载与AES NI硬件加速对TCP吞吐量提升显著(可达8000 Mbit/s),但对QUIC的增益有限。我们的框架可应用于评估未来协议或操作系统改进的效果。