Cloud-native containerized applications constantly seek high-performance and easy-to-operate container network solutions. RDMA network is a potential enabler with higher throughput and lower latency than the standard TCP/IP network stack. However, several challenges remain in equipping containerized applications with RDMA network: 1) How to deliver transparent improvements without modifying application code; 2) How to integrate RDMA-based network solutions with container orchestration systems; 3) How to efficiently utilize RDMA for container networks. In this paper, we present an RDMA-based container network solution, TCP Socket over RDMA (TSoR), which addresses all the above challenges. To transparently accelerate applications using POSIX socket interfaces without modifications, we integrate TSoR with a container runtime that can intercept system calls for socket interfaces. To be compatible with orchestration systems like Kubernetes, TSoR implements a container network following the Kubernetes network model and satisfies all requirements of the model. To leverage RDMA benefits, TSoR designs a high-performance network stack that efficiently transfers TCP traffic using RDMA network. Thus, TSoR provides a turn-key solution for existing Kubernetes clusters to adopt the high-performance RDMA network with minimal effort. Our evaluation results show that TSoR provides up to 2.3x higher throughput and 64\% lower latency for existing containerized applications, such as Redis key-value store and Node.js web server, with no code changes. TSoR code will be open-sourced.
翻译:云原生容器化应用持续追求高性能且易操作的容器网络解决方案。RDMA网络凭借比标准TCP/IP协议栈更高的吞吐量和更低的延迟,成为潜在的推动技术。然而,在为容器化应用配备RDMA网络方面仍存在若干挑战:1)如何在不修改应用代码的前提下实现透明性能提升;2)如何将基于RDMA的网络方案与容器编排系统集成;3)如何高效利用RDMA构建容器网络。本文提出一种基于RDMA的容器网络解决方案——TCP Socket over RDMA(TSoR),可应对上述全部挑战。为透明加速使用POSIX套接字接口的应用而无需修改代码,我们将TSoR集成至能拦截套接字接口系统调用的容器运行时。为兼容Kubernetes等编排系统,TSoR遵循Kubernetes网络模型实现容器网络,满足该模型的所有要求。为发挥RDMA优势,TSoR设计了一套高性能网络栈,可利用RDMA网络高效传输TCP流量。由此,TSoR为现有Kubernetes集群提供即用型方案,使其能以最小工作量采纳高性能RDMA网络。评估结果表明,在无需代码修改的情况下,TSoR可为Redis键值存储和Node.js Web服务器等现有容器化应用提供高达2.3倍的吞吐量提升和64%的延迟降低。TSoR代码将开源。