SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads

Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure. To keep up with the ever-increasing network rates, many CPU cycles are spent on the networking tax. We make a key observation that network processing threads can be simultaneously executed on server CPUs with minimal interference with the application threads. However, utilizing simultaneous multithreading (SMT) to scale the number of network threads with the number of application threads suffers from (1) failing to provide strict tail latency requirements for latency-critical applications, and (2) reducing the number of available hardware threads for application processes, thus contributing to a high datacenter network tax. In this work, we design, implement, and evaluate a chip-multiprocessor (CMP) with specialized Simultaneous Data-delivery Threads (SDT) per physical core. The key insight is that with judicious partitioning at the architectural level, SDT can safely co-run with application processes with guaranteed performance isolation. Our evaluation results, using full-system simulation, show that a 20-core CMP enhanced with SDT reduces the area and power consumption of a baseline 40-core CMP by 47.5% and 66%, respectively, while reducing network throughput by less than 10%.

翻译：网络通信被视为数据中心的一项开销，超大规模服务商极力追求以最小资源消耗提供高性能网络。为应对不断增长的网络速率，大量CPU周期被用于处理网络开销。我们提出一个关键发现：网络处理线程可在服务器CPU上与应用程序线程同步执行，且对应用线程的干扰极小。然而，利用同步多线程技术扩展网络线程数量会面临两个问题：(1) 无法为延迟敏感型应用提供严格的尾部延迟保障；(2) 减少了应用程序可用的硬件线程数，从而加剧了数据中心网络开销。本研究设计、实现并评估了一种芯片多处理器，其每个物理核心配备专用的同步数据传输线程。核心思路在于：通过架构层面的合理划分，SDT能够与应用程序进程安全协同运行，并确保性能隔离。基于全系统仿真的评估结果表明，采用SDT增强的20核CMP相较于基线40核CMP，面积和功耗分别降低47.5%和66%，而网络吞吐量下降幅度不足10%。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日