Beehive: A Flexible Network Stack for Direct-Attached Accelerators

Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To operators, network feature agility, diagnostics, and manageability are often considered just as important as raw performance. By contrast, existing hardware network stacks only support basic protocols and are often difficult to extend since they use fixed processing pipelines. We propose Beehive, a new, open-source FPGA network stack for direct-attached accelerators designed to enable flexible and adaptive construction of complex network functionality in hardware. Application and network protocol elements are modularized as tiles over a network-on-chip substrate. Elements can be added or scaled up/down to match workload characteristics with minimal effort or changes to other elements. Flexible diagnostics and control are integral, with tooling to ensure deadlock safety. Our implementation interoperates with standard Linux TCP and UDP clients, with a 4x improvement in end-to-end remote procedure call tail latency for Linux UDP clients versus a CPU-attached accelerator

翻译：直连加速器将应用加速器通过硬件网络栈直接连接到数据中心网络，在降低延迟、减少CPU开销和节约能耗方面具有显著优势。然而，一个关键挑战在于现代数据中心网络栈结构复杂，包含交织的协议层、网络管理功能及虚拟化支持。对运营商而言，网络功能敏捷性、诊断能力和可管理性通常被认为与原始性能同等重要。相比之下，现有硬件网络栈仅支持基础协议，且因采用固定处理流水线而难以扩展。本文提出蜂巢——一种面向直连加速器的开源FPGA网络栈，旨在实现复杂网络功能在硬件中的灵活自适应构建。应用与网络协议组件通过片上网络基板模块化为处理单元。这些单元可根据工作负载特性进行增减或规模调整，且几乎无需修改其他组件。灵活的诊断与控制机制作为系统核心特性，配备确保无死锁安全的工具链。我们的实现方案与标准Linux TCP/UDP客户端互操作，相较于CPU连接式加速器，为Linux UDP客户端带来端到端远程过程调用尾延迟降低4倍的性能提升。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日