ACCL+: an FPGA-Based Collective Engine for Distributed Applications

FPGAs are increasingly prevalent in cloud deployments, serving as Smart NICs or network-attached accelerators. Despite their potential, developing distributed FPGA-accelerated applications remains cumbersome due to the lack of appropriate infrastructure and communication abstractions. To facilitate the development of distributed applications with FPGAs, in this paper we propose ACCL+, an open-source versatile FPGA-based collective communication library. Portable across different platforms and supporting UDP, TCP, as well as RDMA, ACCL+ empowers FPGA applications to initiate direct FPGA-to-FPGA collective communication. Additionally, it can serve as a collective offload engine for CPU applications, freeing the CPU from networking tasks. It is user-extensible, allowing new collectives to be implemented and deployed without having to re-synthesize the FPGA circuit. We evaluated ACCL+ on an FPGA cluster with 100 Gb/s networking, comparing its performance against software MPI over RDMA. The results demonstrate ACCL+'s significant advantages for FPGA-based distributed applications and highly competitive performance for CPU applications. We showcase ACCL+'s dual role with two use cases: seamlessly integrating as a collective offload engine to distribute CPU-based vector-matrix multiplication, and serving as a crucial and efficient component in designing fully FPGA-based distributed deep-learning recommendation inference.

翻译：FPGA在云端部署中日益普及，可作为智能网卡或网络附加加速器使用。尽管潜力巨大，但由于缺乏适当的基础设施和通信抽象机制，开发分布式FPGA加速应用仍较为繁琐。为促进基于FPGA的分布式应用开发，本文提出ACCL+——一个开源、多功能的基于FPGA的集合通信库。该库可跨平台移植，支持UDP、TCP及RDMA协议，使FPGA应用能够发起FPGA到FPGA的直接集合通信。此外，它可作为CPU应用的集合卸载引擎，将CPU从网络任务中解放。ACCL+具有用户可扩展性，允许在不重新综合FPGA电路的情况下实现并部署新的集合操作。我们在配备100 Gb/s网络的FPGA集群上评估了ACCL+，并将其性能与基于RDMA的软件MPI进行对比。结果表明，ACCL+在基于FPGA的分布式应用中具有显著优势，并在CPU应用中展现出极具竞争力的性能。我们通过两个用例展示其双重角色：一是无缝集成作为集合卸载引擎以加速基于CPU的向量矩阵乘法，二是作为关键高效组件用于设计完全基于FPGA的分布式深度学习推荐推理系统。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日