The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms

FPGAs are rarely mentioned when discussing the implementation of large machine learning applications, such as Large Language Models (LLMs), in the data center. There has been much evidence showing that single FPGAs can be competitive with GPUs in performance for some computations, especially for low latency, and often much more efficient when power is considered. This suggests that there is merit to exploring the use of multiple FPGAs for large machine learning applications. The challenge with using multiple FPGAs is that there is no commonly-accepted flow for developing and deploying multi-FPGA applications, i.e., there are no tools to describe a large application, map it to multiple FPGAs and then deploy the application on a multi-FPGA platform. In this paper, we explore the feasibility of implementing large transformers using multiple FPGAs by developing a scalable multi-FPGA platform and some tools to map large applications to the platform. We validate our approach by designing an efficient multi-FPGA version of the I-BERT transformer and implement one encoder using six FPGAs as a working proof-of-concept to show that our platform and tools work. Based on our proof-of-concept prototype and the estimations of performance using the latest FPGAs compared to GPUs, we conclude that there can be a place for FPGAs in the world of large machine learning applications. We demonstrate a promising first step that shows that with the right infrastructure and tools it is reasonable to continue to explore the possible benefits of using FPGAs for applications such as LLMs.

翻译：FPGA在讨论数据中心部署大型机器学习应用（如大语言模型）时鲜少被提及。大量证据表明，单个FPGA在某些计算场景（尤其是低延迟需求）中可在性能上与GPU竞争，且在功耗方面往往更具优势。这表明探索将多块FPGA用于大型机器学习应用具有价值。然而，使用多块FPGA的挑战在于缺乏通用的开发与部署流程——即缺少描述大型应用、将其映射至多块FPGA并在多FPGA平台部署的工具。本文通过开发可扩展的多FPGA平台及映射大型应用的若干工具，探究了利用多块FPGA实现大规模Transformer的可行性。我们设计了高效的多FPGA版I-BERT Transformer，并以六块FPGA实现一个编码器作为工作概念验证，验证了平台与工具的有效性。基于概念验证原型及采用最新FPGA相比GPU的性能估算，我们得出结论：FPGA可在大型机器学习应用领域占据一席之地。本研究展示了富有前景的第一步，表明在具备合适基础设施与工具的条件下，继续探索FPGA用于LLM等应用的可能性是合理的。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日