Data Processing with FPGAs on Modern Architectures

Trends in hardware, the prevalence of the cloud, and the rise of highly demanding applications have ushered an era of specialization that quickly changes how data is processed at scale. These changes are likely to continue and accelerate in the next years as new technologies are adopted and deployed: smart NICs, smart storage, smart memory, disaggregated storage, disaggregated memory, specialized accelerators (GPUS, TPUs, FPGAs), and a wealth of ASICs specifically created to deal with computationally expensive tasks (e.g., cryptography or compression). In this tutorial, we focus on data processing on FPGAs, a technology that has received less attention than, e.g., TPUs or GPUs but that is, however, increasingly being deployed in the cloud for data processing tasks due to the architectural flexibility of FPGAs, along with their ability to process data at line rate, something not possible with other types of processors or accelerators. In the tutorial, we will cover what FPGAs are, their characteristics, their advantages and disadvantages, as well as examples from deployments in the industry and how they are used in various data processing tasks. We will introduce FPGA programming with high-level languages and describe hardware and software resources available to researchers. The tutorial includes case studies borrowed from research done in collaboration with companies that illustrate the potential of FPGAs in data processing and how software and hardware are evolving to take advantage of the possibilities offered by FPGAs. The use cases include: (1) approximated nearest neighbor search, which is relevant to databases and machine learning, (2) remote disaggregated memory, showing how the cloud architecture is evolving and demonstrating the potential for operator offloading and line rate data processing, and (3) recommendation system as an application with tight latency constraints.

翻译：硬件发展趋势、云计算的普及以及高要求应用的兴起，开启了一个专业化时代，迅速改变了大规模数据处理的方式。随着新技术（智能网卡、智能存储、智能内存、分解存储、分解内存、专用加速器（GPU、TPU、FPGA）以及专为处理计算密集型任务（如加密或压缩）而设计的大量ASIC）的采用与部署，这些变化在未来数年内很可能持续并加速。本教程聚焦于FPGA上的数据处理——这一技术相较于TPU或GPU受到的关注较少，但由于FPGA在架构上的灵活性及其以线速处理数据的能力（这是其他类型的处理器或加速器无法实现的），它在云环境中正日益被部署用于数据处理任务。本教程将涵盖FPGA的定义、特性、优缺点，以及工业部署案例和其在各种数据处理任务中的应用方式。我们将介绍使用高级语言进行FPGA编程，并阐述研究人员可用的硬件与软件资源。教程包含与企业合作研究的案例，这些案例展示了FPGA在数据处理中的潜力，以及软件与硬件如何演变以充分利用FPGA提供的可能性。用例包括：（1）近似最近邻搜索（与数据库和机器学习相关），（2）远程分解内存（展示云架构的演变，并论证操作卸载与线速数据处理的潜力），以及（3）推荐系统（作为具有严格延迟约束的应用）。