Multi-Tenant SmartNICs for In-Network Preprocessing of Recommender Systems

Keeping ML-based recommender models up-to-date as data drifts and evolves is essential to maintain accuracy. As a result, online data preprocessing plays an increasingly important role in serving recommender systems. Existing solutions employ multiple CPU workers to saturate the input bandwidth of a single training node. Such an approach results in high deployment costs and energy consumption. For instance, a recent report from industrial deployments shows that data storage and ingestion pipelines can account for over 60\% of the power consumption in a recommender system. In this paper, we tackle the issue from a hardware perspective by introducing Piper, a flexible and network-attached accelerator that executes data loading and preprocessing pipelines in a streaming fashion. As part of the design, we define MiniPipe, the smallest pipeline unit enabling multi-pipeline implementation by executing various data preprocessing tasks across the single board, giving Piper the ability to be reconfigured at runtime. Our results, using publicly released commercial pipelines, show that Piper, prototyped on a power-efficient FPGA, achieves a 39$\sim$105$\times$ speedup over a server-grade, 128-core CPU and 3$\sim$17$\times$ speedup over GPUs like RTX 3090 and A100 in multiple pipelines. The experimental analysis demonstrates that Piper provides advantages in both latency and energy efficiency for preprocessing tasks in recommender systems, providing an alternative design point for systems that today are in very high demand.

翻译：随着数据漂移与演化，保持基于机器学习的推荐模型实时更新对维持准确性至关重要。因此，在线数据预处理在推荐系统服务中扮演着日益重要的角色。现有解决方案采用多个CPU工作节点来饱和单个训练节点的输入带宽，这种方法导致高昂的部署成本与能耗。例如，近期工业部署报告显示，在推荐系统中数据存储与注入流水线的功耗占比可超过60%。本文从硬件角度切入该问题，提出了Piper——一种灵活的网络附着加速器，能以流式方式执行数据加载与预处理流水线。作为设计的一部分，我们定义了MiniPipe作为最小流水线单元，通过在单板上执行多样化的数据预处理任务实现多流水线部署，使Piper具备运行时重配置能力。基于公开的商业流水线测试表明，在节能型FPGA上原型实现的Piper，在多项流水线任务中相比服务器级128核CPU实现了39∼105倍加速，相比RTX 3090与A100等GPU实现了3∼17倍加速。实验分析证明，Piper在推荐系统预处理任务中兼具低延迟与高能效优势，为当前高需求系统提供了全新的设计思路。

相关内容

数据预处理

关注 1176

数据预处理（data preprocessing）是指在主要的处理以前对数据进行的一些处理。如对大部分地球物理面积性观测数据在进行转换或增强处理之前，首先将不规则分布的测网经过插值转换为规则网的处理，以利于计算机的运算。另外，对于一些剖面测量数据，如地震资料预处理有垂直叠加、重排、加道头、编辑、重新取样、多路编辑等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日