BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Search and Recommendation Models on Commodity CPU Hardware

Nicholas Meisburger,Vihan Lakshman,Benito Geordie,Joshua Engels,David Torres Ramos,Pratik Pranav,Benjamin Coleman,Benjamin Meisburger,Shubh Gupta,Yashwanth Adunukota,Tharun Medini,Anshumali Shrivastava

from arxiv, 6 pages, 5 tables, 3 figures. CIKM 2023 (Applied Research Track)

Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we take a step towards addressing these challenges by introducing BOLT, a sparse deep learning library for training large-scale search and recommendation models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of information retrieval tasks including product recommendations, text classification, graph neural networks, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer case study in the field of e-commerce.

翻译：在通用CPU硬件上进行高效的大规模神经网络训练与推理，对于实现深度学习能力的普及化具有重要的现实意义。当前，训练由数亿至数十亿参数组成的庞大模型需要大量使用专用硬件加速器（如GPU），而这些加速器仅限少数拥有雄厚财力的机构使用。此外，训练和部署这些模型往往伴随着惊人的碳足迹。本文通过引入BOLT——一个用于在标准CPU硬件上训练大规模搜索与推荐模型的稀疏深度学习库，朝着解决这些挑战迈出了一步。BOLT为构建模型提供了灵活、高级的API，这些API对现有流行深度学习框架的用户而言十分熟悉。通过自动调整专用超参数，BOLT还抽象化了稀疏网络训练的算法细节。我们在多个信息检索任务（包括产品推荐、文本分类、图神经网络和个性化推荐）上对BOLT进行了评估。结果表明，我们提出的系统在成本和能耗仅为现有技术一小部分、推理速度快一个数量级的情况下，达到了与最先进技术相媲美的竞争力。BOLT已被多家企业成功部署以解决关键问题，我们重点介绍了一个电子商务领域的客户案例研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日