Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we take a step towards addressing these challenges by introducing BOLT, a sparse deep learning library for training large-scale search and recommendation models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of information retrieval tasks including product recommendations, text classification, graph neural networks, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer case study in the field of e-commerce.
翻译:在通用CPU硬件上进行高效的大规模神经网络训练与推理,对于实现深度学习能力的普及化具有重要的现实意义。当前,训练由数亿至数十亿参数组成的庞大模型需要大量使用专用硬件加速器(如GPU),而这些加速器仅限少数拥有雄厚财力的机构使用。此外,训练和部署这些模型往往伴随着惊人的碳足迹。本文通过引入BOLT——一个用于在标准CPU硬件上训练大规模搜索与推荐模型的稀疏深度学习库,朝着解决这些挑战迈出了一步。BOLT为构建模型提供了灵活、高级的API,这些API对现有流行深度学习框架的用户而言十分熟悉。通过自动调整专用超参数,BOLT还抽象化了稀疏网络训练的算法细节。我们在多个信息检索任务(包括产品推荐、文本分类、图神经网络和个性化推荐)上对BOLT进行了评估。结果表明,我们提出的系统在成本和能耗仅为现有技术一小部分、推理速度快一个数量级的情况下,达到了与最先进技术相媲美的竞争力。BOLT已被多家企业成功部署以解决关键问题,我们重点介绍了一个电子商务领域的客户案例研究。