Unified Low-rank Compression Framework for Click-through Rate Prediction

Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC degradation and additional computing overhead. To address these challenges, we propose a unified low-rank decomposition framework for compressing CTR prediction models. We find that even with the most classic matrix decomposition SVD method, our framework can achieve better performance than the original model. To further improve the effectiveness of our framework, we locally compress the output features instead of compressing the model weights. Our unified low-rank compression framework can be applied to embedding tables and MLP layers in various CTR prediction models. Extensive experiments on two academic datasets and one real industrial benchmark demonstrate that, with 3-5x model size reduction, our compressed models can achieve both faster inference and higher AUC than the uncompressed original models. Our code is at https://github.com/yuhao318/Atomic_Feature_Mimicking.

翻译：深度点击率预测模型在现代工业推荐场景中扮演着重要角色。然而，高内存开销和计算成本限制了其在资源受限环境中的部署。低秩近似是计算机视觉和自然语言处理模型的有效方法，但其在压缩点击率预测模型中的应用尚未得到充分探索。由于内存和计算资源有限，点击率预测模型的压缩通常面临三个基本挑战，即：(1) 如何减小模型规模以适应边缘设备？(2) 如何加速点击率预测模型的推理？(3) 如何在压缩后保留原始模型的能力？以往的低秩压缩研究主要采用张量分解，虽然可以实现较高的参数压缩比，但会导致AUC下降和额外计算开销。为解决这些挑战，我们提出了一种统一低秩分解框架用于压缩点击率预测模型。我们发现，即使使用最经典的矩阵分解SVD方法，我们的框架也能实现比原始模型更优的性能。为进一步提升框架的有效性，我们采用局部压缩输出特征的方式，而非压缩模型权重。我们的统一低秩压缩框架可应用于各类点击率预测模型中的嵌入表和MLP层。在两个学术数据集和一个真实工业基准上的大量实验表明，在模型规模缩减3-5倍的情况下，我们的压缩模型不仅推理速度更快，而且AUC高于未压缩的原始模型。我们的代码位于 https://github.com/yuhao318/Atomic_Feature_Mimicking。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日