Embedding techniques have become essential components of large databases in the deep learning era. By encoding discrete entities, such as words, items, or graph nodes, into continuous vector spaces, embeddings facilitate more efficient storage, retrieval, and processing in large databases. Especially in the domain of recommender systems, millions of categorical features are encoded as unique embedding vectors, which facilitates the modeling of similarities and interactions among features. However, numerous embedding vectors can result in significant storage overhead. In this paper, we aim to compress the embedding table through quantization techniques. Given that features vary in importance levels, we seek to identify an appropriate precision for each feature to balance model accuracy and memory usage. To this end, we propose a novel embedding compression method, termed Mixed-Precision Embeddings (MPE). Specifically, to reduce the size of the search space, we first group features by frequency and then search precision for each feature group. MPE further learns the probability distribution over precision levels for each feature group, which can be used to identify the most suitable precision with a specially designed sampling strategy. Extensive experiments on three public datasets demonstrate that MPE significantly outperforms existing embedding compression methods. Remarkably, MPE achieves about 200x compression on the Criteo dataset without comprising the prediction accuracy.
翻译:嵌入技术已成为深度学习时代大型数据库的关键组成部分。通过将离散实体(如词汇、项目或图节点)编码为连续向量空间,嵌入技术促进了大型数据库中更高效的存储、检索和处理。特别是在推荐系统领域,数百万个分类特征被编码为唯一的嵌入向量,这有助于建模特征之间的相似性和交互作用。然而,大量嵌入向量可能导致显著的存储开销。本文旨在通过量化技术压缩嵌入表。鉴于特征的重要性级别各异,我们试图为每个特征确定适当的精度,以平衡模型准确性和内存使用。为此,我们提出了一种新颖的嵌入压缩方法,称为混合精度嵌入(MPE)。具体而言,为减小搜索空间的规模,我们首先按频率对特征进行分组,然后为每个特征组搜索精度。MPE进一步学习每个特征组在不同精度级别上的概率分布,该分布可通过专门设计的采样策略用于确定最合适的精度。在三个公共数据集上的大量实验表明,MPE显著优于现有的嵌入压缩方法。值得注意的是,MPE在Criteo数据集上实现了约200倍的压缩,同时不损害预测准确性。