Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used across many different categorical features. Our theoretical and empirical analysis reveals that multiplexed embeddings can be decomposed into components from each constituent feature, allowing models to distinguish between features. We show that multiplexed representations lead to Pareto-optimal parameter-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products.
翻译:高效且有效地学习高质量特征嵌入对于网络规模机器学习系统的性能至关重要。典型模型需处理数百个特征,其中词汇表规模可达百万至数十亿量级。标准方法将每个特征值表示为d维嵌入,导致极高基数特征对应数千亿参数。该瓶颈推动了替代嵌入算法的重大进展,但许多方法假设每个特征使用独立嵌入表。本文提出一种简单高效的框架——特征复用(Feature Multiplexing),该框架将单一表示空间用于多个不同类别特征。理论与实证分析表明,复用嵌入可分解为各组成特征的成分,使模型能够区分不同特征。我们证明,复用表示能在三个公开基准数据集上实现帕累托最优的参数-精度权衡。进一步,我们提出极具实用性的统一嵌入(Unified Embedding),其具备三大优势:简化特征配置、自适应动态数据分布、兼容现代硬件。相较于五个网络规模搜索、广告与推荐系统中的高竞争力基线方法,统一嵌入在离线与在线指标上均取得显著提升,该系统在行业领先产品中服务于全球数十亿用户。