Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions. Transformer architectures have witnessed great success in many domains, such as natural language processing and computer vision. However, there has not been much adoption of Transformer architecture for feature interaction modeling in industry. We aim at closing the gap. We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems: (1) Transformer architecture fails to capture the heterogeneous feature interactions in the self-attention layer; (2) The serving latency of Transformer architecture might be too high to be deployed in web-scale recommender systems. We first propose a heterogeneous self-attention layer, which is a simple yet effective modification to the self-attention layer in Transformer, to take into account the heterogeneity of feature interactions. We then introduce \textsc{Hiformer} (\textbf{H}eterogeneous \textbf{I}nteraction Trans\textbf{former}) to further improve the model expressiveness. With low-rank approximation and model pruning, \hiformer enjoys fast inference for online deployment. Extensive offline experiment results corroborates the effectiveness and efficiency of the \textsc{Hiformer} model. We have successfully deployed the \textsc{Hiformer} model to a real world large scale App ranking model at Google Play, with significant improvement in key engagement metrics (up to +2.66\%).

翻译：特征交互学习是构建推荐系统的关键基础。在网络规模应用中，由于输入特征空间稀疏且庞大，特征交互学习极具挑战性；同时，因解空间呈指数级增长，人工设计有效特征交互不可行。我们提出利用基于注意力层的Transformer架构自动捕获特征交互。Transformer架构已在自然语言处理和计算机视觉等领域取得巨大成功，但产业界尚未广泛采用该架构进行特征交互建模。我们旨在弥合这一差距。针对将原生Transformer架构应用于网络规模推荐系统，我们识别出两个关键挑战：（1）Transformer架构在自注意力层中无法捕获异构特征交互；（2）Transformer架构的服务延迟可能过高，难以部署于网络规模推荐系统。我们首先提出异构自注意力层——对Transformer自注意力层进行简单而有效的改进——以考虑特征交互的异构性。随后引入\textsc{Hiformer}（异构交互Transformer）进一步提升模型表达能力。通过低秩近似与模型剪枝，Hiformer实现了快速在线部署推理。大量离线实验验证了Hiformer模型的有效性与高效性。我们已成功将Hiformer模型部署至Google Play的真实大规模应用排序模型中，核心参与度指标显著提升（最高达+2.66%）。