Modern recommender systems employ various sequential modules such as self-attention to learn dynamic user interests. However, these methods are less effective in capturing collaborative and transitional signals within user interaction sequences. First, the self-attention architecture uses the embedding of a single item as the attention query, making it challenging to capture collaborative signals. Second, these methods typically follow an auto-regressive framework, which is unable to learn global item transition patterns. To overcome these limitations, we propose a new method called Multi-Query Self-Attention with Transition-Aware Embedding Distillation (MQSA-TED). First, we propose an $L$-query self-attention module that employs flexible window sizes for attention queries to capture collaborative signals. In addition, we introduce a multi-query self-attention method that balances the bias-variance trade-off in modeling user preferences by combining long and short-query self-attentions. Second, we develop a transition-aware embedding distillation module that distills global item-to-item transition patterns into item embeddings, which enables the model to memorize and leverage transitional signals and serves as a calibrator for collaborative signals. Experimental results on four real-world datasets demonstrate the effectiveness of the proposed modules.
翻译:现代推荐系统采用多种序列模块(如自注意力机制)来学习动态用户兴趣。然而,这些方法在捕捉用户交互序列中的协作信号和迁移模式方面效果有限。首先,自注意力架构将单一物品的嵌入作为注意力查询,这使得捕捉协作信号变得困难。其次,这些方法通常遵循自回归框架,无法学习全局物品迁移模式。为克服这些局限,我们提出了一种名为"具有迁移感知嵌入蒸馏的多查询自注意力机制"(MQSA-TED)的新方法。首先,我们提出一个L-查询自注意力模块,该模块为注意力查询采用灵活窗口大小以捕捉协作信号。此外,我们引入了一种多查询自注意力方法,通过结合长查询和短查询自注意力来平衡用户偏好建模中的偏差-方差权衡。其次,我们开发了一个迁移感知嵌入蒸馏模块,将全局物品到物品的迁移模式蒸馏到物品嵌入中,使模型能够记忆并利用迁移信号,同时作为协作信号的校准器。在四个真实数据集上的实验结果证明了所提模块的有效性。