Modern recommender systems employ various sequential modules such as self-attention to learn dynamic user interests. However, these methods are less effective in capturing collaborative and transitional signals within user interaction sequences. First, the self-attention architecture uses the embedding of a single item as the attention query, which is inherently challenging to capture collaborative signals. Second, these methods typically follow an auto-regressive framework, which is unable to learn global item transition patterns. To overcome these limitations, we propose a new method called Multi-Query Self-Attention with Transition-Aware Embedding Distillation (MQSA-TED). First, we propose an $L$-query self-attention module that employs flexible window sizes for attention queries to capture collaborative signals. In addition, we introduce a multi-query self-attention method that balances the bias-variance trade-off in modeling user preferences by combining long and short-query self-attentions. Second, we develop a transition-aware embedding distillation module that distills global item-to-item transition patterns into item embeddings, which enables the model to memorize and leverage transitional signals and serves as a calibrator for collaborative signals. Experimental results on four real-world datasets show the superiority of our proposed method over state-of-the-art sequential recommendation methods.
翻译:现代推荐系统采用多种序列模块(如自注意力机制)来学习动态用户兴趣。然而,这些方法在捕捉用户交互序列中的协作信号与跃迁信号方面效果有限。首先,自注意力架构使用单个物品的嵌入作为注意力查询,这本质上难以捕获协作信号。其次,此类方法通常遵循自回归框架,无法学习全局物品跃迁模式。为克服这些局限,我们提出一种新方法——融合跃迁感知嵌入蒸馏的多查询自注意力机制(MQSA-TED)。首先,我们提出一种L查询自注意力模块,通过为注意力查询采用灵活窗口大小来捕获协作信号。此外,我们引入一种多查询自注意力方法,通过结合长查询与短查询自注意力,在建模用户偏好时平衡偏差-方差权衡。其次,我们开发了一种跃迁感知嵌入蒸馏模块,将全局物品到物品的跃迁模式蒸馏至物品嵌入中,使模型能够记忆并利用跃迁信号,同时作为协作信号的校准器。在四个真实数据集上的实验结果表明,我们提出的方法相较于最先进的序列推荐方法具有优越性。