Sequential self-attention models usually rely on additive positional embeddings, which inject positional information into item representations at the input. In the absence of positional signals, the attention block is permutation-equivariant over sequence positions and thus has no intrinsic notion of temporal order beyond causal masking. We argue that additive positional embeddings make the attention mechanism only superficially sensitive to sequence order: positional information is entangled with item embedding semantics, propagates weakly in deep architectures, and limits the ability to capture rich sequential patterns. To address these limitations, we introduce a kernelized self-attention mechanism, where a learnable positional kernel operates purely in the position space, disentangled from semantic similarity, and directly modulates attention weights. When applied per attention block, this kernel enables adaptive multi-scale sequential modeling. Experiments on standard next-item prediction benchmarks show that our positional kernel attention consistently improves over strong competing baselines.
翻译:序列自注意力模型通常依赖于加性位置嵌入,这类方法在输入阶段将位置信息注入物品表示中。在缺乏位置信号的情况下,注意力模块对序列位置具有置换等变性,因此除了因果掩码外,其本身并不具备时序顺序的概念。我们认为,加性位置嵌入仅使注意力机制在表面上对序列顺序敏感:位置信息与物品嵌入语义相互纠缠,在深层架构中传播较弱,并且限制了捕获丰富序列模式的能力。为解决这些局限性,我们提出了一种核化的自注意力机制,其中可学习的位置核纯粹在位置空间中运作,与语义相似性解耦,并直接调节注意力权重。当在每个注意力模块中应用时,该核能够实现自适应多尺度序列建模。在标准的下一个物品预测基准测试上的实验表明,我们的位置核注意力机制相较于强有力的竞争基线模型取得了持续性的性能提升。