To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence. Due to the permutation-equivariant nature, positional encoding is used to enhance the attention between token representations. In this setting, the pairwise attention scores can be derived by both semantic difference and positional difference. However, prior studies often model the two kinds of difference measurements in different ways, which potentially limits the expressive capacity of sequence modeling. To address this issue, this paper proposes a novel transformer variant with complex vector attention, named EulerFormer, which provides a unified theoretical framework to formulate both semantic difference and positional difference. The EulerFormer involves two key technical improvements. First, it employs a new transformation function for efficiently transforming the sequence tokens into polar-form complex vectors using Euler's formula, enabling the unified modeling of both semantic and positional information in a complex rotation form.Secondly, it develops a differential rotation mechanism, where the semantic rotation angles can be controlled by an adaptation function, enabling the adaptive integration of the semantic and positional information according to the semantic contexts.Furthermore, a phase contrastive learning task is proposed to improve the isotropy of contextual representations in EulerFormer. Our theoretical framework possesses a high degree of completeness and generality. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle. Extensive experiments conducted on four public datasets demonstrate the effectiveness and efficiency of our approach.
翻译:为捕获用户偏好,Transformer模型已被广泛应用于序列用户行为数据建模。Transformer架构的核心在于自注意力机制,该机制计算序列中成对注意力分数。由于置换等变性质,位置编码被用于增强词元表示间的注意力。在此设定下,成对注意力分数可由语义差异和位置差异共同推导得出。然而,现有研究通常采用不同方式建模这两种差异度量,这潜在地限制了序列建模的表达能力。为解决该问题,本文提出一种新型复向量注意力Transformer变体——EulerFormer,其通过统一理论框架同时形式化语义差异与位置差异。EulerFormer包含两项关键技术改进:第一,采用新的变换函数,利用欧拉公式将序列词元高效转换为极坐标形式复向量,实现语义信息与位置信息在复旋转形式下的统一建模;第二,提出差分旋转机制,通过自适应函数控制语义旋转角度,使语义与位置信息能根据语义上下文进行自适应融合。此外,本文提出相位对比学习任务以改善EulerFormer中上下文表示的各向同性。该理论框架具有高度完备性与通用性,对语义变化更具鲁棒性,且原则上具备更优的理论性质。在四个公开数据集上的广泛实验证明了所提方法的有效性与高效性。