EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention

To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence. Due to the permutation-equivariant nature, positional encoding is used to enhance the attention between token representations. In this setting, the pairwise attention scores can be derived by both semantic difference and positional difference. However, prior studies often model the two kinds of difference measurements in different ways, which potentially limits the expressive capacity of sequence modeling. To address this issue, this paper proposes a novel transformer variant with complex vector attention, named EulerFormer, which provides a unified theoretical framework to formulate both semantic difference and positional difference. The EulerFormer involves two key technical improvements. First, it employs a new transformation function for efficiently transforming the sequence tokens into polar-form complex vectors using Euler's formula, enabling the unified modeling of both semantic and positional information in a complex rotation form.Secondly, it develops a differential rotation mechanism, where the semantic rotation angles can be controlled by an adaptation function, enabling the adaptive integration of the semantic and positional information according to the semantic contexts.Furthermore, a phase contrastive learning task is proposed to improve the anisotropy of contextual representations in EulerFormer. Our theoretical framework possesses a high degree of completeness and generality. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle. Extensive experiments conducted on four public datasets demonstrate the effectiveness and efficiency of our approach.

翻译：为捕捉用户偏好，Transformer模型被广泛应用于序列用户行为数据建模。Transformer架构的核心在于自注意力机制，该机制计算序列中元素间的成对注意力分数。由于置换等变特性，位置编码被用于增强词元表示间的注意力交互。在此设定下，成对注意力分数可由语义差异与位置差异共同推导。然而，现有研究通常采用不同方式建模这两种差异度量，这限制了序列建模的表达能力。针对该问题，本文提出一种新型复向量注意力Transformer变体——EulerFormer，该模型构建了统一理论框架来联合建模语义差异与位置差异。EulerFormer包含两项关键技术改进：首先，采用基于欧拉公式的新型变换函数，将序列词元高效转换为极坐标形式的复向量，从而实现语义与位置信息在复旋转变换框架下的统一建模；其次，提出差分旋转机制，通过自适应函数控制语义旋转角度，可根据语义上下文实现语义信息与位置信息的自适应融合。此外，本文提出相位对比学习任务以改善EulerFormer中上下文表示的各向异性。本理论框架具有高度完备性与普适性，对语义变化具有更强鲁棒性，并在理论上具有更优越的数学性质。在四个公开数据集上的大量实验证明了该方法的有效性与高效性。