Beyond Interleaving: Causal Attention Reformulations for Generative Recommender Systems

Generative Recommender Systems (GR) increasingly model user behavior as a sequence generation task by interleaving item and action tokens. While effective, this formulation introduces significant structural and computational inefficiencies: it doubles sequence length, incurs quadratic overhead, and relies on implicit attention to recover the causal relationship between an item and its associated action. Furthermore, interleaving heterogeneous tokens forces the Transformer to disentangle semantically incompatible signals, leading to increased attention noise and reduced representation efficiency.In this work, we propose a principled reformulation of generative recommendation that aligns sequence modeling with underlying causal structures and attention theory. We demonstrate that current interleaving mechanisms act as inefficient proxies for similarity-weighted action pooling. To address this, we introduce two novel architectures that eliminate interleaved dependencies to reduce sequence complexity by 50%: Attention-based Late Fusion for Actions (AttnLFA) and Attention-based Mixed Value Pooling (AttnMVP). These models explicitly encode the $i_n \rightarrow a_n$ causal dependency while preserving the expressive power of Transformer-based sequence modeling.We evaluate our framework on large-scale product recommendation data from a major social network. Experimental results show that AttnLFA and AttnMVP consistently outperform interleaved baselines, achieving evaluation loss improvements of 0.29% and 0.80%, and significant gains in Normalized Entropy (NE). Crucially, these performance gains are accompanied by training time reductions of 23% and 12%, respectively. Our findings suggest that explicitly modeling item-action causality provides a superior design paradigm for scalable and efficient generative ranking.

翻译：生成式推荐系统（GR）越来越多地将用户行为建模为序列生成任务，通过交错排列物品和动作标记来实现。尽管有效，这种形式化方法引入了显著的结构和计算低效性：它使序列长度加倍，产生二次方开销，并依赖隐式注意力来恢复物品与其关联动作之间的因果关系。此外，交错异质标记迫使Transformer解耦语义不兼容的信号，导致注意力噪声增加和表示效率降低。本文中，我们提出了一种生成式推荐的原则性重构方法，使序列建模与底层因果结构及注意力理论保持一致。我们证明，当前的交错机制充当了相似性加权动作池化的低效代理。为解决此问题，我们引入了两种新颖架构，它们消除了交错依赖，从而将序列复杂度降低50%：基于注意力的动作后融合（AttnLFA）和基于注意力的混合值池化（AttnMVP）。这些模型显式编码 $i_n \rightarrow a_n$ 因果依赖，同时保留了基于Transformer的序列建模的表达能力。我们在来自主要社交网络的大规模产品推荐数据上评估了我们的框架。实验结果表明，AttnLFA和AttnMVP在评估损失上分别实现了0.29%和0.80%的改进，并在归一化熵（NE）上取得了显著提升。关键的是，这些性能提升伴随着训练时间分别减少23%和12%。我们的研究结果表明，显式建模物品-动作因果关系为可扩展且高效的生成式排序提供了一个更优的设计范式。