Generative Recommenders (GRs), exemplified by the Hierarchical Sequential Transduction Unit (HSTU), have emerged as a powerful paradigm for modeling long user interaction sequences. However, we observe that their "flat-sequence" assumption overlooks the rich, intrinsic structure of user behavior. This leads to two key limitations: a failure to capture the temporal hierarchy of session-based engagement, and computational inefficiency, as dense attention introduces significant noise that obscures true preference signals within semantically sparse histories, which deteriorates the quality of the learned representations. To this end, we propose a novel framework named HPGR (Hierarchical and Preference-aware Generative Recommender), built upon a two-stage paradigm that injects these crucial structural priors into the model to handle the drawback. Specifically, HPGR comprises two synergistic stages. First, a structure-aware pre-training stage employs a session-based Masked Item Modeling (MIM) objective to learn a hierarchically-informed and semantically rich item representation space. Second, a preference-aware fine-tuning stage leverages these powerful representations to implement a Preference-Guided Sparse Attention mechanism, which dynamically constrains computation to only the most relevant historical items, enhancing both efficiency and signal-to-noise ratio. Empirical experiments on a large-scale proprietary industrial dataset from APPGallery and an online A/B test verify that HPGR achieves state-of-the-art performance over multiple strong baselines, including HSTU and MTGR.
翻译:以层次化序列转导单元(HSTU)为代表的生成式推荐系统已成为建模长用户交互序列的强大范式。然而,我们观察到其"扁平序列"假设忽略了用户行为内在的丰富结构。这导致两个关键局限:一是无法捕捉基于会话的参与行为的时间层次性;二是计算效率低下,因为稠密注意力机制会在语义稀疏的历史记录中引入显著噪声,从而掩盖真实的偏好信号,进而损害所学表征的质量。为此,我们提出名为HPGR(层次化与偏好感知生成式推荐系统)的新框架,该框架基于两阶段范式,通过向模型注入这些关键的结构先验来处理上述缺陷。具体而言,HPGR包含两个协同阶段。首先,结构感知的预训练阶段采用基于会话的掩码项目建模目标,以学习具有层次化信息且语义丰富的项目表征空间。其次,偏好感知的微调阶段利用这些强大的表征来实现偏好引导的稀疏注意力机制,该机制动态地将计算约束在最相关的历史项目上,从而同时提升效率与信噪比。在来自APPGallery的大规模专有工业数据集上进行的实证实验以及在线A/B测试均验证,HPGR在包括HSTU和MTGR在内的多个强基线模型上实现了最先进的性能。