R2LED: Equipping Retrieval and Refinement in Lifelong User Modeling with Semantic IDs for CTR Prediction

Lifelong user modeling, which leverages users' long-term behavior sequences for CTR prediction, has been widely applied in personalized services. Existing methods generally adopted a two-stage "retrieval-refinement" strategy to balance effectiveness and efficiency. However, they still suffer from (i) noisy retrieval due to skewed data distribution and (ii) lack of semantic understanding in refinement. While semantic enhancement, e.g., LLMs modeling or semantic embeddings, offers potential solutions to these two challenges, these approaches face impractical inference costs or insufficient representation granularity. Obsorbing multi-granularity and lightness merits of semantic identity (SID), we propose a novel paradigm that equips retrieval and refinement in Lifelong User Modeling with SEmantic IDs (R2LED) to address these issues. First, we introduce a Multi-route Mixed Retrieval for the retrieval stage. On the one hand, it captures users' interests from various granularities by several parallel recall routes. On the other hand, a mixed retrieval mechanism is proposed to efficiently retrieve candidates from both collaborative and semantic views, reducing noise. Then, for refinement, we design a Bi-level Fusion Refinement, including a target-aware cross-attention for route-level fusion and a gate mechanism for SID-level fusion. It can bridge the gap between semantic and collaborative spaces, exerting the merits of SID. The comprehensive experimental results on two public datasets demonstrate the superiority of our method in both performance and efficiency. To facilitate the reproduction, we have released the code online https://github.com/abananbao/R2LED.

翻译：终身用户建模通过利用用户的长期行为序列进行CTR预测，已广泛应用于个性化服务中。现有方法普遍采用两阶段的“检索-精化”策略以平衡效果与效率。然而，它们仍存在以下问题：(i) 由于数据分布偏斜导致的检索噪声；(ii) 精化阶段缺乏语义理解。尽管语义增强（例如LLM建模或语义嵌入）为这两大挑战提供了潜在的解决方案，但这些方法面临着不切实际的推理成本或表征粒度不足的问题。借鉴语义身份（SID）多粒度与轻量化的优点，我们提出了一种新颖的范式——为终身用户建模中的检索与精化配备语义ID（R2LED）以解决上述问题。首先，我们在检索阶段引入了多路由混合检索机制。一方面，它通过多个并行的召回路由从不同粒度捕捉用户兴趣；另一方面，提出了一种混合检索机制，能够从协同过滤和语义视角高效地检索候选项目，从而降低噪声。随后，针对精化阶段，我们设计了双层融合精化模块，包括用于路由级融合的目标感知交叉注意力机制以及用于SID级融合的门控机制。该模块能够弥合语义空间与协同空间之间的鸿沟，充分发挥SID的优势。在两个公开数据集上的综合实验结果表明，我们的方法在性能与效率方面均具有优越性。为便于复现，我们已将代码公开于 https://github.com/abananbao/R2LED。