Generative recommendation models that formulate the task as sequence generation overcome the objective fragmentation problem of traditional cascade architectures, yet existing approaches still suffer from flat semantic representations lacking hierarchical structure for multi-step reasoning and an externally constructed chain-of-thought (CoT) that requires expensive annotations and remains disconnected from the generation objective. We propose HoloRec, an endogenous chain-of-thought recommendation mechanism that unifies representation, reasoning, and generation by constructing a hierarchical semantic encoding matrix via multi-granularity nested residual quantization optimized by a holistic reconstruction loss. HoloRec supports two inference modes: a non-thinking mode that uses lightweight multi-granularity supervised alignment for fast prediction, and a thinking mode that employs an interleaved reasoning scheme to generate CoT steps on the fly, directly embedding reasoning into the generation process without external data. Experiments on multiple public recommendation datasets demonstrate that HoloRec consistently outperforms baselines, with especially significant gains in sparse scenarios, and the thinking mode achieves better accuracy than the non-thinking mode with only modest inference overhead.
翻译:将推荐任务建模为序列生成的生成式推荐模型克服了传统级联架构的目标碎片化问题,但现有方法仍存在缺乏分层结构以支持多步推理的扁平语义表示,以及依赖外部构建且需昂贵标注、与生成目标脱节的思维链(CoT)机制。我们提出HoloRec——一种内生思维链推荐机制,通过多粒度嵌套残差量化构建层次化语义编码矩阵,并采用整体重构损失进行优化,统一了表示、推理与生成。HoloRec支持两种推理模式:非思考模式利用轻量级多粒度监督对齐实现快速预测;思考模式则采用交错推理方案实时生成CoT步骤,将推理直接嵌入生成过程而无需外部数据。在多个公开推荐数据集上的实验表明,HoloRec持续优于基线方法,在稀疏场景下提升尤为显著,且思考模式在仅引入适度推理开销的情况下实现了优于非思考模式的准确性。