Generative Recommendation has emerged as a transformative paradigm, reformulating recommendation as an end-to-end autoregressive sequence generation task. Despite its promise, existing preference optimization methods typically rely on binary outcome correctness, suffering from a systemic limitation we term uncertainty blindness. This issue manifests in the neglect of the model's intrinsic generation confidence, the variation in sample learning difficulty, and the lack of explicit confidence expression, directly leading to unstable training dynamics and unquantifiable decision risks. In this paper, we propose Uncertainty-aware Generative Recommendation (UGR), a unified framework that leverages uncertainty as a critical signal for adaptive optimization. UGR synergizes three mechanisms: (1) an uncertainty-weighted reward to penalize confident errors; (2) difficulty-aware optimization dynamics to prevent premature convergence; and (3) explicit confidence alignment to empower the model with confidence expression capabilities. Extensive experiments demonstrate that UGR not only yields superior recommendation performance but also fundamentally stabilizes training, preventing the performance degradation often observed in standard methods. Furthermore, the learned confidence enables reliable downstream risk-aware applications.
翻译:生成式推荐已成为一种变革性范式,它将推荐任务重新表述为端到端的自回归序列生成任务。尽管前景广阔,现有的偏好优化方法通常依赖于二元结果正确性,存在一种我们称之为“不确定性盲视”的系统性局限。这一问题体现在:忽视模型内在的生成置信度、忽略样本学习难度的差异以及缺乏明确的置信度表达,直接导致了训练动态的不稳定和决策风险的不可量化。本文提出不确定性感知的生成式推荐(UGR),这是一个利用不确定性作为自适应优化关键信号的统一框架。UGR协同整合了三种机制:(1)不确定性加权奖励,用于惩罚高置信度错误;(2)难度感知的优化动态,以防止过早收敛;(3)显式置信度对齐,赋予模型置信度表达能力。大量实验表明,UGR不仅能够产生更优的推荐性能,而且从根本上稳定了训练过程,避免了标准方法中常见的性能下降。此外,学习到的置信度使得可靠的下游风险感知应用成为可能。