We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the versatility of pretrained diffusion models. Yet existing methods face a trade-off. Gradient-based guidance shifts the reverse mean, steering generation but pushing intermediate states outside the region that the model was trained on and degrading quality. Search-based methods preserve quality but gain no gradient signal. No prior method achieves both. NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. We introduce a whitening operator, the central mechanism behind NTRK, that makes the reward gradient safe to inject as noise without losing its guiding signal. Across various reward alignment tasks, NTRK outperforms recent state-of-the-art baselines without losing sample quality. Remarkably, on aesthetic generation, NTRK surpasses the reward of the best baseline at 500 NFEs using only 25 NFEs, a 20$\times$ reduction in compute.
翻译:我们提出噪声倾斜反向核(NTRK),这是一种奖励引导的扩散采样器,通过噪声项注入奖励梯度,保持预训练反向核不变,且每步仅需单个样本。推理阶段的奖励引导采样极大地扩展了预训练扩散模型的灵活性。然而,现有方法面临权衡:基于梯度的引导会偏移反向均值,虽引导生成但将中间状态推离模型训练区域并降低质量;基于搜索的方法保持质量但无法获取梯度信号。尚无方法能同时实现两者。NTRK通过固定反向均值并将噪声项向高奖励偏置来解决这一问题。我们引入了白化算子——NTRK的核心机制,该算子使奖励梯度在注入噪声时既安全又能保留其引导信号。在多种奖励对齐任务中,NTRK在不损失样本质量的前提下超越了最新基线方法。值得注意的是,在美学生成任务上,NTRK仅用25次NFE就达到了最优基线在500次NFE时的奖励水平,实现了20倍的计算量缩减。