Inference-time scaling offers a versatile paradigm for aligning visual generative models with downstream objectives without parameter updates. However, existing approaches that optimize the high-dimensional initial noise suffer from severe inefficiency, as many search directions exert negligible influence on the final generation. We show that this inefficiency is closely related to a spectral bias in generative dynamics: model sensitivity to initial perturbations diminishes rapidly as frequency increases. Building on this insight, we propose Spectral Evolution Search (SES), a plug-and-play framework for initial noise optimization that executes gradient-free evolutionary search within a low-frequency subspace. Theoretically, we derive the Spectral Scaling Prediction from perturbation propagation dynamics, which explains the systematic differences in the impact of perturbations across frequencies. Extensive experiments demonstrate that SES significantly advances the Pareto frontier of generation quality versus computational cost, consistently outperforming strong baselines under equivalent budgets.
翻译:推理时缩放提供了一种无需参数更新的灵活范式,可将视觉生成模型与下游目标对齐。然而,现有优化高维初始噪声的方法存在严重的效率低下问题,因为许多搜索方向对最终生成结果的影响微乎其微。我们证明,这种低效性与生成动力学中的谱偏置密切相关:模型对初始扰动的敏感性随频率升高而迅速衰减。基于这一洞见,我们提出谱系演化搜索(SES),一种用于初始噪声优化的即插即用框架,其在低频子空间内执行无梯度演化搜索。理论上,我们从扰动传播动力学推导出谱缩放预测,该预测解释了不同频率扰动影响的系统性差异。大量实验表明,SES显著推进了生成质量与计算成本的帕累托前沿,在同等计算预算下持续优于现有强基线方法。