Diffusion models draw the initial latent from an isotropic Gaussian distribution (all directions equally likely). But in practice, changing only the random seed can sharply alter image quality and prompt faithfulness. We explain this by distinguishing the isotropic prior from the semantics induced by the sampling map: while the prior is direction-agnostic, the mapping from latent noise to semantics has semantic-invariant directions and semantic-sensitive directions, so different seeds can lead to very different semantic outcomes. Motivated by this view, we propose a training-free inference procedure that (i) suppresses seed-specific, semantic-irrelevant variation via distribution-preserving semantic erasure, (ii) reinforces prompt-relevant semantic directions through timestep-aggregated horizontal injection, and (iii) applies a simple spherical retraction to stay near the prior's typical set. Across multiple backbones and benchmarks, our method consistently improves alignment and generation quality over standard sampling.
翻译:扩散模型从各向同性高斯分布(所有方向等概率)中抽取初始潜在向量。然而在实践中,仅改变随机种子即可显著影响图像质量与提示词忠实度。我们通过区分各向同性先验与采样映射诱导的语义结构来解释此现象:虽然先验具有方向无关性,但从潜在噪声到语义的映射却同时存在语义不变方向与语义敏感方向,因此不同种子可导致截然不同的语义输出。基于此视角,我们提出一种免训练推理方法,其通过(i)分布保持式语义擦除以抑制种子特异性且语义无关的变异,(ii)时序聚合水平注入以增强提示词相关语义方向,以及(iii)应用简单球面回缩使采样始终靠近先验典型集。在多种骨干模型与基准测试中,本方法相较于标准采样方案持续提升了语义对齐度与生成质量。