Text-to-image diffusion models, such as Stable Diffusion, can produce high-quality and diverse images but often fail to achieve compositional alignment, particularly when prompts describe complex object relationships, attributes, or spatial arrangements. Recent inference-time approaches address this by optimizing or exploring the initial noise under the guidance of reward functions that score text-image alignment without requiring model fine-tuning. While promising, each strategy has intrinsic limitations when used alone: optimization can stall due to poor initialization or unfavorable search trajectories, whereas exploration may require a prohibitively large number of samples to locate a satisfactory output. Our analysis further shows that neither single reward metrics nor ad-hoc combinations reliably capture all aspects of compositionality, leading to weak or inconsistent guidance. To overcome these challenges, we present Category-Aware Reward-based Initial Noise Optimization and Exploration (CARINOX), a unified framework that combines noise optimization and exploration with a principled reward selection procedure grounded in correlation with human judgments. Evaluations on two complementary benchmarks covering diverse compositional challenges show that CARINOX raises average alignment scores by +16% on T2I-CompBench++ and +11% on the HRS benchmark, consistently outperforming state-of-the-art optimization and exploration-based methods across all major categories, while preserving image quality and diversity. The project page is available at https://amirkasaei.com/carinox/.
翻译:文本到图像扩散模型(如Stable Diffusion)能够生成高质量且多样化的图像,但在实现组合对齐方面往往存在不足,尤其是在提示词描述复杂物体关系、属性或空间布局时。近期的推理时方法通过优化或探索初始噪声来解决此问题,这些方法在奖励函数(用于评估图文对齐程度且无需模型微调)的指导下进行。尽管前景可观,但每种策略单独使用时都存在固有局限:优化可能因初始化不良或搜索轨迹不利而陷入停滞,而探索则可能需要大量样本才能找到满意输出。我们的分析进一步表明,单一奖励指标或临时组合均无法可靠捕捉组合性的所有方面,导致引导效果薄弱或不一致。为克服这些挑战,我们提出了基于类别感知奖励的初始噪声优化与探索(CARINOX)框架,该统一框架将噪声优化与探索相结合,并采用基于人类判断相关性的原则性奖励选择机制。在两个涵盖多样化组合挑战的互补基准测试(T2I-CompBench++和HRS基准)上的评估表明,CARINOX将平均对齐分数分别提升了16%和11%,在所有主要类别中均持续优于基于优化和探索的先进方法,同时保持了图像质量与多样性。项目页面详见 https://amirkasaei.com/carinox/。