Multi-objective alignment for text-to-image generation is commonly implemented via static linear scalarization, but fixed weights often fail under heterogeneous rewards, leading to optimization imbalance where models overfit high-variance, high-responsiveness objectives (e.g., OCR) while under-optimizing perceptual goals. We identify two mechanistic causes: variance hijacking, where reward dispersion induces implicit reweighting that dominates the normalized training signal, and gradient conflicts, where competing objectives produce opposing update directions and trigger seesaw-like oscillations. We propose APEX (Adaptive Priority-based Efficient X-objective Alignment), which stabilizes heterogeneous rewards with Dual-Stage Adaptive Normalization and dynamically schedules objectives via P^3 Adaptive Priorities that combine learning potential, conflict penalty, and progress need. On Stable Diffusion 3.5, APEX achieves improved Pareto trade-offs across four heterogeneous objectives, with balanced gains of +1.31 PickScore, +0.35 DeQA, and +0.53 Aesthetics while maintaining competitive OCR accuracy, mitigating the instability of multi-objective alignment.
翻译:文本到图像生成中的多目标对齐通常通过静态线性标量化实现,但固定权重在异构奖励下往往失效,导致优化失衡:模型过度拟合高方差、高响应性目标(如OCR),而感知目标则优化不足。我们识别出两个机制性原因:方差劫持(奖励离散度引发隐性重加权,主导归一化训练信号)与梯度冲突(竞争性目标产生相反更新方向,引发跷跷板式振荡)。我们提出APEX(基于自适应优先级的高效多目标对齐方法),通过双阶段自适应归一化稳定异构奖励,并利用结合学习潜力、冲突惩罚与进度需求的P^3自适应优先级动态调度目标。在Stable Diffusion 3.5上,APEX在四个异构目标间实现了改进的帕累托权衡,在保持竞争力OCR准确率的同时,均衡提升PickScore +1.31、DeQA +0.35与美学评分 +0.53,有效缓解了多目标对齐的不稳定性。