APEX：面向视觉语言生成中多目标对齐的自适应优先级学习 (APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation)

Multi-objective alignment for text-to-image generation is commonly implemented via static linear scalarization, but fixed weights often fail under heterogeneous rewards, leading to optimization imbalance where models overfit high-variance, high-responsiveness objectives (e.g., OCR) while under-optimizing perceptual goals. We identify two mechanistic causes: variance hijacking, where reward dispersion induces implicit reweighting that dominates the normalized training signal, and gradient conflicts, where competing objectives produce opposing update directions and trigger seesaw-like oscillations. We propose APEX (Adaptive Priority-based Efficient X-objective Alignment), which stabilizes heterogeneous rewards with Dual-Stage Adaptive Normalization and dynamically schedules objectives via P^3 Adaptive Priorities that combine learning potential, conflict penalty, and progress need. On Stable Diffusion 3.5, APEX achieves improved Pareto trade-offs across four heterogeneous objectives, with balanced gains of +1.31 PickScore, +0.35 DeQA, and +0.53 Aesthetics while maintaining competitive OCR accuracy, mitigating the instability of multi-objective alignment.

翻译：文本到图像生成中的多目标对齐通常通过静态线性标量化实现，但固定权重在异构奖励下往往失效，导致优化失衡：模型过度拟合高方差、高响应性目标（如OCR），而感知目标则优化不足。我们识别出两个机制性原因：方差劫持（奖励离散度引发隐性重加权，主导归一化训练信号）与梯度冲突（竞争性目标产生相反更新方向，引发跷跷板式振荡）。我们提出APEX（基于自适应优先级的高效多目标对齐方法），通过双阶段自适应归一化稳定异构奖励，并利用结合学习潜力、冲突惩罚与进度需求的P^3自适应优先级动态调度目标。在Stable Diffusion 3.5上，APEX在四个异构目标间实现了改进的帕累托权衡，在保持竞争力OCR准确率的同时，均衡提升PickScore +1.31、DeQA +0.35与美学评分 +0.53，有效缓解了多目标对齐的不稳定性。