Text-to-image generation powers content creation across design, media, and data augmentation. Post-training of text-to-image generative models is a promising path to better match human preferences, factuality, and improved aesthetics. We introduce SOLACE (Adaptive Rewarding by self-Confidence), a post-training framework that replaces external reward supervision with an internal self-confidence signal, obtained by evaluating how accurately the model recovers injected noise under self-denoising probes. SOLACE converts this intrinsic signal into scalar rewards, enabling fully unsupervised optimization without additional datasets, annotators, or reward models. Empirically, by reinforcing high-confidence generations, SOLACE delivers consistent gains in compositional generation, text rendering and text-image alignment over the baseline. We also find that integrating SOLACE with external rewards results in a complementary improvement, with alleviated reward hacking.
翻译:文本到图像生成技术为设计、媒体和数据增强等领域的内容创作提供了强大动力。对文本到图像生成模型进行后训练是实现更符合人类偏好、提升事实准确性与美学质量的有效途径。本文提出SOLACE(基于自适应自信奖励)——一种后训练框架,该框架通过内部自信信号替代外部奖励监督机制。该信号通过评估模型在自去噪探测中恢复注入噪声的准确度获得。SOLACE将此内在信号转化为标量奖励,从而实现无需额外数据集、标注者或奖励模型的完全无监督优化。实验表明,通过强化高置信度生成结果,SOLACE在组合生成、文本渲染及图文对齐方面相较基线模型均取得稳定提升。研究还发现,将SOLACE与外部奖励结合可产生互补性改进,并有效缓解奖励黑客问题。