Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often too vivid to be real even when prompted for realistic-style images. To address this issue, we present Color Fidelity Dataset (CFD) and Color Fidelity Metric (CFM) for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity. Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. The dataset and code are available at https://github.com/ZhengyaoFang/CFM.
翻译:文本到图像(T2I)生成技术的最新进展极大地提升了视觉质量,然而,生成在视觉上接近真实世界摄影的、具有真实感的图像仍然具有挑战性。这部分归因于现有评估范式的偏差:人类评分和基于偏好训练的度量标准往往偏爱色彩饱和度与对比度被夸大的、视觉上鲜艳的图像,这导致即使在提示要求生成写实风格图像时,生成结果也常常因过于鲜艳而显得不真实。为解决这一问题,我们提出了用于客观评估写实风格生成图像色彩保真度的色彩保真度数据集(CFD)和色彩保真度度量标准(CFM)。CFD包含超过130万张具有不同色彩真实度等级的真实与合成图像,而CFM则采用多模态编码器来学习感知色彩保真度。此外,我们提出了一种无需训练的**色彩保真度优化**方法,该方法自适应地调制生成过程中的时空引导尺度,从而增强色彩真实性。CFD为CFM的评估提供支持,而CFM学习到的注意力机制又进一步指导CFR优化T2I的保真度,共同构成了一个用于评估和改进写实风格T2I生成中色彩保真度的渐进式框架。数据集与代码可在 https://github.com/ZhengyaoFang/CFM 获取。