While text-to-image (T2I) models have advanced considerably, their capability to associate colors with implicit concepts remains underexplored. To address the gap, we introduce ColorConceptBench, a new human-annotated benchmark to systematically evaluate color-concept associations through the lens of probabilistic color distributions. ColorConceptBench moves beyond explicit color names or codes by probing how models translate 1,281 implicit color concepts using a foundation of 6,369 human annotations. Our evaluation of seven leading T2I models reveals that current models lack sensitivity to abstract semantics, and crucially, this limitation appears resistant to standard interventions (e.g., scaling and guidance). This demonstrates that achieving human-like color semantics requires more than larger models, but demands a fundamental shift in how models learn and represent implicit meaning.
翻译:尽管文本到图像(T2I)模型已取得显著进展,但其将颜色与隐含概念相关联的能力仍未得到充分探索。为填补这一空白,我们提出了ColorConceptBench——一个通过概率性颜色分布视角系统评估颜色-概念关联的全新人工标注基准。该基准超越了显式颜色名称或代码的范畴,基于6,369条人工标注数据,深入探究模型如何转化1,281种隐含颜色概念。通过对七个主流T2I模型的评估,我们发现现有模型对抽象语义缺乏敏感性,且关键的是,这种局限性似乎无法通过常规干预手段(如模型缩放和引导技术)得到改善。这表明要实现类人的颜色语义理解,不仅需要更大的模型,更需要在模型学习和表征隐含意义的方式上进行根本性转变。