Text-to-image (T2I) models are increasingly used in impactful real-life applications. As such, there is a growing need to audit these models to ensure that they generate desirable, task-appropriate images. However, systematically inspecting the associations between prompts and generated content in a human-understandable way remains challenging. To address this, we propose Concept2Concept, a framework where we characterize conditional distributions of vision language models using interpretable concepts and metrics that can be defined in terms of these concepts. This characterization allows us to use our framework to audit models and prompt-datasets. To demonstrate, we investigate several case studies of conditional distributions of prompts, such as user-defined distributions or empirical, real-world distributions. Lastly, we implement Concept2Concept as an open-source interactive visualization tool to facilitate use by non-technical end-users. A demo is available at https://tinyurl.com/Concept2ConceptDemo.
翻译:文本到图像(T2I)模型在具有现实影响力的应用中使用日益广泛。因此,越来越需要对这些模型进行审计,以确保其生成符合期望且适用于任务的图像。然而,以人类可理解的方式系统性地检查提示词与生成内容之间的关联仍然具有挑战性。为此,我们提出了Concept2Concept框架,该框架利用可解释的概念以及基于这些概念定义的度量指标,来表征视觉语言模型的条件分布。这种表征使我们能够使用该框架来审计模型和提示词数据集。作为演示,我们研究了多种提示词条件分布的案例,例如用户定义的分布或经验性的真实世界分布。最后,我们将Concept2Concept实现为一个开源交互式可视化工具,以方便非技术终端用户使用。演示地址为:https://tinyurl.com/Concept2ConceptDemo。