Evaluation of generative models has been an underrepresented field despite the surge of generative architectures. Most recent models are evaluated upon rather obsolete metrics which suffer from robustness issues, while being unable to assess more aspects of visual quality, such as compositionality and logic of synthesis. At the same time, the explainability of generative models remains a limited, though important, research direction with several current attempts requiring access to the inner functionalities of generative models. Contrary to prior literature, we view generative models as a black box, and we propose a framework for the evaluation and explanation of synthesized results based on concepts instead of pixels. Our framework exploits knowledge-based counterfactual edits that underline which objects or attributes should be inserted, removed, or replaced from generated images to approach their ground truth conditioning. Moreover, global explanations produced by accumulating local edits can also reveal what concepts a model cannot generate in total. The application of our framework on various models designed for the challenging tasks of Story Visualization and Scene Synthesis verifies the power of our approach in the model-agnostic setting.
翻译:生成式架构的激增未能充分推动生成模型评估领域的发展。大多数最新模型仍采用存在鲁棒性问题的过时指标进行评估,且无法评估视觉质量的诸多方面,如合成的组合性与逻辑性。与此同时,尽管可解释性是一个重要研究方向,但当前生成模型的可解释性仍十分有限,且现有尝试大多需要访问生成模型的内部功能。与以往研究不同,我们将生成模型视为黑箱,并提出一个基于概念而非像素的合成结果评估与解释框架。该框架利用基于知识的反事实编辑,强调应在生成图像中插入、移除或替换哪些对象或属性,以逼近其真实条件。此外,通过累积局部编辑产生的全局解释还能揭示模型完全无法生成哪些概念。将该框架应用于面向故事可视化与场景合成等挑战性任务设计的多种模型,验证了其在模型无关设定下的强大能力。