Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, metaphorical comprehension of images remains relatively unexplored. Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor. We also collect high-quality and rich metaphor annotations (abstract objects, concepts, relationships along with their corresponding object boxes) as there do not exist any datasets that facilitate the evaluation of these tasks. We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations, highlighting strengths and weaknesses of current approaches in visual metaphor Classification, Localization, Understanding (retrieval, question answering, captioning) and gEneration (text-to-image synthesis) tasks. We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.
翻译:创造力是人类认知中不可或缺的一部分,也是我们理解世界的内在方式。隐喻性抽象通过感觉等抽象概念之间的微妙关系,在传达创造性想法中起着基础性作用。尽管计算机视觉基准测试和方法主要侧重于理解与生成图像的字面解释,但对图像的隐喻理解仍相对缺乏探索。为此,我们引入MetaCLUE——一套关于视觉隐喻的视觉任务。由于缺乏支持评估这些任务的数据集,我们还收集了高质量且丰富的隐喻标注(包括抽象对象、概念、关系及其对应的对象框)。基于我们的标注,我们对视觉与语言领域的最先进模型进行了全面分析,揭示了当前方法在视觉隐喻的分类、定位、理解(检索、问答、描述)及生成(文本到图像合成)任务中的优缺点。我们希望这项工作能为开发具有类似人类创造能力的AI系统迈出具体的一步。