Retrieving charts from a large corpus is a fundamental task that can benefit numerous applications such as visualization recommendations.The retrieved results are expected to conform to both explicit visual attributes (e.g., chart type, colormap) and implicit user intents (e.g., design style, context information) that vary upon application scenarios. However, existing example-based chart retrieval methods are built upon non-decoupled and low-level visual features that are hard to interpret, while definition-based ones are constrained to pre-defined attributes that are hard to extend. In this work, we propose a new framework, namely WYTIWYR (What-You-Think-Is-What-You-Retrieve), that integrates user intents into the chart retrieval process. The framework consists of two stages: first, the Annotation stage disentangles the visual attributes within the bitmap query chart; and second, the Retrieval stage embeds the user's intent with customized text prompt as well as query chart, to recall targeted retrieval result. We develop a prototype WYTIWYR system leveraging a contrastive language-image pre-training (CLIP) model to achieve zero-shot classification, and test the prototype on a large corpus with charts crawled from the Internet. Quantitative experiments, case studies, and qualitative interviews are conducted. The results demonstrate the usability and effectiveness of our proposed framework.
翻译:从大规模语料库中检索图表是一项基础任务,可惠及可视化推荐等众多应用。检索结果需同时符合显式视觉属性(如图表类型、色图)和隐式用户意图(如设计风格、上下文信息),且这些因素随应用场景动态变化。然而,现有基于示例的图表检索方法依赖难以解释的非解耦低层视觉特征,而基于定义的方法受限于预定义属性且难以扩展。本研究提出名为WYTIWYR(所思即所得)的新框架,将用户意图融入图表检索过程。该框架包含两个阶段:首先,标注阶段解耦位图查询图表中的视觉属性;其次,检索阶段通过定制文本提示与查询图表嵌入用户意图,召回目标检索结果。我们利用对比语言-图像预训练(CLIP)模型开发了WYTIWYR原型系统,实现零样本分类,并在包含互联网爬取图表的大规模语料库上进行测试。通过定量实验、案例研究和定性访谈,验证了所提框架的可用性与有效性。