Do Text-to-Vis Benchmarks Test Real Use of Visualisations?

Large language models are able to generate code for visualisations in response to user requests. This is a useful application, and an appealing one for NLP research because plots of data provide grounding for language. However, there are relatively few benchmarks, and it is unknown whether those that exist are representative of what people do in practice. This paper aims to answer that question through an empirical study comparing benchmark datasets and code from public repositories. Our findings reveal a substantial gap in datasets, with evaluations not testing the same distribution of chart types, attributes, and the number of actions. The only representative dataset requires modification to become an end-to-end and practical benchmark. This shows that new, more benchmarks are needed to support the development of systems that truly address users' visualisation needs. These observations will guide future data creation, highlighting which features hold genuine significance for users.

翻译：大型语言模型能够根据用户请求生成可视化代码。这是一个有用的应用，也是自然语言处理研究中一个吸引人的方向，因为数据图为语言提供了基础。然而，目前存在的基准测试相对较少，且尚不清楚现有的基准测试是否能够代表人们在实践中的真实需求。本文旨在通过一项实证研究来回答这个问题，该研究比较了基准数据集和来自公共代码仓库的代码。我们的研究结果揭示了数据集之间存在显著差距，现有评估并未测试相同分布的图表类型、属性以及操作数量。唯一具有代表性的数据集需要进行修改，才能成为一个端到端且实用的基准测试。这表明，需要新的、更多的基准测试来支持开发真正满足用户可视化需求的系统。这些观察结果将指导未来的数据创建工作，并突出哪些特征对用户具有真正的重要性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日