Data visualization has emerged as an effective tool for getting insights from massive datasets. Due to the hardness of manipulating the programming languages of data visualization, automatic data visualization generation from natural languages (Text-to-Vis) is becoming increasingly popular. Despite the plethora of research effort on the English Text-to-Vis, studies have yet to be conducted on data visualization generation from questions in Chinese. Motivated by this, we propose a Chinese Text-to-Vis dataset in the paper and demonstrate our first attempt to tackle this problem. Our model integrates multilingual BERT as the encoder, boosts the cross-lingual ability, and infuses the $n$-gram information into our word representation learning. Our experimental results show that our dataset is challenging and deserves further research.
翻译:数据可视化已成为从海量数据集中获取洞察的有效工具。由于数据可视化编程语言操作的困难性,面向自然语言的自动数据可视化生成(Text-to-Vis)正日益流行。尽管针对英文Text-to-Vis已有大量研究工作,但尚未有研究针对中文问题生成数据可视化。受此启发,本文提出了一个中文Text-to-Vis数据集,并展示了我们对该问题的首次探索尝试。我们的模型采用多语言BERT作为编码器,增强了跨语言能力,并将$n$-元语法信息融入词表示学习。实验结果表明,我们的数据集具有挑战性,值得进一步研究。