DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy. Furthermore, recreating existing figures that are not stored in formats preserving semantic information is equally complex. To tackle this problem, we introduce DeTikZify, a novel multimodal language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs based on sketches and existing figures. To achieve this, we create three new datasets: DaTikZv2, the largest TikZ dataset to date, containing over 360k human-created TikZ graphics; SketchFig, a dataset that pairs hand-drawn sketches with their corresponding scientific figures; and MetaFig, a collection of diverse scientific figures and associated metadata. We train DeTikZify on MetaFig and DaTikZv2, along with synthetically generated sketches learned from SketchFig. We also introduce an MCTS-based inference algorithm that enables DeTikZify to iteratively refine its outputs without the need for additional training. Through both automatic and human evaluation, we demonstrate that DeTikZify outperforms commercial Claude 3 and GPT-4V in synthesizing TikZ programs, with the MCTS algorithm effectively boosting its performance. We make our code, models, and datasets publicly available.

翻译：创建高质量的科学图表可能耗时且具有挑战性，尽管在纸上勾勒想法相对容易。此外，重新绘制那些未以保留语义信息的格式存储的现有图表同样复杂。为解决此问题，我们提出了DeTikZify，一种新颖的多模态语言模型，能够基于草图和现有图表，自动将科学图表合成为保留语义的TikZ图形程序。为实现这一目标，我们创建了三个新数据集：DaTikZv2，迄今为止最大的TikZ数据集，包含超过36万个人工创建的TikZ图形；SketchFig，一个将手绘草图与其对应的科学图表配对的数据集；以及MetaFig，一个包含多样化科学图表及相关元数据的集合。我们在MetaFig和DaTikZv2上训练DeTikZify，同时结合从SketchFig学习生成的合成草图。我们还引入了一种基于MCTS的推理算法，使DeTikZify能够在无需额外训练的情况下迭代优化其输出。通过自动评估和人工评估，我们证明DeTikZify在合成TikZ程序方面优于商业模型Claude 3和GPT-4V，且MCTS算法有效提升了其性能。我们将代码、模型和数据集公开提供。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日