How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representation), our work is the first effort focused on the general encoding of structured data to be used for various reasoning tasks. We show that explicitly representing the graph structure allows significant improvements to graph reasoning tasks. Specifically, we see across the board improvements - up to 73% points - on node, edge and, graph-level tasks from the GraphQA benchmark.
翻译:如何最优地将结构化数据编码为序列形式,以供大语言模型使用?本文提出一种参数高效的方法,为LLMs显式表示结构化数据。该方法名为GraphToken,通过学习编码函数,将显式结构化信息扩展至提示词中。不同于聚焦有限领域(如知识图谱表示)的相关研究,本工作是首次致力于面向通用推理任务的结构化数据编码。研究表明,显式表示图结构能显著提升图推理任务性能。具体而言,在GraphQA基准测试的节点级、边级和图级任务中,我们观察到全面改进——最高提升73个百分点。