Can Large Language Models Generalize Procedures Across Representations?

Large language models (LLMs) are trained and tested extensively on symbolic representations such as code and graphs, yet real-world user tasks are often specified in natural language. To what extent can LLMs generalize across these representations? Here, we approach this question by studying isomorphic tasks involving procedures represented in code, graphs, and natural language (e.g., scheduling steps in planning). We find that training LLMs with popular post-training methods on graphs or code data alone does not reliably generalize to corresponding natural language tasks, while training solely on natural language can lead to inefficient performance gains. To address this gap, we propose a two-stage reinforcement learning curriculum that first trains on symbolic, then natural language data. The curriculum substantially improves model performance across model families and tasks. Remarkably, a 1.5B Qwen model trained by our method can closely match zero-shot GPT-4o in naturalistic planning. Finally, our analysis suggests that successful cross-representation generalization can be interpreted as a form of generative analogy, which our curriculum effectively encourages. The dataset and code used in this paper can be found \href{https://github.com/fangru-lin/procedure_generalization_llm}{here}.

翻译：大型语言模型（LLMs）在代码和图等符号化表示上进行了大量训练与测试，然而现实中的用户任务通常以自然语言描述。这些模型在不同表示形式间的泛化能力如何？本文通过研究以代码、图和自然语言（如规划中的步骤调度）表示的过程间构任务来探讨这一问题。我们发现，仅使用图或代码数据对LLMs进行主流后训练方法，无法可靠地泛化至对应的自然语言任务；而仅使用自然语言训练则可能导致性能提升效率低下。为解决这一差距，我们提出一种两阶段强化学习课程：先训练符号化数据，再训练自然语言数据。该课程显著提升了不同模型族与任务下的模型性能。值得注意的是，采用我们方法训练的1.5B Qwen模型在自然语言规划任务中几乎能够匹敌零样本GPT-4o。最后，分析表明，成功的跨表示泛化可被解释为一种生成式类比，而我们的课程有效促进了这种类比。本文使用的数据集和代码可在\href{https://github.com/fangru-lin/procedure_generalization_llm}{此链接}获取。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

如何将领域知识注入大模型？最新《将领域特定知识注入大语言模型》综述

专知会员服务

79+阅读 · 2025年2月24日

扩展英语大语言模型到新语言的综述

专知会员服务

18+阅读 · 2024年8月15日

大型语言模型在不同自然语言处理任务中的提示工程方法综述

专知会员服务

60+阅读 · 2024年7月21日

《大型语言模型代码生成》综述

专知会员服务

70+阅读 · 2024年6月4日