In the realm of embodied artificial intelligence, the reasoning capabilities of Large Language Models (LLMs) play a pivotal role. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
翻译:在具身人工智能领域,大型语言模型(LLMs)的推理能力起着关键作用。虽然存在诸如程序式思维提示(program-of-thought prompting)等有效方法,该方法利用编程语言处理复杂推理任务,但代码数据对推理能力提升的具体影响仍尚待探索。为弥补这一研究空白,我们提出了复杂度影响推理评分(CIRS),该评分结合了结构与逻辑属性,用于衡量代码与推理能力之间的关联。具体而言,我们利用抽象语法树编码结构信息,并通过考虑难度与圈复杂度来计算逻辑复杂度。通过实证分析发现,LLMs 并非能学习或理解所有复杂度级别的代码数据。最优复杂度水平对于通过程序辅助提示提升推理能力至关重要。基于此,我们设计了一种自动合成与分层算法,并将其应用于数学推理任务的指令生成以及代码生成任务的代码数据过滤。大量结果表明了我们所提方法的有效性。代码将集成至 EasyInstruct 框架(https://github.com/zjunlp/EasyInstruct)中。