The development of large language models (LLM) has shown progress on reasoning, though studies have largely considered either English or simple reasoning tasks. To address this, we introduce a multilingual structured reasoning and explanation dataset, termed xSTREET, that covers four tasks across six languages. xSTREET exposes a gap in base LLM performance between English and non-English reasoning tasks. We then propose two methods to remedy this gap, building on the insight that LLMs trained on code are better reasoners. First, at training time, we augment a code dataset with multilingual comments using machine translation while keeping program code as-is. Second, at inference time, we bridge the gap between training and inference by employing a prompt structure that incorporates step-by-step code primitives to derive new facts and find a solution. Our methods show improved multilingual performance on xSTREET, most notably on the scientific commonsense reasoning subtask. Furthermore, the models show no regression on non-reasoning tasks, thus demonstrating our techniques maintain general-purpose abilities.
翻译:大型语言模型(LLM)的发展在推理方面取得了进展,但现有研究主要关注英语或简单推理任务。针对这一问题,我们提出了一个多语言结构化推理与解释数据集xSTREET,涵盖六种语言中的四项任务。xSTREET揭示了基础LLM在英语与非英语推理任务之间的性能差距。基于代码训练的LLM具有更优推理能力这一发现,我们提出了两种弥补该差距的方法:第一,在训练阶段,我们通过机器翻译为代码数据集添加多语言注释,同时保持程序代码不变;第二,在推理阶段,我们通过采用一种融合逐步代码原语的提示结构来弥合训练与推理之间的鸿沟,从而推导出新事实并找到解决方案。我们的方法在xSTREET上提升了多语言性能,尤其在科学常识推理子任务上效果显著。此外,这些模型在非推理任务上未出现性能退化,表明我们的技术保持了通用能力。