Development of large language models (LLM) have shown progress on reasoning, though studies have been limited to English or simple reasoning tasks. We thus introduce a multilingual structured reasoning and explanation dataset, termed xSTREET, that covers four tasks across six languages. xSTREET exposes a gap in base LLM performance between English and non-English reasoning tasks. We then propose two methods to remedy this gap, building on the insight that LLMs trained on code are better reasoners. First, at training time, we augment a code dataset with multi-lingual comments using machine translation while keeping program code as-is. Second, at inference time, we bridge the gap between training and inference by employing a prompt structure that incorporates step-by-step code primitives to derive new facts and find a solution. Our methods show improved multilingual performance on xSTREET, most notably on the scientific commonsense reasoning subtask. Furthermore, the models show no regression on non-reasoning tasks, thus showing our techniques maintain general-purpose abilities.
翻译:大语言模型(LLM)的发展在推理方面取得了进展,但相关研究仅限于英语或简单推理任务。为此,我们引入了一个多语言结构化推理与解释数据集,命名为xSTREET,涵盖六种语言的四项任务。xSTREET揭示了基础LLM在英语与非英语推理任务之间的性能差距。随后,我们提出两种方法弥合这一差距,其基础在于受代码训练的LLM具备更优推理能力的洞见。首先,在训练阶段,我们利用机器翻译为代码数据集添加多语言注释,同时保持程序代码不变。其次,在推理阶段,我们通过采用一种包含逐步代码原语的提示结构来推导新事实并找到解决方案,从而弥合训练与推理之间的鸿沟。我们的方法在xSTREET上展现了改进的多语言性能,尤其在科学常识推理子任务中最为显著。此外,这些模型在非推理任务上未出现性能下降,表明我们的技术保持了通用能力。