The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL), their potential for compositional generalization remains unexplored. In this paper, we first empirically investigate prevailing ICL methods in compositional generalization. We find that they struggle with complex compositional questions due to cumulative errors in long reasoning steps and intricate logic required for tool-making. Consequently, we propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Our method enhances the effectiveness of tool creation and usage with minimal human effort. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.
翻译:自然语言中复杂短语的意义由其各个组成部分组合而成。组合泛化任务评估模型理解新组合成分的能力。以往研究训练了规模较小、任务特定的模型,其泛化能力较弱。尽管大语言模型(LLM)通过上下文学习(ICL)在许多任务上表现出令人印象深刻的泛化能力,但其在组合泛化方面的潜力尚未被探索。本文首先实证研究了现有ICL方法在组合泛化中的表现,发现由于长推理步骤中的累积误差以及工具制作所需的复杂逻辑,这些方法难以处理复杂的组合问题。为此,我们提出了一种人类引导的工具操作框架(HTM),该框架为子问题生成工具并整合多个工具。我们的方法以最小的人力投入增强了工具创建和使用的有效性。实验表明,我们的方法在两个组合泛化基准测试上达到了最先进的性能,并在最具挑战性的测试划分中比现有方法提升了70%。