Compositional generalization--understanding unseen combinations of seen primitives--is an essential reasoning capability in human intelligence. The AI community mainly studies this capability by fine-tuning neural networks on lots of training samples, while it is still unclear whether and how in-context learning--the prevailing few-shot paradigm based on large language models--exhibits compositional generalization. In this paper, we present CoFe, a test suite to investigate in-context compositional generalization. We find that the compositional generalization performance can be easily affected by the selection of in-context examples, thus raising the research question what the key factors are to make good in-context examples for compositional generalization. We study three potential factors: similarity, diversity and complexity. Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple. Furthermore, two strong limitations are observed: in-context compositional generalization on fictional words is much weaker than that on commonly used ones; it is still critical that the in-context examples should cover required linguistic structures, even though the backbone model has been pre-trained on large corpus. We hope our analysis would facilitate the understanding and utilization of in-context learning paradigm.
翻译:组合泛化——理解已见原语的新奇组合——是人类智能中的一项基本推理能力。人工智能领域主要通过大量训练样本微调神经网络来研究这一能力,然而目前尚不明确基于大语言模型的流行少样本范式(上下文学习)是否以及如何展现组合泛化。本文提出CoFe测试套件以探究上下文组合泛化。我们发现组合泛化性能易受上下文示例选择的影响,由此引出研究问题:哪些关键因素能构成实现组合泛化的优质上下文示例。我们研究了三个潜在因素:相似性、多样性与复杂度。系统性实验表明,上下文示例应在结构上类似测试用例、彼此多样且单个简单。此外观察到两个明显局限:虚构词汇的上下文组合泛化显著弱于常用词汇;即使骨干模型已在大语料库上预训练,上下文示例仍需覆盖必要语言结构。希望我们的分析能促进对上下文学习范式的理解与应用。