Large language models show an emergent ability to learn a new task from a small number of input-output demonstrations. However, recent work shows that in-context learners largely rely on their pre-trained knowledge, such as the sentiment of the labels, instead of finding new associations in the input. However, the commonly-used few-shot evaluation settings using a random selection of in-context demonstrations can not disentangle models' ability to learn a new skill from demonstrations, as most of the randomly-selected demonstrations do not present relations informative for prediction beyond exposing the new task distribution. To disentangle models' in-context learning ability independent of models' memory, we introduce a Conceptual few-shot learning method selecting the demonstrations sharing a possibly-informative concept with the predicted sample. We extract a set of such concepts from annotated explanations and measure how much can models benefit from presenting these concepts in few-shot demonstrations. We find that smaller models are more sensitive to the presented concepts. While some of the models are able to benefit from concept-presenting demonstrations for each assessed concept, we find that none of the assessed in-context learners can benefit from all presented reasoning concepts consistently, leaving the in-context concept learning an open challenge.
翻译:大型语言模型展现出从少量输入输出演示中学习新任务的涌现能力。然而,近期研究表明,上下文学习者主要依赖其预训练知识(如标签的情感倾向),而非在输入中发现新关联。但常用的随机选择上下文演示的少样本评估设置无法区分模型从演示中学习新技能的能力,因为大多数随机选取的演示除了暴露新任务分布外,并未呈现对预测具有信息价值的关联。为厘清模型独立于记忆的上下文学习能力,我们提出了一种概念性少样本学习方法,该方法选择与预测样本共享潜在信息概念的演示。我们从注释解释中提取此类概念集,并衡量模型从包含这些概念的少样本演示中获益的程度。研究发现,较小模型对呈现的概念更为敏感。尽管部分模型能从针对各评估概念的概念性演示中获益,但所有被评估的上下文学习者均无法一致地从所有呈现的推理概念中受益,这使得上下文概念学习仍是一个未解决的挑战。