In-context learning (ICL) capabilities are becoming increasingly appealing for building general intelligence due to their sample efficiency and independence from artificial optimization skills. To enhance generalization, biological neural systems primarily inherit learning capabilities and subsequently refine their memory, acquiring diverse skills and knowledge through extensive lifelong experiences. This process gives rise to the concept of general-purpose in-context learning (GPICL). Compared to standard ICL, GPICL addresses a broader range of tasks, extends learning horizons, and starts at a lower zero-shot baseline. We introduce two lightweight but insightful benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark includes a vast number of tasks characterized by significant task variance and minimal transferable knowledge among tasks, facilitating lifelong in-context learning through continuous generation and interaction. These features pose significant challenges for models that rely on context or interactions to improve their proficiency, including language models, decision models, and world models. Our experiments reveal that parameter scale alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.
翻译:上下文学习(ICL)能力因其样本高效性及独立于人工优化技能的特性,正日益成为构建通用智能的重要途径。为增强泛化能力,生物神经系统主要通过继承学习能力并持续优化记忆,在广泛的生命历程中获取多样化的技能与知识。这一过程催生了通用上下文学习(GPICL)的概念。相较于标准ICL,GPICL覆盖更广泛的任务类型,延伸学习的时间跨度,且从更低的零样本基线起步。我们提出了两个轻量级但具有洞察力的基准测试,专门用于训练和评估GPICL功能。每个基准测试均包含海量任务,这些任务具有显著的任务差异性且任务间可迁移知识极少,通过持续生成与交互实现终身上下文学习。这些特性对依赖上下文或交互来提升能力的模型(包括语言模型、决策模型与世界模型)构成了重大挑战。实验表明,参数规模本身可能并非ICL或GPICL的关键因素,这提示了其他可能路径,例如扩展上下文与记忆状态的规模。