In-context learning (ICL) is becoming increasingly appealing to the AI community due to its flexibility, generality, sample efficiency, and exemption from artificial optimization skills. It is desirable to further enhance the generality and capability of ICL, which gives rise to the concept of general-purpose in-context learning (GPICL). We aim to extend ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential, albeit with relatively limited zero-shot generalization. To this end, we introduce two lightweight but insightful benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark includes a vast number of tasks characterized by significant task variance, featuring minimal transferable knowledge among tasks. These tasks are designed to facilitate lifelong in-context learning through continuous generation and interaction. These features pose significant challenges for models that rely on context or interactions to improve their proficiency, including language models, decision models, and world models. Our experiments reveal that the scale of parameters alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.
翻译:情境学习(ICL)因其灵活性、普适性、样本高效性以及无需人工优化技巧的特点,正日益受到人工智能社区的青睐。进一步提升情境学习的普适性与能力具有重要价值,这催生了通用情境学习(GPICL)的概念。我们的目标是将情境学习扩展到更广泛的任务范畴,延长其学习周期并提升改进潜力,尽管其零样本泛化能力相对有限。为此,我们引入了两个轻量级但具有深刻洞察力的基准测试,专门用于训练和评估GPICL功能。每个基准测试均包含大量具有显著任务差异性的任务,其特点是任务间可迁移知识极少。这些任务通过持续生成与交互的设计,支持终身情境学习。这些特性对依赖上下文或交互来提升能力的模型(包括语言模型、决策模型和世界模型)构成了重大挑战。实验结果表明,仅靠参数规模可能对ICL或GPICL并不关键,这提示了诸如扩大上下文规模与记忆状态等替代性研究方向。