In-context learning (ICL) capabilities is becoming increasingly appealing towards building general intelligence. Taking this concept one step further, we draw a parallel to humans and many animals, who inherit primarily learning capabilities but refine their memory and acquire diverse skills and knowledge through extensive lifelong experiences. This parallel inspires our approach to general purpose in-context learning (GPICL). This paper introduces two lightweight but insightful benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark encompasses a wide range of diverse tasks characterized by generation and interaction, minimal transferable knowledge, and long-term dependency. These features present significant challenges for models that primarily rely on context or interactions to enhance their proficiency. We hope that these benchmarks will not only advance research in GPICL but also contribute significantly to the broader field of general intelligence.
翻译:情境学习(ICL)能力对于构建通用智能正展现出日益显著的价值。我们将这一概念进一步延伸,类比人类与许多动物的认知机制:它们虽先天具备基础学习能力,却需通过终身的丰富经验来优化记忆系统并掌握多样化的技能与知识。这一类比启发了我们对通用情境学习(GPICL)的研究路径。本文提出两个轻量级但具有深刻洞察力的基准测试框架,专门用于训练与评估GPICL功能。每个基准测试均涵盖大量多样化任务,这些任务具有生成与交互特性、可迁移知识量极少以及长期依赖关系等特征。这些特性对主要依赖上下文或交互来提升性能的模型构成了重大挑战。我们期望这些基准测试不仅能推动GPICL领域的研究进展,也能为更广泛的通用智能领域作出重要贡献。