In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.
翻译:上下文学习使得变换器模型能够仅基于输入提示泛化至新任务,无需任何权重更新。然而,现有训练范式通常依赖大规模非结构化数据集,这类数据不仅存储成本高昂、质量和平衡性难以评估,还会因包含敏感信息引发隐私和伦理问题。基于这些局限与风险,我们提出一种替代训练策略,利用多个小规模领域特定数据集的集合。实验证明,此类数据在质量和多样性上的提升,能够增强上下文学习器在训练领域之外的泛化能力,同时达到与在单一大规模数据集上训练的模型相当的性能。我们通过元学习在Meta-Album数据集集合上训练上下文学习器来探究这一范式。首先,我们在测试领域完全排除于训练知识的受控环境中展示其性能;其次,探索模型在信息仅可有限访问的连续场景中对遗忘的鲁棒性;最后,研究更具挑战性的无监督场景。研究结果表明,基于精选数据集集合训练的变换器仍能实现上下文预测的泛化,同时具备模块化和可替换性的优势。