Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

Few-shot learning is valuable in many real-world applications, but learning a generalizable model without overfitting to the few labeled datapoints is challenging. In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. Previous works have proposed automated methods for mixing auxiliary and target data, but these methods typically scale linearly (or worse) with the number of auxiliary datasets, limiting their practicality. In this work we relate FLAD to the explore-exploit dilemma that is central to the multi-armed bandit setting and derive algorithms whose computational complexity is independent of the number of auxiliary datasets, allowing us to scale to 100x more auxiliary datasets than prior methods. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit, finding that the combination of exploration and exploitation is crucial. Through extensive experimentation we find that our methods outperform all pre-existing FLAD methods by 4% and lead to the first 3 billion parameter language models that outperform the 175 billion parameter GPT-3. Overall, our work suggests that the discovery of better, more efficient mixing strategies for FLAD may provide a viable path towards substantially improving generalization in few-shot learning.

翻译：少样本学习在许多实际应用中具有重要价值，但学习一个可泛化的模型却难以避免对少量标注数据点的过拟合。本研究聚焦于辅助数据少样本学习（FLAD）这一训练范式，该范式假设在少样本学习过程中可获取辅助数据以提升泛化能力。已有研究提出了自动混合辅助数据与目标数据的方法，但这些方法通常与辅助数据集数量呈线性（甚至更差）的复杂度关系，限制了其实用性。本研究将FLAD与多臂老虎机设置中的探索-利用困境相关联，推导出计算复杂度与辅助数据集数量无关的算法，使我们能够处理比先前方法多100倍的辅助数据集。我们提出两种算法——EXP3-FLAD和UCB1-FLAD——并将其与先前仅探索或仅利用的FLAD方法进行比较，发现探索与利用的结合至关重要。通过广泛实验，我们的方法在所有已有FLAD方法基础上提升4%的性能，并首次实现了30亿参数语言模型超越1750亿参数GPT-3的结果。总体而言，本研究揭示出为FLAD发现更优、更高效的混合策略，或许能为显著提升少样本学习泛化能力提供可行路径。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日