Few-shot learning is valuable in many real-world applications, but learning a generalizable model without overfitting to the few labeled datapoints is challenging. In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. Previous works have proposed automated methods for mixing auxiliary and target data, but these methods typically scale linearly (or worse) with the number of auxiliary datasets, limiting their practicality. In this work we relate FLAD to the explore-exploit dilemma that is central to the multi-armed bandit setting and derive algorithms whose computational complexity is independent of the number of auxiliary datasets, allowing us to scale to 100x more auxiliary datasets than prior methods. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit, finding that the combination of exploration and exploitation is crucial. Through extensive experimentation we find that our methods outperform all pre-existing FLAD methods by 4% and lead to the first 3 billion parameter language models that outperform the 175 billion parameter GPT-3. Overall, our work suggests that the discovery of better, more efficient mixing strategies for FLAD may provide a viable path towards substantially improving generalization in few-shot learning.
翻译:少样本学习在诸多实际应用中具有重要价值,但如何在避免对少量标注数据过拟合的同时学习具备泛化能力的模型仍具挑战性。本研究聚焦于基于辅助数据的少样本学习(FLAD),该训练范式假设在少样本学习过程中可访问辅助数据以提升泛化性能。现有工作提出了辅助数据与目标数据的自动化混合方法,但这些方法的计算复杂度通常与辅助数据集数量呈线性(或更差)关系,限制了其实用性。本文将FLAD与多臂老虎机框架中的探索-利用困境相关联,提出计算复杂度独立于辅助数据集数量的算法,使辅助数据处理规模达到现有方法的100倍以上。我们提出两种算法——EXP3-FLAD与UCB1-FLAD——并将其与仅进行探索或仅进行利用的现有FLAD方法对比,发现探索与利用的结合至关重要。通过大量实验证明,我们的方法在性能上超越所有现有FLAD方法4%,并首次实现30亿参数语言模型超越1750亿参数GPT-3。总体而言,本研究表明,为FLAD发现更优、更高效的混合策略,或将为显著提升少样本学习的泛化能力提供可行路径。