Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from the data. Using public model libraries comprising thousands of models trained on canonical datasets like ImageNet, we observe that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other -- independent of overall performance. Given any arbitrary pairing of pretrained models and no external rankings (such as separate test sets, e.g. due to data privacy), we investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation -- a task made particularly difficult as additional knowledge can be contained in stronger, equiperformant or weaker models. Yet facilitating robust transfer in scenarios agnostic to pretrained model pairings would unlock auxiliary gains and knowledge fusion from any model repository without restrictions on model and problem specifics - including from weaker, lower-performance models. This work therefore provides an initial, in-depth exploration on the viability of such general-purpose knowledge transfer. Across large-scale experiments, we first reveal the shortcomings of standard knowledge distillation techniques, and then propose a much more general extension through data partitioning for successful transfer between nearly all pretrained models, which we show can also be done unsupervised. Finally, we assess both the scalability and impact of fundamental model properties on successful model-agnostic knowledge transfer.

翻译：训练深度网络需要在架构、数据增强或优化等方面做出多种设计决策。本研究发现，这些训练差异会导致网络从数据中学习到独特的特征集。通过使用包含数千个在ImageNet等规范数据集上训练的模型的公共模型库，我们观察到：对于任意预训练模型配对，其中一个模型能提取到另一个模型所不具备的重要数据上下文——且这一现象与整体性能无关。在给定任意预训练模型配对且无外部排名（例如因数据隐私而无法使用独立测试集）的情况下，我们探究了是否可能在不降低性能的前提下，将此类“互补性”知识从一个模型迁移至另一个模型——由于额外知识可能存在于更强、等性能或更弱的模型中，这项任务尤为困难。然而，在忽略预训练模型配对场景中实现稳健迁移，将能从任何模型库中释放辅助增益与知识融合潜力，且不受模型与问题特性的限制——包括从性能较弱的低水平模型中获取知识。因此，本文首次深入探索了此类通用知识迁移的可行性。通过大规模实验，我们首先揭示了标准知识蒸馏技术的局限性，进而提出了一种通过数据分区实现更广泛扩展的通用方法，该方法能够实现几乎所有预训练模型间的成功迁移，且我们证明了该过程可在无监督条件下完成。最后，我们评估了基础模型属性对成功实现模型无关知识迁移的可扩展性与影响。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日