I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation algorithms? The key intellectual challenge is to design a learning algorithm that achieve a competitive level of commonsense acquisition, without relying on the benefits of scale. In particular, we study generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale teacher model with two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model's own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date.

翻译：预训练语言模型的常识能力随规模扩大而显著提升，这使许多人相信规模是唯一制胜法则。但果真如此吗？本文探讨了一种看似不可能的反向路径：若辅以新型常识蒸馏算法，较小规模的语言模型（如GPT-2）能否超越规模大数个量级的更优模型（如GPT-3）？核心学术挑战在于设计不依赖规模优势却能达到同等常识获取水平的学习算法。我们聚焦常识知识的生成式模型，研究"类属陈述"（generic statements）的生成任务——即关于日常概念的常识性事实表述（如"鸟会飞"）。我们提出I2D2框架，该新型常识蒸馏体系虽借鉴了West等人的符号知识蒸馏架构，但通过两项创新突破了对极端规模教师模型的依赖：（1）创新性地适配神经逻辑解码技术，提升弱监督离线语言模型的生成质量；（2）采用自我模仿学习机制，使模型能从自身不断增强的常识获取能力中迭代学习。实证结果表明，规模并非唯一路径，创新算法可成为极具潜力的替代方案。此外，本研究产出了迄今规模最大、质量最高的类属知识语料库Gen-A-tomic。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日