Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding

Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 5 widely used pretrained language models and find it particularly superior for models with few parameters or long input.

翻译：当前自然语言理解（NLU）模型在模型规模和输入上下文方面持续扩展，引入了更多隐藏神经元和输入神经元。尽管这通常能提升平均性能，但额外神经元并未对所有实例带来一致改进。这是因为部分隐藏神经元存在冗余，而输入神经元中混合的噪声易干扰模型。以往研究主要通过额外后处理或预处理（如网络剪枝和上下文选择）从外部减少低效用神经元，以规避该问题。然而，我们能否让模型通过内在增强每个神经元的效用，自主减少冗余参数并抑制输入噪声？若模型能高效利用神经元，则无论哪些神经元被消融（禁用），消融后的子模型性能都应不优于原始完整模型。基于这一模型间比较原则，我们提出了一种适用于广泛任务的跨模型对比损失。对比损失本质上是一种排序损失，叠加在完整模型与消融模型的任务特定损失之上，预期完整模型的任务特定损失最小。我们通过在基于5个广泛使用的预训练语言模型的3个不同NLU任务中的14个数据集上的大量实验，验证了对比损失的普适有效性，并发现其对参数较少或输入较长的模型尤为优越。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日