梯度路由：通过掩蔽梯度实现神经网络计算局部化 (Gradient Routing: Masking Gradients to Localize Computation in Neural Networks)

Neural networks are trained primarily based on their inputs and outputs, without regard for their internal mechanisms. These neglected mechanisms determine properties that are critical for safety, like (i) transparency; (ii) the absence of sensitive information or harmful capabilities; and (iii) reliable generalization of goals beyond the training distribution. To address this shortcoming, we introduce gradient routing, a training method that isolates capabilities to specific subregions of a neural network. Gradient routing applies data-dependent, weighted masks to gradients during backpropagation. These masks are supplied by the user in order to configure which parameters are updated by which data points. We show that gradient routing can be used to (1) learn representations which are partitioned in an interpretable way; (2) enable robust unlearning via ablation of a pre-specified network subregion; and (3) achieve scalable oversight of a reinforcement learner by localizing modules responsible for different behaviors. Throughout, we find that gradient routing localizes capabilities even when applied to a limited, ad-hoc subset of the data. We conclude that the approach holds promise for challenging, real-world applications where quality data are scarce.

翻译：神经网络的训练主要基于其输入和输出，而忽略了其内部机制。这些被忽视的机制决定了对于安全性至关重要的特性，例如：（i）透明度；（ii）敏感信息或有害能力的缺失；以及（iii）目标在训练分布之外的可泛化性。为弥补这一不足，我们提出了梯度路由，这是一种将能力隔离到神经网络特定子区域的训练方法。梯度路由在反向传播过程中对梯度施加数据依赖的加权掩码。这些掩码由用户提供，用于配置哪些参数由哪些数据点更新。我们证明梯度路由可用于：（1）学习以可解释方式划分的表征；（2）通过消融预指定的网络子区域实现鲁棒的遗忘学习；（3）通过定位负责不同行为的模块，实现对强化学习器的可扩展监督。在整个研究中，我们发现即使仅应用于有限、临时的数据子集，梯度路由仍能有效实现能力局部化。我们得出结论，该方法在高质量数据稀缺的具有挑战性的现实应用中具有广阔前景。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日