Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
翻译:注意力机制构成了多种成功深度学习架构的核心组件,其关键思想是:“输出仅依赖于输入的很小(但未知)部分。”在图像描述和语言翻译等若干实际应用中,这一论断基本成立。在训练有素的、配备注意力机制的模型中,常将负责编码输出所依赖输入部分的中间模块输出作为观察网络“推理过程”的手段。我们针对分类问题的一个变体(称为选择性依赖分类,SDC)与注意力模型架构结合的情况,使上述概念更加精确。在此设定下,我们演示了注意力模型可能准确但不可解释的多种错误模式,并表明此类模型确实会因训练而产生。我们说明了可能加剧或缓解此行为的各种情形。最后,我们利用为SDC任务定义的可解释性客观目标,评估了若干旨在促进稀疏性的注意力模型学习算法,并证明这些算法有助于提升可解释性。