Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.
翻译:自然语言处理的神经网络正日益复杂且广泛应用,但人们日益担忧这些模型是否可负责任地使用。模型解释有助于解决安全与伦理问题,对问责机制至关重要。可解释性旨在以人类可理解的方式提供这些解释。此外,后验方法可在模型训练完成后提供解释,且通常具有模型无关性。本综述对近期后验可解释性方法向人类传递解释的途径进行了分类,深入讨论了每种方法及其验证方式——后者常是普遍关注的问题。