The proliferation of deep neural networks in various domains has seen an increased need for the interpretability of these models, especially in scenarios where fairness and trust are as important as model performance. A lot of independent work is being carried out to: i) analyze what linguistic and non-linguistic knowledge is learned within these models, and ii) highlight the salient parts of the input. We present NxPlain, a web application that provides an explanation of a model's prediction using latent concepts. NxPlain discovers latent concepts learned in a deep NLP model, provides an interpretation of the knowledge learned in the model, and explains its predictions based on the used concepts. The application allows users to browse through the latent concepts in an intuitive order, letting them efficiently scan through the most salient concepts with a global corpus level view and a local sentence-level view. Our tool is useful for debugging, unraveling model bias, and for highlighting spurious correlations in a model. A hosted demo is available here: https://nxplain.qcri.org.
翻译:深度神经网络在各个领域的广泛应用使得这些模型的可解释性需求日益增加,尤其在公平性和可信度与模型性能同等重要的场景中。目前已有大量独立研究致力于:i) 分析这些模型中习得的语言及非语言知识,以及 ii) 突出输入中的关键部分。本文提出NxPlain这一Web应用程序,通过潜在概念解释模型预测结果。NxPlain能够发现深度NLP模型中习得的潜在概念,对模型中蕴含的知识进行解读,并基于所使用的概念解释其预测逻辑。该应用允许用户以直观顺序浏览潜在概念,通过全局语料级视图和局部句子级视图高效筛选最显著的概念。本工具可用于模型调试、揭示模型偏差及识别虚假相关性。托管演示地址:https://nxplain.qcri.org。