As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based models have been proposed, there is still a lack of easy access to using and comparing them. We introduce ferret, a Python library to simplify the use and comparisons of XAI methods on transformer-based classifiers. With ferret, users can visualize and compare transformers-based models output explanations using state-of-the-art XAI methods on any free-text or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics to select the most faithful and plausible explanations. To align with the recently consolidated process of sharing and using transformers-based models from Hugging Face, ferret interfaces directly with its Python library. In this paper, we showcase ferret to benchmark XAI methods used on transformers for sentiment analysis and hate speech detection. We show how specific methods provide consistently better explanations and are preferable in the context of transformer models.
翻译:随着Transformer模型越来越多地被用于解决复杂的自然语言处理问题,其决策过程需要具备人类可解释性的需求日益增长。尽管已有多种可解释人工智能(XAI)技术被提出用于解释基于Transformer模型的输出,但在便捷使用和比较这些方法方面仍存在不足。本文推出ferret——一个Python库,旨在简化基于Transformer分类器的XAI方法的使用与比较。通过ferret,用户可对任何自由文本或现有XAI语料库,运用最先进的XAI方法可视化并比较基于Transformer模型的输出解释。此外,用户还能评估特定XAI指标,以选择最忠实且合理的解释。为适应当前从Hugging Face共享和使用基于Transformer模型的标准化流程,ferret直接与其Python库对接。本文通过情感分析和仇恨言论检测任务,展示了ferret对基于Transformer的XAI方法进行基准测试的能力,并揭示特定方法能在Transformer模型上下文中提供更优解释并更具适用性。