Understanding the internal reasoning behind the predictions of machine learning systems is increasingly vital, given their rising adoption and acceptance. While previous approaches, such as LIME, generate algorithmic explanations by attributing importance to input features for individual examples, recent research indicates that practitioners prefer examining language explanations that explain sub-groups of examples. In this paper, we introduce MaNtLE, a model-agnostic natural language explainer that analyzes multiple classifier predictions and generates faithful natural language explanations of classifier rationale for structured classification tasks. MaNtLE uses multi-task training on thousands of synthetic classification tasks to generate faithful explanations. Simulated user studies indicate that, on average, MaNtLE-generated explanations are at least 11% more faithful compared to LIME and Anchors explanations across three tasks. Human evaluations demonstrate that users can better predict model behavior using explanations from MaNtLE compared to other techniques
翻译:理解机器学习系统预测背后的内部推理过程日益重要,这得益于其不断增长的采用与接受程度。尽管先前的方法(如LIME)通过为单个示例的输入特征赋予重要性来生成算法解释,但近期研究表明,实践者更倾向于审视能够解释子组样本的语言解释。本文提出MaNtLE,一种模型无关的自然语言解释器,它分析多个分类器预测结果,并为结构化分类任务生成关于分类器推理依据的忠实自然语言解释。MaNtLE通过数千个合成分类任务的多任务训练来生成忠实解释。模拟用户研究表明,在三个任务中,MaNtLE生成的解释平均比LIME和Anchors解释的忠实度至少高出11%。人类评估证明,与其他技术相比,用户能够通过MaNtLE生成的解释更准确地预测模型行为。