The ubiquity of complex machine learning has raised the importance of model-agnostic explanation algorithms. These methods create artificial instances by slightly perturbing real instances, capturing shifts in model decisions. However, such methods rely on initial data and only provide explanations of the decision for these. To tackle these problems, we propose Therapy, the first global and model-agnostic explanation method adapted to text which requires no input dataset. Therapy generates texts following the distribution learned by a classifier through cooperative generation. Because it does not rely on initial samples, it allows to generate explanations even when data is absent (e.g., for confidentiality reasons). Moreover, conversely to existing methods that combine multiple local explanations into a global one, Therapy offers a global overview of the model behavior on the input space. Our experiments show that although using no input data to generate samples, Therapy provides insightful information about features used by the classifier that is competitive with the ones from methods relying on input samples and outperforms them when input samples are not specific to the studied model.
翻译:复杂机器学习的普适性提高了模型无关解释算法的重要性。这些方法通过轻微扰动真实实例创建人工实例,捕捉模型决策的偏移。然而,此类方法依赖于初始数据,仅能提供针对这些数据的决策解释。为解决这些问题,我们提出Therapy——首个无需输入数据集、适用于文本的全局且模型无关的解释方法。Therapy通过协同生成,遵循分类器学习到的分布生成文本。由于不依赖初始样本,即使在数据缺失(如因保密原因)的情况下也能生成解释。此外,与将多个局部解释组合为全局解释的现有方法不同,Therapy提供模型在输入空间上行为的全局概览。实验表明,尽管无需输入数据生成样本,Therapy仍能提供关于分类器使用特征的有价值信息,其效果与依赖输入样本的方法相当,并在输入样本不特定于研究模型时优于这些方法。