Although deep language representations have become the dominant form of language featurization in recent years, in many settings it is important to understand a model's decision-making process. This necessitates not only an interpretable model but also interpretable features. In particular, language must be featurized in a way that is interpretable while still characterizing the original text well. We present SenteCon, a method for introducing human interpretability in deep language representations. Given a passage of text, SenteCon encodes the text as a layer of interpretable categories in which each dimension corresponds to the relevance of a specific category. Our empirical evaluations indicate that encoding language with SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks. Moreover, we find that SenteCon outperforms existing interpretable language representations with respect to both its downstream performance and its agreement with human characterizations of the text.
翻译:尽管深度语言表征近年来已成为语言特征化的主导形式,但在许多场景中,理解模型的决策过程至关重要。这不仅需要可解释的模型,还需要可解释的特征。具体而言,语言的特征化方式必须兼具可解释性,同时仍能良好地表征原始文本。我们提出SenteCon,一种在深度语言表征中引入人类可解释性的方法。给定一段文本,SenteCon将文本编码为可解释类别的分层,其中每个维度对应特定类别的相关度。实证评估表明,使用SenteCon编码语言可在下游任务预测性能几乎不受损的前提下提供高层次的可解释性。此外,我们发现SenteCon在下游任务表现以及与人类对文本特征描述的契合度方面均优于现有可解释语言表征。