Text classifiers built on Pre-trained Language Models (PLMs) have achieved remarkable progress in various tasks including sentiment analysis, natural language inference, and question-answering. However, the occurrence of uncertain predictions by these classifiers poses a challenge to their reliability when deployed in practical applications. Much effort has been devoted to designing various probes in order to understand what PLMs capture. But few studies have delved into factors influencing PLM-based classifiers' predictive uncertainty. In this paper, we propose a novel framework, called CUE, which aims to interpret uncertainties inherent in the predictions of PLM-based models. In particular, we first map PLM-encoded representations to a latent space via a variational auto-encoder. We then generate text representations by perturbing the latent space which causes fluctuation in predictive uncertainty. By comparing the difference in predictive uncertainty between the perturbed and the original text representations, we are able to identify the latent dimensions responsible for uncertainty and subsequently trace back to the input features that contribute to such uncertainty. Our extensive experiments on four benchmark datasets encompassing linguistic acceptability classification, emotion classification, and natural language inference show the feasibility of our proposed framework. Our source code is available at: https://github.com/lijiazheng99/CUE.
翻译:基于预训练语言模型的文本分类器在情感分析、自然语言推理和问答等多项任务中取得了显著进展。然而,这些分类器在预测中产生的不确定性对其在实际应用中的可靠性构成了挑战。已有大量研究致力于设计各类探针以理解预训练语言模型所捕获的内容,但鲜有研究深入探讨影响基于预训练语言模型的分类器预测不确定性的因素。本文提出了一种名为CUE的新框架,旨在解释基于预训练语言模型预测中固有的不确定性。具体而言,我们首先通过变分自编码器将预训练语言模型编码的表示映射到潜在空间,随后通过扰动潜在空间生成文本表示,从而引起预测不确定性的波动。通过比较扰动后与原始文本表示之间的预测不确定性差异,我们能够识别导致不确定性的潜在维度,并进而追溯至产生这种不确定性的输入特征。我们在涵盖语言可接受性分类、情感分类和自然语言推理的四个基准数据集上进行了大量实验,验证了所提框架的可行性。我们的源代码可在https://github.com/lijiazheng99/CUE获取。