Self-supervised representation learning has proved to be a valuable component for out-of-distribution (OoD) detection with only the texts of in-distribution (ID) examples. These approaches either train a language model from scratch or fine-tune a pre-trained language model using ID examples, and then take the perplexity output by the language model as OoD scores. In this paper, we analyze the complementary characteristics of both OoD detection methods and propose a multi-level knowledge distillation approach that integrates their strengths while mitigating their limitations. Specifically, we use a fine-tuned model as the teacher to teach a randomly initialized student model on the ID examples. Besides the prediction layer distillation, we present a similarity-based intermediate layer distillation method to thoroughly explore the representation space of the teacher model. In this way, the learned student can better represent the ID data manifold while gaining a stronger ability to map OoD examples outside the ID data manifold with the regularization inherited from pre-training. Besides, the student model sees only ID examples during parameter learning, further promoting more distinguishable features for OoD detection. We conduct extensive experiments over multiple benchmark datasets, i.e., CLINC150, SST, ROSTD, 20 NewsGroups, and AG News; showing that the proposed method yields new state-of-the-art performance. We also explore its application as an AIGC detector to distinguish between answers generated by ChatGPT and human experts. It is observed that our model exceeds human evaluators in the pair-expert task on the Human ChatGPT Comparison Corpus.
翻译:自监督表示学习已被证明是仅利用分布内(ID)文本示例进行分布外(OoD)检测的有效组成部分。这类方法要么从头训练语言模型,要么使用ID示例微调预训练语言模型,并将语言模型输出的困惑度作为OoD分数。本文分析了这两种OoD检测方法的互补特性,提出了一种融合二者优势并弥补其局限性的多层级知识蒸馏方法。具体而言,我们使用微调模型作为教师模型,在ID示例上指导随机初始化的学生模型。除预测层蒸馏外,我们提出了一种基于相似性的中间层蒸馏方法,以充分探索教师模型的表示空间。通过这种机制,学习到的学生模型不仅能更好地表征ID数据流形,还能借助预训练继承的正则化能力,将OoD示例映射到ID数据流形之外。此外,学生模型在参数学习过程中仅接触ID示例,进一步提升了OoD检测的可判别特征。我们在多个基准数据集(CLINC150、SST、ROSTD、20 NewsGroups和AG News)上进行了广泛实验,结果表明所提方法取得了新的最优性能。我们还探索了该方法作为AIGC检测器的应用,用于区分ChatGPT生成答案与人类专家答案。在Human ChatGPT Comparison Corpus的成对专家任务中,我们的模型表现优于人类评估者。