从隐式到显式：增强大语言模型的自我识别能力 (From Implicit to Explicit: Enhancing Self-Recognition in Large Language Models)

Large language models (LLMs) have been shown to possess a degree of self-recognition ability, which used to identify whether a given text was generated by themselves. Prior work has demonstrated that this capability is reliably expressed under the pair presentation paradigm (PPP), where the model is presented with two texts and asked to choose which one it authored. However, performance deteriorates sharply under the individual presentation paradigm (IPP), where the model is given a single text to judge authorship. Although this phenomenon has been observed, its underlying causes have not been systematically analyzed. In this paper, we first investigate the cause of this failure and attribute it to implicit self-recognition (ISR). ISR describes the gap between internal representations and output behavior in LLMs: under the IPP scenario, the model encodes self-recognition information in its feature space, yet its ability to recognize self-generated texts remains poor. To mitigate the ISR of LLMs, we propose cognitive surgery (CoSur), a novel framework comprising four main modules: representation extraction, subspace construction, authorship discrimination, and cognitive editing. Experimental results demonstrate that our proposed method improves the self-recognition performance of three different LLMs in the IPP scenario, achieving average accuracies of 99.00%, 97.69%, and 97.13%, respectively.

翻译：大语言模型（LLMs）已被证明具备一定程度的自我识别能力，即能够判断给定文本是否由自身生成。先前研究表明，在配对呈现范式（PPP）下，该能力能够可靠地表达——模型被同时呈现两段文本并被要求选择其生成的那一段。然而，在独立呈现范式（IPP）下，即模型仅被给予单段文本以判断作者身份时，其性能会急剧下降。尽管这一现象已被观察到，但其根本原因尚未得到系统分析。本文首先探究了这种失效的原因，并将其归因于隐式自我识别（ISR）。ISR描述了LLMs内部表征与输出行为之间的差距：在IPP场景下，模型在其特征空间中编码了自我识别信息，但其识别自身生成文本的能力仍然较弱。为缓解LLMs的ISR问题，我们提出认知手术（CoSur）这一新颖框架，该框架包含四个主要模块：表征提取、子空间构建、作者身份判别和认知编辑。实验结果表明，我们提出的方法在IPP场景下提升了三种不同LLMs的自我识别性能，平均准确率分别达到99.00%、97.69%和97.13%。