Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as deepfake texts. There are currently over 72K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and dis/misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as Authorship Attribution (AA), in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose TopFormer to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the Transformer-based model. We show the benefits of having a TDA layer when dealing with imbalanced, and multi-style datasets, by extracting TDA features from the reshaped $pooled\_output$ of our backbone as input. This Transformer-based model captures contextual representations (i.e., semantic and syntactic linguistic features), while TDA captures the shape and structure of data (i.e., linguistic structures). Finally, TopFormer, outperforms all baselines in all 3 datasets, achieving up to 7\% increase in Macro F1 score.
翻译:大语言模型(LLMs)的最新进展能够生成开放域高质量文本,这些文本与人类撰写的文本难以区分。我们将此类LLM生成的文本称为深度伪造文本。当前HuggingFace模型库中已有超过7.2万个文本生成模型,恶意用户可轻易利用这些开源LLM大规模生成有害文本及虚假/错误信息。为应对该问题,需要一种计算方法来判定给定文本是否为深度伪造文本——即图灵测试(TT)。本文进一步探究该问题的更普遍形式——多类别场景下的作者归因(AA),不仅需要判定文本是否由LLM生成,还需精准定位其所属的具体LLM模型。我们提出TopFormer,通过在基于Transformer的模型中引入拓扑数据分析(TDA)层,捕捉深度伪造文本中更多语言模式,从而改进现有AA方案。通过从骨干网络重塑后的$pooled\_output$中提取TDA特征作为输入,我们证明了在处理不平衡与多风格数据集时引入TDA层的优势。该Transformer模型捕获上下文表征(即语义与句法语言特征),而TDA则捕获数据的形状与结构(即语言结构)。最终,TopFormer在所有三个数据集中均超越基线模型,Macro F1分数最高提升7%。