TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as deepfake texts. There are currently over 72K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and dis/misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as Authorship Attribution (AA), in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose TopFormer to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the Transformer-based model. We show the benefits of having a TDA layer when dealing with imbalanced, and multi-style datasets, by extracting TDA features from the reshaped $pooled\_output$ of our backbone as input. This Transformer-based model captures contextual representations (i.e., semantic and syntactic linguistic features), while TDA captures the shape and structure of data (i.e., linguistic structures). Finally, TopFormer, outperforms all baselines in all 3 datasets, achieving up to 7\% increase in Macro F1 score. Our code and datasets are available at: https://github.com/AdaUchendu/topformer

翻译：大型语言模型（LLM）的最新进展使得能够生成开放式的高质量文本，这些文本与人类撰写的文本难以区分。我们将此类LLM生成的文本称为深度伪造文本。目前，huggingface模型库中已有超过7.2万个文本生成模型。因此，怀有恶意的用户可以轻易利用这些开源LLM大规模生成有害文本及虚假/误导信息。为缓解这一问题，亟需一种能够判定给定文本是否为深度伪造文本的计算方法——即图灵测试。具体而言，本研究探讨了该问题在更广义的多类别场景下的形式，即作者归属识别问题——不仅需要判断给定文本是否为深度伪造文本，还需精确定位生成该文本的特定LLM。我们提出TopFormer模型，通过在基于Transformer的架构中引入拓扑数据分析层，以捕捉深度伪造文本中更丰富的语言模式，从而改进现有作者归属识别方案。我们通过从骨干网络重塑后的$pooled\_output$中提取拓扑特征作为输入，证明了在处理不平衡及多风格数据集时，拓扑数据分析层具有显著优势。基于Transformer的模型能够捕捉上下文表征（即语义与句法语言特征），而拓扑数据分析则能捕捉数据的几何形态与结构特征（即语言结构）。最终，TopFormer在全部三个数据集上均超越所有基线模型，宏观F1分数最高提升7%。我们的代码与数据集已开源：https://github.com/AdaUchendu/topformer