Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as \emph{deepfake texts}. There are currently over 11K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as \emph{Authorship Attribution (AA)}, in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose \textbf{TopRoBERTa} to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the RoBERTa model. We show the benefits of having a TDA layer when dealing with noisy, imbalanced, and heterogeneous datasets, by extracting TDA features from the reshaped $pooled\_output$ of RoBERTa as input. We use RoBERTa to capture contextual representations (i.e., semantic and syntactic linguistic features), while using TDA to capture the shape and structure of data (i.e., linguistic structures). Finally, \textbf{TopRoBERTa}, outperforms the vanilla RoBERTa in 2/3 datasets, achieving up to 7\% increase in Macro F1 score.
翻译:摘要:近期大语言模型(LLMs)的进展使得生成开放式高质量文本成为可能,且此类文本与人类撰写的文本难以区分。我们将这种由LLM生成的文本称为"深度伪造文本"。目前Huggingface模型仓库中已有超过1.1万个文本生成模型,意图不轨的用户可轻易利用这些开源LLM大规模生成有害文本与虚假信息。为缓解该问题,亟需一种计算方法来判定给定文本是否为深度伪造文本——即图灵测试(TT)。具体而言,本研究在多类设置下探究该问题的广义版本——作者归属(AA):不仅需要判断文本是否为深度伪造文本,还需精准识别生成该文本的LLM。我们提出**TopRoBERTa**方法,通过在RoBERTa模型中引入拓扑数据分析(TDA)层,捕获深度伪造文本中更多语言模式,从而改进现有AA解决方案。通过从RoBERTa重塑后的$pooled\_output$中提取TDA特征作为输入,我们证明了TDA层在处理含噪、类别不平衡及异质性数据集时的优势。我们利用RoBERTa捕获上下文表征(即语义与句法语言特征),同时借助TDA捕获数据的形状与结构(即语言结构)。最终,**TopRoBERTa**在2/3的数据集上优于标准RoBERTa,Macro F1分数提升最高达7%。