Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations belong to the same or different authors/content sources, but also generates natural language explanation for their decision, simultaneously mitigating confounding information and enhancing interpretability. Extensive experiments demonstrate the effectiveness of EAVAE. On authorship attribution, we achieve state-of-the-art performance on various datasets, including Amazon Reviews, PAN21, and HRS. For AI-generated text detection, EAVAE excels in few-shot learning over the M4 dataset. Code and data repositories are available online\footnote{https://github.com/hieum98/avae} \footnote{https://huggingface.co/collections/Hieuman/document-level-authorship-datasets}.

翻译：学习鲁棒的作者风格表征对于作者身份归因和AI生成文本检测至关重要。然而，现有方法常受困于内容-风格纠缠问题——模型会学习作者写作风格与主题之间的虚假关联，导致跨领域泛化能力不足。针对这一挑战，我们提出可解释作者变分自编码器（EAVAE），一种通过架构分离设计显式解耦风格与内容的新框架。EAVAE首先利用监督对比学习在多样化作者数据上预训练风格编码器，随后通过变分自编码器（VEA）架构，使用风格和内容表征的独立编码器进行微调。解耦过程通过新型判别器强化：该判别器不仅能区分风格/内容表征对是否属于相同或不同作者/内容来源，还能为其决策生成自然语言解释，同步缓解混淆信息并增强可解释性。大量实验证明了EAVAE的有效性。在作者身份归因任务中，我们在Amazon Reviews、PAN21和HRS等多个数据集上取得最优性能。对于AI生成文本检测，EAVAE在M4数据集上的小样本学习表现优异。代码和数据仓库已开源（https://github.com/hieum98/avae；https://huggingface.co/collections/Hieuman/document-level-authorship-datasets）。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《可解释性强化学习模型》

专知会员服务

25+阅读 · 2月24日

【AAAI2025】学习解耦等变表示以实现显式可控的三维分子生成

专知会员服务

10+阅读 · 2024年12月21日

可解释生成人工智能 (GenXAI)：综述、概念化与研究议程

专知会员服务

40+阅读 · 2024年4月19日

可解释人工智能中基于梯度的特征归因技术综述

专知会员服务

29+阅读 · 2024年3月20日