Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations belong to the same or different authors/content sources, but also generates natural language explanation for their decision, simultaneously mitigating confounding information and enhancing interpretability. Extensive experiments demonstrate the effectiveness of EAVAE. On authorship attribution, we achieve state-of-the-art performance on various datasets, including Amazon Reviews, PAN21, and HRS. For AI-generated text detection, EAVAE excels in few-shot learning over the M4 dataset. Code and data repositories are available online\footnote{https://github.com/hieum98/avae} \footnote{https://huggingface.co/collections/Hieuman/document-level-authorship-datasets}.
翻译:学习鲁棒的作者风格表征对于作者身份归因和AI生成文本检测至关重要。然而,现有方法常受困于内容-风格纠缠问题——模型会学习作者写作风格与主题之间的虚假关联,导致跨领域泛化能力不足。针对这一挑战,我们提出可解释作者变分自编码器(EAVAE),一种通过架构分离设计显式解耦风格与内容的新框架。EAVAE首先利用监督对比学习在多样化作者数据上预训练风格编码器,随后通过变分自编码器(VEA)架构,使用风格和内容表征的独立编码器进行微调。解耦过程通过新型判别器强化:该判别器不仅能区分风格/内容表征对是否属于相同或不同作者/内容来源,还能为其决策生成自然语言解释,同步缓解混淆信息并增强可解释性。大量实验证明了EAVAE的有效性。在作者身份归因任务中,我们在Amazon Reviews、PAN21和HRS等多个数据集上取得最优性能。对于AI生成文本检测,EAVAE在M4数据集上的小样本学习表现优异。代码和数据仓库已开源(https://github.com/hieum98/avae;https://huggingface.co/collections/Hieuman/document-level-authorship-datasets)。