The ability to accurately identify authorship is crucial for verifying content authenticity and mitigating misinformation. Large Language Models (LLMs) have demonstrated exceptional capacity for reasoning and problem-solving. However, their potential in authorship analysis, encompassing authorship verification and attribution, remains underexplored. This paper conducts a comprehensive evaluation of LLMs in these critical tasks. Traditional studies have depended on hand-crafted stylistic features, whereas state-of-the-art approaches leverage text embeddings from pre-trained language models. These methods, which typically require fine-tuning on labeled data, often suffer from performance degradation in cross-domain applications and provide limited explainability. This work seeks to address three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively? (2) Are LLMs capable of accurately attributing authorship among multiple candidates authors (e.g., 10 and 20)? (3) How can LLMs provide explainability in authorship analysis, particularly through the role of linguistic features? Moreover, we investigate the integration of explicit linguistic features to guide LLMs in their reasoning processes. Our extensive assessment demonstrates LLMs' proficiency in both tasks without the need for domain-specific fine-tuning, providing insights into their decision-making via a detailed analysis of linguistic features. This establishes a new benchmark for future research on LLM-based authorship analysis. The code and data are available at https://github.com/baixianghuang/authorship-llm.
翻译:准确识别作者身份对于验证内容真实性和减少虚假信息至关重要。大型语言模型(LLMs)在推理和问题解决方面展现了卓越能力,然而其在包含作者验证与归因的作者分析领域的潜力尚未得到充分探索。本文对LLMs在这些关键任务中的表现进行了全面评估。传统研究依赖于人工设计的文体特征,而现有最优方法则采用预训练语言模型生成的文本嵌入。这些方法通常需要在标注数据上进行微调,但在跨领域应用中性能下降且可解释性有限。本研究旨在回答三个研究问题:(1)LLMs能否有效执行零样本端到端的作者验证?(2)LLMs能否在多个候选作者(如10人和20人)中准确进行作者归因?(3)LLMs如何通过语言特征的角色在作者分析中提供可解释性?此外,我们探究了显式语言特征的整合对引导LLMs推理过程的作用。广泛评估表明,LLMs无需领域特定微调即可胜任这两项任务,并通过语言特征的详细分析揭示了其决策机制。这为未来基于LLMs的作者分析研究建立了新基准。相关代码和数据已开源至https://github.com/baixianghuang/authorship-llm。