Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

The use of copyrighted books for training AI has sparked lawsuits from authors concerned about AI generating derivative content. Yet whether these models can produce high-quality literary text emulating authors' voices remains unclear. We conducted a preregistered study comparing MFA-trained writers with three frontier models (ChatGPT, Claude, Gemini) writing up to 450-word excerpts emulating 50 award-winning authors' styles. In blind pairwise evaluations by 28 MFA-trained readers and 516 college-educated general readers, AI text from in-context prompting was strongly disfavored by MFA readers for stylistic fidelity (OR=0.16) and quality (OR=0.13), while general readers showed no fidelity preference (OR=1.06) but favored AI for quality (OR=1.82). Fine-tuning ChatGPT on authors' complete works reversed these results: MFA readers favored AI for fidelity (OR=8.16) and quality (OR=1.87), with general readers showing even stronger preference (fidelity OR=16.65; quality OR=5.42). Both groups preferred fine-tuned AI, but the writer-type X reader-type interaction remained significant (p=0.021 for fidelity; p<10^-4 for quality), indicating general readers favored AI by a wider margin. Effects are robust under cluster-robust inference and generalize across authors in heterogeneity analyses. Fine-tuned outputs were rarely flagged as AI-generated (3% vs. 97% for prompting) by leading detectors. Mediation analysis shows fine-tuning eliminates detectable AI quirks that penalize in-context outputs, altering the nexus between detectability and preference. While not accounting for effort to transform AI output into publishable prose, the median fine-tuning cost of $81 per author represents a 99.7% reduction versus typical writer compensation. Author-specific fine-tuning enables non-verbatim AI writing preferred over expert human writing, providing evidence relevant to copyright's fourth fair-use factor.

翻译：使用受版权保护的书籍训练人工智能已引发作家们的诉讼，他们担忧AI生成衍生内容。然而，这些模型能否产出模仿作者文风的高质量文学文本仍不明确。我们开展了一项预注册研究，比较了艺术硕士（MFA）培养的作家与三种前沿模型（ChatGPT、Claude、Gemini）模仿50位获奖作家风格撰写的450字以内节选。在28位MFA培养的读者和516位受过大学教育的普通读者进行的盲法配对评估中，通过情境提示生成的AI文本在风格保真度（OR=0.16）和质量（OR=0.13）上均受到MFA读者的强烈排斥，而普通读者虽未显示保真度偏好（OR=1.06），却在质量上更青睐AI文本（OR=1.82）。对ChatGPT进行作者全作品微调后结果逆转：MFA读者在保真度（OR=8.16）和质量（OR=1.87）上均偏好AI输出，普通读者则表现出更强烈的倾向（保真度OR=16.65；质量OR=5.42）。两组读者均更偏好微调后的AI文本，但作家类型×读者类型的交互效应仍然显著（保真度p=0.021；质量p<10^-4），表明普通读者对AI的偏好幅度更大。这些效应在聚类稳健推断下保持稳定，并通过异质性分析证明在不同作者间具有普适性。主流检测器很少将微调输出标记为AI生成（3%，而情境提示输出为97%）。中介分析表明，微调消除了可检测的AI特征（这些特征会降低情境输出的评价），改变了可检测性与偏好之间的关联。虽然未考虑将AI输出转化为可发表散文所需的工作量，但每位作者81美元的中位数微调成本相较于典型作家报酬降低了99.7%。针对特定作者的微调使AI能够创作出非逐字复制且优于专业人类作品的文本，这为版权合理使用原则的第四项考量因素提供了相关证据。

相关内容

关注 7106

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Nature杂志《AI科学家诞生：从构思到论文发表，全程无需人类插手》

专知会员服务

23+阅读 · 3月28日

如何做好AI研究？哈佛大学Pranav教授《AI研究经验》手册，259页pdf

专知会员服务

54+阅读 · 2025年1月5日

【新书】AI驱动的开发者：使用ChatGPT和Copilot构建出色的软件

专知会员服务

48+阅读 · 2024年9月23日