The use of copyrighted books for training AI has sparked lawsuits from authors concerned about AI generating derivative content. Yet whether these models can produce high-quality literary text emulating authors' voices remains unclear. We conducted a preregistered study comparing MFA-trained writers with three frontier models (ChatGPT, Claude, Gemini) writing up to 450-word excerpts emulating 50 award-winning authors' styles. In blind pairwise evaluations by 28 MFA-trained readers and 516 college-educated general readers, AI text from in-context prompting was strongly disfavored by MFA readers for stylistic fidelity (OR=0.16) and quality (OR=0.13), while general readers showed no fidelity preference (OR=1.06) but favored AI for quality (OR=1.82). Fine-tuning ChatGPT on authors' complete works reversed these results: MFA readers favored AI for fidelity (OR=8.16) and quality (OR=1.87), with general readers showing even stronger preference (fidelity OR=16.65; quality OR=5.42). Both groups preferred fine-tuned AI, but the writer-type X reader-type interaction remained significant (p=0.021 for fidelity; p<10^-4 for quality), indicating general readers favored AI by a wider margin. Effects are robust under cluster-robust inference and generalize across authors in heterogeneity analyses. Fine-tuned outputs were rarely flagged as AI-generated (3% vs. 97% for prompting) by leading detectors. Mediation analysis shows fine-tuning eliminates detectable AI quirks that penalize in-context outputs, altering the nexus between detectability and preference. While not accounting for effort to transform AI output into publishable prose, the median fine-tuning cost of $81 per author represents a 99.7% reduction versus typical writer compensation. Author-specific fine-tuning enables non-verbatim AI writing preferred over expert human writing, providing evidence relevant to copyright's fourth fair-use factor.
翻译:使用受版权保护的书籍训练人工智能已引发作家们的诉讼,他们担忧AI生成衍生内容。然而,这些模型能否产出模仿作者文风的高质量文学文本仍不明确。我们开展了一项预注册研究,比较了艺术硕士(MFA)培养的作家与三种前沿模型(ChatGPT、Claude、Gemini)模仿50位获奖作家风格撰写的450字以内节选。在28位MFA培养的读者和516位受过大学教育的普通读者进行的盲法配对评估中,通过情境提示生成的AI文本在风格保真度(OR=0.16)和质量(OR=0.13)上均受到MFA读者的强烈排斥,而普通读者虽未显示保真度偏好(OR=1.06),却在质量上更青睐AI文本(OR=1.82)。对ChatGPT进行作者全作品微调后结果逆转:MFA读者在保真度(OR=8.16)和质量(OR=1.87)上均偏好AI输出,普通读者则表现出更强烈的倾向(保真度OR=16.65;质量OR=5.42)。两组读者均更偏好微调后的AI文本,但作家类型×读者类型的交互效应仍然显著(保真度p=0.021;质量p<10^-4),表明普通读者对AI的偏好幅度更大。这些效应在聚类稳健推断下保持稳定,并通过异质性分析证明在不同作者间具有普适性。主流检测器很少将微调输出标记为AI生成(3%,而情境提示输出为97%)。中介分析表明,微调消除了可检测的AI特征(这些特征会降低情境输出的评价),改变了可检测性与偏好之间的关联。虽然未考虑将AI输出转化为可发表散文所需的工作量,但每位作者81美元的中位数微调成本相较于典型作家报酬降低了99.7%。针对特定作者的微调使AI能够创作出非逐字复制且优于专业人类作品的文本,这为版权合理使用原则的第四项考量因素提供了相关证据。