Large Language Models (LLMs) are increasingly used to generate and edit scientific abstracts, yet their integration into academic writing raises questions about trust, quality, and disclosure. Despite growing adoption, little is known about how readers perceive LLM-generated summaries and how these perceptions influence evaluations of scientific work. This paper presents a mixed-methods survey experiment investigating whether readers with ML expertise can distinguish between human- and LLM-generated abstracts, how actual and perceived LLM involvement affects judgments of quality and trustworthiness, and what orientations readers adopt toward AI-assisted writing. Our findings show that participants struggle to reliably identify LLM-generated content, yet their beliefs about LLM involvement significantly shape their evaluations. Notably, abstracts edited by LLMs are rated more favorably than those written solely by humans or LLMs. We also identify three distinct reader orientations toward LLM-assisted writing, offering insights into evolving norms and informing policy around disclosure and acceptable use in scientific communication.
翻译:大型语言模型(LLMs)越来越多地被用于生成和编辑科学摘要,然而它们融入学术写作引发了关于信任、质量和披露的疑问。尽管采用日益广泛,但读者如何感知LLM生成的摘要以及这些感知如何影响对科学工作的评价,目前尚不清楚。本文提出了一项混合方法的调查实验,旨在探究具有机器学习专业知识的读者能否区分人类与LLM生成的摘要、实际和感知到的LLM参与如何影响对质量和可信度的判断,以及读者对AI辅助写作持何种态度。我们的研究结果表明,参与者难以可靠地识别LLM生成的内容,然而他们对LLM参与的信念显著影响了其评价。值得注意的是,经LLM编辑的摘要比完全由人类或LLM撰写的摘要获得了更积极的评价。我们还识别了读者对LLM辅助写作的三种不同态度,为理解科学交流中不断演变的规范提供了见解,并为有关披露和可接受使用的政策制定提供了参考。