Bad writing hinders the publication of science. The role of artificial intelligence (AI) in generating and editing scientific texts remains unsettled. Abstracts serve as the critical gateway to scientific manuscripts, often shaping readers' interest. We inspect how individuals revise AI-generated abstracts compared to human-authored abstracts when incentivized to communicate scientific content. Using 869 keystroke-level edit logs with 240k total edits, we construct behavioral labels and measure linguistic properties of edit bursts to investigate the edit trajectories. AI abstracts exhibit higher sentence-level agency, whereas human-authored abstracts outperform in global coherence, even with edits. Experts engage in stigmatic behavior, switching their strategy from predominantly restructuring to substitution when AI source is disclosed. Language Models (LMs) improve edit outcomes through a mix of local and global features, but still actively struggle with global coherence. Both humans and LMs often target the weakest sections of abstracts, but fail to improve stronger areas. Our large-scale process-oriented evaluation highlights the perks and pitfalls of both human and LM editing processes as machine-generated texts emerge in scientific communication.
翻译:糟糕的写作阻碍了科学成果的发表。人工智能在生成和编辑科学文本中的作用仍悬而未决。摘要作为科学手稿的关键入口,常常影响读者的兴趣。我们考察了当个人被激励去传达科学内容时,他们如何修改AI生成的摘要与人类撰写的摘要。利用包含24万次编辑的869个击键级编辑日志,我们构建行为标签并测量编辑突发行为的语言学特性,以探究编辑轨迹。AI摘要展现出更高的句子层级能动性,而人类撰写的摘要即使在经过编辑后,在全局连贯性方面仍然表现更优。当披露AI来源时,专家会呈现污名化行为,将其策略从主要重构转变为替换。语言模型通过融合局部和全局特征改善了编辑效果,但在全局连贯性方面仍面临显著挑战。人类和语言模型通常都针对摘要中最薄弱的环节进行编辑,但未能改善较强部分。我们的大规模面向流程评估揭示了在机器生成文本出现于科学交流领域时,人类与语言模型编辑过程的优劣之处。