Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups -- using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help. We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. Specifically, it increases the similarity between the writings of different authors and reduces the overall lexical and content diversity. We additionally find that this effect is mainly attributable to InstructGPT contributing less diverse text to co-written essays. In contrast, the user-contributed text remains unaffected by model collaboration. This suggests that the recent improvement in generation quality from adapting models to human feedback might come at the cost of more homogeneous and less diverse content.
翻译:大型语言模型(LLM)的兴起推动了人机协作写作的激增。由于不同用户采纳来自同一模型的建议,所生成内容的多样性可能降低,这或许会限制公共讨论中多元观点的表达。本研究通过一项受控实验来测量协作写作对多样性的影响:参与者在三种设定下撰写议论文——使用基础LLM(GPT3)、使用经过反馈调优的LLM(InstructGPT),以及无模型辅助的独立写作。我们开发了一套多样性度量指标,发现使用InstructGPT(而非GPT3)写作会导致多样性在统计意义上显著降低。具体而言,它增加了不同作者文本之间的相似性,并降低了整体词汇与内容多样性。我们还发现,这种效应主要归因于InstructGPT为协作撰写的文章贡献了多样性较低的文本。相比之下,用户贡献的文本并未受到模型协作的影响。这表明,近期通过使模型适应人类反馈而提升的生成质量,可能以内容更趋同质化和多样性降低为代价。