Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups -- using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help. We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. Specifically, it increases the similarity between the writings of different authors and reduces the overall lexical and content diversity. We additionally find that this effect is mainly attributable to InstructGPT contributing less diverse text to co-written essays. In contrast, the user-contributed text remains unaffected by model collaboration. This suggests that the recent improvement in generation quality from adapting models to human feedback might come at the cost of more homogeneous and less diverse content.
翻译:大语言模型(LLMs)的普及推动了与模型协同写作的兴起。当不同用户采纳同一模型的建议时,所生成内容的多样性可能降低,从而限制公共讨论中的多元视角。本研究通过控制实验衡量协同写作对多样性的影响,让参与者在三种条件下撰写议论文:使用基础LLM(GPT3)、使用经过反馈调优的LLM(InstructGPT),以及无模型辅助写作。我们开发了一套多样性评估指标,发现使用InstructGPT(而非GPT3)写作会导致多样性显著降低——具体表现为不同作者文本相似度上升,且整体词汇与内容多样性下降。进一步研究发现,这种效应主要源于InstructGPT为协同文章贡献了更单调的文本,而用户自身撰写的部分并未受模型协作影响。这表明,通过人类反馈提升模型生成质量的近期进展,可能以牺牲内容多元性为代价。