Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant problem in securing training data for future models. In this work, we investigate how the release of ChatGPT changed human-generated open data on the web by analyzing the activity on Stack Overflow, the leading online Q\&A platform for computer programming. We find that relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable, activity on Stack Overflow significantly decreased. A difference-in-differences model estimates a 16\% decrease in weekly posts on Stack Overflow. This effect increases in magnitude over time, and is larger for posts related to the most widely used programming languages. Posts made after ChatGPT get similar voting scores than before, suggesting that ChatGPT is not merely displacing duplicate or low-quality content. These results suggest that more users are adopting large language models to answer questions and they are better substitutes for Stack Overflow for languages for which they have more training data. Using models like ChatGPT may be more efficient for solving certain programming problems, but its widespread adoption and the resulting shift away from public exchange on the web will limit the open data people and models can learn from in the future.

翻译：大型语言模型（如ChatGPT）能高效地向用户提供各类主题信息，这使其可能成为网络搜索和在线求助的替代方案。但由于用户与模型进行私密交互，这类模型可能大幅减少公开可用的人类生成数据与知识资源。这种替代将给未来模型的训练数据保障带来严峻挑战。本研究通过分析计算机编程领域头部在线问答平台Stack Overflow的活动数据，探究ChatGPT发布如何改变网络公开的人类生成数据。研究发现：相较于访问受限的俄语和中文平台，以及ChatGPT处理能力较弱的数学类论坛，Stack Overflow的活动量显著下降。双重差分模型估算其每周发帖量减少16%，且该效应随时间推移持续增强，在涉及最常用编程语言的帖子中表现尤为突出。ChatGPT发布后产生的帖子获得的投票评分与之前相当，表明ChatGPT并未简单取代重复或低质量内容。这些结果表明，越来越多用户正采用大型语言模型回答问题，且对于训练数据更充足的语言，这些模型对Stack Overflow的替代性更强。使用ChatGPT等模型可能更高效地解决某些编程问题，但其广泛应用导致用户从公开网络交流中撤离，将限制人类和未来模型可学习的开放数据资源。