Human-created works represent critical data inputs to artificial intelligence (AI). Strategic behavior can play a major role for AI training datasets, be it in limiting access to existing works or in deciding which types of new works to create or whether to create new works at all. We examine creators' behavioral change when their works become training data for AI. Specifically, we focus on contributors on Unsplash, a popular stock image platform with about 6 million high-quality photos and illustrations. In the summer of 2020, Unsplash launched an AI research program by releasing a dataset of 25,000 images for commercial use. We study contributors' reactions, comparing contributors whose works were included in this dataset to contributors whose works were not included. Our results suggest that treated contributors left the platform at a higher-than-usual rate and substantially slowed down the rate of new uploads. Professional and more successful photographers react stronger than amateurs and less successful photographers. We also show that affected users changed the variety and novelty of contributions to the platform, with long-run implications for the stock of works potentially available for AI training. Taken together, our findings highlight the trade-off between interests of rightsholders and promoting innovation at the technological frontier. We discuss implications for copyright and AI policy.
翻译:人类创作的作品是人工智能的关键数据输入。在人工智能训练数据集的构建中,战略性行为扮演着重要角色——无论是限制对现有作品的访问,还是决定创作何种类型的新作品,甚至是否创作新作品。本研究考察了当创作者的作品成为人工智能训练数据时其行为变化。具体而言,我们聚焦于Unsplash(一个拥有约600万张高质量照片和插图的流行图片平台)上的贡献者。2020年夏季,Unsplash发布了一个包含25,000张商用图片的数据集,并启动了人工智能研究项目。通过对比作品被纳入该数据集与未被纳入的贡献者,我们研究了他们的反应。结果表明,受影响的贡献者以高于正常水平的比率离开平台,同时大幅降低了新作品的上传速率。专业且更成功的摄影师比业余或较不成功的摄影师反应更为强烈。我们还发现,受影响的用户改变了向平台贡献作品的多样性和新颖性,这对可能用于人工智能训练的作品存量产生了长期影响。综合而言,我们的发现凸显了权利人利益与促进技术前沿创新之间的权衡。最后,我们讨论了这些发现对版权和人工智能政策的启示。