Strategic Behavior and AI Training Data

Human-created works represent critical data inputs to artificial intelligence (AI). Strategic behavior can play a major role for AI training datasets, be it in limiting access to existing works or in deciding which types of new works to create or whether to create new works at all. We examine creators' behavioral change when their works become training data for AI. Specifically, we focus on contributors on Unsplash, a popular stock image platform with about 6 million high-quality photos and illustrations. In the summer of 2020, Unsplash launched an AI research program by releasing a dataset of 25,000 images for commercial use. We study contributors' reactions, comparing contributors whose works were included in this dataset to contributors whose works were not included. Our results suggest that treated contributors left the platform at a higher-than-usual rate and substantially slowed down the rate of new uploads. Professional and more successful photographers react stronger than amateurs and less successful photographers. We also show that affected users changed the variety and novelty of contributions to the platform, with long-run implications for the stock of works potentially available for AI training. Taken together, our findings highlight the trade-off between interests of rightsholders and promoting innovation at the technological frontier. We discuss implications for copyright and AI policy.

翻译：人类创作的作品是人工智能的关键数据输入。在人工智能训练数据集的构建中，战略性行为扮演着重要角色——无论是限制对现有作品的访问，还是决定创作何种类型的新作品，甚至是否创作新作品。本研究考察了当创作者的作品成为人工智能训练数据时其行为变化。具体而言，我们聚焦于Unsplash（一个拥有约600万张高质量照片和插图的流行图片平台）上的贡献者。2020年夏季，Unsplash发布了一个包含25,000张商用图片的数据集，并启动了人工智能研究项目。通过对比作品被纳入该数据集与未被纳入的贡献者，我们研究了他们的反应。结果表明，受影响的贡献者以高于正常水平的比率离开平台，同时大幅降低了新作品的上传速率。专业且更成功的摄影师比业余或较不成功的摄影师反应更为强烈。我们还发现，受影响的用户改变了向平台贡献作品的多样性和新颖性，这对可能用于人工智能训练的作品存量产生了长期影响。综合而言，我们的发现凸显了权利人利益与促进技术前沿创新之间的权衡。最后，我们讨论了这些发现对版权和人工智能政策的启示。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日