Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners.
翻译:生成式人工智能系统通过训练大规模数据集生成新的文本、图像、视频及其他媒体内容。此类系统可能侵犯训练数据贡献者版权权益的问题日益引发关注。为应对生成式AI的版权挑战,我们提出一个框架,根据版权所有者对AI生成内容创作的贡献比例进行补偿。该贡献度量通过利用现代生成式AI模型的概率特性,并结合经济学中合作博弈理论的技术进行定量确定。该框架构建了一个平台,使AI开发者能够获取高质量训练数据以提升模型性能,同时版权所有者获得公平补偿,从而激励其为生成式模型训练持续提供相关数据。实验表明,我们的框架能够成功识别艺术作品生成中最相关的数据来源,确保在版权所有者之间实现公平且可解释的收益分配。