Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners.
翻译:生成式人工智能系统通过大规模数据集训练,能够生成新的文本、图像、视频等多媒体内容。此类系统可能侵犯训练数据贡献者版权的问题日益引发关注。为解决生成式人工智能的版权挑战,我们提出一个按版权所有者对AI生成内容的贡献比例进行补偿的框架。该贡献指标通过利用现代生成式AI模型的概率特性,结合经济学中合作博弈论技术进行定量确定。该框架构建了一个平台:AI开发者通过获取高质量训练数据提升模型性能,同时版权所有者获得公平报酬,从而持续为生成模型训练提供相关数据。实验表明,我们的框架能成功识别艺术创作中最相关的数据源,确保版权所有者之间实现公平且可解释的收入分配。