Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners.
翻译:生成式人工智能系统通过在大规模数据语料库上进行训练,能够生成新的文本、图像、视频及其他媒体内容。人们日益担忧此类系统可能侵犯训练数据贡献者的版权利益。为应对生成式人工智能的版权挑战,我们提出一个框架,该框架根据版权所有者对生成式人工智能内容创作的贡献比例进行补偿。贡献度的度量通过利用现代生成式人工智能模型的概率特性,并采用经济学中合作博弈论的技术进行定量确定。该框架构建了一个平台,使人工智能开发者能够通过获取高质量训练数据而受益,从而提升模型性能。同时,版权所有者获得公平补偿,推动持续为生成式模型训练提供相关数据。实验表明,我们的框架能成功识别用于艺术作品生成的最相关数据源,确保在版权所有者之间实现公平且可解释的收益分配。