Generative art using Diffusion models has achieved remarkable performance in image generation and text-to-image tasks. However, the increasing demand for training data in generative art raises significant concerns about copyright infringement, as models can produce images highly similar to copyrighted works. Existing solutions attempt to mitigate this by perturbing Diffusion models to reduce the likelihood of generating such images, but this often compromises model performance. Another approach focuses on economically compensating data holders for their contributions, yet it fails to address copyright loss adequately. Our approach begin with the introduction of a novel copyright metric grounded in copyright law and court precedents on infringement. We then employ the TRAK method to estimate the contribution of data holders. To accommodate the continuous data collection process, we divide the training into multiple rounds. Finally, We designed a hierarchical budget allocation method based on reinforcement learning to determine the budget for each round and the remuneration of the data holder based on the data holder's contribution and copyright loss in each round. Extensive experiments across three datasets show that our method outperforms all eight benchmarks, demonstrating its effectiveness in optimizing budget distribution in a copyright-aware manner. To the best of our knowledge, this is the first technical work that introduces to incentive contributors and protect their copyrights by compensating them.
翻译:基于扩散模型的生成艺术在图像生成和文本到图像任务中取得了显著成就。然而,生成艺术对训练数据日益增长的需求引发了严重的版权侵权担忧,因为模型可能生成与受版权保护作品高度相似的图像。现有解决方案试图通过扰动扩散模型来降低生成此类图像的概率,但这通常会损害模型性能。另一种方法侧重于对数据持有者的贡献进行经济补偿,但未能充分解决版权损失问题。我们的方法首先引入了一种基于版权法和侵权判例的新型版权度量标准。随后采用TRAK方法估算数据持有者的贡献度。为适应持续的数据收集过程,我们将训练划分为多个轮次。最后,设计了一种基于强化学习的层次化预算分配方法,根据每轮次中数据持有者的贡献度和版权损失确定该轮预算及对应报酬。在三个数据集上的大量实验表明,我们的方法在全部八个基准测试中均表现优异,证明了其在版权感知前提下优化预算分配的有效性。据我们所知,这是首个通过技术手段激励贡献者并通过补偿机制保护其版权的研究工作。