Credit attribution is crucial across various fields. In academic research, proper citation acknowledges prior work and establishes original contributions. Similarly, in generative models, such as those trained on existing artworks or music, it is important to ensure that any generated content influenced by these works appropriately credits the original creators. We study credit attribution by machine learning algorithms. We propose new definitions--relaxations of Differential Privacy--that weaken the stability guarantees for a designated subset of $k$ datapoints. These $k$ datapoints can be used non-stably with permission from their owners, potentially in exchange for compensation. Meanwhile, the remaining datapoints are guaranteed to have no significant influence on the algorithm's output. Our framework extends well-studied notions of stability, including Differential Privacy ($k = 0$), differentially private learning with public data (where the $k$ public datapoints are fixed in advance), and stable sample compression (where the $k$ datapoints are selected adaptively by the algorithm). We examine the expressive power of these stability notions within the PAC learning framework, provide a comprehensive characterization of learnability for algorithms adhering to these principles, and propose directions and questions for future research.
翻译:信用归属在多个领域至关重要。在学术研究中,恰当的引用既是对前人工作的承认,也确立了原创性贡献。类似地,在生成模型中(例如基于现有艺术作品或音乐训练而成的模型),确保任何受这些作品影响的生成内容能够恰当地归功于原始创作者,这一点非常重要。我们研究机器学习算法的信用归属问题。我们提出了新的定义——差分隐私的松弛形式——这些定义弱化了针对指定k个数据点的稳定性保证。这k个数据点可以在获得其所有者许可的情况下非稳定地使用,可能以补偿为交换条件。同时,保证其余数据点对算法输出没有显著影响。我们的框架扩展了经过深入研究的稳定性概念,包括差分隐私(k = 0)、带公共数据的差分隐私学习(其中k个公共数据点是预先固定的)以及稳定样本压缩(其中k个数据点由算法自适应选择)。我们在PAC学习框架内考察了这些稳定性概念的表达能力,全面刻画了遵循这些原则的算法的可学习性特征,并为未来研究提出了方向与问题。