Several companies have deployed watermark-based detection to identify AI-generated content. However, attribution--the ability to trace back to the user of a generative AI (GenAI) service who created a given AI-generated content--remains largely unexplored despite its growing importance. In this work, we aim to bridge this gap by conducting the first systematic study on watermark-based, user-level attribution of AI-generated content. Our key idea is to assign a unique watermark to each user of the GenAI service and embed this watermark into the AI-generated content created by that user. Attribution is then performed by identifying the user whose watermark best matches the one extracted from the given content. This approach, however, faces a key challenge: How should watermarks be selected for users to maximize attribution performance? To address the challenge, we first theoretically derive lower bounds on detection and attribution performance through rigorous probabilistic analysis for any given set of user watermarks. Then, we select watermarks for users to maximize these lower bounds, thereby optimizing detection and attribution performance. Our theoretical and empirical results show that watermark-based attribution inherits both the accuracy and (non-)robustness properties of the underlying watermark. Specifically, attribution remains highly accurate when the watermarked AI-generated content is either not post-processed or subjected to common post-processing such as JPEG compression, as well as black-box adversarial post-processing with limited query budgets.
翻译:多家公司已部署基于水印的检测技术以识别AI生成内容。然而,溯源——即追踪特定AI生成内容至生成式AI服务使用者的能力——尽管日益重要,目前仍鲜有研究。本工作旨在填补这一空白,首次对基于水印的用户级AI生成内容溯源进行系统性研究。我们的核心思路是为生成式AI服务的每位用户分配唯一水印,并将该水印嵌入相应用户创建的AI生成内容中。通过识别与待检测内容提取水印匹配度最高的用户,即可实现溯源。但该方法面临关键挑战:应如何为用户选择水印以最大化溯源性能?为解决该问题,我们首先通过严格概率分析,针对任意给定用户水印集合,理论推导出检测与溯源性能的下界。随后通过优化水印选择来提升这些下界,从而实现检测与溯源性能的最优化。理论与实证结果表明:基于水印的溯源方法继承了底层水印的准确性与(非)鲁棒性特征。具体而言,当水印化AI生成内容未经后处理、或经受JPEG压缩等常规后处理、以及查询预算受限的黑盒对抗性后处理时,溯源仍能保持较高准确性。