Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

翻译：水印技术作为大语言模型溯源的关键机制，现有密钥方案将检测与注入紧密耦合，需获取密钥或依赖服务方特定方案检测器方可验证。这种依赖性为实际治理造成根本障碍——独立审计若无法兼顾模型安全性或突破服务方不透明的声明便无法实现。为解决这一困境，我们提出TTP-Detect，一种面向非侵入式第三方水印验证的开创性黑盒框架。通过解耦检测与注入，TTP-Detect将验证重新定义为相对假设检验问题。该方法采用代理模型放大与水印相关信号，并设计一组互补的相对度量评估查询文本与水印分布的拟合程度。在代表性水印方案、数据集和模型上的大量实验表明，TTP-Detect在检测性能与对抗多种攻击的鲁棒性方面均表现卓越。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【AAAI2026】URaG：面向高效长文档理解的多模态大语言模型统一检索与生成框架

专知会员服务

15+阅读 · 2025年11月14日

面向 AI 生成图像的安全与鲁棒水印：全面综述

专知会员服务

14+阅读 · 2025年10月6日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日