While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.
翻译:水印技术作为大语言模型溯源的关键机制,现有密钥方案将检测与注入紧密耦合,需获取密钥或依赖服务方特定方案检测器方可验证。这种依赖性为实际治理造成根本障碍——独立审计若无法兼顾模型安全性或突破服务方不透明的声明便无法实现。为解决这一困境,我们提出TTP-Detect,一种面向非侵入式第三方水印验证的开创性黑盒框架。通过解耦检测与注入,TTP-Detect将验证重新定义为相对假设检验问题。该方法采用代理模型放大与水印相关信号,并设计一组互补的相对度量评估查询文本与水印分布的拟合程度。在代表性水印方案、数据集和模型上的大量实验表明,TTP-Detect在检测性能与对抗多种攻击的鲁棒性方面均表现卓越。