Multilingual watermarking aims to make large language model (LLM) outputs traceable across languages, yet current methods still fall short. Despite claims of cross-lingual robustness, they are evaluated only on high-resource languages. We show that existing multilingual watermarking methods are not truly multilingual: they fail to remain robust under translation attacks in medium- and low-resource languages. We trace this failure to semantic clustering, which fails when the tokenizer vocabulary contains too few full-word tokens for a given language. To address this, we introduce STEAM, a detection method that uses Bayesian optimisation to search among 133 candidate languages for the back-translation that best recovers the watermark strength. It is compatible with any watermarking method, robust across different tokenizers and languages, non-invasive, and easily extendable to new languages. With average gains of +0.23 AUC and +37% TPR@1%, STEAM provides a scalable approach toward fairer watermarking across the diversity of languages.
翻译:多语言水印旨在使大语言模型(LLM)的输出在不同语言中可追溯,但现有方法仍存在不足。尽管声称具有跨语言鲁棒性,但这些方法仅在高资源语言上进行了评估。我们证明,现有的多语言水印方法并非真正的多语种:在翻译攻击下,它们在中、低资源语言中无法保持鲁棒性。我们将此失败归因于语义聚类——当分词器词汇表中包含的完整单词标记数量过少时,语义聚类会失效。为解决这一问题,我们提出STEAM检测方法,利用贝叶斯优化在133种候选语言中搜索能最佳恢复水印强度的反向翻译。该方法与任意水印技术兼容,跨不同分词器和语言鲁棒,具有非侵入性,且易于扩展至新语言。平均AUC提升0.23、TPR@1%提升37%,STEAM为跨语言多样性的更公平水印提供了可扩展方案。