Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.
翻译:模型水印利用内部表示机制保护大型语言模型的所有权,然而在微调、量化或知识蒸馏等实际模型修改过程中,这些特征不可避免地会发生复杂畸变,使得可靠提取面临极大挑战。尽管现有模型侧水印研究已取得大量成果,但现有方法仍缺乏应对参数级扰动的充分鲁棒性。为此,我们提出**功能子空间水印(FSW)**框架,该框架将所有权信号锚定到低维功能骨架中。具体而言,我们首先求解广义特征值问题以提取用于水印注入的稳定功能子空间,同时引入自适应谱截断策略实现鲁棒性与模型效用的最优平衡。此外,通过加入向量一致性约束确保水印注入不损害原始语义性能。在多种大型语言模型架构与数据集上的广泛实验表明,本方法在多种模型攻击下具有卓越的检测准确率与统计可验证性,其鲁棒性全面超越现有最优方法。