Watermarking has emerged as an effective solution for copyright protection of synthetic data. However, applying watermarking techniques to synthetic tabular data presents challenges, as tabular data can easily lose their watermarks through shuffling or deletion operations. The major challenge is to provide traceability for tracking multiple users of the watermarked tabular data while maintaining high data utility and robustness (resistance to attacks). To address this, we design a multi-bit watermarking scheme TableMark that encodes watermarks into synthetic tabular data, ensuring superior traceability and robustness while maintaining high utility. We formulate the watermark encoding process as a constrained optimization problem, allowing the data owner to effectively trade off robustness and utility. Additionally, we propose effective optimization mechanisms to solve this problem to enhance the data utility. Experimental results on four widely used real-world datasets show that TableMark effectively traces a large number of users, is resilient to attacks, and preserves high utility. Moreover, TableMark significantly outperforms state-of-the-art tabular watermarking schemes.
翻译:水印技术已成为保护合成数据版权的有效解决方案。然而,将水印技术应用于合成表格数据面临挑战,因为表格数据极易通过重排或删除操作丢失水印。主要难题在于如何在保持高数据效用和鲁棒性(抗攻击能力)的同时,为追踪水印表格数据的多用户提供溯源能力。为此,我们设计了一种多比特水印方案TableMark,将水印编码至合成表格数据中,在维持高效用的同时确保卓越的溯源能力与鲁棒性。我们将水印编码过程建模为约束优化问题,使数据所有者能够有效权衡鲁棒性与效用。此外,我们提出了高效的优化机制来解决该问题以提升数据效用。在四个广泛使用的真实数据集上的实验结果表明,TableMark能有效追踪大量用户,具备抗攻击能力,并保持高数据效用。此外,TableMark显著优于当前最先进的表格水印方案。