We present a novel technique for modulating the appearance frequency of a few tokens within a dataset for encoding an invisible watermark that can be used to protect ownership rights upon data. We develop optimal as well as fast heuristic algorithms for creating and verifying such watermarks. We also demonstrate the robustness of our technique against various attacks and derive analytical bounds for the false positive probability of erroneously detecting a watermark on a dataset that does not carry it. Our technique is applicable to both single dimensional and multidimensional datasets, is independent of token type, allows for a fine control of the introduced distortion, and can be used in a variety of use cases that involve buying and selling data in contemporary data marketplaces.
翻译:我们提出了一种新颖的技术,通过调制数据集中若干标记的出现频率来编码隐式水印,从而保护数据的所有权。我们开发了最优化的快速启发式算法用于此类水印的创建与验证。此外,我们展示了该技术对多种攻击的鲁棒性,并推导了无水印数据集中错误检测水印的假阳性概率解析界。该技术可应用于单维及多维数据集,与标记类型无关,能精细控制引入的失真,并适用于当代数据市场中涉及数据买卖的多种应用场景。