TimeWak：面向时间序列数据的时序链式哈希水印 (TimeWak: Temporal Chained-Hashing Watermark for Time Series Data)

Synthetic time series generated by diffusion models enable sharing privacy-sensitive datasets, such as patients' functional MRI records. Key criteria for synthetic data include high data utility and traceability to verify the data source. Recent watermarking methods embed in homogeneous latent spaces, but state-of-the-art time series generators operate in data space, making latent-based watermarking incompatible. This creates the challenge of watermarking directly in data space while handling feature heterogeneity and temporal dependencies. We propose TimeWak, the first watermarking algorithm for multivariate time series diffusion models. To handle temporal dependence and spatial heterogeneity, TimeWak embeds a temporal chained-hashing watermark directly within the temporal-feature data space. The other unique feature is the $\epsilon$-exact inversion, which addresses the non-uniform reconstruction error distribution across features from inverting the diffusion process to detect watermarks. We derive the error bound of inverting multivariate time series while preserving robust watermark detectability. We extensively evaluate TimeWak on its impact on synthetic data quality, watermark detectability, and robustness under various post-editing attacks, against five datasets and baselines of different temporal lengths. Our results show that TimeWak achieves improvements of 61.96% in context-FID score, and 8.44% in correlational scores against the strongest state-of-the-art baseline, while remaining consistently detectable.

翻译：扩散模型生成的合成时间序列数据使得共享隐私敏感数据集成为可能，例如患者的功能性磁共振成像记录。合成数据的关键标准包括高数据效用性和可追溯性，以验证数据来源。现有的水印方法通常在均匀的潜在空间中嵌入水印，但最先进的时间序列生成器在数据空间中运行，使得基于潜在空间的水印方法不兼容。这带来了直接在数据空间中嵌入水印的挑战，同时需要处理特征异质性和时序依赖性。我们提出了TimeWak，这是首个面向多元时间序列扩散模型的水印算法。为处理时序依赖性和空间异质性，TimeWak直接在时序-特征数据空间中嵌入时序链式哈希水印。其另一独特特性是$\epsilon$-精确反演，该技术通过反演扩散过程来检测水印，解决了特征间重构误差分布不均匀的问题。我们推导了在保持鲁棒水印可检测性的前提下，反演多元时间序列的误差边界。我们在五个不同时序长度的数据集上，针对合成数据质量、水印可检测性及多种后编辑攻击下的鲁棒性，对TimeWak进行了全面评估，并与现有基线方法进行比较。实验结果表明，相较于当前最强的基线方法，TimeWak在上下文FID分数上提升了61.96%，在相关性分数上提升了8.44%，同时始终保持稳定的可检测性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日