Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases

Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed, and shared across diverse domains such as healthcare, finance, and mobility. While prior research has focused on protecting privacy in a single data release, many real-world systems operate under sequential or continuous data publishing, where the same or related data are released over time. Such sequential disclosures introduce new vulnerabilities, as temporal correlations across releases may enable adversaries to infer sensitive information that remains hidden in any individual release. In this paper, we investigate whether an attacker can compromise privacy in sequential data releases by exploiting dependencies between consecutive publications, even when each individual release satisfies standard privacy guarantees. To this end, we propose a novel attack model that captures these sequential dependencies by integrating a Hidden Markov Model with a reinforcement learning-based bi-directional inference mechanism. This enables the attacker to leverage both earlier and later observations in the sequence to infer private information. We instantiate our framework in the context of trajectory data, demonstrating how an adversary can recover sensitive locations from sequential mobility datasets. Extensive experiments on Geolife, Porto Taxi, and SynMob datasets show that our model consistently outperforms baseline approaches that treat each release independently. The results reveal a fundamental privacy risk inherent to sequential data publishing, where individually protected releases can collectively leak sensitive information when analyzed temporally. These findings underscore the need for new privacy-preserving frameworks that explicitly model temporal dependencies, such as time-aware differential privacy or sequential data obfuscation strategies.

翻译：隐私问题在现代人工智能与数据科学应用中日益关键，敏感信息在医疗、金融和移动性等多个领域被收集、分析和共享。尽管先前研究主要关注单次数据发布中的隐私保护，但许多现实系统在序列或连续数据发布模式下运行，相同或相关数据会随时间多次发布。此类序列披露引入了新的脆弱性，因为发布间的时间相关性可能使攻击者能够推断出在单次发布中隐藏的敏感信息。本文研究攻击者是否能够通过利用连续发布间的依赖关系，在序列数据发布中破坏隐私，即使每次单独发布均满足标准隐私保障。为此，我们提出一种新颖的攻击模型，通过将隐马尔可夫模型与基于强化学习的双向推理机制相结合，捕捉这些序列依赖关系。这使得攻击者能够利用序列中早期和晚期的观测来推断私有信息。我们在轨迹数据背景下实例化该框架，展示攻击者如何从序列移动数据集中恢复敏感位置。在Geolife、Porto Taxi和SynMob数据集上的大量实验表明，我们的模型始终优于将每次发布独立处理的基线方法。结果揭示了序列数据发布固有的根本性隐私风险：当进行时序分析时，单独受保护的发布可能共同泄露敏感信息。这些发现强调了需要新的隐私保护框架，以显式建模时间依赖性，例如时间感知差分隐私或序列数据混淆策略。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日