Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed, and shared across diverse domains such as healthcare, finance, and mobility. While prior research has focused on protecting privacy in a single data release, many real-world systems operate under sequential or continuous data publishing, where the same or related data are released over time. Such sequential disclosures introduce new vulnerabilities, as temporal correlations across releases may enable adversaries to infer sensitive information that remains hidden in any individual release. In this paper, we investigate whether an attacker can compromise privacy in sequential data releases by exploiting dependencies between consecutive publications, even when each individual release satisfies standard privacy guarantees. To this end, we propose a novel attack model that captures these sequential dependencies by integrating a Hidden Markov Model with a reinforcement learning-based bi-directional inference mechanism. This enables the attacker to leverage both earlier and later observations in the sequence to infer private information. We instantiate our framework in the context of trajectory data, demonstrating how an adversary can recover sensitive locations from sequential mobility datasets. Extensive experiments on Geolife, Porto Taxi, and SynMob datasets show that our model consistently outperforms baseline approaches that treat each release independently. The results reveal a fundamental privacy risk inherent to sequential data publishing, where individually protected releases can collectively leak sensitive information when analyzed temporally. These findings underscore the need for new privacy-preserving frameworks that explicitly model temporal dependencies, such as time-aware differential privacy or sequential data obfuscation strategies.
翻译:隐私问题在现代人工智能与数据科学应用中日益关键,敏感信息在医疗、金融和移动性等多个领域被收集、分析和共享。尽管先前研究主要关注单次数据发布中的隐私保护,但许多现实系统在序列或连续数据发布模式下运行,相同或相关数据会随时间多次发布。此类序列披露引入了新的脆弱性,因为发布间的时间相关性可能使攻击者能够推断出在单次发布中隐藏的敏感信息。本文研究攻击者是否能够通过利用连续发布间的依赖关系,在序列数据发布中破坏隐私,即使每次单独发布均满足标准隐私保障。为此,我们提出一种新颖的攻击模型,通过将隐马尔可夫模型与基于强化学习的双向推理机制相结合,捕捉这些序列依赖关系。这使得攻击者能够利用序列中早期和晚期的观测来推断私有信息。我们在轨迹数据背景下实例化该框架,展示攻击者如何从序列移动数据集中恢复敏感位置。在Geolife、Porto Taxi和SynMob数据集上的大量实验表明,我们的模型始终优于将每次发布独立处理的基线方法。结果揭示了序列数据发布固有的根本性隐私风险:当进行时序分析时,单独受保护的发布可能共同泄露敏感信息。这些发现强调了需要新的隐私保护框架,以显式建模时间依赖性,例如时间感知差分隐私或序列数据混淆策略。