Once analysed, location trajectories can provide valuable insights beneficial to various applications. However, such data is also highly sensitive, rendering them susceptible to privacy risks in the event of mismanagement, for example, revealing an individual's identity, home address, or political affiliations. Hence, ensuring that privacy is preserved for this data is a priority. One commonly taken measure to mitigate this concern is aggregation. Previous work by Xu et al. shows that trajectories are still recoverable from anonymised and aggregated datasets. However, the study lacks implementation details, obfuscating the mechanisms of the attack. Additionally, the attack was evaluated on commercial non-public datasets, rendering the results and subsequent claims unverifiable. This study reimplements the trajectory recovery attack from scratch and evaluates it on two open-source datasets, detailing the preprocessing steps and implementation. Results confirm that privacy leakage still exists despite common anonymisation and aggregation methods but also indicate that the initial accuracy claims may have been overly ambitious. We release all code as open-source to ensure the results are entirely reproducible and, therefore, verifiable. Moreover, we propose a stronger attack by designing a series of enhancements to the baseline attack. These enhancements yield higher accuracies by up to 16%, providing an improved benchmark for future research in trajectory recovery methods. Our improvements also enable online execution of the attack, allowing partial attacks on larger datasets previously considered unprocessable, thereby furthering the extent of privacy leakage. The findings emphasise the importance of using strong privacy-preserving mechanisms when releasing aggregated mobility data and not solely relying on aggregation as a means of anonymisation.
翻译:位置轨迹一经分析,可为各类应用提供宝贵洞见。然而,此类数据亦高度敏感,若管理不当易引发隐私风险,例如泄露个人身份、家庭住址或政治倾向。因此,确保此类数据的隐私保护至关重要。聚合是缓解此问题的常用措施。Xu等人的先前研究表明,轨迹仍可从匿名化聚合数据集中恢复。但该研究缺乏实施细节,使攻击机制模糊不清。此外,攻击评估基于商业非公开数据集,导致结果及后续主张无法验证。本研究从头重新实现轨迹恢复攻击,并在两个开源数据集上进行评估,详细说明预处理步骤与实现方法。结果证实,尽管采用常规匿名化与聚合方法,隐私泄露依然存在,但也表明原始精度声明可能过于乐观。我们公开全部代码以确保结果完全可复现与可验证。此外,我们通过设计一系列对基线攻击的增强方案,提出一种更强力的攻击方法。这些增强使准确率最高提升16%,为未来轨迹恢复方法研究提供了更优基准。我们的改进还支持攻击的在线执行,可对先前认为无法处理的大规模数据集实施部分攻击,从而进一步扩大隐私泄露范围。研究结果强调,在发布聚合移动数据时,必须采用强隐私保护机制,而非仅依赖聚合作为匿名化手段。