Smartwatches enable the efficient collection of health data that can be used for research and comprehensive analysis to improve the health of individuals. In addition to the analysis capabilities, ensuring privacy when handling health data is a critical concern as the collection and analysis of such data become pervasive. Since health data contains sensitive information, it should be handled with responsibility and is therefore often treated anonymously. However, also the data itself can be exploited to reveal information and break anonymity. We propose a novel similarity-based re-identification attack on time-series health data and thereby unveil a significant vulnerability. Despite privacy measures that remove identifying information, our attack demonstrates that a brief amount of various sensor data from a target individual is adequate to possibly identify them within a database of other samples, solely based on sensor-level similarities. In our example scenario, where data owners leverage health data from smartwatches, findings show that we are able to correctly link the target data in two out of three cases. User privacy is thus already inherently threatened by the data itself and even when removing personal information.
翻译:智能手表能够高效收集可用于研究和综合分析的健康数据,从而改善个体健康状况。除了分析能力外,随着健康数据的收集和分析日益普及,确保数据处理的隐私性成为关键问题。由于健康数据包含敏感信息,应当负责任地处理,因此通常以匿名方式对待。然而,数据本身也可能被利用来泄露信息并破坏匿名性。我们提出了一种基于相似性的时间序列健康数据重识别攻击,从而揭示了其重大脆弱性。尽管采取了去除识别信息的隐私保护措施,但我们的攻击表明,仅凭目标个体的少量各类传感器数据,就足以基于传感器级别的相似性,在包含其他样本的数据库中可能识别出该个体。在我们的示例场景中,数据所有者利用智能手表的健康数据,研究结果显示,我们能够在三分之二的案例中正确关联目标数据。因此,用户隐私已从根本上受到数据本身的威胁,即便在去除个人信息的情况下也是如此。