Libertas: Privacy-Preserving Collective Computation for Decentralised Personal Data Stores

Data and data processing have become an indispensable aspect for our society. Insights drawn from collective data make invaluable contribution to scientific and societal research and business. But there are increasing worries about privacy issues and data misuse. This has prompted the emergence of decentralised personal data stores (PDS) like Solid that provide individuals more control over their personal data. However, existing PDS frameworks face challenges in ensuring data privacy when performing collective computations with data from multiple users. While Secure Multi-Party Computation (MPC) offers input secrecy protection during the computation without relying on any single party, issues emerge when directly applying MPC in the context of PDS, particularly due to key factors like autonomy and decentralisation. In this work, we discuss the essence of this issue, identify a potential solution, and introduce a modular architecture, Libertas, to integrate MPC with PDS like Solid, without requiring protocol-level changes. We introduce a paradigm shift from an `omniscient' view to individual-based, user-centric view of trust and security, and discuss the threat model of Libertas. Two realistic use cases for collaborative data processing are used for evaluation, both for technical feasibility and empirical benchmark, highlighting its effectiveness in empowering gig workers and generating differentially private synthetic data. The results of our experiments underscore Libertas' linear scalability and provide valuable insights into compute optimisations, thereby advancing the state-of-the-art in privacy-preserving data processing practices. By offering practical solutions for maintaining both individual autonomy and privacy in collaborative data processing environments, Libertas contributes significantly to the ongoing discourse on privacy protection in data-driven decision-making contexts.

翻译：数据及其处理已成为当今社会不可或缺的组成部分。从集体数据中提取的洞见为科学与社会研究及商业活动提供了宝贵贡献。然而，人们对隐私问题与数据滥用的担忧日益加剧。这促使了去中心化个人数据存储（如Solid）的出现，使个人能更好地掌控自身数据。然而，现有PDS框架在进行多用户数据集体计算时，难以确保数据隐私。虽然安全多方计算（MPC）能在不依赖任何单一方的情况下保障计算过程中的输入数据机密性，但将MPC直接应用于PDS场景时仍存在问题，这主要源于自治性与去中心化等关键特性。本研究探讨了该问题的本质，提出了一种潜在解决方案，并引入模块化架构Libertas，以在不改变协议层的前提下实现MPC与Solid等PDS的集成。我们提出了从"全知"视角向基于个体、以用户为中心的信任安全范式的转变，并阐述了Libertas的威胁模型。通过两个协同数据处理的现实用例进行评估，涵盖技术可行性与实证基准测试，凸显了该系统在赋能零工工作者与生成差分隐私合成数据方面的有效性。实验结果证明了Libertas的线性可扩展性，并为计算优化提供了重要见解，从而推动了隐私保护数据处理实践的前沿发展。通过为协同数据处理环境中的个体自治与隐私保护提供实用解决方案，Libertas为数据驱动决策场景下的隐私保护讨论作出了重要贡献。