Data de-identification makes it possible to glean insights from data while preserving user privacy. The use of Trusted Execution Environments (TEEs) allow for the execution of de-identification applications on the cloud without the need for a user to trust the third-party application provider. In this paper, we present \textit{SPIDEr - Secure Pipeline for Information De-Identification with End-to-End Encryption}, our implementation of an end-to-end encrypted data de-identification pipeline. SPIDEr supports classical anonymisation techniques such as suppression, pseudonymisation, generalisation, and aggregation, as well as techniques that offer a formal privacy guarantee such as k-anonymisation and differential privacy. To enable scalability and improve performance on constrained TEE hardware, we enable batch processing of data for differential privacy computations. We present our design of the control flows for end-to-end secure execution of de-identification operations within a TEE. As part of the control flow for running SPIDEr within the TEE, we perform attestation, a process that verifies that the software binaries were properly instantiated on a known, trusted platform.
翻译:数据去标识化能够在保护用户隐私的同时从数据中提取洞察。可信执行环境(TEEs)的使用使得去标识化应用能够在云端执行,而无需用户信任第三方应用提供商。本文提出\textit{SPIDEr——具备端到端加密的安全信息去标识化流水线},这是我们实现的一种端到端加密数据去标识化流水线。SPIDEr支持经典的匿名化技术,如抑制、假名化、泛化和聚合,同时也支持提供形式化隐私保障的技术,如k-匿名化和差分隐私。为了在受限的TEE硬件上实现可扩展性并提升性能,我们支持对差分隐私计算进行批量数据处理。我们提出了在TEE内端到端安全执行去标识化操作的控制流设计。作为在TEE内运行SPIDEr控制流的一部分,我们执行认证——这是一个验证软件二进制文件是否在已知可信平台上正确实例化的过程。