Privacy-preserving instance encoding aims to encode raw data as feature vectors without revealing their privacy-sensitive information. When designed properly, these encodings can be used for downstream ML applications such as training and inference with limited privacy risk. However, the vast majority of existing instance encoding schemes are based on heuristics and their privacy-preserving properties are only validated empirically against a limited set of attacks. In this paper, we propose a theoretically-principled measure for the privacy of instance encoding based on Fisher information. We show that our privacy measure is intuitive, easily applicable, and can be used to bound the invertibility of encodings both theoretically and empirically.
翻译:隐私保护实例编码旨在将原始数据编码为特征向量,同时不泄露其隐私敏感信息。若设计得当,这些编码可用于下游机器学习应用(如训练和推理),且隐私风险有限。然而,现有绝大多数实例编码方案基于启发式方法,其隐私保护特性仅针对有限攻击集进行了经验验证。本文提出一种基于Fisher信息的理论化隐私度量方法,用于衡量实例编码的隐私保护程度。我们证明该隐私度量直观易用,既可从理论上也可从经验上界定编码的可逆性。