Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications often involve datasets with missing values due to non-response, data corruption, or deliberate anonymization. Missing data is traditionally viewed as a limitation because it reduces the information available to analysts and can degrade model performance. In this work, we take an alternative perspective and study missing data from a privacy preservation standpoint. Intuitively, when features are missing, less information is revealed about individuals, suggesting that missingness could inherently enhance privacy. We formalize this intuition by analyzing missing data as a privacy amplification mechanism within the framework of differential privacy. We show, for the first time, that incomplete data can yield privacy amplification for differentially private algorithms.
翻译:隐私保护是医疗和金融等高风险领域的基本要求,这些领域必须在分析敏感个人数据时不损害个体隐私。同时,这些应用常涉及因未回应、数据损坏或刻意匿名化导致缺失值的数据集。传统上,缺失数据被视为一种限制,因为它减少了分析者可用的信息,并可能降低模型性能。在本研究中,我们采取另一种视角,从隐私保护的角度研究缺失数据。直观而言,当特征缺失时,个体信息暴露更少,这表明缺失性可能固有地增强隐私。我们通过在差分隐私框架下将缺失数据分析为一种隐私增强机制,将这一直觉形式化。我们首次证明,不完整数据能够为差分隐私算法带来隐私增强效果。