Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications often involve datasets with missing values due to non-response, data corruption, or deliberate anonymization. Missing data is traditionally viewed as a limitation because it reduces the information available to analysts and can degrade model performance. In this work, we take an alternative perspective and study missing data from a privacy preservation standpoint. Intuitively, when features are missing, less information is revealed about individuals, suggesting that missingness could inherently enhance privacy. We formalize this intuition by analyzing missing data as a privacy amplification mechanism within the framework of differential privacy. We show, for the first time, that incomplete data can yield privacy amplification for differentially private algorithms.
翻译:隐私保护是医学和金融等高风险领域的一项基本要求,在这些领域中,必须在不损害个人机密性的前提下分析敏感的个人数据。同时,这些应用通常涉及因无应答、数据损坏或刻意匿名化而导致数据缺失的数据集。传统上,缺失数据被视为一种限制,因为它减少了分析者可用的信息,并可能降低模型性能。在本研究中,我们采取了一种替代视角,从隐私保护的角度研究缺失数据。直观上,当特征缺失时,关于个体的信息揭示得更少,这表明缺失性可能固有地增强隐私。我们通过将缺失数据作为差分隐私框架内的一种隐私增强机制进行分析,从而形式化了这一直觉。我们首次证明,不完整数据可以为差分隐私算法带来隐私增强效果。