Informationally Compressive Anonymization: Non-Degrading Sensitive Input Protection for Privacy-Preserving Supervised Machine Learning

Modern machine learning systems increasingly rely on sensitive data, creating significant privacy, security, and regulatory risks that existing privacy-preserving machine learning (ppML) techniques, such as Differential Privacy (DP) and Homomorphic Encryption (HE), address only at the cost of degraded performance, increased complexity, or prohibitive computational overhead. This paper introduces Informationally Compressive Anonymization (ICA) and the VEIL architecture, a privacy-preserving ML framework that achieves strong privacy guarantees through architectural and mathematical design rather than noise injection or cryptography. ICA embeds a supervised, multi-objective encoder within a trusted Source Environment to transform raw inputs into low-dimensional, task-aligned latent representations, ensuring that only irreversibly anonymized vectors are exported to untrusted Training and Inference Environments. The paper rigorously proves that these encodings are structurally non-invertible using topological and information-theoretic arguments, showing that inversion is logically impossible, even under idealized attacker assumptions, and that, in realistic deployments, the attackers conditional entropy over the original data diverges, driving reconstruction probability to zero. Unlike prior autoencoder-based ppML approaches, ICA preserves predictive utility by aligning representation learning with downstream supervised objectives, enabling low-latency, high-performance ML without gradient clipping, noise budgets, or encryption at inference time. The VEIL architecture enforces strict trust boundaries, supports scalable multi-region deployment, and naturally aligns with privacy-by-design regulatory frameworks, establishing a new foundation for enterprise ML that is secure, performant, and safe by construction, even in the face of post-quantum threats.

翻译：现代机器学习系统日益依赖敏感数据，这带来了重大的隐私、安全和监管风险。现有的隐私保护机器学习技术，如差分隐私和同态加密，虽然能应对这些风险，但往往以性能下降、复杂度增加或计算开销过高为代价。本文提出信息压缩匿名化方法及VEIL架构，这是一种通过架构与数学设计而非噪声注入或密码学手段实现强隐私保证的隐私保护机器学习框架。ICA在可信的源环境中嵌入一个监督式、多目标编码器，将原始输入转换为低维、任务对齐的潜在表示，确保只有经过不可逆匿名化的向量被导出至不可信的训练和推理环境。本文运用拓扑学与信息论论证，严格证明了这些编码在结构上是不可逆的，表明即使在理想化的攻击者假设下，逆转换在逻辑上也是不可能的；在实际部署中，攻击者对原始数据的条件熵会发散，使得重构概率趋近于零。与先前基于自编码器的隐私保护机器学习方法不同，ICA通过将表示学习与下游监督目标对齐来保持预测效用，无需梯度裁剪、噪声预算或推理时的加密，即可实现低延迟、高性能的机器学习。VEIL架构强制执行严格的信任边界，支持可扩展的多区域部署，并天然契合隐私设计法规框架，为企业级机器学习构建了一个安全、高性能且本质上安全的全新基础，即使面对后量子威胁也是如此。