Privacy preserving machine learning deployments in sensitive deep learning applications; from medical imaging to autonomous systems; increasingly require combining multiple techniques. Yet, practitioners lack systematic guidance to assess the synergistic and non-additive interactions of these hybrid configurations, relying instead on isolated technique analysis that misses critical system level interactions. We introduce PrivacyBench, a benchmarking framework that reveals striking failures in privacy technique combinations with severe deployment implications. Through systematic evaluation across ResNet18 and ViT models on medical datasets, we uncover that FL + DP combinations exhibit severe convergence failure, with accuracy dropping from 98% to 13% while compute costs and energy consumption substantially increase. In contrast, FL + SMPC maintains near-baseline performance with modest overhead. Our framework provides the first systematic platform for evaluating privacy-utility-cost trade-offs through automated YAML configuration, resource monitoring, and reproducible experimental protocols. PrivacyBench enables practitioners to identify problematic technique interactions before deployment, moving privacy-preserving computer vision from ad-hoc evaluation toward principled systems design. These findings demonstrate that privacy techniques cannot be composed arbitrarily and provide critical guidance for robust deployment in resource-constrained environments.
翻译:在从医学影像到自主系统等敏感深度学习应用中的隐私保护机器学习部署,日益需要结合多种技术。然而,从业者缺乏系统性的指导来评估这些混合配置的协同与非叠加交互作用,而是依赖于孤立的技术分析,这忽略了关键的系统级交互。我们提出了PrivacyBench,一个揭示隐私技术组合中惊人失败及其严重部署影响的基准测试框架。通过在医学数据集上对ResNet18和ViT模型进行系统评估,我们发现FL + DP组合表现出严重的收敛失败,准确率从98%降至13%,同时计算成本和能耗大幅增加。相比之下,FL + SMPC在适度开销下保持了接近基线的性能。我们的框架通过自动化的YAML配置、资源监控和可复现的实验协议,为评估隐私-效用-成本权衡提供了首个系统性平台。PrivacyBench使从业者能够在部署前识别有问题的技术交互,推动隐私保护计算机视觉从临时性评估转向有原则的系统设计。这些发现表明隐私技术不能任意组合,并为在资源受限环境中的稳健部署提供了关键指导。