Certified Robustness to Data Poisoning in Gradient-Based Training

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. However, provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge and develop the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data. In particular, our framework certifies robustness against untargeted and targeted poisoning as well as backdoor attacks for both input and label manipulations. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

翻译：现代机器学习流水线依赖大量公开数据，这导致无法保证数据质量，并使模型面临投毒攻击和后门攻击的威胁。然而，在此类攻击下对模型行为提供可证明的边界约束仍是一个开放性问题。本研究针对这一挑战，首次提出一个框架，可为使用潜在篡改数据训练的模型行为提供可证明的保障。具体而言，该框架能够认证模型在非定向投毒、定向投毒以及后门攻击下（包括输入与标签篡改场景）的鲁棒性。我们采用凸松弛方法对给定投毒威胁模型下所有可能的参数更新集合进行超近似，从而能够为任意基于梯度的学习算法中所有可达参数集合划定边界。基于该参数集合，我们进一步提供最坏情况行为的边界约束，包括模型性能与后门攻击成功率。我们在多个真实世界数据集（涵盖能耗分析、医学影像及自动驾驶等应用场景）上验证了该方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/