评估生产级恶意软件检测系统对可迁移对抗攻击的鲁棒性 (Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks)

As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level vulnerabilities with real-world impact. This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system, performing a case study analysis of Gmail's pipeline where file-type identification relies on a ML model. The malware detection pipeline in use by Gmail contains a machine learning model that routes each potential malware sample to a specialized malware classifier to improve accuracy and performance. This model, called Magika, has been open sourced. By designing adversarial examples that fool Magika, we can cause the production malware service to incorrectly route malware to an unsuitable malware detector thereby increasing our chance of evading detection. Specifically, by changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases and thereby allow us to send malware files over Gmail. We then turn our attention to defenses, and develop an approach to mitigate the severity of these types of attacks. For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate. We implement this defense, and, thanks to a collaboration with Google engineers, it has already been deployed in production for the Gmail classifier.

翻译：随着深度学习模型作为组件被广泛部署于大型生产系统中，其个体缺陷可能引发具有现实影响的系统级漏洞。本文研究了针对机器学习组件的对抗攻击如何使整个生产级恶意软件检测系统性能下降或被绕过，并以Gmail的恶意软件检测流程为案例进行分析，该流程中文件类型识别依赖于机器学习模型。Gmail当前使用的恶意软件检测流程包含一个机器学习模型，该模型将每个潜在恶意软件样本路由至专用恶意软件分类器以提高检测准确性和性能。该模型名为Magika，已开源。通过设计能够欺骗Magika的对抗样本，我们可以导致生产级恶意软件服务将恶意软件错误路由至不合适的检测器，从而增加逃避检测的可能性。具体而言，仅需修改恶意软件样本的13个字节，即可在90%的情况下成功规避Magika，进而实现通过Gmail发送恶意软件文件。随后我们将研究重点转向防御机制，开发了一种降低此类攻击严重性的方法。对于采用防御措施的生产模型，资源充足的攻击者需要修改50个字节才能达到仅20%的攻击成功率。我们已实现该防御方案，并得益于与谷歌工程师的合作，该方案已在Gmail分类器的生产环境中部署。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/