哨兵：实时验证机器学习工件的真实性 (Sentry: Authenticating Machine Learning Artifacts on the Fly)

Machine learning systems increasingly rely on open-source artifacts such as datasets and models that are created or hosted by other parties. The reliance on external datasets and pre-trained models exposes the system to supply chain attacks where an artifact can be poisoned before it is delivered to the end-user. Such attacks are possible due to the lack of any authenticity verification in existing machine learning systems. Incorporating cryptographic solutions such as hashing and signing can mitigate the risk of supply chain attacks. However, existing frameworks for integrity verification based on cryptographic techniques can incur significant overhead when applied to state-of-the-art machine learning artifacts due to their scale, and are not compatible with GPU platforms. In this paper, we develop Sentry, a novel GPU-based framework that verifies the authenticity of machine learning artifacts by implementing cryptographic signing and verification for datasets and models. Sentry ties developer identities to signatures and performs authentication on the fly as artifacts are loaded on GPU memory, making it compatible with GPU data movement solutions such as NVIDIA GPUDirect that bypass the CPU. Sentry incorporates GPU acceleration of cryptographic hash constructions such as Merkle tree and lattice hashing, implementing memory optimizations and resource partitioning schemes for a high throughput performance. Our evaluations show that Sentry is a practical solution to bring authenticity to machine learning systems, achieving orders of magnitude speedup over a CPU-based baseline.

翻译：机器学习系统日益依赖由其他方创建或托管的开源工件，如数据集和模型。对外部数据集和预训练模型的依赖使系统面临供应链攻击的风险，即工件在交付给最终用户之前可能被投毒。此类攻击之所以可能发生，是因为现有机器学习系统缺乏任何真实性验证机制。采用哈希和签名等加密解决方案可以降低供应链攻击的风险。然而，基于加密技术的现有完整性验证框架应用于最先进的机器学习工件时，因其规模庞大可能产生显著开销，且与GPU平台不兼容。本文开发了Sentry，一种基于GPU的新型框架，通过对数据集和模型实施加密签名与验证来确保机器学习工件的真实性。Sentry将开发者身份与签名绑定，并在工件加载到GPU内存时实时执行认证，使其与绕过CPU的GPU数据传输解决方案（如NVIDIA GPUDirect）兼容。Sentry集成了Merkle树和格哈希等加密哈希结构的GPU加速，通过内存优化和资源分区方案实现高吞吐性能。我们的评估表明，Sentry是为机器学习系统提供真实性的实用解决方案，相比基于CPU的基线实现了数量级的速度提升。

相关内容

Machine Learning

关注 2248

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日