NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect

Multi-GPU systems are becoming increasingly important in highperformance computing (HPC) and cloud infrastructure, providing acceleration for data-intensive applications, including machine learning workloads. These systems consist of multiple GPUs interconnected through high-speed networking links such as NVIDIA's NVLink. In this work, we explore whether the interconnect on such systems can offer a novel source of leakage, enabling new forms of covert and side-channel attacks. Specifically, we reverse engineer the operations of NVlink and identify two primary sources of leakage: timing variations due to contention and accessible performance counters that disclose communication patterns. The leakage is visible remotely and even across VM instances in the cloud, enabling potentially dangerous attacks. Building on these observations, we develop two types of covert-channel attacks across two GPUs, achieving a bandwidth of over 70 Kbps with an error rate of 4.78% for the contention channel. We develop two end-to-end crossGPU side-channel attacks: application fingerprinting (including 18 high-performance computing and deep learning applications) and 3D graphics character identification within Blender, a multi-GPU rendering application. These attacks are highly effective, achieving F1 scores of up to 97.78% and 91.56%, respectively. We also discover that leakage surprisingly occurs across Virtual Machines on the Google Cloud Platform (GCP) and demonstrate a side-channel attack on Blender, achieving F1 scores exceeding 88%. We also explore potential defenses such as managing access to counters and reducing the resolution of the clock to mitigate the two sources of leakage.

翻译：多GPU系统在高性能计算（HPC）和云基础设施中日益重要，为包括机器学习工作负载在内的数据密集型应用提供加速。此类系统由多个通过高速网络链路（如NVIDIA的NVLink）互连的GPU构成。本工作中，我们探究此类系统的互连是否可能成为新的信息泄漏源，从而催生新型隐蔽信道与侧信道攻击。具体而言，我们逆向分析了NVLink的运行机制，识别出两类主要泄漏源：由竞争导致的时序差异，以及可访问的、能揭示通信模式的性能计数器。该泄漏可被远程观测，甚至在云环境的虚拟机实例间亦存在，可能引发高危攻击。基于这些发现，我们开发了两种跨双GPU的隐蔽信道攻击，其中基于竞争的信道实现了超过70 Kbps的带宽及4.78%的误码率。我们还构建了两种端到端的跨GPU侧信道攻击：应用指纹识别（涵盖18种高性能计算与深度学习应用）以及在多GPU渲染应用Blender内的3D图形角色识别。这些攻击效果显著，分别取得了最高97.78%和91.56%的F1分数。我们意外发现，泄漏现象在Google云平台（GCP）的虚拟机间同样存在，并成功演示了对Blender的侧信道攻击，其F1分数超过88%。此外，我们探讨了潜在的防御措施，例如通过管理计数器访问权限和降低时钟精度来缓解这两类泄漏源。

相关内容

英伟达（NVIDIA）

关注 25

NVIDIA（全称NVIDIA Corporation，NASDAQ：NVDA，发音：IPA：/ɛnvɪdɪə/，台湾官方中文名为輝達），创立于1993年4月，是一家以设计显示芯片和芯片组为主的半导体公司。NVIDIA亦会设计游戏机核心，例如Xbox和PlayStation 3。NVIDIA最出名的产品线是为个人与游戏玩家所设计的GeForce系列，为专业工作站而设计的Quadro系列，以及为服务器和高效运算而设计的Tesla系列。 NVIDIA的总部设在美国加利福尼亚州的圣克拉拉。是一家无晶圆（Fabless）IC半导体设计公司。"NVIDIA"的读音与英文"video"相似，亦与西班牙文evidia（英文"envy"）相似。现任总裁为黄仁勋。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日