Backend enrichment is now widely deployed in sensitive domains such as product recommendation pipelines, healthcare, and finance, where models are trained on confidential data and retrieve private features whose values influence inference behavior while remaining hidden from the API caller. This paper presents the first hardware-level backend retrieval data-stealing attack, showing that accelerator optimizations designed for performance can directly undermine data confidentiality and bypass state-of-the-art privacy defenses. Our attack, FEATUREBLEED, exploits zero-skipping in AI accelerators to infer private backend-retrieved features solely through end-to-end timing, without relying on power analysis, DVFS manipulation, or shared-cache side channels. We evaluate FEATUREBLEED on three datasets spanning medical and non-medical domains: Texas-100X (clinical records), OrganAMNIST (medical imaging), and Census-19 (socioeconomic data). We further evaluate FEATUREBLEED across three hardware backends (Intel AVX, Intel AMX, and NVIDIA A100) and three model architectures (DNNs, CNNs, and hybrid CNN-MLP pipelines), demonstrating that the leakage generalizes across CPU and GPU accelerators, data modalities, and application domains, with an adversarial advantage of up to 98.87 percentage points. Finally, we identify the root cause of the leakage as sparsity-driven zero-skipping in modern hardware. We quantify the privacy-performance-power trade-off: disabling zero-skipping increases Intel AMX per-operation energy by up to 25 percent and incurs 100 percent performance overhead. We propose a padding-based defense that masks timing leakage by equalizing responses to the worst-case execution time, achieving protection with only 7.24 percent average performance overhead and no additional power cost.
翻译:后端增强现已广泛应用于产品推荐管道、医疗保健和金融等敏感领域,这些领域的模型基于机密数据训练,并检索私有特征——这些特征的值会影响推理行为,同时保持对API调用者隐藏。本文提出了首个硬件级后端检索数据窃取攻击,表明为性能设计的加速器优化可能直接破坏数据机密性,并绕过最先进的隐私防御。我们的攻击方法FEATUREBLEED利用AI加速器中的零值跳过机制,仅通过端到端计时推断私有的后端检索特征,无需依赖功耗分析、DVFS操纵或共享缓存侧信道。我们在涵盖医疗与非医疗领域的三个数据集上评估FEATUREBLEED:Texas-100X(临床记录)、OrganAMNIST(医学影像)和Census-19(社会经济数据)。我们进一步在三种硬件后端(Intel AVX、Intel AMX和NVIDIA A100)和三种模型架构(DNN、CNN及混合CNN-MLP管道)上评估FEATUREBLEED,证明该泄露现象可泛化至CPU与GPU加速器、数据模态和应用领域,对抗优势最高可达98.87个百分点。最后,我们将泄露根源定位为现代硬件中稀疏性驱动的零值跳过机制。我们量化了隐私-性能-功耗的权衡关系:禁用零值跳过会使Intel AMX每操作能耗增加最高25%,并产生100%的性能开销。我们提出一种基于填充的防御方案,通过将响应时间统一至最坏执行时间来掩盖计时泄露,仅产生7.24%的平均性能开销且无额外功耗成本即可实现保护。