As edge devices gain stronger computing power, deploying high-performance DNN models on untrusted hardware has become a practical approach to cut inference latency and protect user data privacy. Given high model training costs and user experience requirements, balancing model privacy and low runtime overhead is critical. TEEs offer a viable defense, and prior work has proposed heterogeneous GPU-TEE inference frameworks via parameter obfuscation to balance efficiency and confidentiality. However, recent studies find partial obfuscation defenses ineffective, while robust schemes cause unacceptable latency. To resolve these issues, we propose ConvShatter, a novel obfuscation scheme that achieves low latency and high accuracy while preserving model confidentiality and integrity. It leverages convolution linearity to decompose kernels into critical and common ones, inject confounding decoys, and permute channel/kernel orders. Pre-deployment, it performs kernel decomposition, decoy injection and order obfuscation, storing minimal recovery parameters securely in the TEE. During inference, the TEE reconstructs outputs of obfuscated convolutional layers. Extensive experiments show ConvShatter substantially reduces latency overhead with strong security guarantees; versus comparable schemes, it cuts overhead by 16% relative to GroupCover while maintaining accuracy on par with the original model.
翻译:随着边缘设备计算能力的增强,在不可信硬件上部署高性能DNN模型已成为降低推理延迟、保护用户数据隐私的实用方案。考虑到高昂的模型训练成本与用户体验需求,在模型隐私与低运行时开销之间取得平衡至关重要。可信执行环境(TEE)提供了可行的防御手段,先前研究已通过参数混淆提出了异构GPU-TEE推理框架以兼顾效率与机密性。然而,近期研究发现部分混淆防御机制效果有限,而强健方案又会引入难以接受的延迟。为解决这些问题,我们提出了ConvShatter——一种新颖的混淆方案,在保障模型机密性与完整性的同时实现了低延迟与高精度。该方案利用卷积线性特性将卷积核分解为关键核与通用核,注入混淆诱饵并置换通道/卷积核顺序。在部署前,系统执行卷积核分解、诱饵注入与顺序混淆,并将极少量恢复参数安全存储在TEE中。推理过程中,TEE负责重构混淆卷积层的输出。大量实验表明,ConvShatter在提供强安全保证的同时显著降低了延迟开销;相较于同类方案,其相对GroupCover减少了16%的开销,同时保持与原始模型相当的精度。