Deploying proprietary Deep Neural Networks (DNNs) on commodity edge devices demands hardware-backed Digital Rights Management (DRM) capable of withstanding both software-level and physical adversaries. In Unified Memory Architecture (UMA) systems, the host CPU and Neural Processing Unit (NPU) share physical DRAM, leaving plaintext model weights directly readable by a compromised OS kernel. Existing defenses fail in this constrained setting: trusted execution environments monopolize scarce memory with permanently reserved regions, while full-memory encryption operates at page granularity. This forces the system to fetch massive 4 KB memory pages for sub-page tensor tiles, severely crippling bandwidth. We present Tessera, a reference architecture for inline, cache-line granularity weight decryption on UMA edge accelerators. The design intercepts 64-byte AXI bursts, computing AES-256-CTR keystreams in parallel with DRAM fetches. This streams plaintext directly into isolated NPU SRAM, creating a transient memory footprint confined to the active tile and eliminating the need for permanent memory carve-outs. Measurements across three distinct SoC platforms demonstrate that this parallelization hides cryptographic latency behind standard DRAM fetch times, a condition that holds even under worst-case timing variations. Consequently, Tessera is projected to achieve 98.4\% of the theoretical memory bandwidth ceiling (a mere 1.6\% overhead). Across standard vision and language models, page-level memory encryption suffers up to a 32x bandwidth penalty, whereas Tessera maintains an optimal 1x footprint for all layer geometries. Finally, Tessera neutralizes major UMA-specific attack vectors -- including physical DRAM extraction, rogue DMA, and compute hijacking -- and formally prevents plaintext leakage across sparse tensors.
翻译:在商用边缘设备上部署专有深度神经网络(DNN),需要具备抗衡软件层面及物理层面攻击者的硬件支撑数字版权管理(DRM)。在统一内存架构(UMA)系统中,主机CPU与神经网络处理单元(NPU)共享物理DRAM,导致明文模型权重可被受损操作系统内核直接读取。现有防御机制在此受限场景下失效:可信执行环境以永久保留区域独占稀缺内存,而全内存加密以页粒度运作,迫使系统为子页张量切片提取高达4 KB的内存页,严重削弱带宽。本文提出Tessera——一种面向UMA边缘加速器的内联式、缓存行粒度权重解密参考架构。该设计通过拦截64字节AXI突发传输,在DRAM提取的同时并行计算AES-256-CTR密钥流,将明文直接流式传输至隔离的NPU SRAM内,形成仅限定于激活切片的瞬态内存足迹,从而消除永久内存预留需求。跨三款不同SoC平台的测量表明:该并行机制将密码学延迟隐藏于标准DRAM获取时间之下,即使在最坏时序变化条件下仍能成立。因此,Tessera预计可达到理论内存带宽上限的98.4%(仅1.6%开销)。在标准视觉与语言模型上,页级内存加密会产生高达32倍的带宽损耗,而Tessera对所有层几何构型均维持最优1倍内存足迹。最后,Tessera可消除UMA特有攻击向量(包括物理DRAM提取、恶意DMA及计算劫持),并从形式化层面防止稀疏张量的明文泄露。