Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Deploying proprietary Deep Neural Networks (DNNs) on commodity edge devices demands hardware-backed Digital Rights Management (DRM) capable of withstanding both software-level and physical adversaries. In Unified Memory Architecture (UMA) systems, the host CPU and Neural Processing Unit (NPU) share physical DRAM, leaving plaintext model weights directly readable by a compromised OS kernel. Existing defenses fail in this constrained setting: trusted execution environments monopolize scarce memory with permanently reserved regions, while full-memory encryption operates at page granularity. This forces the system to fetch massive 4 KB memory pages for sub-page tensor tiles, severely crippling bandwidth. We present Tessera, a reference architecture for inline, cache-line granularity weight decryption on UMA edge accelerators. The design intercepts 64-byte AXI bursts, computing AES-256-CTR keystreams in parallel with DRAM fetches. This streams plaintext directly into isolated NPU SRAM, creating a transient memory footprint confined to the active tile and eliminating the need for permanent memory carve-outs. Measurements across three distinct SoC platforms demonstrate that this parallelization hides cryptographic latency behind standard DRAM fetch times, a condition that holds even under worst-case timing variations. Consequently, Tessera is projected to achieve 98.4\% of the theoretical memory bandwidth ceiling (a mere 1.6\% overhead). Across standard vision and language models, page-level memory encryption suffers up to a 32x bandwidth penalty, whereas Tessera maintains an optimal 1x footprint for all layer geometries. Finally, Tessera neutralizes major UMA-specific attack vectors -- including physical DRAM extraction, rogue DMA, and compute hijacking -- and formally prevents plaintext leakage across sparse tensors.

翻译：在商用边缘设备上部署专有深度神经网络（DNN），需要具备抗衡软件层面及物理层面攻击者的硬件支撑数字版权管理（DRM）。在统一内存架构（UMA）系统中，主机CPU与神经网络处理单元（NPU）共享物理DRAM，导致明文模型权重可被受损操作系统内核直接读取。现有防御机制在此受限场景下失效：可信执行环境以永久保留区域独占稀缺内存，而全内存加密以页粒度运作，迫使系统为子页张量切片提取高达4 KB的内存页，严重削弱带宽。本文提出Tessera——一种面向UMA边缘加速器的内联式、缓存行粒度权重解密参考架构。该设计通过拦截64字节AXI突发传输，在DRAM提取的同时并行计算AES-256-CTR密钥流，将明文直接流式传输至隔离的NPU SRAM内，形成仅限定于激活切片的瞬态内存足迹，从而消除永久内存预留需求。跨三款不同SoC平台的测量表明：该并行机制将密码学延迟隐藏于标准DRAM获取时间之下，即使在最坏时序变化条件下仍能成立。因此，Tessera预计可达到理论内存带宽上限的98.4%（仅1.6%开销）。在标准视觉与语言模型上，页级内存加密会产生高达32倍的带宽损耗，而Tessera对所有层几何构型均维持最优1倍内存足迹。最后，Tessera可消除UMA特有攻击向量（包括物理DRAM提取、恶意DMA及计算劫持），并从形式化层面防止稀疏张量的明文泄露。