In recent years, DeepSeek has achieved strong inference performance but remains hard to deploy on energy-constrained edge devices. This paper presents the DeepSeek Processing Element (DSPE), an edge-oriented architecture that alleviates the model's heavy computational and energy demands. DSPE introduces three techniques: the MerkleTree-based Incremental Pruning Scheme (MIPS) for secure redundant-vector reduction, the Multi-Stage Boothing Lookup Method (MBLM) for bit-flip-aware approximate multiplication, and the Dynamic Adaptive Posit Processing Mechanism (DAPPM), which introduces a new DA-Posit format and its corresponding hardware multiplication architecture. Implemented in TSMC 28nm CMOS, DSPE achieves 109.4 TFLOPS/W energy efficiency compared with state-of-the-art designs and offers a scalable foundation for edge deployment.
翻译:近年来,DeepSeek取得了强大的推理性能,但仍难以部署在能量受限的边缘设备上。本文提出DeepSeek处理单元(DSPE),一种面向边缘的架构,旨在缓解该模型高昂的计算与能耗需求。DSPE引入三项技术:基于Merkle树的增量剪枝方案(MIPS)用于安全冗余向量削减,多级Boothing查找方法(MBLM)用于感知位翻转的近似乘法,以及动态自适应Posit处理机制(DAPPM),该机制引入一种新的DA-Posit格式及其对应的硬件乘法架构。基于TSMC 28nm CMOS工艺实现,与现有最先进设计相比,DSPE实现了109.4 TFLOPS/W的能效,并为边缘部署提供了可扩展的基础。