SigDLA: A Deep Learning Accelerator Extension for Signal Processing

Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learning on DSPs is limited due to the lack of native hardware support. In this case, we present a contrary strategy and propose to enable signal processing on top of a classical deep learning accelerator (DLA). With the observation that irregular data patterns such as butterfly operations in FFT are the major barrier that hinders the deployment of signal processing on DLAs, we propose a programmable data shuffling fabric and have it inserted between the input buffer and computing array of DLAs such that the irregular data is reorganized and the processing is converted to be regular. With the online data shuffling, the proposed architecture, SigDLA, can adapt to various signal processing tasks without affecting the deep learning processing. Moreover, we build a reconfigurable computing array to suit the various data width requirements of both signal processing and deep learning. According to our experiments, SigDLA achieves an average performance speedup of 4.4$\times$, 1.4$\times$, and 1.52$\times$, and average energy reduction of 4.82$\times$, 3.27$\times$, and 2.15$\times$ compared to an embedded ARM processor with customized DSP instructions, a DSP processor, and an independent DSP-DLA architecture respectively with 17% more chip area over the original DLAs.

翻译：深度学习与信号处理在诸多物联网场景（如异常检测）中紧密关联，共同赋能万物智能。许多物联网处理器利用数字信号处理器进行信号处理，并在此基础上构建深度学习框架。然而，深度学习通常比信号处理需要更高的计算强度，而DSP因缺乏原生硬件支持，其深度学习计算效率受限。对此，我们提出一种逆向策略：在经典深度学习加速器之上实现信号处理。通过观察发现，FFT中蝶形运算等不规则数据模式是阻碍信号处理在DLA上部署的主要障碍。为此，我们设计了一种可编程数据重排结构，将其嵌入DLA的输入缓冲区与计算阵列之间，从而重组不规则数据、将处理过程规整化。借助在线数据重排，所提出的SigDLA架构能适配多种信号处理任务，且不影响深度学习处理。此外，我们构建了可重构计算阵列，以满足信号处理与深度学习对数据宽度的多样化需求。实验表明：相较于采用定制DSP指令的嵌入式ARM处理器、DSP处理器及独立DSP-DLA架构，SigDLA在芯片面积仅增加17%的前提下，平均性能分别提升4.4倍、1.4倍和1.52倍，平均能耗分别降低4.82倍、3.27倍和2.15倍。

相关内容

Signal Processing

关注 3

信号处理期刊采用了理论与实践的各个方面的信号处理。它以原始研究工作，教程和评论文章以及实际发展情况为特色。它旨在将知识和经验快速传播给从事信号处理研究，开发或实际应用的工程师和科学家。该期刊涵盖的主题领域包括：信号理论；随机过程; 检测和估计；光谱分析；过滤；信号处理系统；软件开发；图像处理; 模式识别; 光信号处理；数字信号处理; 多维信号处理；通信信号处理；生物医学信号处理；地球物理和天体信号处理；地球资源信号处理；声音和振动信号处理；数据处理; 遥感; 信号处理技术；雷达信号处理；声纳信号处理；工业应用；新的应用程序。官网地址：http://dblp.uni-trier.de/db/journals/sigpro/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日