面向边缘设备的低功耗流式语音增强加速器 (A Low-Power Streaming Speech Enhancement Accelerator For Edge Devices)

Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9\% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.

翻译：基于Transformer的语音增强模型取得了令人瞩目的成果。然而，其异构且复杂的结构限制了模型压缩的潜力，导致更高的复杂度和降低的硬件效率。此外，这些模型并非专为流式处理和低功耗应用而设计。针对这些挑战，本文通过模型与硬件协同优化，提出了一种低功耗流式语音增强加速器。所提出的高性能模型通过模型压缩与目标应用的协同设计，针对硬件执行进行了优化，利用提出的领域感知与流式感知剪枝技术，将模型大小减少了93.9%。通过采用基于批归一化的Transformer，进一步降低了所需延迟。此外，我们采用了无softmax注意力机制，并辅以额外的批归一化，从而简化了硬件设计。定制的硬件通过将多样化的计算模式分解为逐元素乘累加（MAC）运算来适应它们。这是通过一维处理阵列实现的，该阵列利用可配置的SRAM寻址，从而最大限度地降低了硬件复杂度并简化了零值跳过。采用台积电40nm CMOS工艺，最终实现仅需207.8K门电路和53.75KB SRAM。在62.5MHz频率下进行实时推理时，功耗仅为8.08 mW。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日