Dynamically Reconfigurable Variable-precision Sparse-Dense Matrix Acceleration in Tensorflow Lite - 专知论文

会员服务 ·

0

动态可重构 · 可重构 · 稀疏 · 精度 · TensorFlow ·

2023 年 4 月 17 日

Dynamically Reconfigurable Variable-precision Sparse-Dense Matrix Acceleration in Tensorflow Lite

翻译：TensorFlow Lite中动态可重构变精度稀疏-稠密矩阵加速

Jose Nunez-Yanez,Andres Otero,Eduardo de la Torre

In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25% better performance in dense mode with half the DSP blocks in the same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.

翻译：本文提出了一种名为FADES（稠密与稀疏矩阵融合架构）的动态可重构硬件加速器。FADES设计提供多种配置选项，通过数据流模型在并行度和复杂度之间进行权衡，构建了四个阶段用于读取、计算、缩放和输出结果。FADES被映射到可编程逻辑（PL）中，并与运行于异构SoC器件处理系统（PS）上的TensorFlow Lite推理引擎集成。该加速器负责执行张量运算，而动态可重构方法可在int8与float模式间切换精度。与同时支持两种算术精度相比，这种动态重构通过将更多计算核映射到资源受限器件上实现更优性能，并降低功耗。我们将所提硬件与高性能脉动阵列架构进行对比，在相同工艺下，稠密模式性能提升25%，且DSP块使用量减半。在稀疏模式下，即使稀疏度较低，该计算核仍能超越稠密模式性能，单核相比软件优化的NEON RUY库实现最高20倍加速。

0

相关内容

动态可重构

动态可重构

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

专知会员服务

34+阅读 · 2021年3月25日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

量子位

11+阅读 · 2019年5月15日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

基于MEMS工艺的高性能石英加速度计谐振结构特征研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向雷达信号处理动态可重构处理器中阵列路由结构和数据缓存机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多核处理器的高性能深度数据包检测技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

天体测量中的高并行图像处理方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

星载非共线TDI CCD成像数据几何处理算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Toward Foundation Models for Earth Monitoring: Generalizable Deep Learning Models for Natural Hazard Segmentation

Arxiv

0+阅读 · 2023年6月1日

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

Arxiv

0+阅读 · 2023年6月1日

TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Arxiv

0+阅读 · 2023年5月31日

Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning

Arxiv

0+阅读 · 2023年5月31日

DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

Arxiv

0+阅读 · 2023年5月31日

DäRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation

Arxiv

0+阅读 · 2023年5月30日

Are We There Yet? Product Quantization and its Hardware Acceleration

Arxiv

0+阅读 · 2023年5月25日

Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators

Arxiv

0+阅读 · 2023年5月24日

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Arxiv

11+阅读 · 2023年3月5日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

动态可重构

最新内容

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

5+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

5+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

5+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

5+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

15+阅读 · 6月16日

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

6+阅读 · 6月16日

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

10+阅读 · 6月16日

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

21+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

8+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

9+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

9+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

10+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

9+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

专知会员服务

34+阅读 · 2021年3月25日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

多模态代码智能综述：从视觉输入到可执行代码系统

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

相关资讯

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

量子位

11+阅读 · 2019年5月15日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Toward Foundation Models for Earth Monitoring: Generalizable Deep Learning Models for Natural Hazard Segmentation

Arxiv

0+阅读 · 2023年6月1日

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

Arxiv

0+阅读 · 2023年6月1日

TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Arxiv

0+阅读 · 2023年5月31日

Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning

Arxiv

0+阅读 · 2023年5月31日

DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

Arxiv

0+阅读 · 2023年5月31日

DäRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation

Arxiv

0+阅读 · 2023年5月30日

Are We There Yet? Product Quantization and its Hardware Acceleration

Arxiv

0+阅读 · 2023年5月25日

Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators

Arxiv

0+阅读 · 2023年5月24日

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Arxiv

11+阅读 · 2023年3月5日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

相关基金

基于MEMS工艺的高性能石英加速度计谐振结构特征研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向雷达信号处理动态可重构处理器中阵列路由结构和数据缓存机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多核处理器的高性能深度数据包检测技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

天体测量中的高并行图像处理方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

星载非共线TDI CCD成像数据几何处理算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员