Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units - 专知论文

会员服务 ·

0

TPU · iMac · 张量处理 · 张量处理单元 · 模拟计算 ·

2023 年 4 月 18 日

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

翻译：异构集成内存模拟计算架构与张量处理单元

Mohammed E. Elbtity,Brendan Reidy,Md Hasibul Amin,Ramtin Zand

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in convolutional neural networks (CNNs). However, they struggle to maintain the same efficiency in fully connected (FC) layers, leading to suboptimal hardware utilization. In-memory analog computing (IMAC) architectures, on the other hand, have demonstrated notable speedup in executing FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. To leverage the strengths of TPUs for convolutional layers and IMAC circuits for dense layers, we propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC configuration achieves up to $2.59\times$ performance improvements, and $88\%$ memory reductions compared to conventional TPU architectures for various CNN models while maintaining comparable accuracy. The TPU-IMAC architecture shows potential for various applications where energy efficiency and high performance are essential, such as edge computing and real-time processing in mobile devices. The unified training algorithm and the integration of IMAC and TPU architectures contribute to the potential impact of this research on the broader machine learning landscape.

翻译：张量处理单元（TPU）作为专用机器学习硬件加速器，在执行卷积神经网络（CNN）的卷积层时展现出显著的性能提升。然而，其在全连接（FC）层的处理中难以维持同等效率，导致硬件利用率欠佳。相比之下，内存模拟计算（IMAC）架构在执行全连接层时表现出显著的加速效果。本文提出一种新颖的异构混合信号混合精度架构，将IMAC单元与边缘TPU集成，以提升移动端CNN性能。为充分发挥TPU在卷积层与IMAC电路在密集层的优势，我们提出统一学习算法，融合混合精度训练技术以缓解模型部署于TPU-IMAC架构时可能出现的精度下降。仿真表明，在各类CNN模型上，TPU-IMAC配置相较传统TPU架构可实现高达$2.59\times$的性能提升与$88\%$的内存缩减，同时保持可比拟的准确率。该架构在能效与高性能至关重要的应用场景（如边缘计算与移动设备实时处理）中展现出潜力。统一训练算法与IMAC-TPU架构的协同设计，进一步凸显了本研究对广义机器学习领域的潜在影响。

0

相关内容

TPU

Transformer推理的全栈优化综述

Transformer推理的全栈优化综述

专知会员服务

83+阅读 · 2023年3月4日

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

专知会员服务

43+阅读 · 2022年10月18日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

108+阅读 · 2021年10月30日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

专知会员服务

22+阅读 · 2020年4月11日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

专知会员服务

49+阅读 · 2020年2月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

面向癌症精准诊疗的高密度微腔阵列式多重数字PCR芯片

国家自然科学基金

2+阅读 · 2017年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

肝癌细胞靶向性药物与miR-122共传输体系及其协同抗肿瘤作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于偶极矩调控和侧链修饰的新型聚合物材料的设计、制备与光电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

地球物理反演中的混合并行计算方法研究- - 以MT Occam并行反演为例

国家自然科学基金

0+阅读 · 2012年12月31日

100Gb/s高速光逻辑门及可重构光逻辑门芯片研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速Flash ADC量化模型设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU性能模型的异构系统优化技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Arxiv

0+阅读 · 2023年6月5日

Streaming Task Graph Scheduling for Dataflow Architectures

Arxiv

0+阅读 · 2023年6月5日

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

Arxiv

0+阅读 · 2023年6月4日

Auto-Spikformer: Spikformer Architecture Search

Arxiv

0+阅读 · 2023年6月1日

Graph Clustering with Graph Neural Networks

Arxiv

0+阅读 · 2023年6月1日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

VIP会员

文章信息

相关主题

张量处理单元

最新内容

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

4+阅读 · 今天15:21

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

0+阅读 · 今天15:12

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

2+阅读 · 今天15:06

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

4+阅读 · 今天14:55

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

9+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

7+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

9+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

6+阅读 · 7月18日

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

9+阅读 · 7月17日

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

9+阅读 · 7月17日

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

5+阅读 · 7月17日

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

3+阅读 · 7月17日

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

7+阅读 · 7月17日

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

5+阅读 · 7月17日

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

6+阅读 · 7月17日

相关VIP内容

Transformer推理的全栈优化综述

Transformer推理的全栈优化综述

专知会员服务

83+阅读 · 2023年3月4日

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

专知会员服务

43+阅读 · 2022年10月18日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

108+阅读 · 2021年10月30日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

专知会员服务

22+阅读 · 2020年4月11日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

专知会员服务

49+阅读 · 2020年2月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

锻造未来士兵：外骨骼、基因工程与赛博格

《无人机蜂群通信技术研究》50页

深入Project Maven：为何人工智能在战场上依然失灵

《无人机系统（UAS）通信网状网络试验性部署》50页报告

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Arxiv

0+阅读 · 2023年6月5日

Streaming Task Graph Scheduling for Dataflow Architectures

Arxiv

0+阅读 · 2023年6月5日

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

Arxiv

0+阅读 · 2023年6月4日

Auto-Spikformer: Spikformer Architecture Search

Arxiv

0+阅读 · 2023年6月1日

Graph Clustering with Graph Neural Networks

Arxiv

0+阅读 · 2023年6月1日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

相关基金

面向癌症精准诊疗的高密度微腔阵列式多重数字PCR芯片

国家自然科学基金

2+阅读 · 2017年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

肝癌细胞靶向性药物与miR-122共传输体系及其协同抗肿瘤作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于偶极矩调控和侧链修饰的新型聚合物材料的设计、制备与光电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

地球物理反演中的混合并行计算方法研究- - 以MT Occam并行反演为例

国家自然科学基金

0+阅读 · 2012年12月31日

100Gb/s高速光逻辑门及可重构光逻辑门芯片研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速Flash ADC量化模型设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU性能模型的异构系统优化技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员