Characterizing the Impact of NVFP4 Quantization for Low-Power Edge AI Deployment - 专知论文

会员服务 ·

0

推断 · 模型评估 · 缩放 · 边 · 查准率/准确率 ·

Characterizing the Impact of NVFP4 Quantization for Low-Power Edge AI Deployment

翻译：暂无翻译

Ovishake Sen,Venkata Nithin Kamineni,Daniel Lobo,Swarup Bhunia,Rickard Ewetz,Baibhab Chatterjee

from arxiv, 7 Pages

Energy-efficient neural-network inference at the edge requires reducing arithmetic cost, memory traffic, computation energy, and storage overhead while maintaining acceptable accuracy. This paper presents an ablation-focused study of NVFP4 quantization for edge-efficient neural networks, with emphasis on the relationship between activation precision, weight precision, block-size scaling, retraining, and model accuracy. NVFP4 activations are represented using 4-bit FP4 data, an FP8 block scale, and an FP32 tensor scale, enabling ultra-low precision inference while preserving activation dynamic range. A block-size ablation over six edge-efficient models shows that block size B = 16 provides a practical accuracy/storage trade-off, requiring only 4.5078 bits per input for N = 4096. A weight precision ablation further shows that FP8 and FP16 weights provide only modest gains over FP4 weights under the same NVFP4 activation path, suggesting that activation quantization and scaling dominate much of the accuracy behavior. To isolate the benefit of the NVFP4 data type, this work compares conventional unscaled FP4 activation inference and NVFP4 activation inference with and without retraining. The results show that conventional FP4 inference collapses accuracy for most compact models, while NVFP4 without retraining already recovers substantial accuracy by restoring activation dynamic range through FP8 block scaling and FP32 tensor scaling. When combined with retraining, NVFP4 achieves the best accuracy across the evaluated models, demonstrating the effectiveness of scaling-aware FP4 (NVFP4) inference. These findings provide general design guidance for hardware-software co-design of low power edge inference across a broad range of accelerator platforms, including GPUs, Tensor Cores, FPGAs, domain-specific AI accelerators, near-memory computing systems, and emerging edge-computing architectures.

翻译：暂无翻译

0

相关内容

EdgeRunner AI：在本地设备关键军事任务中实现GPT-5级性能表现（附论文）

EdgeRunner AI：在本地设备关键军事任务中实现GPT-5级性能表现（附论文）

专知会员服务

29+阅读 · 2025年11月19日

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

专知会员服务

21+阅读 · 2020年5月20日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

124+阅读 · 2020年3月30日

【机器学习论文推荐】EfficientNet:卷积神经网络的再思考模型缩放（EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks）

【机器学习论文推荐】EfficientNet:卷积神经网络的再思考模型缩放（EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks）

专知会员服务

17+阅读 · 2019年12月24日

【论文】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图（Edge Computing: A Comprehensive Surveyof Current Initiativesand a Roadmap for a Sustainable Edge Computing Development）

【论文】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图（Edge Computing: A Comprehensive Surveyof Current Initiativesand a Roadmap for a Sustainable Edge Computing Development）

专知会员服务

29+阅读 · 2019年12月19日

【论文推荐】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图

【论文推荐】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图

专知会员服务

36+阅读 · 2019年12月19日

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

专知会员服务

14+阅读 · 2019年11月25日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

96+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

图神经网络入门（三）GAT图注意力网络

图神经网络入门（三）GAT图注意力网络

图与推荐

10+阅读 · 2020年5月14日

【泡泡图灵智库】工业环境中用于表面缺陷检测的全卷积网络

【泡泡图灵智库】工业环境中用于表面缺陷检测的全卷积网络

泡泡机器人SLAM

12+阅读 · 2019年9月21日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【知识图谱】知识图谱+人工智能=新型网络信息体系

【知识图谱】知识图谱+人工智能=新型网络信息体系

产业智能官

14+阅读 · 2018年11月18日

每日论文 | 图形深度神经网络并行框架NGra；用人类注意力进行序列分类；针对多智能体协作的图卷积强化学习

每日论文 | 图形深度神经网络并行框架NGra；用人类注意力进行序列分类；针对多智能体协作的图卷积强化学习

论智

26+阅读 · 2018年10月30日

每日论文 | CV中深度学习涉及到的几何和不确定性；用深度学习分析气象；可自动调整模拟器参数的模型

每日论文 | CV中深度学习涉及到的几何和不确定性；用深度学习分析气象；可自动调整模拟器参数的模型

论智

11+阅读 · 2018年10月9日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

详述DeepMind wavenet原理及其TensorFlow实现

详述DeepMind wavenet原理及其TensorFlow实现

深度学习每日摘要

12+阅读 · 2017年6月26日

基于时变回声状态网的光伏发电在线短期预测方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向绿色交通的智能车辆变工况行驶能耗反馈与耗散控制方法

国家自然科学基金

0+阅读 · 2015年12月31日

面向智能电网多元储能系统的信息综合利用及自学习研究

国家自然科学基金

1+阅读 · 2015年12月31日

智能电网环境下地理分布式互联网数据中心的能量成本降低方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于不完全测量信息的随机忆阻神经网络的参数与状态估计问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于盲源-变点分析的光伏微网变流器疲劳损伤随机过程特性与剩余寿命在线预测研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于文本模式的海量电能质量数据自动分析技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

消纳大规模间歇性可再生能源对智能电网脆弱性的影响研究

国家自然科学基金

0+阅读 · 2014年12月31日

能源效率测度和资源优化配置的非参数前沿面建模方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于演化博弈的智能电网供需动态耦合优化及政策分析

国家自然科学基金

7+阅读 · 2014年12月31日

Toward Multi-Domain and Long-Tailed Quantization via Feature Alignment and Scaling

Arxiv

0+阅读 · 6月21日

PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference

Arxiv

0+阅读 · 6月19日

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

Arxiv

0+阅读 · 6月18日

SPINE: A Fault Injection Profiler for Quantized Neural Networks under Accumulated Faults

Arxiv

0+阅读 · 6月17日

VQ4SNN: Vector Quantization for Memory-Efficient FPGA Spiking Neural Networks

Arxiv

0+阅读 · 6月12日

A Survey on UAV-enabled Edge Computing: Resource Management Perspective

Arxiv

10+阅读 · 2022年10月13日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

A Survey on Edge Intelligence

A Survey on Edge Intelligence

Arxiv

52+阅读 · 2020年3月26日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

Deep Learning for Energy Markets

Deep Learning for Energy Markets

Arxiv

11+阅读 · 2019年4月10日

VIP会员

文章信息

相关主题

查准率/准确率

最新内容

《曝光下的战争：战场过滤与乌克兰军事选择的窄化》

《曝光下的战争：战场过滤与乌克兰军事选择的窄化》

专知会员服务

2+阅读 · 今天7:13

俄乌无人机战争的六大启示

俄乌无人机战争的六大启示

专知会员服务

4+阅读 · 今天7:07

《无人机空中监控：通信实验洞察》

《无人机空中监控：通信实验洞察》

专知会员服务

3+阅读 · 今天7:05

《无全球定位系统及通信拒止环境下用于地面目标防护的分布式无人机蜂群》（含代码）

《无全球定位系统及通信拒止环境下用于地面目标防护的分布式无人机蜂群》（含代码）

专知会员服务

3+阅读 · 今天6:59

从采集到决策：美军视角下的战术情报范式重构

从采集到决策：美军视角下的战术情报范式重构

专知会员服务

12+阅读 · 8月2日

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

专知会员服务

5+阅读 · 8月2日

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

专知会员服务

10+阅读 · 8月2日

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

专知会员服务

12+阅读 · 8月2日

《履带式无人地面战车技术发展现状》

《履带式无人地面战车技术发展现状》

专知会员服务

6+阅读 · 8月2日

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

专知会员服务

10+阅读 · 8月1日

隐身技术前沿综述：物理机理、工程实践与战略展望

隐身技术前沿综述：物理机理、工程实践与战略展望

专知会员服务

8+阅读 · 8月1日

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

专知会员服务

9+阅读 · 8月1日

《以机反机：基于无人机载麦克风的空中周界入侵检测》

《以机反机：基于无人机载麦克风的空中周界入侵检测》

专知会员服务

8+阅读 · 8月1日

《无人机脆弱性利用：网络空间力量的新域》

《无人机脆弱性利用：网络空间力量的新域》

专知会员服务

6+阅读 · 8月1日

美空军如何将人工智能从战场部署至后方机关

美空军如何将人工智能从战场部署至后方机关

专知会员服务

13+阅读 · 7月31日

相关VIP内容

EdgeRunner AI：在本地设备关键军事任务中实现GPT-5级性能表现（附论文）

EdgeRunner AI：在本地设备关键军事任务中实现GPT-5级性能表现（附论文）

专知会员服务

29+阅读 · 2025年11月19日

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

专知会员服务

21+阅读 · 2020年5月20日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

124+阅读 · 2020年3月30日

【机器学习论文推荐】EfficientNet:卷积神经网络的再思考模型缩放（EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks）

【机器学习论文推荐】EfficientNet:卷积神经网络的再思考模型缩放（EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks）

专知会员服务

17+阅读 · 2019年12月24日

【论文】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图（Edge Computing: A Comprehensive Surveyof Current Initiativesand a Roadmap for a Sustainable Edge Computing Development）

【论文】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图（Edge Computing: A Comprehensive Surveyof Current Initiativesand a Roadmap for a Sustainable Edge Computing Development）

专知会员服务

29+阅读 · 2019年12月19日

【论文推荐】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图

【论文推荐】边缘计算:对当前计划的全面调查和可持续边缘计算发展的路线图

专知会员服务

36+阅读 · 2019年12月19日

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

专知会员服务

14+阅读 · 2019年11月25日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

96+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌无人机战争的六大启示

《无全球定位系统及通信拒止环境下用于地面目标防护的分布式无人机蜂群》（含代码）

《曝光下的战争：战场过滤与乌克兰军事选择的窄化》

《无人机空中监控：通信实验洞察》

相关资讯

图神经网络入门（三）GAT图注意力网络

图神经网络入门（三）GAT图注意力网络

图与推荐

10+阅读 · 2020年5月14日

【泡泡图灵智库】工业环境中用于表面缺陷检测的全卷积网络

【泡泡图灵智库】工业环境中用于表面缺陷检测的全卷积网络

泡泡机器人SLAM

12+阅读 · 2019年9月21日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【知识图谱】知识图谱+人工智能=新型网络信息体系

【知识图谱】知识图谱+人工智能=新型网络信息体系

产业智能官

14+阅读 · 2018年11月18日

每日论文 | 图形深度神经网络并行框架NGra；用人类注意力进行序列分类；针对多智能体协作的图卷积强化学习

每日论文 | 图形深度神经网络并行框架NGra；用人类注意力进行序列分类；针对多智能体协作的图卷积强化学习

论智

26+阅读 · 2018年10月30日

每日论文 | CV中深度学习涉及到的几何和不确定性；用深度学习分析气象；可自动调整模拟器参数的模型

每日论文 | CV中深度学习涉及到的几何和不确定性；用深度学习分析气象；可自动调整模拟器参数的模型

论智

11+阅读 · 2018年10月9日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

详述DeepMind wavenet原理及其TensorFlow实现

详述DeepMind wavenet原理及其TensorFlow实现

深度学习每日摘要

12+阅读 · 2017年6月26日

相关论文

Toward Multi-Domain and Long-Tailed Quantization via Feature Alignment and Scaling

Arxiv

0+阅读 · 6月21日

PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference

Arxiv

0+阅读 · 6月19日

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

Arxiv

0+阅读 · 6月18日

SPINE: A Fault Injection Profiler for Quantized Neural Networks under Accumulated Faults

Arxiv

0+阅读 · 6月17日

VQ4SNN: Vector Quantization for Memory-Efficient FPGA Spiking Neural Networks

Arxiv

0+阅读 · 6月12日

A Survey on UAV-enabled Edge Computing: Resource Management Perspective

Arxiv

10+阅读 · 2022年10月13日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

A Survey on Edge Intelligence

A Survey on Edge Intelligence

Arxiv

52+阅读 · 2020年3月26日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

Deep Learning for Energy Markets

Deep Learning for Energy Markets

Arxiv

11+阅读 · 2019年4月10日

相关基金

基于时变回声状态网的光伏发电在线短期预测方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向绿色交通的智能车辆变工况行驶能耗反馈与耗散控制方法

国家自然科学基金

0+阅读 · 2015年12月31日

面向智能电网多元储能系统的信息综合利用及自学习研究

国家自然科学基金

1+阅读 · 2015年12月31日

智能电网环境下地理分布式互联网数据中心的能量成本降低方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于不完全测量信息的随机忆阻神经网络的参数与状态估计问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于盲源-变点分析的光伏微网变流器疲劳损伤随机过程特性与剩余寿命在线预测研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于文本模式的海量电能质量数据自动分析技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

消纳大规模间歇性可再生能源对智能电网脆弱性的影响研究

国家自然科学基金

0+阅读 · 2014年12月31日

能源效率测度和资源优化配置的非参数前沿面建模方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于演化博弈的智能电网供需动态耦合优化及政策分析

国家自然科学基金

7+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员