Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP - 专知论文

会员服务 ·

0

Performer · 优化器 · GPUs · Analysis · 同质 ·

Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP

翻译：暂无翻译

Ghazal Rahimi,Victor Lopez,Marc Clascà,Joan Vinyals Ylla Català,Jesus Labarta,Marta Garcia-Gasulla

The increasing adoption of heterogeneous platforms that combine CPUs with accelerators such as GPUs in high-performance computing (HPC) introduces new challenges for performance analysis and optimization. Traditional efficiency metrics, such as those proposed by the Performance Optimization and Productivity (POP) Center of Excellence, were designed primarily for homogeneous CPU-based systems and therefore, do not capture the complex interactions between host and device resources. In this work, we extend the POP efficiency framework to heterogeneous architectures by introducing a new hierarchy of metrics that separately evaluate host and device efficiency. On the host side, we quantify the effectiveness of hybrid execution and offloading operations. On the device side, we propose a multiplicative hierarchy analogous to the host hierarchy and define its Parallel Efficiency branch. Beyond their definition and formulation, we present the implementation of these metrics in the TALP module of the DLB library. TALP is a lightweight monitoring library that provides measurements both post mortem and at runtime, with outputs available in textual and machine-readable formats. We validate the proposed framework through synthetic benchmarks and three production HPC applications, demonstrating how the metrics expose inefficiencies in offloading, load balance, and orchestration. Results show that the extended TALP metrics provide actionable insights to guide developers in optimizing heterogeneous HPC codes.

翻译：暂无翻译

0

相关内容

Performer

《多机器人系统协作效能提升：基于模型与数据驱动的具身智能方法》339页

《多机器人系统协作效能提升：基于模型与数据驱动的具身智能方法》339页

专知会员服务

60+阅读 · 2025年4月6日

【Nature Machine Intelligence】大规模多智能体系统的高效强化学习

【Nature Machine Intelligence】大规模多智能体系统的高效强化学习

专知会员服务

45+阅读 · 2024年9月7日

【阿姆斯特丹博士论文】GPU图算法性能分析与预测，227页pdf

【阿姆斯特丹博士论文】GPU图算法性能分析与预测，227页pdf

专知会员服务

40+阅读 · 2023年4月10日

《用于计算机系统的人工智能增强设计空间探索的机器学习》哥伦比亚大学2022最新博士论文

《用于计算机系统的人工智能增强设计空间探索的机器学习》哥伦比亚大学2022最新博士论文

专知会员服务

16+阅读 · 2022年6月6日

FPGA加速系统开发工具设计:综述与实践

FPGA加速系统开发工具设计:综述与实践

专知会员服务

69+阅读 · 2020年6月24日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

【AAAI2020-清华大学】高效的异构协同过滤推荐（Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation），张敏，马少平等

【AAAI2020-清华大学】高效的异构协同过滤推荐（Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation），张敏，马少平等

专知会员服务

61+阅读 · 2019年11月22日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

96+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【Nature Medicine】人工智能与医学结合的最新综述，附13页PDF

专知会员服务

107+阅读 · 2019年1月7日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

机器之心

20+阅读 · 2019年5月3日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

陈天奇团队推出开源AI芯片栈VTA，降低芯片设计门槛

陈天奇团队推出开源AI芯片栈VTA，降低芯片设计门槛

AI前线

15+阅读 · 2018年7月13日

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

极市平台

16+阅读 · 2018年1月20日

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

专知

27+阅读 · 2017年12月17日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

电场调制增强型AlGaN/GaN HEMT关键技术研究

国家自然科学基金

0+阅读 · 2017年12月31日

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

大信号及宽带调制信号激励下AlGaN/GaN HEMT功率器件行为模型建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

嵌入式异构多核系统应用程序自动并行化过程关键技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

面向可重构多核处理器系统的分层次自适应优化机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向存储受限应用的GPU性能预测模型和通信优化关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

带有通信量化和延时的多智能体系统一致性研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

CPU和GPU混合体系结构上生物网络比对并行算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Coordinated Power Management on Heterogeneous Systems

Arxiv

0+阅读 · 4月23日

A Total Lagrangian Finite Element Framework for Multibody Dynamics: Part II -- GPU Implementation and Numerical Experiments

Arxiv

0+阅读 · 4月21日

EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems

Arxiv

0+阅读 · 4月19日

Dynamic Multi-Robot Task Allocation under Uncertainty and Communication Constraints: A Game-Theoretic Approach

Arxiv

0+阅读 · 4月13日

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading

Arxiv

0+阅读 · 4月9日

Hardware-Level Governance of AI Compute: A Feasibility Taxonomy for Regulatory Compliance and Treaty Verification

Arxiv

0+阅读 · 4月6日

A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems

Arxiv

0+阅读 · 4月2日

Differentiable Initialization-Accelerated CPU-GPU Hybrid Combinatorial Scheduling

Arxiv

0+阅读 · 3月30日

Agentic Trust Coordination for Federated Learning through Adaptive Thresholding and Autonomous Decision Making in Sustainable and Resilient Industrial Networks

Arxiv

0+阅读 · 3月26日

Decentralized Task Scheduling in Distributed Systems: A Deep Reinforcement Learning Approach

Arxiv

0+阅读 · 3月25日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

6+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

2+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

2+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

4+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

5+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

12+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

4+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

5+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

2+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

2+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

11+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

8+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

8+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

6+阅读 · 5月4日

相关VIP内容

《多机器人系统协作效能提升：基于模型与数据驱动的具身智能方法》339页

《多机器人系统协作效能提升：基于模型与数据驱动的具身智能方法》339页

专知会员服务

60+阅读 · 2025年4月6日

【Nature Machine Intelligence】大规模多智能体系统的高效强化学习

【Nature Machine Intelligence】大规模多智能体系统的高效强化学习

专知会员服务

45+阅读 · 2024年9月7日

【阿姆斯特丹博士论文】GPU图算法性能分析与预测，227页pdf

【阿姆斯特丹博士论文】GPU图算法性能分析与预测，227页pdf

专知会员服务

40+阅读 · 2023年4月10日

《用于计算机系统的人工智能增强设计空间探索的机器学习》哥伦比亚大学2022最新博士论文

《用于计算机系统的人工智能增强设计空间探索的机器学习》哥伦比亚大学2022最新博士论文

专知会员服务

16+阅读 · 2022年6月6日

FPGA加速系统开发工具设计:综述与实践

FPGA加速系统开发工具设计:综述与实践

专知会员服务

69+阅读 · 2020年6月24日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

【AAAI2020-清华大学】高效的异构协同过滤推荐（Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation），张敏，马少平等

【AAAI2020-清华大学】高效的异构协同过滤推荐（Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation），张敏，马少平等

专知会员服务

61+阅读 · 2019年11月22日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

96+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【Nature Medicine】人工智能与医学结合的最新综述，附13页PDF

专知会员服务

107+阅读 · 2019年1月7日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

机器之心

20+阅读 · 2019年5月3日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

陈天奇团队推出开源AI芯片栈VTA，降低芯片设计门槛

陈天奇团队推出开源AI芯片栈VTA，降低芯片设计门槛

AI前线

15+阅读 · 2018年7月13日

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

极市平台

16+阅读 · 2018年1月20日

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

专知

27+阅读 · 2017年12月17日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

相关论文

Coordinated Power Management on Heterogeneous Systems

Arxiv

0+阅读 · 4月23日

A Total Lagrangian Finite Element Framework for Multibody Dynamics: Part II -- GPU Implementation and Numerical Experiments

Arxiv

0+阅读 · 4月21日

EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems

Arxiv

0+阅读 · 4月19日

Dynamic Multi-Robot Task Allocation under Uncertainty and Communication Constraints: A Game-Theoretic Approach

Arxiv

0+阅读 · 4月13日

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading

Arxiv

0+阅读 · 4月9日

Hardware-Level Governance of AI Compute: A Feasibility Taxonomy for Regulatory Compliance and Treaty Verification

Arxiv

0+阅读 · 4月6日

A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems

Arxiv

0+阅读 · 4月2日

Differentiable Initialization-Accelerated CPU-GPU Hybrid Combinatorial Scheduling

Arxiv

0+阅读 · 3月30日

Agentic Trust Coordination for Federated Learning through Adaptive Thresholding and Autonomous Decision Making in Sustainable and Resilient Industrial Networks

Arxiv

0+阅读 · 3月26日

Decentralized Task Scheduling in Distributed Systems: A Deep Reinforcement Learning Approach

Arxiv

0+阅读 · 3月25日

相关基金

电场调制增强型AlGaN/GaN HEMT关键技术研究

国家自然科学基金

0+阅读 · 2017年12月31日

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

大信号及宽带调制信号激励下AlGaN/GaN HEMT功率器件行为模型建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

嵌入式异构多核系统应用程序自动并行化过程关键技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

面向可重构多核处理器系统的分层次自适应优化机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向存储受限应用的GPU性能预测模型和通信优化关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

带有通信量化和延时的多智能体系统一致性研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

CPU和GPU混合体系结构上生物网络比对并行算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员