CUTh-Solver: GPU-Accelerated Sparse Matrix Solver for High-Resolution Thermal Simulation of 3D ICs - 专知论文

会员服务 ·

0

稀疏 · 3D · Storage · ICS · 评论员 ·

CUTh-Solver: GPU-Accelerated Sparse Matrix Solver for High-Resolution Thermal Simulation of 3D ICs

翻译：暂无翻译

Chenghan Wang,Zhen Zhuang,Shui Jiang,Siyuan Liang,Xiaoman Yang,Kai Zhu,Darong Huang,Luis Costero,Rongmei Chen,Tsung-Wei Huang,David Atienza,Tsung-Yi Ho

Coarse-grained thermal simulation tends to underestimate localized thermal issues, potentially missing critical hotspots. Accurate analysis, therefore, demands fine-grained information, which dramatically increases grid resolution and thus computational workload. Fortunately, the coefficient matrices are often sparse with regular sparsity patterns, offering optimization opportunities. However, existing general-purpose matrix solvers on GPUs rarely exploit these domain-specific properties, thereby encountering bottlenecks in data storage, memory access, parallelism, computational efficiency, and hardware utilization. Therefore, we propose CUTh-Solver, a co-designed GPU-accelerated Preconditioned Conjugate Gradient (PCG)-based sparse solver framework for Symmetric Positive Definite (SPD) systems arising from high-resolution steady-state and transient 3D IC thermal simulation. For data storage, CUTh-Solver condenses the Diagonal (DIA) storage format to remove redundancy. To optimize the memory access, CUTh-Solver employs diagonal-wise SpMV to achieve coalesced memory access. We further observe a critical conflict between parallelism and preconditioning quality and thus adopt a high-parallelism preconditioning strategy. To improve computational efficiency and hardware utilization, we employ an adaptive fine-grained mixed-precision strategy that leverages diverse floating-point units to avoid resource contention, enhancing throughput without compromising numerical stability. Experimental results show that CUTh-Solver achieves up to 25.8x speedup over GPU-accelerated COMSOL Multiphysics 6.4 and over 3x speedup over NVIDIA's native general-purpose libraries (AmgX, cuSPARSE, cuDSS). Ablation studies validate the individual contribution of each optimization. The code is available at: https://github.com/Chenghan-Wang/CUTh-Solver

翻译：暂无翻译

0

相关内容

ICLR 2025 | 多模态大模型总"胡说八道"？「定位-修正」实现生成过程的幻觉抑制

ICLR 2025 | 多模态大模型总"胡说八道"？「定位-修正」实现生成过程的幻觉抑制

专知会员服务

17+阅读 · 2025年3月26日

最高9.0分！这16篇最高分ICLR2025论文必看！从生成模型到MOE等

最高9.0分！这16篇最高分ICLR2025论文必看！从生成模型到MOE等

专知会员服务

26+阅读 · 2024年11月19日

【CVPR 2022】深度安全多视图聚类:降低因视图增加而导致聚类性能下降的风险，Deep Safe Multi-view Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase

【CVPR 2022】深度安全多视图聚类:降低因视图增加而导致聚类性能下降的风险，Deep Safe Multi-view Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase

专知会员服务

10+阅读 · 2022年3月12日

【ICML2021】基于经典迭代算法的图神经网络

专知会员服务

30+阅读 · 2021年5月21日

干货！南京大学吴建鑫教授《模式识别》2021课程，附课件下载

干货！南京大学吴建鑫教授《模式识别》2021课程，附课件下载

专知会员服务

74+阅读 · 2021年4月14日

图像增强领域大突破！以1.66ms的速度处理4K图像，港理工提出图像自适应的3DLUT

专知会员服务

17+阅读 · 2020年9月25日

【何恺明团队新论文】PointRend:将图像分割视作渲染问题，性能显著提升！

【何恺明团队新论文】PointRend:将图像分割视作渲染问题，性能显著提升！

专知会员服务

28+阅读 · 2019年12月19日

【清华大学】自动微分蒙特卡洛，理论与应用，Automatic Differentiable Monte Carlo: Theory and Application (附pdf）

专知会员服务

28+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【泡泡图灵智库】使用平面特征IMU-Kinect融合SLAM的退化情况检测与补偿

【泡泡图灵智库】使用平面特征IMU-Kinect融合SLAM的退化情况检测与补偿

泡泡机器人SLAM

13+阅读 · 2019年9月20日

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

泡泡机器人SLAM

11+阅读 · 2019年9月17日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

【泡泡点云时空】SpiderCNN：利用参数化卷积滤波进行点集深度学习（ECCV2018-13）

【泡泡点云时空】SpiderCNN：利用参数化卷积滤波进行点集深度学习（ECCV2018-13）

泡泡机器人SLAM

10+阅读 · 2018年11月8日

PyTorch 中使用深度学习（CNN和LSTM）的自动图像捕获

PyTorch 中使用深度学习（CNN和LSTM）的自动图像捕获

AI研习社

40+阅读 · 2018年9月21日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

高精度片上抖动测量关键技术及电路实现研究

国家自然科学基金

0+阅读 · 2015年12月31日

集成电路中电热耦合建模理论及高效数值方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

大信号及宽带调制信号激励下AlGaN/GaN HEMT功率器件行为模型建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

大变形高固溶Mg含量Al-Mg合金的纳、微米混晶组织形成及强塑性同时提高机制

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩多介质流体的真正多维高保真算法

国家自然科学基金

0+阅读 · 2015年12月31日

使用GPU加速银道面尘埃辐射图像的高分辨率模拟与多参数反演

国家自然科学基金

0+阅读 · 2015年12月31日

超高速CMOS数模转换器关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

复杂腔体上电磁散射大波数问题非协调元逼近及加速技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

AlGaN/GaN MIS-HEMT器件在质子辐射下的退化机理，寿命预测模型与加固技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Multi-Head Attention-Based Feature Extractor Integration with Soft Actor-Critic for Porosity Prediction and Process Parameter Optimization in Additive Manufacturing

Arxiv

0+阅读 · 6月18日

Robust Convex Model Predictive Control with collision avoidance guarantees for robot manipulators

Arxiv

0+阅读 · 6月18日

Parallelizing Large-Scale Tensor Network Contraction on Multiple GPUs

Arxiv

0+阅读 · 6月1日

A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

Arxiv

0+阅读 · 5月28日

One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

Arxiv

0+阅读 · 5月20日

ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs

Arxiv

0+阅读 · 5月15日

Co-Design Optimization for Data Center Cooling System via Digital Twin

Arxiv

0+阅读 · 5月15日

Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs

Arxiv

0+阅读 · 5月7日

FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

Arxiv

0+阅读 · 5月5日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

3+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

4+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

5+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

4+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

4+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

4+阅读 · 6月22日

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

8+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

21+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

ICLR 2025 | 多模态大模型总"胡说八道"？「定位-修正」实现生成过程的幻觉抑制

ICLR 2025 | 多模态大模型总"胡说八道"？「定位-修正」实现生成过程的幻觉抑制

专知会员服务

17+阅读 · 2025年3月26日

最高9.0分！这16篇最高分ICLR2025论文必看！从生成模型到MOE等

最高9.0分！这16篇最高分ICLR2025论文必看！从生成模型到MOE等

专知会员服务

26+阅读 · 2024年11月19日

【CVPR 2022】深度安全多视图聚类:降低因视图增加而导致聚类性能下降的风险，Deep Safe Multi-view Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase

【CVPR 2022】深度安全多视图聚类:降低因视图增加而导致聚类性能下降的风险，Deep Safe Multi-view Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase

专知会员服务

10+阅读 · 2022年3月12日

【ICML2021】基于经典迭代算法的图神经网络

专知会员服务

30+阅读 · 2021年5月21日

干货！南京大学吴建鑫教授《模式识别》2021课程，附课件下载

干货！南京大学吴建鑫教授《模式识别》2021课程，附课件下载

专知会员服务

74+阅读 · 2021年4月14日

图像增强领域大突破！以1.66ms的速度处理4K图像，港理工提出图像自适应的3DLUT

专知会员服务

17+阅读 · 2020年9月25日

【何恺明团队新论文】PointRend:将图像分割视作渲染问题，性能显著提升！

【何恺明团队新论文】PointRend:将图像分割视作渲染问题，性能显著提升！

专知会员服务

28+阅读 · 2019年12月19日

【清华大学】自动微分蒙特卡洛，理论与应用，Automatic Differentiable Monte Carlo: Theory and Application (附pdf）

专知会员服务

28+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

【泡泡图灵智库】使用平面特征IMU-Kinect融合SLAM的退化情况检测与补偿

【泡泡图灵智库】使用平面特征IMU-Kinect融合SLAM的退化情况检测与补偿

泡泡机器人SLAM

13+阅读 · 2019年9月20日

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

泡泡机器人SLAM

11+阅读 · 2019年9月17日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

【泡泡点云时空】SpiderCNN：利用参数化卷积滤波进行点集深度学习（ECCV2018-13）

【泡泡点云时空】SpiderCNN：利用参数化卷积滤波进行点集深度学习（ECCV2018-13）

泡泡机器人SLAM

10+阅读 · 2018年11月8日

PyTorch 中使用深度学习（CNN和LSTM）的自动图像捕获

PyTorch 中使用深度学习（CNN和LSTM）的自动图像捕获

AI研习社

40+阅读 · 2018年9月21日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Multi-Head Attention-Based Feature Extractor Integration with Soft Actor-Critic for Porosity Prediction and Process Parameter Optimization in Additive Manufacturing

Arxiv

0+阅读 · 6月18日

Robust Convex Model Predictive Control with collision avoidance guarantees for robot manipulators

Arxiv

0+阅读 · 6月18日

Parallelizing Large-Scale Tensor Network Contraction on Multiple GPUs

Arxiv

0+阅读 · 6月1日

A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

Arxiv

0+阅读 · 5月28日

One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

Arxiv

0+阅读 · 5月20日

ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs

Arxiv

0+阅读 · 5月15日

Co-Design Optimization for Data Center Cooling System via Digital Twin

Arxiv

0+阅读 · 5月15日

Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs

Arxiv

0+阅读 · 5月7日

FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

Arxiv

0+阅读 · 5月5日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

相关基金

高精度片上抖动测量关键技术及电路实现研究

国家自然科学基金

0+阅读 · 2015年12月31日

集成电路中电热耦合建模理论及高效数值方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

大信号及宽带调制信号激励下AlGaN/GaN HEMT功率器件行为模型建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

大变形高固溶Mg含量Al-Mg合金的纳、微米混晶组织形成及强塑性同时提高机制

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩多介质流体的真正多维高保真算法

国家自然科学基金

0+阅读 · 2015年12月31日

使用GPU加速银道面尘埃辐射图像的高分辨率模拟与多参数反演

国家自然科学基金

0+阅读 · 2015年12月31日

超高速CMOS数模转换器关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

复杂腔体上电磁散射大波数问题非协调元逼近及加速技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

AlGaN/GaN MIS-HEMT器件在质子辐射下的退化机理，寿命预测模型与加固技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员