Clutch: High Performance Vector-Scalar Comparison using DRAM via Chunked Temporal Coding - 专知论文

会员服务 ·

0

Performer · 代码 · 向量化 · 标量 · 决策树 ·

Clutch: High Performance Vector-Scalar Comparison using DRAM via Chunked Temporal Coding

翻译：暂无翻译

Daichi Tokuda,Tatsuya Kubo,Ismail Emir Yuksel,Ataberk Olgun,Haocong Luo,Tomoya Nagatani,Geraldo F. Oliveira,Abdullah Giray Yağlıkçı,Mohammad Sadrosadati,Onur Mutlu,Shinya Takamaeda-Yamazaki

from arxiv, To appear at ICS 2026 (July 2026)

Vector-scalar comparison is a fundamental computation primitive that compares each element in a vector against a single scalar value. It is widely used in various data-intensive workloads from databases to machine learning. Due to its low computational intensity, its execution tends to be memory-bound, limiting the utilization of compute resources. Processing-using-DRAM (PuD) is an emerging computing paradigm that performs massively parallel bitwise operations directly inside DRAM arrays, alleviating off-chip data movement. Existing PuD-based approaches require many DRAM commands because the comparison's algorithmic complexity grows with operand bit-width in the bit-serial execution model. This command overhead becomes the dominant bottleneck, limiting application-level speedup. We propose Clutch, a data representation and comparison algorithm that accelerates vector-scalar comparisons in PuD systems with high efficiency and scalability. Clutch first uses temporal coding, encoding each vector value as a sequence of leading ones, which enables lookup-based comparison against a scalar by accessing the corresponding DRAM row. To avoid the prohibitive memory footprint of lookup tables at high precision, Clutch partitions operands into multiple multi-bit chunks, compares chunks independently using compact lookup tables, and merges the per-chunk results with a PuD-efficient procedure. By adjusting the number of chunks, Clutch provides a flexible tradeoff between throughput and memory usage. Across predicate evaluation and decision tree inference, Clutch improves end-to-end application throughput and energy efficiency by an average of 12x and 69x over highly optimized CPU and GPU execution, and by 2.9x and 3.0x over the state-of-the-art bit-serial PuD implementation. We also present the first mapping of decision tree inference to PuD execution, extending PuD to a new application domain.

翻译：暂无翻译

0

相关内容

Performer

PALANTIR GOTHAM平台：人工智能赋能作战

PALANTIR GOTHAM平台：人工智能赋能作战

专知会员服务

41+阅读 · 5月17日

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

专知会员服务

11+阅读 · 2025年6月27日

【ICML 2024】零阶优化器微调大模型，大幅降低内存

【ICML 2024】零阶优化器微调大模型，大幅降低内存

专知会员服务

32+阅读 · 2024年7月8日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

清华49页长文全方位分析参数高效微调方案Delta Tuning，揭秘大模型背后的机理

清华49页长文全方位分析参数高效微调方案Delta Tuning，揭秘大模型背后的机理

专知会员服务

50+阅读 · 2022年4月8日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ICCV 2019 Tutorial】Holistic 3D Reconstruction: Learning to Reconstruct Holistic 3D Structures from Sensorial Data（整体3D重建：学习从感官数据重建整体3D结构），宾夕法尼亚州立大学 Zihan Zhou，西蒙弗雷泽大学计算机科学系 Yasutaka Furukawa，UCB 马毅

【ICCV 2019 Tutorial】Holistic 3D Reconstruction: Learning to Reconstruct Holistic 3D Structures from Sensorial Data（整体3D重建：学习从感官数据重建整体3D结构），宾夕法尼亚州立大学 Zihan Zhou，西蒙弗雷泽大学计算机科学系 Yasutaka Furukawa，UCB 马毅

专知会员服务

29+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

PyTorch 深度剖析：如何使用模型并行技术（Model Parallel）

PyTorch 深度剖析：如何使用模型并行技术（Model Parallel）

极市平台

11+阅读 · 2021年11月18日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

【泡泡点云时空】跟踪与三角测量中一种通过兴趣点网络进行多视图2D/3D刚性配准的方法

【泡泡点云时空】跟踪与三角测量中一种通过兴趣点网络进行多视图2D/3D刚性配准的方法

泡泡机器人SLAM

17+阅读 · 2019年7月8日

Perseus-BERT——业内性能极致优化的BERT训练方案

Perseus-BERT——业内性能极致优化的BERT训练方案

云栖社区

15+阅读 · 2019年2月20日

综述：Image Caption 任务之语句多样性

综述：Image Caption 任务之语句多样性

PaperWeekly

22+阅读 · 2018年11月30日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

手把手教 | 深度学习库PyTorch（附代码）

手把手教 | 深度学习库PyTorch（附代码）

数据派THU

27+阅读 · 2018年3月15日

YesOfCourse团队在Kaggle文本匹配竞赛中获得优异成绩

YesOfCourse团队在Kaggle文本匹配竞赛中获得优异成绩

中国科学院网络数据重点实验室

10+阅读 · 2017年6月15日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

异构众核处理器非对称片上互连网络研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

高速冲击破碎问题的Hamilton粒子重构单元方法

国家自然科学基金

0+阅读 · 2015年12月31日

具有重构特征的系统可靠性建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于虚拟原型的信息物理融合系统高效可信构造研究

国家自然科学基金

8+阅读 · 2015年12月31日

面向视觉质量的高效立体视频编码资源分配优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

可与MPSoC高度融合的片上自主测试-自主修复关键技术研究：针对自然、人为可靠性威胁

国家自然科学基金

0+阅读 · 2015年12月31日

面向大数据计算的高吞吐量众核处理器关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

Task Vector Bases: A Unified and Scalable Framework for Compressed Task Arithmetic

Arxiv

0+阅读 · 6月23日

Collapsed Effective Operators for Higher-order Structures

Collapsed Effective Operators for Higher-order Structures

Arxiv

0+阅读 · 6月22日

BatchGen: An Architecture for Scalable and Efficient Batch Inference

Arxiv

0+阅读 · 6月19日

CLAR: Learning 3D Representations for Robotic Manipulation by Fusing Masked Reconstruction with Multi-Level Contrastive Alignment

Arxiv

0+阅读 · 6月19日

ClayBuddy: A Framework, Evaluation, & Mitigation of Coding Agent Failures

Arxiv

0+阅读 · 6月19日

Randomized Sketching is Robust to Low-Precision Rounding on GPUs

Arxiv

0+阅读 · 6月18日

SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered Clustering

Arxiv

0+阅读 · 6月17日

LiveStack: OS Support for Cluster-Scale Full-Stack Live Simulation

Arxiv

0+阅读 · 6月17日

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

Arxiv

0+阅读 · 6月16日

A skew polynomial framework for constructing division algebras and linear maximum rank distance codes

Arxiv

0+阅读 · 6月16日

VIP会员

文章信息

相关主题

最新内容

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

10+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

10+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

相关VIP内容

PALANTIR GOTHAM平台：人工智能赋能作战

PALANTIR GOTHAM平台：人工智能赋能作战

专知会员服务

41+阅读 · 5月17日

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

专知会员服务

11+阅读 · 2025年6月27日

【ICML 2024】零阶优化器微调大模型，大幅降低内存

【ICML 2024】零阶优化器微调大模型，大幅降低内存

专知会员服务

32+阅读 · 2024年7月8日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

清华49页长文全方位分析参数高效微调方案Delta Tuning，揭秘大模型背后的机理

清华49页长文全方位分析参数高效微调方案Delta Tuning，揭秘大模型背后的机理

专知会员服务

50+阅读 · 2022年4月8日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ICCV 2019 Tutorial】Holistic 3D Reconstruction: Learning to Reconstruct Holistic 3D Structures from Sensorial Data（整体3D重建：学习从感官数据重建整体3D结构），宾夕法尼亚州立大学 Zihan Zhou，西蒙弗雷泽大学计算机科学系 Yasutaka Furukawa，UCB 马毅

【ICCV 2019 Tutorial】Holistic 3D Reconstruction: Learning to Reconstruct Holistic 3D Structures from Sensorial Data（整体3D重建：学习从感官数据重建整体3D结构），宾夕法尼亚州立大学 Zihan Zhou，西蒙弗雷泽大学计算机科学系 Yasutaka Furukawa，UCB 马毅

专知会员服务

29+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

巡飞弹与反无人机系统——现代战场的两大支柱

《北约数字教官网络发展路径》128页报告

无人机自主控制与人工智能：系统性综述

《打造“黄金舰队”》57页报告

相关资讯

PyTorch 深度剖析：如何使用模型并行技术（Model Parallel）

PyTorch 深度剖析：如何使用模型并行技术（Model Parallel）

极市平台

11+阅读 · 2021年11月18日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

【泡泡点云时空】跟踪与三角测量中一种通过兴趣点网络进行多视图2D/3D刚性配准的方法

【泡泡点云时空】跟踪与三角测量中一种通过兴趣点网络进行多视图2D/3D刚性配准的方法

泡泡机器人SLAM

17+阅读 · 2019年7月8日

Perseus-BERT——业内性能极致优化的BERT训练方案

Perseus-BERT——业内性能极致优化的BERT训练方案

云栖社区

15+阅读 · 2019年2月20日

综述：Image Caption 任务之语句多样性

综述：Image Caption 任务之语句多样性

PaperWeekly

22+阅读 · 2018年11月30日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

手把手教 | 深度学习库PyTorch（附代码）

手把手教 | 深度学习库PyTorch（附代码）

数据派THU

27+阅读 · 2018年3月15日

YesOfCourse团队在Kaggle文本匹配竞赛中获得优异成绩

YesOfCourse团队在Kaggle文本匹配竞赛中获得优异成绩

中国科学院网络数据重点实验室

10+阅读 · 2017年6月15日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Task Vector Bases: A Unified and Scalable Framework for Compressed Task Arithmetic

Arxiv

0+阅读 · 6月23日

Collapsed Effective Operators for Higher-order Structures

Collapsed Effective Operators for Higher-order Structures

Arxiv

0+阅读 · 6月22日

BatchGen: An Architecture for Scalable and Efficient Batch Inference

Arxiv

0+阅读 · 6月19日

CLAR: Learning 3D Representations for Robotic Manipulation by Fusing Masked Reconstruction with Multi-Level Contrastive Alignment

Arxiv

0+阅读 · 6月19日

ClayBuddy: A Framework, Evaluation, & Mitigation of Coding Agent Failures

Arxiv

0+阅读 · 6月19日

Randomized Sketching is Robust to Low-Precision Rounding on GPUs

Arxiv

0+阅读 · 6月18日

SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered Clustering

Arxiv

0+阅读 · 6月17日

LiveStack: OS Support for Cluster-Scale Full-Stack Live Simulation

Arxiv

0+阅读 · 6月17日

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

Arxiv

0+阅读 · 6月16日

A skew polynomial framework for constructing division algebras and linear maximum rank distance codes

Arxiv

0+阅读 · 6月16日

相关基金

异构众核处理器非对称片上互连网络研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

高速冲击破碎问题的Hamilton粒子重构单元方法

国家自然科学基金

0+阅读 · 2015年12月31日

具有重构特征的系统可靠性建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于虚拟原型的信息物理融合系统高效可信构造研究

国家自然科学基金

8+阅读 · 2015年12月31日

面向视觉质量的高效立体视频编码资源分配优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

可与MPSoC高度融合的片上自主测试-自主修复关键技术研究：针对自然、人为可靠性威胁

国家自然科学基金

0+阅读 · 2015年12月31日

面向大数据计算的高吞吐量众核处理器关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员