ZeroDex: Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning - 专知论文

会员服务 ·

0

3D · Ray · 讲稿 · 端到端 · MoDELS ·

ZeroDex: Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning

翻译：暂无翻译

Jisoo Kim,Sangwon Baik,Taeksoo Kim,Sungjoo Kim,Junyoung Lee,Mingi Choi,Hanbyul Joo

We present ZeroDex, a zero-shot framework for long-horizon dexterous manipulation that grounds language instructions into executable 3D task plans from calibrated multi-view RGB images. Rather than training an end-to-end policy, our system uses a vision-language model (VLM) to produce reference-frame task grounding and primitive-level 2D keypoints, then lifts them into 3D via multi-view fusion. This lifting combines triangulation of view-wise VLM groundings with reference-view ray voting, which searches along a semantic camera ray for geometrically consistent candidates across neighboring views. The resulting 3D keypoints support both pick-and-place and tool-use: for tool-use, we retrieve an object-centric atomic action corresponding to the inferred skill category and align its stored 6D tool trajectory to the scene; for dexterous execution, we expand the lifted grasp keypoint into a task-conditioned grasp affordance region and generate feasible grasp-motion pairs with an arm-hand motion generator. Real-world experiments show improved 3D grounding accuracy and execution reliability over single-view RGB-D grounding and fine-tuned VLA baselines. We further demonstrate long-horizon manipulation through closed-loop status verification and replan, enabling zero-shot execution on unseen objects and tool-use tasks in novel scenes.

翻译：暂无翻译

0

相关内容

3D是英文“Three Dimensions”的简称，中文是指三维、三个维度、三个坐标，即有长、有宽、有高，换句话说，就是立体的，是相对于只有长和宽的平面（2D）而言。

NeurIPS 2025｜从层次化掩码的视角统一并增强 Graph Transformer

NeurIPS 2025｜从层次化掩码的视角统一并增强 Graph Transformer

专知会员服务

9+阅读 · 2025年11月13日

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

专知会员服务

21+阅读 · 2025年1月15日

[ICML2024] Spotlight|DAT：通过交互式注意力实现统一的多粒度文本检测

[ICML2024] Spotlight|DAT：通过交互式注意力实现统一的多粒度文本检测

专知会员服务

19+阅读 · 2024年6月26日

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

专知会员服务

12+阅读 · 2020年3月13日

【KDD2019|讲座推荐】零阶优化及其在数据挖掘和机器学习中对抗鲁棒性的应用研究进展：Recent Progress in Zeroth Order Optimization and Its Applications to Adversarial Robustness in Data Mining and Machine Learning

【KDD2019|讲座推荐】零阶优化及其在数据挖掘和机器学习中对抗鲁棒性的应用研究进展：Recent Progress in Zeroth Order Optimization and Its Applications to Adversarial Robustness in Data Mining and Machine Learning

专知会员服务

16+阅读 · 2019年12月6日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

地平线提出AFDet：首个Anchor free、NMS free的3D目标检测算法

地平线提出AFDet：首个Anchor free、NMS free的3D目标检测算法

CVer

10+阅读 · 2020年6月27日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

这套1600赞的NLP课程已开放，面向实战，视频代码都有丨资源

这套1600赞的NLP课程已开放，面向实战，视频代码都有丨资源

量子位

15+阅读 · 2019年7月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

新智元

12+阅读 · 2018年7月13日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【深度强化学习】深度强化学习揭秘

【深度强化学习】深度强化学习揭秘

产业智能官

21+阅读 · 2017年11月13日

【干货】神经机器翻译全流程解析，one-shot 和 zero-shot 学习成亮点

【干货】神经机器翻译全流程解析，one-shot 和 zero-shot 学习成亮点

新智元

10+阅读 · 2017年4月2日

广义双随机相位编码系统中以QR码为载体的信息加密及无损恢复

国家自然科学基金

0+阅读 · 2015年12月31日

不同加工层次和不同时空尺度下无意识加工之间的相互作用

国家自然科学基金

0+阅读 · 2015年12月31日

5G极化码译码算法理论与实现关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于多准则场景缩减的“零停机”设备状态预测与维护方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

多约束协同的彩色夜视影像亚像素超分辨率重建

国家自然科学基金

1+阅读 · 2015年12月31日

多纹理多深度的3D视频码率控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

有限局部交换环与零化理想图

国家自然科学基金

0+阅读 · 2014年12月31日

千万自由度量级并行有限元模态和振动分析软件研发

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

Agentic Collaborative Cognition for Zero-Shot 3D Understanding

Agentic Collaborative Cognition for Zero-Shot 3D Understanding

Arxiv

0+阅读 · 6月23日

ZeroGVC: Zero-Shot Generative Video Compression with Autoregressive Diffusion Priors

Arxiv

0+阅读 · 6月23日

Bridging Semantics and Kinematics: A Modular Framework for Zero-Shot Robotic Manipulation

Arxiv

0+阅读 · 6月22日

Cloak: Zero-Shot Cross-Embodiment Manipulation by Masking the End-Effector from the VLA

Arxiv

0+阅读 · 6月22日

Zero-VC: Zero-Lookahead Streaming Voice Conversion via Speaker Anonymization

Arxiv

0+阅读 · 6月18日

Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning

Arxiv

0+阅读 · 6月17日

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Arxiv

0+阅读 · 6月17日

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

Arxiv

0+阅读 · 6月16日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

VIP会员

文章信息

相关主题

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

2+阅读 · 今天11:43

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

2+阅读 · 今天11:41

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

5+阅读 · 今天6:30

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

5+阅读 · 今天6:18

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

6+阅读 · 今天6:08

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

6+阅读 · 今天5:54

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

7+阅读 · 今天5:22

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

7+阅读 · 今天5:15

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

7+阅读 · 今天3:42

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

5+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

7+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

10+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

9+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

9+阅读 · 6月24日

相关VIP内容

NeurIPS 2025｜从层次化掩码的视角统一并增强 Graph Transformer

NeurIPS 2025｜从层次化掩码的视角统一并增强 Graph Transformer

专知会员服务

9+阅读 · 2025年11月13日

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

专知会员服务

21+阅读 · 2025年1月15日

[ICML2024] Spotlight|DAT：通过交互式注意力实现统一的多粒度文本检测

[ICML2024] Spotlight|DAT：通过交互式注意力实现统一的多粒度文本检测

专知会员服务

19+阅读 · 2024年6月26日

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

专知会员服务

12+阅读 · 2020年3月13日

【KDD2019|讲座推荐】零阶优化及其在数据挖掘和机器学习中对抗鲁棒性的应用研究进展：Recent Progress in Zeroth Order Optimization and Its Applications to Adversarial Robustness in Data Mining and Machine Learning

【KDD2019|讲座推荐】零阶优化及其在数据挖掘和机器学习中对抗鲁棒性的应用研究进展：Recent Progress in Zeroth Order Optimization and Its Applications to Adversarial Robustness in Data Mining and Machine Learning

专知会员服务

16+阅读 · 2019年12月6日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

地平线提出AFDet：首个Anchor free、NMS free的3D目标检测算法

地平线提出AFDet：首个Anchor free、NMS free的3D目标检测算法

CVer

10+阅读 · 2020年6月27日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

这套1600赞的NLP课程已开放，面向实战，视频代码都有丨资源

这套1600赞的NLP课程已开放，面向实战，视频代码都有丨资源

量子位

15+阅读 · 2019年7月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

新智元

12+阅读 · 2018年7月13日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【深度强化学习】深度强化学习揭秘

【深度强化学习】深度强化学习揭秘

产业智能官

21+阅读 · 2017年11月13日

【干货】神经机器翻译全流程解析，one-shot 和 zero-shot 学习成亮点

【干货】神经机器翻译全流程解析，one-shot 和 zero-shot 学习成亮点

新智元

10+阅读 · 2017年4月2日

相关论文

Agentic Collaborative Cognition for Zero-Shot 3D Understanding

Agentic Collaborative Cognition for Zero-Shot 3D Understanding

Arxiv

0+阅读 · 6月23日

ZeroGVC: Zero-Shot Generative Video Compression with Autoregressive Diffusion Priors

Arxiv

0+阅读 · 6月23日

Bridging Semantics and Kinematics: A Modular Framework for Zero-Shot Robotic Manipulation

Arxiv

0+阅读 · 6月22日

Cloak: Zero-Shot Cross-Embodiment Manipulation by Masking the End-Effector from the VLA

Arxiv

0+阅读 · 6月22日

Zero-VC: Zero-Lookahead Streaming Voice Conversion via Speaker Anonymization

Arxiv

0+阅读 · 6月18日

Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning

Arxiv

0+阅读 · 6月17日

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Arxiv

0+阅读 · 6月17日

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

Arxiv

0+阅读 · 6月16日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

相关基金

广义双随机相位编码系统中以QR码为载体的信息加密及无损恢复

国家自然科学基金

0+阅读 · 2015年12月31日

不同加工层次和不同时空尺度下无意识加工之间的相互作用

国家自然科学基金

0+阅读 · 2015年12月31日

5G极化码译码算法理论与实现关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于多准则场景缩减的“零停机”设备状态预测与维护方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

多约束协同的彩色夜视影像亚像素超分辨率重建

国家自然科学基金

1+阅读 · 2015年12月31日

多纹理多深度的3D视频码率控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

有限局部交换环与零化理想图

国家自然科学基金

0+阅读 · 2014年12月31日

千万自由度量级并行有限元模态和振动分析软件研发

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员