EgoMimic：通过第一人称视频扩展模仿学习 (EgoMimic: Scaling Imitation Learning via Egocentric Video) - 专知论文

会员服务 ·

0

Learning · 缩放 · 多样性 · 机器人 · Pair ·

2024 年 10 月 31 日

EgoMimic: Scaling Imitation Learning via Egocentric Video

翻译：EgoMimic：通过第一人称视频扩展模仿学习

Simar Kareer,Dhruv Patel,Ryan Punamiya,Pranay Mathur,Shuo Cheng,Chen Wang,Judy Hoffman,Danfei Xu

The scale and diversity of demonstration data required for imitation learning is a significant challenge. We present EgoMimic, a full-stack framework which scales manipulation via human embodiment data, specifically egocentric human videos paired with 3D hand tracking. EgoMimic achieves this through: (1) a system to capture human embodiment data using the ergonomic Project Aria glasses, (2) a low-cost bimanual manipulator that minimizes the kinematic gap to human data, (3) cross-domain data alignment techniques, and (4) an imitation learning architecture that co-trains on human and robot data. Compared to prior works that only extract high-level intent from human videos, our approach treats human and robot data equally as embodied demonstration data and learns a unified policy from both data sources. EgoMimic achieves significant improvement on a diverse set of long-horizon, single-arm and bimanual manipulation tasks over state-of-the-art imitation learning methods and enables generalization to entirely new scenes. Finally, we show a favorable scaling trend for EgoMimic, where adding 1 hour of additional hand data is significantly more valuable than 1 hour of additional robot data. Videos and additional information can be found at https://egomimic.github.io/

翻译：模仿学习所需演示数据的规模与多样性是一个重大挑战。我们提出了EgoMimic，这是一个全栈框架，通过人类具身数据（具体而言是结合了3D手部追踪的第一人称人类视频）来扩展操作任务的规模。EgoMimic通过以下方式实现这一目标：(1) 一个使用符合人体工程学的Project Aria眼镜来采集人类具身数据的系统，(2) 一个成本低廉的双臂机械手，其最小化了与人类数据之间的运动学差距，(3) 跨领域数据对齐技术，以及(4) 一种在人类数据和机器人数据上协同训练的模仿学习架构。与先前仅从人类视频中提取高层意图的工作相比，我们的方法将人类数据和机器人数据平等地视为具身演示数据，并从这两个数据源中学习一个统一的策略。在一系列多样化的长时程、单臂和双臂操作任务上，EgoMimic相比最先进的模仿学习方法取得了显著改进，并能泛化到全新的场景。最后，我们展示了EgoMimic一个有利的扩展趋势：增加1小时额外的手部数据比增加1小时额外的机器人数据价值显著更高。视频和更多信息可在 https://egomimic.github.io/ 找到。

0

相关内容

Learning

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

AgentMixer: Multi-Agent Correlated Policy Factorization

Arxiv

1+阅读 · 2024年12月11日

Banyan: Fast Rotating Leader BFT

Arxiv

1+阅读 · 2024年12月11日

Maya: An Instruction Finetuned Multilingual Multimodal Model

Arxiv

1+阅读 · 2024年12月10日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Arxiv

12+阅读 · 2022年2月10日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

CapsuleGAN: Generative Adversarial Capsule Network

Arxiv

10+阅读 · 2018年2月17日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

VIP会员

文章信息

相关主题

最新内容

美国对伊朗军事行动：弹药与反导

美国对伊朗军事行动：弹药与反导

专知会员服务

0+阅读 · 4分钟前

超越技术：伊朗冲突中的“战争方式”

超越技术：伊朗冲突中的“战争方式”

专知会员服务

11+阅读 · 4月1日

军事决策大语言模型综合评价基准

军事决策大语言模型综合评价基准

专知会员服务

8+阅读 · 4月1日

利用核国家战略互动博弈（SIGNAL）进行实验性兵棋推演

利用核国家战略互动博弈（SIGNAL）进行实验性兵棋推演

专知会员服务

9+阅读 · 4月1日

《美军混合航空器军用适航认证路线图》84页

《美军混合航空器军用适航认证路线图》84页

专知会员服务

7+阅读 · 4月1日

《ClaudeCode源码深度研究报告（增强完整版）》，下载链接

《ClaudeCode源码深度研究报告（增强完整版）》，下载链接

专知会员服务

17+阅读 · 4月1日

量子无人机与未来军事战争

量子无人机与未来军事战争

专知会员服务

12+阅读 · 4月1日

迈向医学人工智能科学家

迈向医学人工智能科学家

专知会员服务

12+阅读 · 4月1日

《美国陆军材料科学与工程创新与领导力发展：250年（从1775年到2025年）视角》

《美国陆军材料科学与工程创新与领导力发展：250年（从1775年到2025年）视角》

专知会员服务

10+阅读 · 4月1日

无人机尚未在乌克兰赢得战斗：西方考量

无人机尚未在乌克兰赢得战斗：西方考量

专知会员服务

12+阅读 · 3月31日

兵力设计流程需要五类兵棋推演：问题识别、应对概念、缺口识别、能力识别、潜在方案

兵力设计流程需要五类兵棋推演：问题识别、应对概念、缺口识别、能力识别、潜在方案

专知会员服务

14+阅读 · 3月31日

《北约信息征询书 2026网络空间作战发展与实验活动》

《北约信息征询书 2026网络空间作战发展与实验活动》

专知会员服务

8+阅读 · 3月31日

《海上反无人机：用于特种作战部队行动的多传感器融合框架》200页

《海上反无人机：用于特种作战部队行动的多传感器融合框架》200页

专知会员服务

17+阅读 · 3月31日

《用于持久电磁区域拒止与不可见空中雷场的综合技术框架》

《用于持久电磁区域拒止与不可见空中雷场的综合技术框架》

专知会员服务

11+阅读 · 3月31日

《军事网络数据包拦截技术研究》

《军事网络数据包拦截技术研究》

专知会员服务

12+阅读 · 3月31日

相关VIP内容

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

军事决策大语言模型综合评价基准

《美军混合航空器军用适航认证路线图》84页

超越技术：伊朗冲突中的“战争方式”

利用核国家战略互动博弈（SIGNAL）进行实验性兵棋推演

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

AgentMixer: Multi-Agent Correlated Policy Factorization

Arxiv

1+阅读 · 2024年12月11日

Banyan: Fast Rotating Leader BFT

Arxiv

1+阅读 · 2024年12月11日

Maya: An Instruction Finetuned Multilingual Multimodal Model

Arxiv

1+阅读 · 2024年12月10日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Arxiv

12+阅读 · 2022年2月10日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

CapsuleGAN: Generative Adversarial Capsule Network

Arxiv

10+阅读 · 2018年2月17日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员