EgoGen: An Egocentric Synthetic Data Generator

Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural human movements and behaviors that effectively steer the embodied cameras to capture a faithful egocentric representation of the 3D world. To address this challenge, we introduce EgoGen, a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks. At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage reinforcement learning approach, our motion synthesis model offers a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly coupled. Compared to previous works, our model eliminates the need for a pre-defined global path, and is directly applicable to dynamic environments. Combined with our easy-to-use and scalable data generation pipeline, we demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. EgoGen will be fully open-sourced, offering a practical solution for creating realistic egocentric training data and aiming to serve as a useful tool for egocentric computer vision research. Refer to our project page: https://ego-gen.github.io/.

翻译：理解第一人称视角的世界是增强现实（AR）的基础。与第三人称视角相比，这种沉浸式视角带来了剧烈的视觉变化和独特的挑战。合成数据已赋能第三人称视觉模型，但其在具身自我中心感知任务中的应用仍鲜有探索。一个关键挑战在于模拟自然的人类运动和行为，从而有效引导具身摄像机捕捉3D世界的真实自我中心表征。为解决这一挑战，我们提出了EgoGen——一种新型合成数据生成器，能够为自我中心感知任务生成精确且丰富的真实训练数据。EgoGen的核心是一种新颖的人体运动合成模型，该模型直接利用虚拟人类的自我中心视觉输入来感知3D环境。结合避碰运动基元和两阶段强化学习方法，我们的运动合成模型提供了一种闭环解决方案，其中虚拟人类的具身感知与运动无缝耦合。与先前工作相比，该模型无需预定义全局路径，且可直接适用于动态环境。结合易于使用且可扩展的数据生成管线，我们在三个任务中展示了EgoGen的有效性：头戴摄像头的建图与定位、自我中心摄像头跟踪以及从自我中心视角恢复人体网格。EgoGen将完全开源，为生成逼真的自我中心训练数据提供实用解决方案，并旨在成为自我中心计算机视觉研究的有用工具。更多信息请参见项目页面：https://ego-gen.github.io/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日