Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric videos by distilling per-object inverse-physics solutions into a compact codebook, enabling prediction of dense spring stiffness fields for unseen objects without per-spring test-time optimization. Trained with generalizable priors from diverse egocentric interactions, EgoPhys outperforms baselines in reconstruction, future prediction, and zero-shot generalization. To support training and evaluation, we curate an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. We deploy EgoPhys on a real xArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning, highlighting egocentric RGB observations as a scalable path toward real-to-sim pipelines.
翻译:人类通过日常交互本能地理解物体物理属性,但准确预测弹性材料、织物等复杂可变形动力学行为仍是计算机视觉与机器人领域的重大挑战。本文提出EgoPhys框架,该框架利用通用先验从仅含RGB信息的第一人称视频构建可变形物理数字孪生。通过将逐物体逆物理求解结果蒸馏为紧凑码本,EgoPhys为未见物体预测密集弹簧刚度场且无需逐弹簧测试时优化,从而突破现有方法局限,实现从第一人称视频生成可控可变形数字孪生。经多样化的第一人称交互通用先验训练后,EgoPhys在重建、未来预测及零样本泛化方面均优于基线方法。为支撑训练与评估,我们构建了涵盖多样化可变形物体、场景及操作风格的第一人称交互数据集。将EgoPhys部署于真实xArm6机器人实验表明,从单段第一人称人类操作视频初始化的数字孪生可作为内部世界表征,辅助可变形物体规划任务,验证了第一人称RGB观测实现真实到仿真流水线的可扩展路径。