Real-time tracking of previously unseen, highly dynamic objects in contact-rich scenes, such as during dexterous in-hand manipulation, remains a major challenge. Pure vision-based approaches often fail under heavy occlusions due to frequent contact interactions and motion blur caused by abrupt impacts. We propose Twintrack, a physics-aware perception system that enables robust, real-time 6-DoF pose tracking of unknown dynamic objects in contact-rich scenes by leveraging contact physics cues. At its core, Twintrack integrates Real2Sim and Sim2Real. Real2Sim combines vision and contact physics to jointly estimate object geometry and physical properties: an initial reconstruction is obtained from vision, then refined by learning a geometry residual and simultaneously estimating physical parameters (e.g., mass, inertia, and friction) based on contact dynamics consistency. Sim2Real achieves robust pose estimation by adaptively fusing a visual tracker with predictions from the updated contact dynamics. Twintrack is implemented on a GPU-accelerated, customized MJX engine to guarantee real-time performance. We evaluate our method on two contact-rich scenarios: object falling with environmental contacts and multi-fingered in-hand manipulation. Results show that, compared to baselines, Twintrack delivers significantly more robust, accurate, and real-time tracking in these challenging settings, with tracking speeds above 20 Hz. Project page: https://irislab.tech/TwinTrack-webpage/
翻译:在接触丰富的场景(如灵巧手内操作)中,对先前未见、高度动态的物体进行实时追踪,仍然是一个重大挑战。纯基于视觉的方法在频繁接触交互导致的严重遮挡以及突发冲击引起的运动模糊下常常失效。我们提出了TwinTrack,一种物理感知的感知系统,通过利用接触物理线索,实现对接触丰富场景中未知动态物体的鲁棒、实时六自由度姿态追踪。其核心在于整合了Real2Sim与Sim2Real。Real2Sim结合视觉与接触物理来联合估计物体几何与物理属性:首先从视觉获得初始重建,然后通过学习几何残差并基于接触动力学一致性同时估计物理参数(如质量、惯性和摩擦)来对其进行细化。Sim2Real通过自适应融合视觉跟踪器与更新后接触动力学的预测,实现鲁棒的姿态估计。TwinTrack在GPU加速、定制的MJX引擎上实现,以保证实时性能。我们在两种接触丰富的场景上评估了我们的方法:物体坠落伴随环境接触和多指手内操作。结果表明,与基线方法相比,TwinTrack在这些具有挑战性的场景中提供了显著更鲁棒、更准确且实时的追踪,追踪速度超过20 Hz。项目页面:https://irislab.tech/TwinTrack-webpage/