Virtualizing the physical world into virtual models has been a critical technique for robot navigation and planning in the real world. To foster manipulation with articulated objects in everyday life, this work explores building articulation models of indoor scenes through a robot's purposeful interactions in these scenes. Prior work on articulation reasoning primarily focuses on siloed objects of limited categories. To extend to room-scale environments, the robot has to efficiently and effectively explore a large-scale 3D space, locate articulated objects, and infer their articulations. We introduce an interactive perception approach to this task. Our approach, named Ditto in the House, discovers possible articulated objects through affordance prediction, interacts with these objects to produce articulated motions, and infers the articulation properties from the visual observations before and after each interaction. It tightly couples affordance prediction and articulation inference to improve both tasks. We demonstrate the effectiveness of our approach in both simulation and real-world scenes. Code and additional results are available at https://ut-austin-rpl.github.io/HouseDitto/
翻译:将物理世界虚拟化为虚拟模型一直是机器人导航和真实世界规划的关键技术。为促进日常生活中可活动物体的操控,本研究探索通过机器人在室内场景中的目的性交互,构建这些场景的关节模型。此前关于关节推理的工作主要聚焦于有限类别的孤立物体。为扩展到房间级环境,机器人需要高效探索大规模三维空间、定位可活动物体并推断其关节属性。我们提出一种交互式感知方法来解决该任务。该方法名为"家里的小叮当"(Ditto in the House),通过可操作性预测发现潜在的可活动物体,与这些物体交互产生关节运动,并根据每次交互前后的视觉观测推断关节属性。该方法紧密耦合可操作性预测与关节推理,以提升两项任务的性能。我们通过仿真和真实场景验证了该方法的有效性。代码及更多结果见 https://ut-austin-rpl.github.io/HouseDitto/