Reconstructing real-world objects and estimating their movable joint structures are pivotal technologies within the field of robotics. Previous research has predominantly focused on supervised approaches, relying on extensively annotated datasets to model articulated objects within limited categories. However, this approach falls short of effectively addressing the diversity present in the real world. To tackle this issue, we propose a self-supervised interaction perception method, referred to as SM$^3$, which leverages multi-view RGB images captured before and after interaction to model articulated objects, identify the movable parts, and infer the parameters of their rotating joints. By constructing 3D geometries and textures from the captured 2D images, SM$^3$ achieves integrated optimization of movable part and joint parameters during the reconstruction process, obviating the need for annotations. Furthermore, we introduce the MMArt dataset, an extension of PartNet-Mobility, encompassing multi-view and multi-modal data of articulated objects spanning diverse categories. Evaluations demonstrate that SM$^3$ surpasses existing benchmarks across various categories and objects, while its adaptability in real-world scenarios has been thoroughly validated.
翻译:真实世界物体的重建及其可动关节结构的估计是机器人领域的关键技术。以往研究主要依赖监督方法,借助大量标注数据集对有限类别内的铰接物体进行建模。然而,该方法难以有效应对现实世界中存在的多样性。为解决这一问题,我们提出一种名为SM$^3$的自监督交互感知方法,该方法利用交互前后采集的多视角RGB图像对铰接物体进行建模,识别可动部件并推断其旋转关节参数。通过从捕获的二维图像中构建三维几何与纹理,SM$^3$在重建过程中实现了可动部件与关节参数的联合优化,无需任何标注。此外,我们引入了MMArt数据集,它是PartNet-Mobility的扩展,包含涵盖多种类别的铰接物体的多视角与多模态数据。评估结果表明,SM$^3$在不同类别与物体上均超越了现有基准方法,其在真实场景中的适用性也得到了充分验证。