Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.
翻译:精确感知关节物体对于增强服务机器人的能力至关重要。近期研究主要集中于点云这一单模态方法,往往忽略了关键的纹理和光照细节,并假设了理想条件(如最优观测视角),这与真实场景不符。为克服这些局限,我们提出了MARS——一种用于关节物体特性分析的新型框架。该框架包含一个多模态融合模块,利用多尺度RGB特征增强点云特征,并结合基于强化学习的主动感知技术,以自主优化观测视角。在PartNet-Mobility数据集中各类关节物体实例上进行的实验中,我们的方法在关节参数估计精度上超越了当前最先进的方法。此外,通过主动感知,MARS进一步降低了误差,展现出在处理次优视角时更高的效率。进一步地,我们的方法能有效泛化至真实世界的关节物体,从而提升机器人的交互能力。代码发布于https://github.com/robhlzeng/MARS。