We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at https://armlabstanford.github.io/next-best-sense.
翻译:我们提出了一种利用3D高斯溅射(3DGS)实现机器人操作器主动式最佳下一视角与触觉选择的框架。3DGS作为一种显式三维场景表示方法,因其能够以照片级真实感和几何精确性表征场景,正逐渐成为机器人领域实用的技术。然而,在实际在线机器人场景中,受效率要求限制可获取的视角数量有限,此时随机视角选择对于3DGS而言往往因视角重叠冗余而变得不切实际。针对此问题,我们提出了端到端的在线训练与主动视角选择流程,显著提升了3DGS在少视角机器人场景中的性能。我们首先通过创新的语义深度对齐方法——结合Segment Anything Model 2(SAM2)并辅以皮尔逊深度与表面法向损失——提升了少样本3DGS在真实场景中的颜色与深度重建质量。随后,我们将面向3DGS的最佳下一视角选择方法FisherRF扩展为基于深度不确定性的视角与触觉位姿选择系统。我们在真实机器人系统上实现了3DGS实时训练期间的在线视角选择。我们论证了少样本高斯溅射场景的改进方案,并将基于深度的FisherRF方法扩展至该场景,在具有挑战性的机器人场景中展示了定性与定量的双重提升。更多信息请访问项目页面:https://armlabstanford.github.io/next-best-sense。