In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs
翻译:在本工作中,我们提出了一种利用光学触觉传感器监督3D高斯泼溅(3DGS)场景的新方法。光学触觉传感器在机器人操控与物体表示领域已得到广泛应用,然而原始光学触觉传感器数据无法直接用于监督3DGS场景。我们的表征方法借助高斯过程隐式曲面(Gaussian Process Implicit Surface)隐式表示物体,将多次触碰融合为具有不确定性的统一表示。我们将该模型与单目深度估计网络相结合,通过两阶段对齐流程——先与深度相机粗略对齐,再与触觉数据精细调整——实现模型融合。针对每张训练图像,我们的方法生成相应的融合深度与不确定性图。利用这一附加信息,我们提出了一种新的损失函数——方差加权深度监督损失,用于训练3DGS场景模型。我们借助DenseTact光学触觉传感器与RealSense RGB-D相机证明,在不透明物体及反射/透明物体的少视图场景合成中,以这种方式融合触觉与视觉信息相较单独使用视觉或触觉能获得量化和质化均更优的结果。详情请参见项目主页:http://armlabstanford.github.io/touch-gs