In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs
翻译:本文提出了一种使用光学触觉传感器监督三维高斯泼溅(3DGS)场景的新方法。光学触觉传感器在机器人操作与物体表征领域已广泛应用,然而原始光学触觉传感器数据并不适合直接监督3DGS场景。我们提出的表征方法利用高斯过程隐式曲面隐式表达物体,将多次接触融合为具有不确定性的统一表征。我们将该模型与单目深度估计网络相结合,通过两阶段对齐过程:先与深度相机粗略对齐,再精细调整以匹配触觉数据。对于每张训练图像,我们的方法都能生成对应的融合深度图与不确定性图。利用这一额外信息,我们提出了一种新的损失函数——方差加权深度监督损失,用于训练3DGS场景模型。我们采用DenseTact光学触觉传感器与RealSense RGB-D相机进行实验,结果表明,在不透明物体以及反射与透明物体的少视图场景合成任务中,这种触觉-视觉融合方法在定量与定性上均优于仅依赖视觉或触觉的方法。详情请参见项目页面:http://armlabstanford.github.io/touch-gs