We present VERF, a collection of two methods (VERF-PnP and VERF-Light) for providing runtime assurance on the correctness of a camera pose estimate of a monocular camera without relying on direct depth measurements. We leverage the ability of NeRF (Neural Radiance Fields) to render novel RGB perspectives of a scene. We only require as input the camera image whose pose is being estimated, an estimate of the camera pose we want to monitor, and a NeRF model containing the scene pictured by the camera. We can then predict if the pose estimate is within a desired distance from the ground truth and justify our prediction with a level of confidence. VERF-Light does this by rendering a viewpoint with NeRF at the estimated pose and estimating its relative offset to the sensor image up to scale. Since scene scale is unknown, the approach renders another auxiliary image and reasons over the consistency of the optical flows across the three images. VERF-PnP takes a different approach by rendering a stereo pair of images with NeRF and utilizing the Perspective-n-Point (PnP) algorithm. We evaluate both methods on the LLFF dataset, on data from a Unitree A1 quadruped robot, and on data collected from Blue Origin's sub-orbital New Shepard rocket to demonstrate the effectiveness of the proposed pose monitoring method across a range of scene scales. We also show monitoring can be completed in under half a second on a 3090 GPU.
翻译:我们提出VERF方法,包含两种技术(VERF-PnP和VERF-Light),用于在无需直接深度测量的条件下,对单目相机位姿估计的正确性提供运行时保障。该方法利用神经辐射场(NeRF)渲染场景新视角RGB图像的能力,仅需输入待估计位姿的相机图像、待监控的相机位姿估计值,以及包含该相机场景的NeRF模型即可运行。通过该方法,我们能够预测位姿估计是否处于与真值相距的期望误差范围内,并以置信度水平支撑预测结论。VERF-Light通过在估计位姿处渲染NeRF视角图像,并与传感器图像进行相对偏移估计(仅解算至尺度模糊)实现监控。由于场景尺度未知,该方法额外渲染辅助图像,通过三联图像光流一致性推理实现尺度约束。VERF-PnP则采用不同策略:利用NeRF渲染立体图像对,并应用透视n点(PnP)算法求解。我们在LLFF数据集、宇树A1四足机器人采集数据,以及蓝色起源公司亚轨道新谢泼德火箭采集数据上对两种方法进行评测,验证了所提出位姿监控方法在跨场景尺度下的有效性。实验表明,在3090 GPU上单次监控可在0.5秒内完成。