We introduce a Generalizable Neural Radiance Field approach for predicting 3D workspace occupancy from egocentric robot observations. Unlike prior methods operating in camera-centric coordinates, our model constructs occupancy representations in a global workspace frame, making it directly applicable to robotic manipulation. The model integrates flexible source views and generalizes to unseen object arrangements without scene-specific finetuning. We demonstrate the approach on a humanoid robot and evaluate predicted geometry against 3D sensor ground truth. Trained on 40 real scenes, our model achieves 26mm reconstruction error, including occluded regions, validating its ability to infer complete 3D occupancy beyond traditional stereo vision methods.
翻译:我们提出了一种可泛化神经辐射场方法,用于从机器人第一视角观测中预测三维工作空间占据状态。与先前在相机坐标系中运行的方法不同,我们的模型在全局工作空间坐标系中构建占据表示,使其可直接应用于机器人操作任务。该模型整合了灵活的源视图,并能在未经场景特定微调的情况下泛化至未见过的物体布局。我们在人形机器人平台上验证了该方法,并通过与三维传感器真值对比评估了预测几何的精度。在40个真实场景上训练后,我们的模型实现了26毫米的重建误差(包含遮挡区域),验证了其超越传统立体视觉方法推断完整三维占据状态的能力。