Global localisation from visual data is a challenging problem applicable to many robotics domains. Prior works have shown that neural networks can be trained to map images of an environment to absolute camera pose within that environment, learning an implicit neural mapping in the process. In this work we evaluate the applicability of such an approach to real-world robotics scenarios, demonstrating that by constraining the problem to 2-dimensions and significantly increasing the quantity of training data, a compact model capable of real-time inference on embedded platforms can be used to achieve localisation accuracy of several centimetres. We deploy our trained model onboard a UGV platform, demonstrating its effectiveness in a waypoint navigation task, wherein it is able to localise with a mean accuracy of 9cm at a rate of 6fps running on the UGV onboard CPU, 35fps on an embedded GPU, or 220fps on a desktop GPU. Along with this work we will release a novel localisation dataset comprising simulated and real environments, each with training samples numbering in the tens of thousands.
翻译:基于视觉数据的全局定位是适用于众多机器人领域的挑战性问题。先前研究表明,可通过训练神经网络将环境图像映射至该环境内的绝对相机位姿,在此过程中学习隐式神经映射。本研究评估了此类方法在真实机器人场景中的适用性,证明通过将问题约束在二维空间并显著增加训练数据量,可在嵌入式平台实现具备实时推理能力的紧凑模型,达到数厘米级定位精度。我们在无人地面车辆(UGV)平台上部署训练模型,验证其在航点导航任务中的有效性。该模型在UGV板载CPU上实现平均9厘米的定位精度与6帧/秒的处理速率,在嵌入式GPU上达35帧/秒,在桌面级GPU上达220帧/秒。本研究同时将发布包含模拟环境与真实环境的新型定位数据集,每个环境均包含数万个训练样本。