We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images. The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot and estimate 2D keypoints defined at distinctive positions of the 3D robot model. Robot keypoint detections are synchronized and fused on a central backend, where the robot's pose is estimated via multi-view minimization of reprojection errors. Through the pose estimation from external cameras, the robot's localization can be initialized in an allocentric map from a completely unknown state (kidnapped robot problem) and robustly tracked over time. We conduct a series of experiments evaluating the accuracy and robustness of the camera-based pose estimation compared to the robot's internal navigation stack, showing that our camera-based method achieves pose errors below 3 cm and 1{\deg} and does not drift over time, as the robot is localized allocentrically. With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model. We show a real-world application, where observations from mobile robot and static smart edge sensors are fused to collaboratively build a 3D semantic map of a $\sim$240 m$^2$ indoor environment.
翻译:我们提出了一种方法,利用多视角RGB图像估算移动机器人相对于静态摄像头网络异我中心坐标系的位姿。图像由智能边缘传感器上的深度神经网络在线本地处理,以检测机器人并估计在三维机器人模型独特位置定义的二维关键点。机器人关键点检测结果在中央后端进行同步与融合,并通过多视角重投影误差最小化估算机器人位姿。通过外部摄像头的位姿估计,机器人可在完全未知状态下(“被绑架机器人”问题)初始化异我中心地图中的定位,并实现长时间鲁棒跟踪。我们开展了一系列实验,评估基于摄像头的位姿估计与机器人内部导航堆栈相比的精度与鲁棒性。结果表明,我们的基于摄像头的方法可在机器人被异我中心定位时实现低于3厘米和1°的位姿误差,且不随时间漂移。在精确估算机器人位姿后,其观测结果可融合至异我中心场景模型中。我们展示了一个真实应用案例:移动机器人与静态智能边缘传感器的观测数据被融合,以协同构建约240平方米室内环境的三维语义地图。