Reliable obstacle avoidance in industrial settings demands 3D scene understanding, but widely used 2D LiDAR sensors perceive only a single horizontal slice of the environment, missing critical obstacles above or below the scan plane. We present a teacher-student framework for vision-based mobile robot navigation that eliminates the need for LiDAR sensors. A teacher policy trained via Proximal Policy Optimization (PPO) in NVIDIA Isaac Lab leverages privileged 2D LiDAR observations that account for the full robot footprint to learn robust navigation. The learned behavior is distilled into a student policy that relies solely on monocular depth maps predicted by a fine-tuned Depth Anything V2 model from four RGB cameras. The complete inference pipeline, comprising monocular depth estimation (MDE), policy execution, and motor control, runs entirely onboard an NVIDIA Jetson Orin AGX mounted on a DJI RoboMaster platform, requiring no external computation for inference. In simulation, the student achieves success rates of 82-96.5%, consistently outperforming the standard 2D LiDAR teacher (50-89%). In real-world experiments, the MDE-based student outperforms the 2D LiDAR teacher when navigating around obstacles with complex 3D geometries, such as overhanging structures and low-profile objects, that fall outside the single scan plane of a 2D LiDAR.
翻译:工业环境中可靠的障碍物规避需要三维场景理解,但广泛使用的二维激光雷达传感器仅能感知环境的单一水平截面,从而遗漏扫描平面之上或之下的关键障碍物。我们提出了一种基于视觉的移动机器人导航师生框架,无需激光雷达传感器。教师策略通过NVIDIA Isaac Lab中的近端策略优化(PPO)训练,利用考虑完整机器人足迹的特权二维激光雷达观测学习鲁棒导航。学习到的行为被提炼为仅依赖来自四个RGB相机的微调Depth Anything V2模型预测的单目深度图的学生策略。完整的推理流水线(包括单目深度估计、策略执行和电机控制)完全在搭载于DJI RoboMaster平台上的NVIDIA Jetson Orin AGX上运行,无需外部计算进行推理。在仿真中,学生策略的成功率达到82-96.5%,始终优于标准二维激光雷达教师策略(50-89%)。在真实世界实验中,基于MDE的学生策略在绕过具有复杂三维几何结构(如悬空结构和低矮物体)且超出二维激光雷达单一扫描平面的障碍物时,性能优于二维激光雷达教师策略。