Reliable obstacle avoidance in industrial settings demands 3D scene understanding, but widely used 2D LiDAR sensors perceive only a single horizontal slice of the environment, missing critical obstacles above or below the scan plane. We present a teacher-student framework for vision-based mobile robot navigation that eliminates the need for LiDAR sensors. A teacher policy trained via Proximal Policy Optimization (PPO) in NVIDIA Isaac Lab leverages privileged 2D LiDAR observations that account for the full robot footprint to learn robust navigation. The learned behavior is distilled into a student policy that relies solely on monocular depth maps predicted by a fine-tuned Depth Anything V2 model from four RGB cameras. The complete inference pipeline, comprising monocular depth estimation (MDE), policy execution, and motor control, runs entirely onboard an NVIDIA Jetson Orin AGX mounted on a DJI RoboMaster platform, requiring no external computation for inference. In simulation, the student achieves success rates of 82-96.5%, consistently outperforming the standard 2D LiDAR teacher (50-89%). In real-world experiments, the MDE-based student outperforms the 2D LiDAR teacher when navigating around obstacles with complex 3D geometries, such as overhanging structures and low-profile objects, that fall outside the single scan plane of a 2D LiDAR.
翻译:工业场景中可靠的避障需要三维场景理解,但广泛使用的二维激光雷达传感器仅能感知环境的单一水平切片,会遗漏扫描平面上方或下方的关键障碍物。本文提出一种基于视觉的移动机器人导航师生框架,无需激光雷达传感器。在NVIDIA Isaac Lab中通过近端策略优化(PPO)训练的教师策略利用特权二维激光雷达观测(考虑机器人完整足迹)来学习鲁棒导航。习得的行为被蒸馏至学生策略,该策略仅依赖四个RGB相机通过微调Depth Anything V2模型预测的单目深度图。完整的推理流程——包括单目深度估计(MDE)、策略执行和电机控制——完全在搭载于大疆RoboMaster平台的NVIDIA Jetson Orin AGX上运行,无需外部计算支持。在仿真环境中,学生策略的成功率达到82-96.5%,持续优于标准二维激光雷达教师策略(50-89%)。在真实世界实验中,当导航场景涉及复杂三维几何的障碍物(如悬垂结构和低矮物体)时,基于MDE的学生策略表现优于二维激光雷达教师策略,因为这些障碍物处于二维激光雷达单一扫描平面之外。