Exoskeletons for daily use by those with mobility impairments are being developed. They will require accurate and robust scene understanding systems. Current research has used vision to identify immediate terrain and geometric obstacles, however these approaches are constrained to detections directly in front of the user and are limited to classifying a finite range of terrain types (e.g., stairs, ramps and level-ground). This paper presents Exosense, a vision-centric scene understanding system which is capable of generating rich, globally-consistent elevation maps, incorporating both semantic and terrain traversability information. It features an elastic Atlas mapping framework associated with a visual SLAM pose graph, embedded with open-vocabulary room labels from a Vision-Language Model (VLM). The device's design includes a wide field-of-view (FoV) fisheye multi-camera system to mitigate the challenges introduced by the exoskeleton walking pattern. We demonstrate the system's robustness to the challenges of typical periodic walking gaits, and its ability to construct accurate semantically-rich maps in indoor settings. Additionally, we showcase its potential for motion planning -- providing a step towards safe navigation for exoskeletons.
翻译:为行动障碍者日常使用的外骨骼设备正在开发中。这类系统需要精确且鲁棒的场景理解能力。当前研究已利用视觉技术识别近端地形与几何障碍物,但这些方法仅局限于用户正前方的检测范围,且仅能对有限的地形类型(如楼梯、坡道和平地)进行分类。本文提出Exosense——一种视觉中心场景理解系统,能够生成富含语义与地形可穿越性信息的全局一致性高程地图。该系统采用与视觉SLAM位姿图关联的弹性Atlas地图框架,并嵌入来自视觉语言模型(VLM)的开集房间标签。设备设计包含宽视场鱼眼多相机系统,以缓解外骨骼步态模式带来的挑战。我们证明了系统对典型周期性步态挑战的鲁棒性,及其在室内环境中构建精确且语义丰富地图的能力。此外,我们展示了其在运动规划中的潜力——为外骨骼安全导航迈出关键一步。