Human-robot walking with prosthetic legs and exoskeletons, especially over complex terrains such as stairs, remains a significant challenge. Egocentric vision has the unique potential to detect the walking environment prior to physical interactions, which can improve transitions to and from stairs. This motivated us to create the StairNet initiative to support the development of new deep learning models for visual sensing and recognition of stairs, with an emphasis on lightweight and efficient neural networks for onboard real-time inference. In this study, we present an overview of the development of our large-scale dataset with over 515,000 manually labeled images, as well as our development of different deep learning models (e.g., 2D and 3D CNN, hybrid CNN and LSTM, and ViT networks) and training methods (e.g., supervised learning with temporal data and semi-supervised learning with unlabeled images) using our new dataset. We consistently achieved high classification accuracy (i.e., up to 98.8%) with different designs, offering trade-offs between model accuracy and size. When deployed on mobile devices with GPU and NPU accelerators, our deep learning models achieved inference speeds up to 2.8 ms. We also deployed our models on custom-designed CPU-powered smart glasses. However, limitations in the embedded hardware yielded slower inference speeds of 1.5 seconds, presenting a trade-off between human-centered design and performance. Overall, we showed that StairNet can be an effective platform to develop and study new visual perception systems for human-robot locomotion with applications in exoskeleton and prosthetic leg control.
翻译:摘要:使用假肢和外骨骼进行人机协同行走,尤其在楼梯等复杂地形上的行走,仍是一项重大挑战。第一人称视角视觉具有独特潜力,能在物理交互前检测行走环境,从而改善上下楼梯的过渡过程。这促使我们创建StairNet项目,以支持用于楼梯视觉感知与识别的新型深度学习模型开发,重点设计轻量高效的神经网络以实现机载实时推理。在本研究中,我们概述了大尺度数据集(包含超过51.5万张人工标注图像)的构建过程,以及基于该数据集开发的不同深度学习模型(例如2D/3D CNN、混合CNN-LSTM、ViT网络)和训练方法(例如带时序数据的监督学习、利用未标注图像的半监督学习)。我们通过不同设计方案持续取得了高分类精度(最高达98.8%),并在模型精度与尺寸之间实现了平衡。当部署于配备GPU和NPU加速器的移动设备时,我们的深度学习模型推理速度可达2.8毫秒。我们还将模型部署于定制设计的CPU驱动智能眼镜上,但嵌入式硬件的局限性导致推理速度较慢(1.5秒),在以人为本的设计与性能之间形成了权衡。总体而言,StairNet可成为开发和研究面向人机行走(应用于外骨骼和假肢控制)的新型视觉感知系统的有效平台。