Comprehensive perception of human beings is the prerequisite to ensure the safety of human-robot interaction. Currently, prevailing visual sensing approach typically involves a single static camera, resulting in a restricted and occluded field of view. In our work, we develop an active vision system using multiple cameras to dynamically capture multi-source RGB-D data. An integrated human sensing strategy based on a hierarchically connected tree structure is proposed to fuse localized visual information. Constituting the tree model are the nodes representing keypoints and the edges representing keyparts, which are consistently interconnected to preserve the structural constraints during multi-source fusion. Utilizing RGB-D data and HRNet, the 3D positions of keypoints are analytically estimated, and their presence is inferred through a sliding widow of confidence scores. Subsequently, the point clouds of reliable keyparts are extracted by drawing occlusion-resistant masks, enabling fine registration between data clouds and cylindrical model following the hierarchical order. Experimental results demonstrate that our method enhances keypart recognition recall from 69.20% to 90.10%, compared to employing a single static camera. Furthermore, in overcoming challenges related to localized and occluded perception, the robotic arm's obstacle avoidance capabilities are effectively improved.
翻译:全面感知人体是保障人机交互安全的前提。当前主流的视觉感知方法通常采用单一静态相机,导致视野受限且存在遮挡问题。本研究开发了一套基于多相机的主动视觉系统,可动态捕获多源RGB-D数据。我们提出了一种基于层级连接树结构的集成化人体感知策略,用于融合局部视觉信息。该树模型由代表关键点的节点和代表关键部位的边构成,在多方数据融合过程中通过节点与边的持续互联保持结构约束。利用RGB-D数据与HRNet,我们分析估计出关键点的三维位置,并通过滑动窗口置信度推理其存在状态。随后,基于抗遮挡掩膜提取可靠关键部位的点云数据,实现数据点云与圆柱体模型按层级顺序的精确配准。实验结果表明,与采用单一静态相机相比,本方法将关键部位识别召回率从69.20%提升至90.10%。此外,在应对局部感知与遮挡挑战时,机械臂的避障能力得到显著增强。