Collaboration between human and robot requires effective modes of communication to assign robot tasks and coordinate activities. As communication can utilize different modalities, a multi-modal approach can be more expressive than single modal models alone. In this work we propose a co-speech gesture model that can assign robot tasks for human-robot collaboration. Human gestures and speech, detected by computer vision and speech recognition, can thus refer to objects in the scene and apply robot actions to them. We present an experimental evaluation of the multi-modal co-speech model with a real-world industrial use case. Results demonstrate that multi-modal communication is easy to achieve and can provide benefits for collaboration with respect to single modal tools.
翻译:人机协作需要有效的沟通模式来分配机器人任务并协调活动。由于沟通可利用不同模态,多模态方法相比单一模态模型更具表现力。本研究提出一种共语手势模型,可为人机协作分配机器人任务。通过计算机视觉和语音识别检测到的人类手势与言语,能够指向场景中的物体并对其施加机器人动作。我们通过真实工业应用场景对多模态共语模型进行了实验评估。结果表明,多模态沟通易于实现,且相较于单一模态工具能为协作带来优势。