In this paper, we present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR). Our primary objective is to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings. Participants engaged in various conversational scenarios, all based on referential communication tasks. The dataset provides a rich set of multimodal recordings such as motion capture, speech, gaze, and scene graphs. This comprehensive dataset aims to enhance the understanding and development of gesture generation models in 3D scenes by providing diverse and contextually rich data.
翻译:本文提出了一种新颖的数据集,该数据集通过VR头戴设备记录参与者在物理模拟器(AI2-THOR)内的对话。我们的主要目标是通过在指涉性场景中融入丰富的上下文信息,来拓展伴随语音的手势生成研究领域。参与者参与了多种对话场景,这些场景均基于指涉性交流任务。该数据集提供了丰富的多模态记录,如动作捕捉、语音、视线和场景图。通过提供多样化且上下文丰富的数据,这一综合性数据集旨在增进对3D场景中手势生成模型的理解与开发。