This paper proposes a neural network-based user simulator that can provide a multimodal interactive environment for training Reinforcement Learning (RL) agents in collaborative tasks involving multiple modes of communication. The simulator is trained on the existing ELDERLY-AT-HOME corpus and accommodates multiple modalities such as language, pointing gestures, and haptic-ostensive actions. The paper also presents a novel multimodal data augmentation approach, which addresses the challenge of using a limited dataset due to the expensive and time-consuming nature of collecting human demonstrations. Overall, the study highlights the potential for using RL and multimodal user simulators in developing and improving domestic assistive robots.
翻译:本文提出了一种基于神经网络的用户模拟器,可为涉及多种通信模式的协作任务提供多模态交互环境,用于训练强化学习(RL)智能体。该模拟器基于现有ELDERLY-AT-HOME语料库进行训练,支持语言、指向手势以及触觉-示能性动作等多种模态。此外,本文提出了一种新颖的多模态数据增强方法,以应对因人类演示数据收集成本高昂且耗时导致的有限数据集利用难题。总体而言,本研究凸显了强化学习与多模态用户模拟器在开发与优化家庭辅助机器人方面的潜力。