Software architectures for conversational robots typically consist of multiple modules, each designed for a particular processing task or functionality. Some of these modules are developed for the purpose of making decisions about the next action that the robot ought to perform in the current context. Those actions may relate to physical movements, such as driving forward or grasping an object, but may also correspond to communicative acts, such as asking a question to the human user. In this position paper, we reflect on the organization of those decision modules in human-robot interaction platforms. We discuss the relative benefits and limitations of modular vs. end-to-end architectures, and argue that, despite the increasing popularity of end-to-end approaches, modular architectures remain preferable when developing conversational robots designed to execute complex tasks in collaboration with human users. We also show that most practical HRI architectures tend to be either robot-centric or dialogue-centric, depending on where developers wish to place the ``command center'' of their system. While those design choices may be justified in some application domains, they also limit the robot's ability to flexibly interleave physical movements and conversational behaviours. We contend that architectures placing ``action managers'' and ``interaction managers'' on an equal footing may provide the best path forward for future human-robot interaction systems.
翻译:对话机器人的软件架构通常由多个模块组成,每个模块专司特定的处理任务或功能。其中部分模块旨在决定机器人在当前情境下应执行的下一动作。这些动作可能涉及物理移动(如前进或抓取物体),也可能对应通信行为(例如向人类用户提问)。在本立场论文中,我们反思了人机交互平台中这些决策模块的组织方式。我们探讨了模块化架构与端到端架构的相对优势与局限,并主张尽管端到端方法日益流行,但在开发与人类用户协作完成复杂任务的对话机器人时,模块化架构仍更具优势。我们还指出,大多数实用的人机交互架构往往要么以机器人为中心,要么以对话为中心,这取决于开发者希望将系统的"指挥中心"置于何处。虽然这些设计选择在某些应用领域具有合理性,但它们也限制了机器人灵活交错执行物理运动与对话行为的能力。我们主张,将"动作管理器"与"交互管理器"置于平等地位的架构,可能为未来人机交互系统提供最佳发展方向。