Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assistance. This raises the question: \textit{In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots, i.e., the viability of the `linguomotor` mode for assistive robots?} This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly. We have designed and conducted experiments on a Sawyer cobot to support our arguments. A Turtlebot robot case is designed to demonstrate the adaptation of the solution to scenarios where assistive robots need to maneuver to assist. Codes will be released on GitHub soon to benefit the community.
翻译:辅助机器人因其提升老年人等弱势群体生活质量的潜力而受到广泛关注。计算机视觉、大语言模型与机器人技术的融合催生了辅助机器人的"视觉-语言-运动"模式,该模式将视觉与语言信息整合至辅助机器人中,以实现主动交互式辅助。这引出一个关键问题:\textit{在视觉信息不可靠或缺失的情况下,我们能否仅依靠语言控制机器人?即辅助机器人的"语言-运动"模式是否可行?} 本研究通过以下工作初步探索该问题:1) 评估辅助机器人对不同粒度语言指令的响应能力;2) 探究动态控制机器人的必要性与可行性。我们在Sawyer协作机器人上设计并开展了实验以验证观点,同时设计了Turtlebot机器人案例来展示该方案在辅助机器人需移动执行任务场景中的适应性。相关代码即将在GitHub开源以惠及学术界。