Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.
翻译:机器人必须同时从人们的行为和语言中学习,但单一模态往往存在不足:物理纠正确有实际依据但意图模糊,而语言虽能表达高层次目标却缺乏物理基础。我们提出QuickLAP(快速语言-行为偏好学习)——一种贝叶斯框架,通过融合物理反馈和语言反馈实时推断奖励函数。其核心见解在于将语言视为对用户潜在偏好的概率观测,从而阐明哪些奖励特征至关重要以及物理纠正应如何被解读。QuickLAP利用大语言模型从自由形式的话语中提取奖励特征注意力掩码和偏好偏移,并通过闭式更新规则将其与物理反馈整合。这实现了快速、实时且鲁棒的奖励学习,能够处理模糊反馈。在半自主驾驶模拟器中,QuickLAP相较于纯物理及启发式多模态基线方法,将奖励学习误差降低超过70%。一项包含15名参与者的用户研究进一步验证了该方法的有效性:参与者认为QuickLAP更易理解且更具协作性,并更偏好其学习到的行为而非基线方法。代码已开源:https://github.com/MIT-CLEAR-Lab/QuickLAP