Foundation models are a promising path toward general-purpose and user-friendly robots. The prevalent approach involves training a generalist policy that, like a reinforcement learning policy, uses observations to output actions. Although this approach has seen much success, several concerns arise when considering deployment and end-user interaction with these systems. In particular, the lack of modularity between tasks means that when model weights are updated (e.g., when a user provides feedback), the behavior in other, unrelated tasks may be affected. This can negatively impact the system's interpretability and usability. We present an alternative approach to the design of robot foundation models, Diffusion for Policy Parameters (DPP), which generates stand-alone, task-specific policies. Since these policies are detached from the foundation model, they are updated only when a user wants, either through feedback or personalization, allowing them to gain a high degree of familiarity with that policy. We demonstrate a proof-of-concept of DPP in simulation then discuss its limitations and the future of interpretable foundation models.
翻译:基础模型是实现通用且用户友好型机器人的一条有前景的路径。主流方法涉及训练一个通用策略,该策略类似于强化学习策略,利用观测信息输出动作。尽管这种方法已取得诸多成功,但在考虑这些系统的部署及最终用户交互时,仍存在若干值得关注的问题。特别是任务间缺乏模块化意味着当模型权重更新时(例如用户提供反馈时),其他无关任务的行为可能会受到影响。这可能对系统的可解释性与可用性产生负面影响。我们提出一种机器人基础模型设计的替代方案——策略参数扩散法,该方法生成独立的任务特定策略。由于这些策略与基础模型分离,它们仅在用户需要时(通过反馈或个性化)进行更新,从而使用户能够对该策略获得高度的熟悉感。我们在仿真环境中演示了DPP的概念验证,随后讨论了其局限性以及可解释基础模型的未来发展方向。