To the Noise and Back: Diffusion for Shared Autonomy

Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the user's policy -- assumptions that are unrealistic in many domains. Recent works relax some of these assumptions by formulating shared autonomy with model-free deep reinforcement learning (RL). In particular, they no longer need knowledge of the goal space (e.g., that the goals are discrete or constrained) or environment dynamics. However, they need knowledge of a task-specific reward function to train the policy. Unfortunately, such reward specification can be a difficult and brittle process. On top of that, the formulations inherently rely on human-in-the-loop training, and that necessitates them to prepare a policy that mimics users' behavior. In this paper, we present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our approach does not assume known environment dynamics or the space of user goals, and in contrast to previous work, it does not require any reward feedback, nor does it require access to the user's policy during training. Instead, our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution. Crucially, we show that it is possible to carry out this process in a manner that preserves the user's control authority. We evaluate our framework on a series of challenging continuous control tasks, and analyze its ability to effectively correct user actions while maintaining their autonomy.

翻译：共享自主是一种操作概念，其中用户与自主代理协同控制机器人系统。在许多场景中，该概念相较于完全遥操作和完全自主这两种极端情况具有诸多优势。传统共享自主方法依赖于对环境动力学的了解、预先已知的用户目标离散空间，或对用户策略的掌握——这些假设在许多领域并不现实。近期研究通过将共享自主与无模型深度强化学习相结合，放宽了部分假设。具体而言，这些方法不再需要预知目标空间（如目标是否离散或受约束）或环境动力学。然而，它们仍需依赖特定任务的奖励函数来训练策略。遗憾的是，此类奖励规范往往困难且脆弱。此外，这些方法本质上依赖人在回路训练，因此必须准备一个模拟用户行为的策略。本文提出一种新的共享自主方法，通过调制扩散模型的正向与反向扩散过程来实现。该方法既不假设已知环境动力学或用户目标空间，与以往工作不同，它既不需要任何奖励反馈，也不需在训练期间获取用户策略。相反，我们的框架学习期望行为空间上的分布，并利用扩散模型将用户动作映射为该分布中的一个样本。关键在于，我们证明了可以在保持用户控制权限的同时实现这一过程。我们在系列具有挑战性的连续控制任务上评估该框架，并分析其在维护用户自主性的同时有效纠正用户动作的能力。