To the Noise and Back: Diffusion for Shared Autonomy

Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the user's policy -- assumptions that are unrealistic in many domains. Recent works relax some of these assumptions by formulating shared autonomy with model-free deep reinforcement learning (RL). In particular, they no longer need knowledge of the goal space (e.g., that the goals are discrete or constrained) or environment dynamics. However, they need knowledge of a task-specific reward function to train the policy. Unfortunately, such reward specification can be a difficult and brittle process. On top of that, the formulations inherently rely on human-in-the-loop training, and that necessitates them to prepare a policy that mimics users' behavior. In this paper, we present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our approach does not assume known environment dynamics or the space of user goals, and in contrast to previous work, it does not require any reward feedback, nor does it require access to the user's policy during training. Instead, our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution. Crucially, we show that it is possible to carry out this process in a manner that preserves the user's control authority. We evaluate our framework on a series of challenging continuous control tasks, and analyze its ability to effectively correct user actions while maintaining their autonomy.

翻译：共享自主是一种用户与自主体协同控制机器人系统的操作范式。与纯遥操作或全自主的极端模式相比，该方法在诸多场景中展现出显著优势。传统共享自主方法依赖于对环境动力学的先验认知、预先已知的离散用户目标空间，或对用户策略的预知——这些假设在多数实际应用中难以成立。近期研究通过将共享自主与无模型深度强化学习相结合，逐步放宽了这些假设。具体而言，这类方法不再需要预知目标空间特性（如目标的离散性或约束条件）或环境动力学模型，但需借助任务特定的奖励函数来训练策略。然而，奖励函数的构建往往困难且易产生偏差。更关键的是，这些方法本质上依赖人机协同训练流程，这迫使研究者必须预先构建能模仿用户行为的策略。本文提出一种基于扩散模型正向与反向过程调制的共享自主新方法。该方法既不假设已知环境动力学或用户目标空间，亦无需依赖奖励反馈或用户策略先验知识——这与现有研究形成本质区别。我们的框架学习所需行为空间上的概率分布，并通过扩散模型将用户动作映射为从该分布中采样的结果。关键突破在于，我们证明了该过程能以保留用户控制权的方式实现。我们在系列具有挑战性的连续控制任务中评估该框架，并验证其在维持用户自主性前提下有效修正用户动作的能力。