End-to-end Optimization of Belief and Policy Learning in Shared Autonomy Paradigms

Shared autonomy systems require principled methods for inferring user intent and determining appropriate assistance levels. This is a central challenge in human-robot interaction, where systems must be successful while being mindful of user agency. Previous approaches relied on static blending ratios or separated goal inference from assistance arbitration, leading to suboptimal performance in unstructured environments. We introduce BRACE (Bayesian Reinforcement Assistance with Context Encoding), a novel framework that fine-tunes Bayesian intent inference and context-adaptive assistance through an architecture enabling end-to-end gradient flow between intent inference and assistance arbitration. Our pipeline conditions collaborative control policies on environmental context and complete goal probability distributions. We provide analysis showing (1) optimal assistance levels should decrease with goal uncertainty and increase with environmental constraint severity, and (2) integrating belief information into policy learning yields a quadratic expected regret advantage over sequential approaches. We validated our algorithm against SOTA methods (IDA, DQN) using a three-part evaluation progressively isolating distinct challenges of end-effector control: (1) core human-interaction dynamics in a 2D human-in-the-loop cursor task, (2) non-linear dynamics of a robotic arm, and (3) integrated manipulation under goal ambiguity and environmental constraints. We demonstrate improvements over SOTA, achieving 6.3% higher success rates and 41% increased path efficiency, and 36.3% success rate and 87% path efficiency improvement over unassisted control. Our results confirmed that integrated optimization is most beneficial in complex, goal-ambiguous scenarios, and is generalizable across robotic domains requiring goal-directed assistance, advancing the SOTA for adaptive shared autonomy.

翻译：共享自主系统需要具备推断用户意图和确定适当辅助水平的原理性方法。这是人机交互领域的核心挑战，系统必须在成功完成任务的同时充分考虑用户自主性。先前方法依赖于静态混合比例或将目标推断与辅助仲裁分离，导致在非结构化环境中性能欠佳。我们提出BRACE（基于上下文编码的贝叶斯强化辅助）——一种通过支持意图推断与辅助仲裁间端到端梯度流的架构，对贝叶斯意图推断和上下文自适应辅助进行微调的新型框架。我们的流水线将协作控制策略建立在环境上下文和完整目标概率分布上。我们提供的分析表明：(1) 最优辅助水平应随目标不确定性的增加而降低，随环境约束严重性的增加而提高；(2) 将信念信息整合到策略学习中，相比顺序方法可获得二次期望遗憾优势。我们通过三阶段评估将算法与SOTA方法（IDA、DQN）进行对比，逐步分离末端执行器控制的不同挑战：(1) 二维人在回路光标任务中的核心人机交互动态，(2) 机械臂的非线性动力学，(3) 目标模糊和环境约束下的集成操控。我们展示了相对于SOTA方法的改进：成功率提高6.3%，路径效率提升41%；相对于无辅助控制：成功率提升36.3%，路径效率提高87%。我们的结果证实，集成优化在复杂、目标模糊的场景中最为有效，且可推广至需要目标导向辅助的机器人领域，推动了自适应共享自主技术的SOTA发展。