Hierarchical Planning and Policy Shaping Shared Autonomy for Articulated Robots

In this work, we propose a novel shared autonomy framework to operate articulated robots. We provide strategies to design both the task-oriented hierarchical planning and policy shaping algorithms for efficient human-robot interactions in context-aware operation of articulated robots. Our framework for interplay between the human and the autonomy, as the participating agents in the system, is particularly influenced by the ideas from multi-agent systems, game theory, and theory of mind for a sliding level of autonomy. We formulate the sequential hierarchical human-in-the-loop decision making process by extending MDPs and Options framework to shared autonomy, and make use of deep RL techniques to train an uncertainty-aware shared autonomy policy. To fine-tune the formulation to a human, we use history of the system states, human actions, and their error with respect to a surrogate optimal model to encode human's internal state embeddings, beyond the designed values, by using conditional VAEs. We showcase the effectiveness of our formulation for different human skill levels and degrees of cooperativeness by using a case study of a feller-buncher machine in the challenging tasks of timber harvesting. Our framework is successful in providing a sliding level of autonomy from fully autonomous to fully manual, and is particularly successful in handling a noisy non-cooperative human agent in the loop. The proposed framework advances the state-of-the-art in shared autonomy for operating articulated robots, but can also be applied to other domains where autonomous operation is the ultimate goal.

翻译：本文提出了一种新颖的共享自主框架，用于操控铰接机器人。我们设计了面向任务的分层规划与策略塑形算法策略，旨在实现铰接机器人在情境感知操作中高效的人机交互。该框架将人类与自主系统视为参与主体，其交互设计尤其借鉴了多智能体系统、博弈论和心理理论的思想，以实现自主性的渐进式调节。通过将马尔可夫决策过程与选项框架扩展至共享自主领域，我们构建了序贯分层人机协同决策过程，并利用深度强化学习技术训练出具有不确定性感知能力的共享自主策略。为将公式化模型适配至具体操作人员，除预设参数外，我们利用系统状态历史、人类操作行为及其与代理最优模型间的偏差，通过条件变分自编码器编码人类内在状态表征。以伐木归堆机在木材采伐这一挑战性任务中的案例研究表明，该框架在不同操作技能水平及协作程度下均展现出有效性。该框架成功实现了从全自主到全手动操作的渐进式自主调节，尤其擅长处理包含非协作性人类主体的噪声干扰。本研究提出的框架不仅推动了铰接机器人共享自主控制领域的技术发展，亦可应用于其他以自主运作为终极目标的场景。