Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver's policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver's policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver's policy, we propose a strategy for sampling a care-receiver's response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.
翻译:自主辅助运动障碍人士是自主机器人系统最具前景的应用之一。近期研究表明,深度强化学习在医疗健康领域已取得可喜成果。先前研究证实,辅助任务可建模为包含护理者与被护理者两个智能体的多智能体强化学习问题。然而,多智能体强化学习训练出的策略往往对其他智能体的策略变化敏感,导致训练好的护理者策略可能无法适用于不同被护理者。为缓解该问题,我们提出通过训练多样化被护理者响应来学习鲁棒护理者策略的框架。在该框架中,被护理者的多样化响应通过试错自主习得。此外,为强化护理者策略的鲁棒性,我们提出在训练过程中以对抗方式采样被护理者响应的策略。我们使用Assistive Gym中的任务对所提方法进行评估。实验表明,主流深度强化学习方法训练的策略对其他智能体的策略变化十分脆弱,而所提框架能显著提升对此类变化的鲁棒性。