Agentic Skill Discovery

Language-conditioned robotic skills make it possible to apply the high-level reasoning of Large Language Models (LLMs) to low-level robotic control. A remaining challenge is to acquire a diverse set of fundamental skills. Existing approaches either manually decompose a complex task into atomic robotic actions in a top-down fashion, or bootstrap as many combinations as possible in a bottom-up fashion to cover a wider range of task possibilities. These decompositions or combinations, however, require an initial skill library. For example, a "grasping" capability can never emerge from a skill library containing only diverse "pushing" skills. Existing skill discovery techniques with reinforcement learning acquire skills by an exhaustive exploration but often yield non-meaningful behaviors. In this study, we introduce a novel framework for skill discovery that is entirely driven by LLMs. The framework begins with an LLM generating task proposals based on the provided scene description and the robot's configurations, aiming to incrementally acquire new skills upon task completion. For each proposed task, a series of reinforcement learning processes are initiated, utilizing reward and success determination functions sampled by the LLM to develop the corresponding policy. The reliability and trustworthiness of learned behaviors are further ensured by an independent vision-language model. We show that starting with zero skill, the ASD skill library emerges and expands to more and more meaningful and reliable skills, enabling the robot to efficiently further propose and complete advanced tasks. The project page can be found at: https://agentic-skill-discovery.github.io.

翻译：语言条件化的机器人技能使得将大型语言模型（LLM）的高层推理能力应用于低层机器人控制成为可能。一个尚存的挑战是如何获取多样化的基础技能集。现有方法要么以自上而下的方式手动将复杂任务分解为原子级机器人动作，要么以自下而上的方式尽可能多地引导组合以覆盖更广泛的任务可能性。然而，这些分解或组合都需要一个初始技能库。例如，仅包含多样化“推动”技能的技能库永远无法涌现出“抓取”能力。现有的基于强化学习的技能发现技术通过穷举探索来获取技能，但常常产生无意义的行为。在本研究中，我们引入了一种完全由LLM驱动的技能发现新框架。该框架始于LLM根据提供的场景描述和机器人配置生成任务提案，旨在任务完成后逐步获取新技能。对于每个提出的任务，系统会启动一系列强化学习过程，利用LLM采样的奖励和成功判定函数来开发相应策略。学习行为的可靠性与可信度进一步由独立的视觉-语言模型确保。我们证明，从零技能开始，ASD技能库能够涌现并扩展出越来越多有意义且可靠的技能，使机器人能够高效地进一步提出并完成高级任务。项目页面详见：https://agentic-skill-discovery.github.io。