In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of diversity, which could support skill-diversity pretraining in new and richly interactive environments where optimising for various forms of diversity may be desirable.
翻译:在自我监督强化学习(RL)中,一个关键挑战是学习多样化的技能集,以便为智能体应对未知的未来任务做好准备。尽管取得了令人瞩目的进展,可扩展性和评估仍然是普遍存在的问题。在可扩展性方面,对有意义技能的搜索可能会被高维特征空间所掩盖,其中相关特征可能因下游任务领域而异。对于技能多样性的评估,定义何为“多样性”通常需要硬性承诺于某种特定的技能多样性概念,这可能导致对技能多样性的理解不一致,使得不同方法的结果难以比较,并使得许多形式的多样性未被探索。为解决这些问题,我们采用了一种样本多样性度量方法——Vendi Score,它将生态学中的思想引入机器学习,允许用户指定和评估任何期望的多样性形式。我们展示了该度量如何促进技能评估,并介绍了VendiRL,这是一个用于学习多样化技能集的统一框架。给定不同的相似性函数,VendiRL能够激励不同形式的多样性,这可以支持在新型且交互丰富的环境中进行技能多样性预训练,其中优化各种形式的多样性可能是可取的。