We present an information-theoretic framework to learn fixed-dimensional embeddings for tasks in reinforcement learning. We leverage the idea that two tasks are similar if observing an agent's performance on one task reduces our uncertainty about its performance on the other. This intuition is captured by our information-theoretic criterion which uses a diverse agent population as an approximation for the space of agents to measure similarity between tasks in sequential decision-making settings. In addition to qualitative assessment, we empirically demonstrate the effectiveness of our techniques based on task embeddings by quantitative comparisons against strong baselines on two application scenarios: predicting an agent's performance on a new task by observing its performance on a small quiz of tasks, and selecting tasks with desired characteristics from a given set of options.
翻译:我们提出一种信息论框架,用于在强化学习中学习任务的固定维度嵌入。其核心思想在于:若观察智能体在一个任务上的表现能降低对其在另一任务上表现的不确定性,则这两个任务具有相似性。该直觉通过我们提出的信息论准则得以形式化——该准则利用多样化智能体群作为智能体空间的近似,从而在序贯决策场景中度量任务间的相似性。除定性评估外,我们通过两种应用场景的量化实验,将基于任务嵌入的技术与强基线方法进行对比,证明了其有效性:其一,通过观察智能体在一小撮测试任务上的表现预测其在新任务上的表现;其二,从给定选项集中筛选具有特定特征的任务。