AI agents are commonly trained with large datasets of demonstrations of human behavior. However, not all behaviors are equally safe or desirable. Desired characteristics for an AI agent can be expressed by assigning desirability scores, which we assume are not assigned to individual behaviors but to collective trajectories. For example, in a dataset of vehicle interactions, these scores might relate to the number of incidents that occurred. We first assess the effect of each individual agent's behavior on the collective desirability score, e.g., assessing how likely an agent is to cause incidents. This allows us to selectively imitate agents with a positive effect, e.g., only imitating agents that are unlikely to cause incidents. To enable this, we propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score. The Exchange Value is the expected change in desirability score when substituting the agent for a randomly selected agent. We propose additional methods for estimating Exchange Values from real-world datasets, enabling us to learn desired imitation policies that outperform relevant baselines. The project website can be found at https://tinyurl.com/select-to-perfect.
翻译:AI智能体通常使用大量人类行为示范数据集进行训练。然而,并非所有行为都具有同等安全性或期望性。AI智能体的期望特征可通过分配期望性分数来表达,我们假定这些分数并非分配给单个行为,而是分配给集体轨迹。例如,在车辆交互数据集中,此类分数可能与发生的事故数量相关。我们首先评估每个个体智能体行为对集体期望性分数的影响,例如评估智能体引发事故的可能性。这使我们能够选择性模仿具有正面影响的智能体,例如仅模仿不易引发事故的智能体。为此,我们提出智能体"交换价值"这一概念,用以量化个体智能体对集体期望性分数的贡献程度。交换价值是指用随机选取的智能体替换当前智能体时期望性分数的预期变化量。我们进一步提出从真实世界数据集中估计交换价值的辅助方法,从而能够学习优于相关基准的期望模仿策略。项目网站详见https://tinyurl.com/select-to-perfect。