Ad hoc teamwork refers to the problem of enabling an agent to collaborate with teammates without prior coordination. Data-driven methods represent the state of the art in ad hoc teamwork. They use a large labeled dataset of prior observations to model the behavior of other agent types and to determine the ad hoc agent's behavior. These methods are computationally expensive, lack transparency, and make it difficult to adapt to previously unseen changes, e.g., in team composition. Our recent work introduced an architecture that determined an ad hoc agent's behavior based on non-monotonic logical reasoning with prior commonsense domain knowledge and predictive models of other agents' behavior that were learned from limited examples. In this paper, we substantially expand the architecture's capabilities to support: (a) online selection, adaptation, and learning of the models that predict the other agents' behavior; and (b) collaboration with teammates in the presence of partial observability and limited communication. We illustrate and experimentally evaluate the capabilities of our architecture in two simulated multiagent benchmark domains for ad hoc teamwork: Fort Attack and Half Field Offense. We show that the performance of our architecture is comparable or better than state of the art data-driven baselines in both simple and complex scenarios, particularly in the presence of limited training data, partial observability, and changes in team composition.
翻译:临时团队合作问题旨在让智能体无需预先协调即可与队友协作。当前主流方法采用数据驱动技术,通过利用大规模带标签的历史观测数据集来建模其他智能体的行为类型,并确定临时智能体的行为策略。然而这些方法计算成本高昂、缺乏可解释性,且难以适应团队构成等未预见的动态变化。我们此前的研究提出了一种架构,通过结合非单调逻辑推理(基于先验常识领域知识)以及从有限样本中学习的其他智能体行为预测模型,来确定临时智能体的行为策略。本文对该架构的能力进行了实质性扩展,使其支持:(a)在线选择、自适应更新和持续学习其他智能体行为预测模型;(b)在部分可观测与有限通信条件下与队友协同合作。我们在两个临时团队多智能体模拟基准环境(Fort Attack与Half Field Offense)中,通过实验验证了该架构的性能。实验结果表明,在简单和复杂场景下,尤其是在训练数据有限、部分可观测环境及团队构成动态变化条件下,本架构的性能均达到或超越了当前最先进的数据驱动基线方法。