Responsible AI has risen to the forefront of the AI research community. As neural network-based learning algorithms continue to permeate real-world applications, the field of Responsible AI has played a large role in ensuring that such systems maintain a high-level of human-compatibility. Despite this progress, the state of the art in Responsible AI has ignored one crucial point: human problems are multi-agent problems. Predominant approaches largely consider the performance of a single AI system in isolation, but human problems are, by their very nature, multi-agent. From driving in traffic to negotiating economic policy, human problem-solving involves interaction and the interplay of the actions and motives of multiple individuals. This dissertation develops the study of responsible emergent multi-agent behavior, illustrating how researchers and practitioners can better understand and shape multi-agent learning with respect to three pillars of Responsible AI: interpretability, fairness, and robustness. First, I investigate multi-agent interpretability, presenting novel techniques for understanding emergent multi-agent behavior at multiple levels of granularity. With respect to low-level interpretability, I examine the extent to which implicit communication emerges as an aid to coordination in multi-agent populations. I introduce a novel curriculum-driven method for learning high-performing policies in difficult, sparse reward environments and show through a measure of position-based social influence that multi-agent teams that learn sophisticated coordination strategies exchange significantly more information through implicit signals than lesser-coordinated agents. Then, at a high-level, I study concept-based interpretability in the context of multi-agent learning. I propose a novel method for learning intrinsically interpretable, concept-based policies and show that it enables...
翻译:责任型人工智能已跃居AI研究前沿。随着基于神经网络的学习算法持续渗透实际应用,责任型AI领域在确保此类系统保持高度人类兼容性方面发挥了重要作用。尽管取得这些进展,责任型AI的现有技术仍忽视了一个关键问题:人类问题本质上是多智能体问题。主流方法主要考虑单个AI系统的孤立性能,但人类问题本质上就是多智能体的——从交通拥堵中的驾驶到经济政策的谈判,人类解决问题涉及多个个体行为与动机的交互与协同。本论文系统研究了责任涌现型多智能体行为,阐释了研究者与实践者如何基于责任型AI三大支柱(可解释性、公平性、鲁棒性)更好地理解和塑造多智能体学习。首先,我研究了多智能体可解释性,提出了在多个粒度层面理解涌现型多智能体行为的新技术。在低层级可解释性方面,我考察了隐式通信作为多智能体群体协调辅助手段的涌现程度。我提出了一种新颖的课程驱动方法,用于在困难、稀疏奖励环境中学习高性能策略,并通过基于位置的社会影响力度量表明:学习复杂协调策略的多智能体团队通过隐式信号交换的信息量显著高于协调性较差的智能体。随后,在高层级层面,我研究了多智能体学习中的概念级可解释性。我提出了一种学习内在可解释概念策略的新方法,并证明该方法能够...