The AI safety literature is full of examples of powerful AI agents that, in blindly pursuing a specific and usually narrow objective, ends up with unacceptable and even catastrophic collateral damage to others. In this paper, we consider the problem of social harms that can result from actions taken by learning and utility-maximising agents in a multi-agent environment. The problem of measuring social harms or impacts in such multi-agent settings, especially when the agents are artificial generally intelligent (AGI) agents, was listed as an open problem in Everitt et al, 2018. We attempt a partial answer to that open problem in the form of market-based mechanisms to quantify and control the cost of such social harms. The proposed setup captures many well-studied special cases and is more general than existing formulations of multi-agent reinforcement learning with mechanism design in two ways: (i) the underlying environment is a history-based general reinforcement learning environment like in AIXI; (ii) the reinforcement-learning agents participating in the environment can have different learning strategies and planning horizons. To demonstrate the practicality of the proposed setup, we survey some key classes of learning algorithms and present a few applications, including a discussion of the Paperclips problem and pollution control with a cap-and-trade system.
翻译:人工智能安全文献中充斥着诸多案例,其中强大的AI智能体在盲目追求特定且通常狭隘的目标时,会对其他方造成不可接受甚至灾难性的附带损害。本文探讨了在多智能体环境中,由学习和效用最大化智能体的行为所可能引发的社会危害问题。Everitt等人(2018)曾将此类多智能体场景(尤其是当智能体为人工通用智能(AGI)体时)中社会危害或影响的度量问题列为开放性问题。我们尝试通过基于市场的机制来量化和控制此类社会危害的成本,从而对该开放性问题给出部分解答。所提出的框架涵盖了许多已深入研究的特例,并在以下两个方面比现有的结合机制设计的多智能体强化学习表述更为一般化:(i)底层环境是基于历史的通用强化学习环境,类似于AIXI;(ii)参与环境的强化学习智能体可以具有不同的学习策略和规划视野。为证明所提框架的实用性,我们综述了几类关键的学习算法,并展示了若干应用实例,包括对回形针问题以及基于总量控制与交易体系的污染控制问题的讨论。