As AI systems are increasingly incorporated into domains where human behavior has set the norm, a challenge for AI governance and AI alignment research is to regulate their behavior in a way that is useful and constructive for society. One way to answer this question is to ask: how do we govern the human behavior that the models are emulating? To evaluate human behavior, the American legal system often uses the "Reasonable Person Standard." The idea of "reasonable" behavior comes up in nearly every area of law. The legal system often judges the actions of parties with respect to what a reasonable person would have done under similar circumstances. This paper argues that the reasonable person standard provides useful guidelines for the type of behavior we should develop, probe, and stress-test in models. It explains how reasonableness is defined and used in key areas of the law using illustrative cases, how the reasonable person standard could apply to AI behavior in each of these areas and contexts, and how our societal understanding of "reasonable" behavior provides useful technical goals for AI researchers.
翻译:随着人工智能系统日益融入以人类行为为规范标准的领域,如何对其行为进行监管以使其对社会有益且具有建设性,已成为人工智能治理和人工智能对齐研究面临的挑战。解决此问题的一种途径是追问:我们应如何规范模型所模仿的人类行为?在评估人类行为时,美国法律体系常采用"合理人标准"。"合理"行为的概念几乎出现在所有法律领域。法律体系通常依据理性人在类似情境下会采取的行动来判断当事人的行为。本文主张,合理人标准为我们应在模型中开发、检验和压力测试的行为类型提供了有益的指导准则。文章通过典型案例阐释了法律关键领域中"合理性"的定义与应用方式,分析了合理人标准如何适用于各领域和情境中的人工智能行为,并论证了社会对"合理"行为的共识如何为人工智能研究者提供有价值的技术目标。