We present what we call the Interpretation Problem, whereby any rule in symbolic form is open to infinite interpretation in ways that we might disapprove of and argue that any attempt to build morality into machines is subject to it. We show how the Interpretation Problem in Artificial Intelligence is an illustration of Wittgenstein's general claim that no rule can contain the criteria for its own application, and that the risks created by this problem escalate in proportion to the degree to which to machine is causally connected to the world, in what we call the Law of Interpretative Exposure. Using game theory, we attempt to define the structure of normative spaces and argue that any rule-following within a normative space is guided by values that are external to that space and which cannot themselves be represented as rules. In light of this, we categorise the types of mistakes an artificial moral agent could make into Mistakes of Intention and Instrumental Mistakes, and we propose ways of building morality into machines by getting them to interpret the rules we give in accordance with these external values, through explicit moral reasoning, the Show, not Tell paradigm, the adjustment of causal power and structure of the agent, and relational values, with the ultimate aim that the machine develop a virtuous character and that the impact of the Interpretation Problem is minimised.
翻译:我们提出所谓的"解释问题",即任何符号形式的规则都可能以我们不认同的方式被无限解释,并认为任何将道德嵌入机器的尝试都受此问题制约。我们证明,人工智能中的解释问题正是维特根斯坦核心论断的例证——任何规则都无法包含其自身的适用标准,且该问题引发的风险随机器与世界因果关联程度的提高而加剧,我们将此规律称为"解释暴露定律"。通过博弈论,我们试图界定规范性空间的结构,并论证规范性空间内的任何规则遵循行为都受制于该空间之外、且无法被规则自身表征的价值引导。据此,我们将道德智能体可能犯的错误分为"意图错误"与"工具性错误"两类,并提出通过以下途径将道德嵌入机器:根据这些外在价值解释我们给予的规则、显式道德推理、"展示而非告知"范式、调整智能体的因果能力与结构、以及关系性价值,最终目标是使机器形成美德品质,并最大程度降低解释问题的影响。