We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to Gödel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of Gödel's and Turing's results. Key to our proof is the formalization of the notion of trust, which allows us to separate the intrinsic property of a system (being accurate) from its epistemic status (being trusted).
翻译:我们发现在严格数学定义下,人工智能系统中的精确性、可信性与人类水平推理目标之间存在根本性矛盾。我们将系统的精确性定义为:当其具备对任意输入可放弃预测的能力时,系统不会作出任何错误断言的性质;将可信性定义为系统精确性的预设前提。我们将人类水平推理定义为人工智能系统始终达到或超越人类能力水平的性质。核心发现是:基于对这些概念的正式定义,一个精确且可信的人工智能系统无法成为人类水平推理系统——对于这类精确可信的系统,存在人类可以轻松、可证明地解决但系统无法解决的任务实例。我们的证明与哥德尔不完备定理及图灵对停机问题不可判定性的证明存在相似性,可视为对哥德尔与图灵定理的诠释。证明的关键在于对可信性概念的形式化,这使我们得以区分系统的内在属性(精确性)与其认知地位(可信性)。