While large-scale models such as LLMs and diffusion models have achieved practical success, public institutions have emphasized the importance of explainability in AI. Existing methods for explaining AI, however, are not designed to provide completely faithful explanations of the behavior of large-scale AI systems. Although a completely faithful and interpretable explanation of the behavior of an AI system might be useful for AI governance, it has not been known whether providing such an explanation is theoretically possible. In this paper, we mathematically prove a fundamental quadrilemma in explaining AI, stating that AI and its explanation cannot satisfy the following four conditions simultaneously: 1) the complexity of the operation environment, 2) the goodness of the AI's performance, 3) the interpretability of the AI's explanation, and 4) the complete faithfulness of the AI's explanation. This quadrilemma suggests that, in most applications where we cannot change the environment or sacrifice good AI performance and an interpretable explanation, we should give up complete faithfulness of explanations and should instead aim to explain only the parts that are important for applications. As a consequence, the quadrilemma implies that AI governance should be designed on the premise that the faithfulness of AI explanations is always incomplete.
翻译:尽管诸如大语言模型和扩散模型等大规模模型已取得实际成功,但公共机构始终强调人工智能可解释性的重要性。然而,现有的人工智能解释方法并非为提供大规模人工智能系统行为的完全忠实解释而设计。尽管对人工智能系统行为提供完全忠实且可解释的说明可能有助于人工智能治理,但理论上能否实现这种解释尚不明确。本文通过数学证明提出了人工智能解释中的基本四难困境,阐明人工智能及其解释无法同时满足以下四个条件:1)运行环境的复杂性,2)人工智能性能的优越性,3)人工智能解释的可理解性,4)人工智能解释的完全忠实性。这一四难困境表明,在大多数无法改变环境、不能牺牲良好AI性能或放弃可解释性解释的应用场景中,我们应放弃解释的完全忠实性,转而仅解释对应用重要的部分。由此推论,四难困境意味着人工智能治理的制定必须基于"AI解释的忠实性始终存在缺陷"这一前提。