We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.
翻译:我们描述了一条让人类安全地驾驭强大通用人工智能(AGI)的路径,即构建能够可证明地满足人类指定需求的系统。我们认为,利用先进人工智能进行形式验证与机制可解释性,这一目标将很快在技术层面成为现实。我们进一步论证,这是确保安全可控通用人工智能的唯一路径。最后,我们列举了一系列挑战性问题,其解决将推动这一积极前景的实现,并邀请读者共同参与这项研究。