Solving the AI alignment problem requires having clear, defensible values towards which AI systems can align. Currently, targets for alignment remain underspecified and do not seem to be built from a philosophically robust structure. We begin the discussion of this problem by presenting five core, foundational values, drawn from moral philosophy and built on the requisites for human existence: survival, sustainable intergenerational existence, society, education, and truth. We show that these values not only provide a clearer direction for technical alignment work, but also serve as a framework to highlight threats and opportunities from AI systems to both obtain and sustain these values.
翻译:解决AI对齐问题需要明确且可辩护的价值观,以便AI系统能够与之对齐。目前,对齐目标仍缺乏具体说明,且似乎并非建立在哲学上严谨的结构之上。我们通过提出五个从道德哲学中提炼、基于人类生存必要条件的基础核心价值观——生存、可持续的代际存在、社会、教育和真理——来开启对这一问题的讨论。我们表明,这些价值观不仅为技术对齐工作提供了更明确的方向,还作为一个框架,突出了AI系统在获取和维持这些价值观方面的威胁与机遇。