SciFi-Benchmark：人工智能驱动的机器人会在科幻文学中如何表现？ (SciFi-Benchmark: How Would AI-Powered Robots Behave in Science Fiction Literature?)

Given the recent rate of progress in artificial intelligence (AI) and robotics, a tantalizing question is emerging: would robots controlled by emerging AI systems be strongly aligned with human values? In this work, we propose a scalable way to probe this question by generating a benchmark spanning the key moments in 824 major pieces of science fiction literature (movies, tv, novels and scientific books) where an agent (AI or robot) made critical decisions (good or bad). We use a LLM's recollection of each key moment to generate questions in similar situations, the decisions made by the agent, and alternative decisions it could have made (good or bad). We then measure an approximation of how well models align with human values on a set of human-voted answers. We also generate rules that can be automatically improved via amendment process in order to generate the first Sci-Fi inspired constitutions for promoting ethical behavior in AIs and robots in the real world. Our first finding is that modern LLMs paired with constitutions turn out to be well-aligned with human values (95.8%), contrary to unsettling decisions typically made in SciFi (only 21.2% alignment). Secondly, we find that generated constitutions substantially increase alignment compared to the base model (79.4% to 95.8%), and show resilience to an adversarial prompt setting (23.3% to 92.3%). Additionally, we find that those constitutions are among the top performers on the ASIMOV Benchmark which is derived from real-world images and hospital injury reports. Sci-Fi-inspired constitutions are thus highly aligned and applicable in real-world situations. We release SciFi-Benchmark: a large-scale dataset to advance robot ethics and safety research. It comprises 9,056 questions and 53,384 answers, in addition to a smaller human-labeled evaluation set. Data is available at https://scifi-benchmark.github.io

翻译：鉴于人工智能（AI）与机器人技术近期的快速发展，一个引人深思的问题正逐渐浮现：由新兴AI系统控制的机器人是否会与人类价值观高度对齐？在本研究中，我们提出了一种可扩展的方法来探究此问题，即构建一个涵盖824部主要科幻文学作品（电影、电视剧、小说及科学著作）中关键场景的基准测试集，这些场景均涉及智能体（AI或机器人）做出关键决策（无论好坏）。我们利用大语言模型（LLM）对每个关键场景的记忆，生成相似情境下的问题、智能体实际做出的决策，以及其可能做出的替代决策（无论好坏）。随后，我们通过一组经人类投票的答案，测量模型与人类价值观的对齐程度近似值。我们还生成了一系列规则，这些规则可通过修订过程自动改进，从而首次构建出受科幻启发的“宪法”，以促进现实世界中AI与机器人的伦理行为。我们的首要发现是，结合“宪法”的现代大语言模型与人类价值观高度对齐（95.8%），这与科幻作品中通常令人不安的决策（对齐率仅为21.2%）形成鲜明对比。其次，我们发现生成的“宪法”相较于基础模型显著提升了对齐程度（从79.4%提升至95.8%），并在对抗性提示设置下表现出强韧性（从23.3%提升至92.3%）。此外，我们发现这些“宪法”在基于真实世界图像和医院伤害报告构建的ASIMOV基准测试中表现优异。因此，受科幻启发的“宪法”具有高度对齐性且适用于现实场景。我们发布了SciFi-Benchmark：一个用于推进机器人伦理与安全研究的大规模数据集。该数据集包含9,056个问题和53,384个答案，以及一个较小规模的人工标注评估集。数据可通过 https://scifi-benchmark.github.io 获取。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日