Alignment has a Fantasia Problem

Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed. When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users' needs. We call these failures Fantasia interactions. We argue that Fantasia interactions demand a rethinking of alignment research: rather than treating users as rational oracles, AI should provide cognitive support by actively helping users form and refine their intent through time. This requires an interdisciplinary approach that bridges machine learning, interface design, and behavioral science. We synthesize insights from these fields to characterize the mechanisms and failures of Fantasia interactions. We then show why existing interventions are insufficient, and propose a research agenda for designing and evaluating AI systems that better help humans navigate uncertainty in their tasks.

翻译：现代人工智能助手被训练用于遵循指令，其隐含假设是用户能够清晰表达其目标及所需帮助的类型。然而，数十年的行为研究表明，人们往往在目标尚未完全形成时就开始使用AI系统。当AI系统将提示视为意图的完整表达时，它们可能看似有用或便捷，但未必符合用户的实际需求。我们将此类失败称为"幻想曲交互"。我们认为，幻想曲交互要求重新思考对齐研究：AI不应将用户视为理性预言者，而应通过主动帮助用户随时间形成并完善其意图来提供认知支持。这需要一种连接机器学习、界面设计与行为科学的跨学科方法。我们综合这些领域的见解，以刻画幻想曲交互的机制与失败模式，进而论证现有干预措施的不足，并提出一项用于设计及评估能更好帮助人类应对任务不确定性的AI系统的研究议程。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【伯克利博士论文】在部分可观察性下的对齐问题

专知会员服务

20+阅读 · 2025年1月9日

《人工智能辅助决策面临的三大挑战》最新33页

专知会员服务

53+阅读 · 2025年1月8日

【MIT博士论文】人工智能与人类对齐的构建模块：指定、检查、建模和修订，216页pdf

专知会员服务

44+阅读 · 2024年4月2日

《人工智能辅助决策面临的三大挑战》

专知会员服务

86+阅读 · 2023年12月15日