AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Does AI understand human values? While this remains an open philosophical question, we take a pragmatic stance by introducing VAPT, the Value-Alignment Perception Toolkit, for studying how LLMs reflect people's values and how people judge those reflections. 20 participants texted a chatbot over a month, then completed a 2-hour interview with our toolkit evaluating AI's ability to extract (pull details regarding), embody (make decisions guided by), and explain (provide proof of) their values. 13 participants ultimately left our study convinced that AI can understand human values. Thus, we warn about "weaponized empathy": a design pattern that may arise in interactions with value-aware, yet welfare-misaligned conversational agents. VAPT offers a new way to evaluate value-alignment in AI systems. We also offer design implications to evaluate and responsibly build AI systems with transparency and safeguards as AI capabilities grow more inscrutable, ubiquitous, and posthuman into the future.

翻译：人工智能是否理解人类价值观？虽然这仍是一个开放的哲学问题，我们采取实用主义立场，引入VAPT（价值对齐感知工具包）来研究大语言模型如何反映人们的价值观，以及人们如何评判这些反映。20名参与者与聊天机器人进行了一个月的短信交流，随后完成了2小时的访谈，利用我们的工具包评估AI提取（获取相关细节）、体现（以价值观指导决策）和解释（提供证据证明）其价值观的能力。最终，13名参与者确信AI能够理解人类价值观。因此，我们警告“武器化共情”：一种在与具有价值意识但福利错位的对话智能体互动中可能产生的设计模式。VAPT为评估AI系统中的价值对齐提供了新方法。我们还提出了设计启示，旨在随着AI能力在未来变得更加不可捉摸、无处不在且超越人类时，以透明性和保障措施评估并负责任地构建AI系统。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

28+阅读 · 2月27日

LLMS4ALL：大语言模型在各学科科研与应用中的综述

专知会员服务

36+阅读 · 2025年10月4日

可解释人工智能（XAI）：从内在可解释性到大语言模型

专知会员服务

34+阅读 · 2025年1月20日

揭示生成式人工智能 / 大型语言模型（LLMs）的军事潜力

专知会员服务

32+阅读 · 2024年9月26日