PrivacyAlign: Contextual Privacy Alignment for LLM Agents - 专知论文

会员服务 ·

0

Agent · 大语言模型 · AI · 标注 · 样本 ·

PrivacyAlign: Contextual Privacy Alignment for LLM Agents

翻译：暂无翻译

Manveer Singh Tamber,Abhay Puri,Marc-Etienne Brunet,Perouz Taslakian,Jimmy Lin,Spandana Gella

AI agents acting on behalf of users are constantly making decisions, and for users to trust their agents, those decisions must align with what they actually want. Privacy is an important alignment problem for agents: every message, post, or tool call an agent makes is a contextual judgment about what is appropriate to share, with whom, and under which conditions. Because such judgments depend on social expectations and norms, human judgment does not merely label privacy violations but also helps define them. While existing work relies on unreliable proxies for both training and evaluation, we place human judgment at the center of agentic privacy alignment. We introduce PrivacyAlign, a dataset of 1,350 samples with 3,516 detailed annotations from 599 unique annotators across diverse scenarios where current LLMs actually leak, and use it to ground both alignment training and automated evaluation in human privacy norms. Building on these annotations, we first show that conditioning LLM judges on human annotations and explanations for reference responses to the same prompt makes their judgments more reliable. We then introduce annotation-conditioned reward modeling, which uses these annotations to score new responses during RL, and show that small open-weight agents trained with this reward better align with human privacy norms, with strong gains on PrivacyAlign and existing privacy benchmarks for agents.

翻译：暂无翻译

0

相关内容

Agent

AI Agent深度（二）：2025 Agent元年，AI从L2向L3发展

AI Agent深度（二）：2025 Agent元年，AI从L2向L3发展

专知会员服务

45+阅读 · 2025年5月5日

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

33+阅读 · 2025年3月27日

中国AI Agent行业研究报告（二）

中国AI Agent行业研究报告（二）

专知会员服务

48+阅读 · 2025年3月13日

人工智能专题报告：Operator和Manus打开AI Agent时代

人工智能专题报告：Operator和Manus打开AI Agent时代

专知会员服务

64+阅读 · 2025年3月12日

【AI Agent行业深度】框架、应用方向、应用领域及相关公司一文深度梳理！（附下载）

【AI Agent行业深度】框架、应用方向、应用领域及相关公司一文深度梳理！（附下载）

专知会员服务

144+阅读 · 2024年1月1日

【中国联通】AI隐私泄露危机四伏！如何确保数据安全？《人工智能隐私保护白皮书》揭示AI隐私风险与应对策略（附下载），45页pdf

【中国联通】AI隐私泄露危机四伏！如何确保数据安全？《人工智能隐私保护白皮书》揭示AI隐私风险与应对策略（附下载），45页pdf

专知会员服务

50+阅读 · 2023年12月10日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

291+阅读 · 2023年10月12日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

AI Agent：基于大模型的自主智能体

AI Agent：基于大模型的自主智能体

专知会员服务

250+阅读 · 2023年9月9日

香港个人资料私隐专员公署2021年【开发及使用人工智能道德标准指引】

香港个人资料私隐专员公署2021年【开发及使用人工智能道德标准指引】

专知会员服务

13+阅读 · 2022年2月17日

《人工智能安全测评白皮书》，99页pdf

《人工智能安全测评白皮书》，99页pdf

专知

36+阅读 · 2022年2月26日

推荐系统丨完整的架构设计和算法（协同过滤、隐语义）

推荐系统丨完整的架构设计和算法（协同过滤、隐语义）

架构文摘

16+阅读 · 2019年9月9日

NLP 与 NLU：从语言理解到语言处理

NLP 与 NLU：从语言理解到语言处理

AI研习社

15+阅读 · 2019年5月29日

美参议员提出商业面部识别隐私法案

美参议员提出商业面部识别隐私法案

蚂蚁金服评论

12+阅读 · 2019年4月25日

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

专知

47+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

中国信通院：人工智能安全白皮书（2018年）（附解读及白皮书下载）

中国信通院：人工智能安全白皮书（2018年）（附解读及白皮书下载）

走向智能论坛

27+阅读 · 2018年9月18日

一文读懂「Attention is All You Need」| 附代码实现

一文读懂「Attention is All You Need」| 附代码实现

PaperWeekly

37+阅读 · 2018年1月10日

Representation Learning on Network 网络表示学习

Representation Learning on Network 网络表示学习

全球人工智能

10+阅读 · 2017年10月19日

移动互联网的用户隐私保护研究

国家自然科学基金

2+阅读 · 2017年12月31日

面向网络系统的一致性安全隐私分析与防护机制设计

国家自然科学基金

2+阅读 · 2017年12月31日

面向隐私保护的地理社交网络个性化推荐方法研究

国家自然科学基金

2+阅读 · 2017年12月31日

复杂系统中多密码算法密钥协同安全研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

即时通讯匿名隐通道系统模型与算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

Android移动终端多语种基础软件组合的安全技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

移动互联网服务及隐私保护的理论与关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents

Arxiv

0+阅读 · 6月22日

MuPPET: A Benchmark for Contextual Privacy of LLM Assistants in Multi-Party Conversations

Arxiv

0+阅读 · 6月22日

Safety in Self-Evolving LLM Agent Systems: Threats, Amplification, and Case Studies

Arxiv

0+阅读 · 6月22日

LLM-as-Code: Agentic Programming for Agent Harness

Arxiv

0+阅读 · 6月22日

GroundEval: A Deterministic Replacement for LLM-as-Judge in Stateful Agent Evaluation

Arxiv

0+阅读 · 6月22日

Prophet Inequalities under Local Differential Privacy

Arxiv

0+阅读 · 6月19日

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

Arxiv

0+阅读 · 6月18日

Hidden Anchors in Multi-Agent LLM Deliberation

Arxiv

0+阅读 · 6月17日

A Technical Taxonomy of LLM Agent Communication Protocols

A Technical Taxonomy of LLM Agent Communication Protocols

Arxiv

0+阅读 · 6月17日

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

Arxiv

0+阅读 · 6月17日

VIP会员

文章信息

相关主题

大语言模型

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

3+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

AI Agent深度（二）：2025 Agent元年，AI从L2向L3发展

AI Agent深度（二）：2025 Agent元年，AI从L2向L3发展

专知会员服务

45+阅读 · 2025年5月5日

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

33+阅读 · 2025年3月27日

中国AI Agent行业研究报告（二）

中国AI Agent行业研究报告（二）

专知会员服务

48+阅读 · 2025年3月13日

人工智能专题报告：Operator和Manus打开AI Agent时代

人工智能专题报告：Operator和Manus打开AI Agent时代

专知会员服务

64+阅读 · 2025年3月12日

【AI Agent行业深度】框架、应用方向、应用领域及相关公司一文深度梳理！（附下载）

【AI Agent行业深度】框架、应用方向、应用领域及相关公司一文深度梳理！（附下载）

专知会员服务

144+阅读 · 2024年1月1日

【中国联通】AI隐私泄露危机四伏！如何确保数据安全？《人工智能隐私保护白皮书》揭示AI隐私风险与应对策略（附下载），45页pdf

【中国联通】AI隐私泄露危机四伏！如何确保数据安全？《人工智能隐私保护白皮书》揭示AI隐私风险与应对策略（附下载），45页pdf

专知会员服务

50+阅读 · 2023年12月10日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

291+阅读 · 2023年10月12日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

AI Agent：基于大模型的自主智能体

AI Agent：基于大模型的自主智能体

专知会员服务

250+阅读 · 2023年9月9日

香港个人资料私隐专员公署2021年【开发及使用人工智能道德标准指引】

香港个人资料私隐专员公署2021年【开发及使用人工智能道德标准指引】

专知会员服务

13+阅读 · 2022年2月17日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

《人工智能安全测评白皮书》，99页pdf

《人工智能安全测评白皮书》，99页pdf

专知

36+阅读 · 2022年2月26日

推荐系统丨完整的架构设计和算法（协同过滤、隐语义）

推荐系统丨完整的架构设计和算法（协同过滤、隐语义）

架构文摘

16+阅读 · 2019年9月9日

NLP 与 NLU：从语言理解到语言处理

NLP 与 NLU：从语言理解到语言处理

AI研习社

15+阅读 · 2019年5月29日

美参议员提出商业面部识别隐私法案

美参议员提出商业面部识别隐私法案

蚂蚁金服评论

12+阅读 · 2019年4月25日

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

专知

47+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

中国信通院：人工智能安全白皮书（2018年）（附解读及白皮书下载）

中国信通院：人工智能安全白皮书（2018年）（附解读及白皮书下载）

走向智能论坛

27+阅读 · 2018年9月18日

一文读懂「Attention is All You Need」| 附代码实现

一文读懂「Attention is All You Need」| 附代码实现

PaperWeekly

37+阅读 · 2018年1月10日

Representation Learning on Network 网络表示学习

Representation Learning on Network 网络表示学习

全球人工智能

10+阅读 · 2017年10月19日

相关论文

AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents

Arxiv

0+阅读 · 6月22日

MuPPET: A Benchmark for Contextual Privacy of LLM Assistants in Multi-Party Conversations

Arxiv

0+阅读 · 6月22日

Safety in Self-Evolving LLM Agent Systems: Threats, Amplification, and Case Studies

Arxiv

0+阅读 · 6月22日

LLM-as-Code: Agentic Programming for Agent Harness

Arxiv

0+阅读 · 6月22日

GroundEval: A Deterministic Replacement for LLM-as-Judge in Stateful Agent Evaluation

Arxiv

0+阅读 · 6月22日

Prophet Inequalities under Local Differential Privacy

Arxiv

0+阅读 · 6月19日

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

Arxiv

0+阅读 · 6月18日

Hidden Anchors in Multi-Agent LLM Deliberation

Arxiv

0+阅读 · 6月17日

A Technical Taxonomy of LLM Agent Communication Protocols

A Technical Taxonomy of LLM Agent Communication Protocols

Arxiv

0+阅读 · 6月17日

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

Arxiv

0+阅读 · 6月17日

相关基金

移动互联网的用户隐私保护研究

国家自然科学基金

2+阅读 · 2017年12月31日

面向网络系统的一致性安全隐私分析与防护机制设计

国家自然科学基金

2+阅读 · 2017年12月31日

面向隐私保护的地理社交网络个性化推荐方法研究

国家自然科学基金

2+阅读 · 2017年12月31日

复杂系统中多密码算法密钥协同安全研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

即时通讯匿名隐通道系统模型与算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

Android移动终端多语种基础软件组合的安全技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

移动互联网服务及隐私保护的理论与关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员