Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (\textbf{A}nonymization with \textbf{U}tility-\textbf{R}etention \textbf{A}daptation), an LLM-powered \textit{mask-reconstruct} framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.
翻译:具备网络搜索能力的智能代理LLM改变了文本匿名化的威胁模型:弱语境线索可能成为重识别的交叉引用证据,但这些细节同样承载着文本的下游分析价值。现有防御方法要么移除显式标识符,要么扰动文本以实现形式化隐私保护,或针对非网络推理模型测试改写文本,但尚未充分探索对抗智能代理网络搜索重识别与保持文本效用之间的平衡区域。我们提出AURA(效用保持自适应匿名化),这是一种基于LLM的掩码-重建框架,将隐私定位与效用保持重建解耦,并通过对抗性隐私和效用保持检查筛选候选结果。我们使用网络搜索代理执行的重识别攻击,在真实用户访谈记录上评估AURA,同时基于受访者画像事实、编码本事实及联合语境效用网格进行效用评估。结果表明,AURA通过自适应隐私范围增强对智能代理重识别的抵抗能力,并在固定隐私范围下采用掩码-重建匿名化方法更好地保留语境效用,从而改善了隐私-效用边界。