Large language models (LLMs) are increasingly being used in privacy pipelines to detect and remedy sensitive data leakage. These solutions often rely on the premise that LLMs can reliably recognize human names, one of the most important categories of personally identifiable information (PII). In this paper, we reveal how LLMs can consistently mishandle broad classes of human names even in short text snippets due to ambiguous linguistic cues in the contexts. We construct AmBench, a benchmark of over 12,000 real yet ambiguous human names based on the name regularity bias phenomenon. Each name appears in dozens of concise text snippets that are compatible with multiple entity types. Our experiments with 12 state-of-the-art LLMs show that the recall of AmBench names drops by 20--40% compared to more recognizable names. This uneven privacy protection due to linguistic properties raises important concerns about the fairness of privacy enforcement. When the contexts contain benign prompt injections -- instruction-like user texts that can cause LLMs to conflate data with commands -- AmBench names can become four times more likely to be ignored in Clio, an LLM-powered enterprise tool used by Anthropic AI to extract supposedly privacy-preserving insights from user conversations with Claude. Our findings showcase blind spots in the performance and fairness of LLM-based privacy solutions and call for a systematic investigation into their privacy failure modes and countermeasures.
翻译:大型语言模型(LLMs)正日益被用于隐私处理流程中,以检测并修复敏感数据泄露问题。这类解决方案通常基于一个前提:LLMs能够可靠地识别人名——这是个人身份信息(PII)中最重要的一类。本文揭示了由于上下文语境中存在的模糊语言线索,LLMs即使在简短文本片段中也可能持续错误处理广泛类别的人名。基于姓名规律性偏差现象,我们构建了包含超过12,000个真实但具有歧义人名的基准测试集AmBench。每个姓名出现在数十个与多种实体类型兼容的简洁文本片段中。我们对12个前沿LLMs的实验表明,相较于更易识别的姓名,AmBench中人名的召回率下降了20%至40%。这种因语言特性导致的隐私保护不均衡现象,引发了关于隐私执行公平性的重要关切。当上下文包含良性提示注入(即可能使LLMs混淆数据与指令的类指令用户文本)时,在Anthropic AI用于从用户与Claude的对话中提取所谓隐私保护洞察的企业级工具Clio中,AmBench姓名被忽略的可能性会增加四倍。我们的研究结果揭示了基于LLM的隐私解决方案在性能与公平性方面的盲点,并呼吁对其隐私失效模式及应对措施展开系统性研究。