Large Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 77% reduction in identity-dependent bias compared to vanilla prompting and a 45% reduction relative to prompt-based defenses. Our work addresses a critical gap in mitigating the impact of user identity cues in prompts on core generation quality.
翻译:大型语言模型(LLM)的输出常因用户社会人口属性而异,导致事实准确性、实用性和安全性方面存在差异,即使在人口统计信息无关的客观问题中亦是如此。与以往关于刻板印象或表征偏见的研究不同,本文研究了核心响应质量随身份而出现的退化现象。我们通过实证表明,尽管事实知识在不同身份间被稳健编码,此类退化仍源于有偏的生成行为。受此不匹配现象的启发,我们提出一种轻量级、无需训练的框架,用于实现身份鲁棒的生成。该框架选择性地中和非关键的身份信息,同时保留语义上必需的属性,从而维持输出内容的完整性。在四个基准测试和18种社会人口身份上的实验表明,与原始提示相比,该方法平均减少了77%的身份依赖性偏差;与基于提示的防御方法相比,偏差减少了45%。我们的工作填补了缓解提示中用户身份线索对核心生成质量影响的关键空白。