LLM-Guided Synthetic Augmentation (LGSA) for Mitigating Bias in AI Systems

Bias in AI systems, especially those relying on natural language data, raises ethical and practical concerns. Underrepresentation of certain groups often leads to uneven performance across demographics. Traditional fairness methods, such as pre-processing, in-processing, and post-processing, depend on protected-attribute labels, involve accuracy-fairness trade-offs, and may not generalize across datasets. To address these challenges, we propose LLM-Guided Synthetic Augmentation (LGSA), which uses large language models to generate counterfactual examples for underrepresented groups while preserving label integrity. We evaluated LGSA on a controlled dataset of short English sentences with gendered pronouns, professions, and binary classification labels. Structured prompts were used to produce gender-swapped paraphrases, followed by quality control including semantic similarity checks, attribute verification, toxicity screening, and human spot checks. The augmented dataset expanded training coverage and was used to train a classifier under consistent conditions. Results show that LGSA reduces performance disparities without compromising accuracy. The baseline model achieved 96.7 percent accuracy with a 7.2 percent gender bias gap. Simple swap augmentation reduced the gap to 0.7 percent but lowered accuracy to 95.6 percent. LGSA achieved 99.1 percent accuracy with a 1.9 percent bias gap, improving performance on female-labeled examples. These findings demonstrate that LGSA is an effective strategy for bias mitigation, enhancing subgroup balance while maintaining high task accuracy and label fidelity.

翻译：人工智能系统中的偏见，特别是依赖自然语言数据的系统，引发了伦理和实践层面的担忧。某些群体的代表性不足往往导致模型在不同人口统计学群体上的性能表现不均。传统的公平性方法，如预处理、处理中和后处理，依赖于受保护属性的标签，涉及准确性与公平性的权衡，且可能无法跨数据集泛化。为应对这些挑战，我们提出了LLM引导的合成数据增强（LGSA），该方法利用大语言模型为代表性不足的群体生成反事实示例，同时保持标签的完整性。我们在一个包含性别化代词、职业和二元分类标签的英文短句受控数据集上评估了LGSA。通过结构化提示生成性别互换的释义，随后进行质量控制，包括语义相似性检查、属性验证、毒性筛查和人工抽查。增强后的数据集扩展了训练覆盖范围，并用于在一致条件下训练分类器。结果表明，LGSA在不牺牲准确性的前提下减少了性能差异。基线模型实现了96.7%的准确率，但存在7.2%的性别偏见差距。简单的互换增强将差距缩小至0.7%，但准确率降至95.6%。LGSA实现了99.1%的准确率和1.9%的偏见差距，并提升了在女性标签示例上的性能。这些发现表明，LGSA是一种有效的偏见缓解策略，能在保持高任务准确性和标签保真度的同时，增强子群体的平衡性。