While various approaches have recently been studied for bias identification, little is known about how implicit language that does not explicitly convey a viewpoint affects bias amplification in large language models.To examine the severity of bias toward a view, we evaluated the performance of two downstream tasks where the implicit and explicit knowledge of social groups were used. First, we present a stress test evaluation by using a biased model in edge cases of excessive bias scenarios. Then, we evaluate how LLMs calibrate linguistically in response to both implicit and explicit opinions when they are aligned with conflicting viewpoints. Our findings reveal a discrepancy in LLM performance in identifying implicit and explicit opinions, with a general tendency of bias toward explicit opinions of opposing stances. Moreover, the bias-aligned models generate more cautious responses using uncertainty phrases compared to the unaligned (zero-shot) base models. The direct, incautious responses of the unaligned models suggest a need for further refinement of decisiveness by incorporating uncertainty markers to enhance their reliability, especially on socially nuanced topics with high subjectivity.
翻译:尽管近来已有多种方法被用于偏见识别,但对于未明确表达观点的隐性语言如何影响大语言模型中的偏见放大,我们仍知之甚少。为检验模型对特定观点的偏见严重程度,我们评估了在两项下游任务中的表现,其中分别使用了关于社会群体的隐性与显性知识。首先,我们通过在极端过度偏见场景中使用带偏见的模型进行压力测试评估。随后,我们评估了当大语言模型面对相互冲突的观点时,如何根据隐性与显性意见进行语言校准。我们的研究结果揭示了大语言模型在识别隐性与显性意见时存在性能差异,普遍表现出对对立立场显性意见的偏向。此外,与未对齐(零样本)的基础模型相比,偏见对齐模型在生成回应时更倾向于使用不确定性短语以表达谨慎。未对齐模型直接且不加谨慎的回应表明,有必要通过引入不确定性标记来进一步优化其决策的确定性,从而提升其可靠性,尤其是在具有高度主观性的社会敏感话题上。