While various approaches have recently been studied for bias identification, little is known about how implicit language that does not explicitly convey a viewpoint affects bias amplification in large language models. To examine the severity of bias toward a view, we evaluated the performance of two downstream tasks where the implicit and explicit knowledge of social groups were used. First, we present a stress test evaluation by using a biased model in edge cases of excessive bias scenarios. Then, we evaluate how LLMs calibrate linguistically in response to both implicit and explicit opinions when they are aligned with conflicting viewpoints. Our findings reveal a discrepancy in LLM performance in identifying implicit and explicit opinions, with a general tendency of bias toward explicit opinions of opposing stances. Moreover, the bias-aligned models generate more cautious responses using uncertainty phrases compared to the unaligned (zero-shot) base models. The direct, incautious responses of the unaligned models suggest a need for further refinement of decisiveness by incorporating uncertainty markers to enhance their reliability, especially on socially nuanced topics with high subjectivity.
翻译:尽管近期已有多种方法用于偏见识别,但关于不明确表达观点的隐性语言如何影响大语言模型中的偏见放大效应,目前仍知之甚少。为考察模型对特定立场的偏见严重程度,我们评估了两个下游任务的性能,其中分别运用了社会群体的隐性与显性知识。首先,我们通过在极端偏见场景的边界案例中使用带偏见的模型进行压力测试评估。随后,我们评估了当语言模型面对相互冲突的立场时,如何根据隐性与显性观点进行语言校准。研究结果表明,语言模型在识别隐性与显性观点时存在性能差异,总体上倾向于对对立立场的显性观点产生偏见。此外,与未经对齐的(零样本)基础模型相比,经过偏见对齐的模型会使用更多不确定性短语来生成更为谨慎的回应。未经对齐模型直接且不加谨慎的响应表明,需要通过引入不确定性标记来进一步完善其决策确定性,特别是在具有高度主观性的社会敏感话题上,以提升其可靠性。