Large language models (LLMs) have become integral tool for users from various backgrounds. LLMs, trained on vast corpora, reflect the linguistic and cultural nuances embedded in their pre-training data. However, the values and perspectives inherent in this data can influence the behavior of LLMs, leading to potential biases. As a result, the use of LLMs in contexts involving spiritual or moral values necessitates careful consideration of these underlying biases. Our work starts with verification of our hypothesis by testing the spiritual values of popular LLMs. Experimental results show that LLMs' spiritual values are quite diverse, as opposed to the stereotype of atheists or secularists. We then investigate how different spiritual values affect LLMs in social-fairness scenarios e.g., hate speech identification). Our findings reveal that different spiritual values indeed lead to different sensitivity to different hate target groups. Furthermore, we propose to continue pre-training LLMs on spiritual texts, and empirical results demonstrate the effectiveness of this approach in mitigating spiritual bias.
翻译:大型语言模型(LLMs)已成为不同背景用户不可或缺的工具。LLMs基于海量语料库训练而成,反映了其预训练数据中蕴含的语言与文化细微差别。然而,这些数据固有的价值观与视角会影响LLMs的行为,导致潜在的偏见。因此,在涉及精神或道德价值观的语境中使用LLMs时,必须审慎考量这些潜在偏见。本研究首先通过测试主流LLMs的精神价值观来验证我们的假设。实验结果表明,LLMs的精神价值观具有高度多样性,而非刻板印象中的无神论或世俗主义倾向。随后,我们探究了不同精神价值观如何影响LLMs在社会公平场景(例如仇恨言论识别)中的表现。研究发现,不同的精神价值观确实会导致对不同仇恨目标群体的敏感度差异。此外,我们提出在精神文本上对LLMs进行持续预训练的方法,实证结果表明该策略能有效缓解精神偏见。