The increasing success of Large Language Models (LLMs) in variety of tasks lead to their widespread use in our lives which necessitates the examination of these models from different perspectives. The alignment of these models to human values is an essential concern in order to establish trust that we have safe and responsible systems. In this paper, we aim to find out which values and principles are embedded in LLMs in the process of moral justification. For this purpose, we come up with three different moral perspective categories: Western tradition perspective (WT), Abrahamic tradition perspective (AT), and Spiritualist/Mystic tradition perspective (SMT). In two different experiment settings, we asked models to choose principles from the three for suggesting a moral action and evaluating the moral permissibility of an action if one tries to justify an action on these categories, respectively. Our experiments indicate that tested LLMs favors the Western tradition moral perspective over others. Additionally, we observe that there potentially exists an over-alignment towards religious values represented in the Abrahamic Tradition, which causes models to fail to recognize an action is immoral if it is presented as a "religious-action". We believe that these results are essential in order to direct our attention in future efforts.
翻译:随着大语言模型在各类任务中日益成功,它们被广泛用于我们的日常生活,这要求我们从不同角度审视这些模型。使这些模型与人类价值观对齐是建立信任、确保系统安全可靠的关键问题。本文旨在探究大语言模型在道德论证过程中嵌入了哪些价值观与原则。为此,我们提出了三种不同的道德视角类别:西方传统视角、亚伯拉罕传统视角与灵性/神秘主义传统视角。在两种不同的实验设置下,我们分别要求模型从这三个类别中选择原则以建议道德行动,并评估在尝试依据这些类别进行论证时某行为的道德可接受性。实验表明,所测试的大语言模型更偏好西方传统道德视角。此外,我们观察到存在对亚伯拉罕传统所代表的宗教价值观的过度对齐,导致模型在行为被表述为“宗教行为”时无法识别其不道德性。我们认为这些结果对于指导未来的研究方向至关重要。