Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.
翻译:将语言模型与人类价值观对齐至关重要,尤其是在它们日益融入日常生活的背景下。尽管模型常根据用户偏好进行调整,但确保其与现实社会情境中的道德规范和行为保持一致同样重要。尽管在英语和汉语等语言领域已取得显著进展,法语在此方面却鲜有关注,导致对大型语言模型如何用法语进行道德推理的理解存在空白。为填补这一空白,我们引入了Histoires Morales——一个源自Moral Stories的法语数据集,该数据集通过翻译创建,并随后在母语者的协助下进行精炼,以确保语法准确性及对法国文化语境的适应性。我们还依据数据集中道德价值的标注来确保其与法国规范的一致性。Histoires Morales涵盖了广泛的社会情境,包括小费支付习惯的差异、人际关系中诚实的表达方式,以及对动物的责任等。为促进未来研究,我们还就多语言模型在法语和英语数据上的对齐性及对齐鲁棒性进行了初步实验。研究发现,尽管大型语言模型默认状态下通常与人类道德规范对齐,但它们极易受到用户偏好优化的影响,无论数据内容是否符合道德标准。