Moral sensitivity is fundamental to human moral competence, as it guides individuals in regulating everyday behavior. Although many approaches seek to align large language models (LLMs) with human moral values, how to enable them morally sensitive has been extremely challenging. In this paper, we take a step toward answering the question: how can we enhance moral sensitivity in LLMs? Specifically, we propose two pragmatic inference methods that faciliate LLMs to diagnose morally benign and hazardous input and correct moral errors, whereby enhancing LLMs' moral sensitivity. A central strength of our pragmatic inference methods is their unified perspective: instead of modeling moral discourses across semantically diverse and complex surface forms, they offer a principled perspective for designing pragmatic inference procedures grounded in their inferential loads. Empirical evidence demonstrates that our pragmatic methods can enhance moral sensitivity in LLMs and achieves strong performance on representative morality-relevant benchmarks.
翻译:道德敏感性是人类道德能力的基础,它指导个体调节日常行为。尽管已有多种方法致力于使大型语言模型(LLMs)与人类道德价值观对齐,但如何使其具备道德敏感性仍极具挑战。本文旨在探索如何增强LLMs的道德敏感性。具体而言,我们提出两种实用的推理方法,促使LLMs能够诊断道德良性或危险的输入并修正道德错误,从而增强其道德敏感性。我们提出的实用推理方法的核心优势在于其统一的视角:它们并非对语义多样且复杂的表层形式中的道德论述进行建模,而是基于推理负荷为设计实用推理过程提供了原则性视角。实证结果表明,我们的实用方法能够有效增强LLMs的道德敏感性,并在代表性道德相关基准测试中取得优异性能。