Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.
翻译:人工智能伦理与公平领域的研究在规范大语言模型以体现公平、真理和多样性等价值观方面取得了显著进展。然而,这些研究默认了"LLMs如何能够'意指'任何事物"这一前提问题已被解决。若不解决此问题,赋予LLMs此类价值观究竟意味着什么便无从谈起。为此,我们提出一种超越人类范畴的通用意义理论,并运用该理论阐明LLMs作为意义代理体的精确本质。我们认为,LLMs凭借其作为意义代理体的地位,在概念层面已然把握人类社会建构(如道德、性别、种族)。因此,在某些伦理框架下,当前流行的模型对齐方法充其量效果有限,最坏情况下甚至适得其反。此外,未经对齐的模型可能有助于我们更好地发展道德与社会哲学。