In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers. To establish a shared vocabulary around how abstract concepts of alignment are operationalised in empirical datasets, we propose a framework that demarcates: 1) which dimensions of model behaviour are considered important, then 2) how meanings and definitions are ascribed to these dimensions, and by whom. We situate existing empirical literature and provide guidance on deciding which paradigm to follow. Through this framework, we aim to foster a culture of transparency and critical evaluation, aiding the community in navigating the complexities of aligning LLMs with human populations.
翻译:本文通过后结构主义社会政治理论的视角,探讨大语言模型(LLMs)中“对齐”概念,特别考察其与空能指的相似性。为建立关于抽象对齐概念如何在经验数据集中被操作化的共享词汇体系,我们提出一个框架,该框架划分:1)模型行为的哪些维度被视为重要,进而2)这些维度的意义和定义如何被赋予、由谁赋予。我们将现有经验文献置于该框架中,并就选择遵循何种范式提供指导。通过此框架,我们旨在培育透明与批判性评估的文化,助力社群应对使大语言模型与人类群体对齐的复杂性。