大型语言模型的对齐性与可控性不足：来自大学入学申请文书的证据 (Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays)

People are increasingly using technologies equipped with large language models (LLM) to write texts for formal communication, which raises two important questions at the intersection of technology and society: Who do LLMs write like (model alignment); and can LLMs be prompted to change who they write like (model steerability). We investigate these questions in the high-stakes context of undergraduate admissions at a selective university by comparing lexical and sentence variation between essays written by 30,000 applicants to two types of LLM-generated essays: one prompted with only the essay question used by the human applicants; and another with additional demographic information about each applicant. We consistently find that both types of LLM-generated essays are linguistically distinct from human-authored essays, regardless of the specific model and analytical approach. Further, prompting a specific sociodemographic identity is remarkably ineffective in aligning the model with the linguistic patterns observed in human writing from this identity group. This holds along the key dimensions of sex, race, first-generation status, and geographic location. The demographically prompted and unprompted synthetic texts were also more similar to each other than to the human text, meaning that prompting did not alleviate homogenization. These issues of model alignment and steerability in current LLMs raise concerns about the use of LLMs in high-stakes contexts.

翻译：随着越来越多的人使用搭载大型语言模型（LLM）的技术来撰写正式沟通文本，这引发了技术与社会交叉领域的两个重要问题：LLM的写作风格趋近于谁（模型对齐性）；以及能否通过提示词引导LLM改变其拟似对象（模型可控性）。我们通过比较30,000名申请者撰写的文书与两类LLM生成文书（一类仅使用人类申请者所用的文书题目作为提示，另一类则额外添加每位申请者的人口统计信息）在词汇和句子层面的变异，在选拔性大学本科招生的高风险情境下对这些问题展开研究。我们一致发现，无论使用何种具体模型或分析方法，两类LLM生成文书在语言特征上都与人类撰写的文书存在显著差异。更重要的是，即使提示特定的社会人口身份信息，模型也极难与该身份群体在人类写作中观察到的语言模式实现对齐——这在性别、种族、第一代大学生身份和地理位置等关键维度上均成立。添加人口统计提示与未添加提示的合成文本之间的相似度，反而高于它们与人类文本的相似度，这意味着提示词并未缓解文本同质化问题。当前LLM在模型对齐性和可控性方面存在的这些缺陷，引发了对其在高风险情境中应用的担忧。