The ability to accurately align LLMs with population groups on subjective questions would have great value. In this work, we show that simple supervision can more consistently improve language model alignment with diverse population groups, as measured across three datasets spanning various topics. Beyond evaluating average alignment, we also report how alignment varies across specific groups. Our broad findings provide insights into the distributional alignment of LLMs with diverse populations. By conducting evaluation over many LLMs and prompting strategies, we provide a benchmark to stimulate future research.
翻译:使大语言模型在主观问题上与不同人群准确对齐的能力具有重要价值。本研究表明,简单的监督方法能够更稳定地提升语言模型与多样化人群的对齐效果,该结论在涵盖多个主题的三个数据集中均得到验证。除评估平均对齐度外,我们还报告了对齐效果在不同群体间的差异。我们的广泛研究结果为理解大语言模型与多样化人群的分布对齐特性提供了见解。通过对多种大语言模型及提示策略进行评估,我们建立了一个基准测试框架以推动未来研究。