How diverse are the outputs of large language models when diversity is desired? We examine the diversity of responses of various models to questions with multiple possible answers, comparing them with human responses. Our findings suggest that models' outputs are highly concentrated, reflecting a narrow, mainstream 'worldview', in comparison to humans, whose responses exhibit a much longer-tail. We examine three ways to increase models' output diversity: 1) increasing generation randomness via temperature sampling; 2) prompting models to answer from diverse perspectives; 3) aggregating outputs from several models. A combination of these measures significantly increases models' output diversity, reaching that of humans. We discuss implications of these findings for AI policy that wishes to preserve cultural diversity, an essential building block of a democratic social fabric.
翻译:当需要多样性时,大型语言模型的输出究竟有多多样化?我们考察了多种模型对存在多个可能答案的问题所给出的响应多样性,并将其与人类响应进行比较。我们的研究结果表明,与人类响应展现出更长的长尾分布相比,模型的输出高度集中,反映了一种狭隘、主流的“世界观”。我们研究了三种提升模型输出多样性的方法:1)通过温度采样增加生成随机性;2)提示模型从不同视角进行回答;3)聚合多个模型的输出。这些措施的组合能显著提升模型的输出多样性,使其达到与人类相当的水平。我们讨论了这些发现对希望保护文化多样性(民主社会结构的重要基石)的人工智能政策的影响。