Collecting diverse human data on subjective NLP topics is costly and challenging. As Large Language Models (LLMs) have developed human-like capabilities, there is a recent trend in collaborative efforts between humans and LLMs for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate LLMs' capacity for generating diverse perspectives and rationales on subjective topics, such as social norms and argumentative texts. We formulate this problem as diversity extraction in LLMs and propose a criteria-based prompting technique to ground diverse opinions and measure perspective diversity from the generated criteria words. Our results show that measuring semantic diversity through sentence embeddings and distance metrics is not enough to measure perspective diversity. To see how far we can extract diverse perspectives from LLMs, or called diversity coverage, we employ a step-by-step recall prompting for generating more outputs from the model in an iterative manner. As we apply our prompting method to other tasks (hate speech labeling and story continuation), indeed we find that LLMs are able to generate diverse opinions according to the degree of task subjectivity.
翻译:收集关于主观性自然语言处理(NLP)话题的多样化人类数据既昂贵又具挑战性。随着大语言模型展现出类人能力,近年来人机协作生成多样化数据成为趋势,提供了可扩展且高效的潜在解决方案。然而,大语言模型在主观性话题上生成多样化视角的能力仍是一个未探索的问题。本研究探究了大语言模型在社会规范、论辩文本等主观性话题上生成多样化视角与理据的能力。我们将此问题形式化为大语言模型中的多样性提取,并提出一种基于准则的提示技术来锚定多样化观点,并通过生成的准则词衡量视角多样性。结果表明,仅通过句子嵌入与距离度量来衡量语义多样性不足以捕捉视角多样性。为探究从大语言模型中提取多样化视角的极限(即多样性覆盖率),我们采用逐步召回提示方法,以迭代方式引导模型生成更多输出。将该方法应用于其他任务(仇恨言论标注与故事续写)时,我们确实发现大语言模型能够根据任务主观性程度生成多样化观点。