Large language models often generate homogeneous outputs, but whether this is problematic depends on the specific task. For objective math tasks, responses may vary in terms of problem-solving strategy but should maintain the same verifiable answer. Whereas, for creative writing tasks, we often expect variation in key narrative components (e.g. plot, setting, etc.) beyond mere vocabulary diversity. Prior work on homogenization rarely conceptualizes diversity in a task-dependent way. We address this gap with four contributions: (1) a task taxonomy with distinct notions of functional diversity -- whether a user would perceive two responses as meaningfully different for a given task; (2) a small user study validating that the taxonomy aligns with human perception of functional diversity; (3) a task-dependent sampling technique that increases diversity only where homogenization is undesired; (4) evidence challenging the perceived diversity-quality trade-off, showing it may stem from mis-conceptualizing both diversity and quality in a task-agnostic way.
翻译:大语言模型常生成同质化输出,但其是否存在问题取决于特定任务。对于客观数学任务,响应可能在解题策略上有所差异,但应保持相同的可验证答案。而在创意写作任务中,我们通常期望关键叙事元素(如情节、背景等)具有差异,而不仅仅是词汇多样性。以往关于同质化的研究很少以任务相关的方式概念化多样性。我们通过四项贡献弥补这一空白:(1)提出具有不同功能多样性概念的任务分类体系——即用户是否认为针对给定任务的两个响应具有实质性差异;(2)通过小型用户研究验证该分类体系与人类对功能多样性的感知一致;(3)开发任务相关的采样技术,仅在非期望的同质化区域增加多样性;(4)提供证据挑战感知到的多样性-质量权衡,表明该权衡可能源于以任务无关的方式错误概念化多样性与质量。