Research on Large Language Models (LLMs) studies output variation across generation, reasoning, alignment, and representational analysis, often under the umbrella of "diversity." Yet the terminology remains fragmented, largely because the normative objectives underlying tasks are rarely made explicit. We introduce the Magic, Madness, Heaven, Sin framework, which models output variation along a homogeneity-heterogeneity axis, where valuation is determined by the task and its normative objective. We organize tasks into four normative contexts: epistemic (factuality), interactional (user utility), societal (representation), and safety (robustness). For each, we examine the failure modes and vocabulary such as hallucination, mode collapse, bias, and erasure through which variation is studied. We apply the framework to analyze all pairwise cross-contextual interactions, revealing that optimizing for one objective, such as improving safety, can inadvertently harm demographic representation or creative diversity. We argue for context-aware evaluation of output variation, reframing it as a property shaped by task objectives rather than a model's intrinsic trait.
翻译:针对大型语言模型(LLM)的研究,通常将输出在生成、推理、对齐及表征分析中的变异统称为“多样性”。然而,由于任务背后的规范性目标极少被明确阐述,相关术语体系仍显碎片化。本文提出“魔法、疯狂、天堂、原罪”框架,沿同质-异质轴对输出变异进行建模,其中价值评判取决于任务及其规范性目标。我们将任务划分为四类规范性情境:认知性(事实性)、交互性(用户效用)、社会性(表征)与安全性(鲁棒性)。针对每种情境,我们剖析了研究变异时采用的失败模式与术语体系,如幻觉、模式坍缩、偏见和擦除。通过该框架分析所有两两跨情境交互作用,我们发现优化单一目标(如提升安全性)可能无意中损害人口表征或创造性多样性。我们主张对输出变异进行情境感知评估,将其重塑为受任务目标塑造的属性,而非模型的内在特质。