When language models answer open-ended problems, they implicitly make hidden decisions that shape their outputs, leaving users with uncontextualized answers rather than a working map of the problem; drawing on multiverse analysis from statistics, we build and evaluate the conceptual multiverse, an interactive system that represents conceptual decisions such as how to frame a question or what to value as a space users can transparently inspect, intervenably change, and check against principled domain reasoning; for this structure to be worth navigating rather than misleading, it must be rigorous and checkable against domain reasoning norms, so we develop a general verification framework that enforces properties of good decision structures like unambiguity and completeness calibrated by expert-level reasoning; across three domains, the conceptual multiverse helped participants develop a working map of the problem, with philosophy students rewriting essays with sharper framings and reversed theses, alignment annotators moving from surface preferences to reasoning about user intent and harm, and poets identifying compositional patterns that clarified their taste.
翻译:当语言模型回答开放性问题时,它们会隐性地做出影响其输出结果的隐藏决策,从而向用户提供缺乏上下文的答案,而非问题的可操作解析图;借鉴统计学中的多样宇宙分析,我们构建并评估了概念性多样宇宙这一交互系统,该体系将诸如如何构建问题框架或赋予何种价值等概念性决策,呈现为一种用户可透明审查、主动干预修改、并与原则性领域推理相校验的空间;为了确保此类结构具备可导航性而非误导性,其必须严格遵循领域推理规范并具备可验证性,因此我们开发了一套通用验证框架,用于强制执行良好决策结构的特性,例如由专家级推理校准的无歧义性与完备性;在三个领域中,概念性多样宇宙帮助参与者形成了问题的可操作解析图:哲学专业学生以更精准的框架和反转的论点重写论文,对齐标注员从表面偏好转向对用户意图与伤害的推理,诗人则识别出能阐明自身品味的创作模式。