As LLMs expand from assistance to decision support, a dangerous pattern emerges: fluent agreement without calibrated judgment. Low-friction assistants can become sycophantic, baking in implicit assumptions and pushing verification costs onto experts, while outcomes arrive too late to serve as reward signals. In deep-uncertainty decisions (where objectives are contested and reversals are costly), scaling fluent agreement amplifies poor commitments faster than it builds expertise. We argue reliable human-AI partnership requires a shift from answer generation to collaborative premise governance over a knowledge substrate, negotiating only what is decision-critical. A discrepancy-driven control loop operates over this substrate: detecting conflicts, localizing misalignment via typed discrepancies (teleological, epistemic, procedural), and triggering bounded negotiation through decision slices. Commitment gating blocks action on uncommitted load-bearing premises unless overridden under logged risk; value-gated challenge allocates probing under interaction cost. Trust then attaches to auditable premises and evidence standards, not conversational fluency. We illustrate with tutoring and propose falsifiable evaluation criteria.
翻译:随着大型语言模型从辅助工具扩展到决策支持,一种危险模式逐渐显现:流利的赞同缺乏校准判断。低摩擦的助手可能变得阿谀奉承,固化隐含假设并将验证成本转嫁给专家,而结果反馈过迟无法作为奖励信号。在深度不确定性决策中(目标存在争议且逆转成本高昂),扩大流利赞同放大错误承诺的速度远快于专业知识积累。我们认为可靠的人机协作需要从答案生成转向基于知识基底的协作式前提治理,仅就决策关键要素进行协商。差异驱动的控制环路在此基底上运行:检测冲突,通过类型化差异(目的论差异、认知差异、程序差异)定位错位,并通过决策切片触发有界协商。承诺门控机制阻止对未达成共识的承重前提采取行动,除非在记录风险下被覆盖;价值门控质询根据交互成本分配探查资源。信任由此附着于可审计的前提与证据标准,而非对话流畅度。我们通过教学案例进行阐释,并提出可证伪的评估标准。