Large language models (LLMs) are highly vulnerable to input confirmation bias. When a prompt implies a preferred answer, models often reinforce that bias rather than explore alternatives. This phenomenon remains underexplored, yet it is already harmful in base models and poses an even greater risk in multi-agent debate, where echo chambers reinforce bias instead of correction. We introduce Mixture of Latent Concept Experts (MoLaCE), a lightweight inference-time framework that addresses confirmation bias by mixing experts instantiated as different activation strengths over latent concepts that shape model responses. Our key insight is that, due to the compositional nature of language, differently phrased prompts reweight latent concepts in prompt-specific ways that affect factual correctness, so no single fixed intervention can be applied universally across inputs. This design enables a single LLM to emulate the benefits of debate internally while remaining computationally efficient and scalable. It can also be integrated into multi-agent debate frameworks to diversify perspectives and reduce correlated errors. We empirically show that it consistently reduces confirmation bias, improves robustness, and matches or surpasses multi-agent debate while requiring only a fraction of the computation.
翻译:大型语言模型(LLM)极易受到输入确认偏误的影响。当提示暗示某个倾向性答案时,模型往往会强化这种偏误,而非探索其他可能性。这一现象尚未得到充分研究,但已在基础模型中造成危害,并在多智能体辩论中构成更大风险——此类场景中的回声室效应会加剧而非纠正偏误。我们提出潜在概念专家混合机制(MoLaCE),这是一种轻量级的推理时框架,通过混合以不同激活强度实例化潜在概念的专家来应对确认偏误,这些潜在概念共同塑造模型响应。我们的核心洞见是:由于语言的组合特性,不同表述的提示会以特定方式重新加权影响事实正确性的潜在概念,因此无法对所有输入通用单一固定干预。该设计使单一LLM能够在保持计算高效性和可扩展性的同时,内部模拟辩论的优势。该框架亦可集成至多智能体辩论框架中以丰富视角并降低关联错误。实证研究表明,该方法能持续降低确认偏误、提升鲁棒性,在仅需少量计算开销的情况下达到或超越多智能体辩论的效果。