We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume fixed participation, real-world aggregation often benefits from allowing agents to say ``I don't know.'' We propose a probabilistic framework where agents engage in a \textit{calibration} phase, updating beliefs about their own fixed competence, before facing a final confidence gate that determines whether to vote or abstain. We derive a non-asymptotic lower bound on the group's success probability and prove that this \textit{selective participation} generalizes the asymptotic guarantees of the CJT to a sequential, confidence-gated setting. Empirically, we validate these bounds via Monte Carlo simulations. While our results are general, we discuss their potential application to AI safety, outlining how this framework can mitigate \textit{hallucinations} in collective LLM decision-making.
翻译:我们研究了异质性智能体在随时间学习评估自身可靠性并选择性放弃投票时的集体准确性。尽管经典的认知投票结果,如《孔多塞陪审团定理》(CJT),假设固定参与,但现实中的集体决策往往受益于允许智能体说“我不知道”。我们提出了一个概率框架,其中智能体经历一个“校准”阶段,更新关于自身固定能力的信念,然后面对一个最终置信门控,决定是投票还是放弃。我们推导了群体成功概率的非渐近下界,并证明了这种“选择性参与”将CJT的渐近保证推广到了序列性、置信门控的设置中。在实证上,我们通过蒙特卡洛模拟验证了这些界限。尽管我们的结果是通用的,但我们讨论了它们在人工智能安全中的潜在应用,概述了该框架如何减轻集体大语言模型决策中的“幻觉”。