Current Large Language Model reasoning systems process queries independently, discarding valuable cross-instance signals such as shared reasoning patterns and consistency constraints. We introduce Batch-of-Thought (BoT), a training-free method that processes related queries jointly to enable cross-instance learning. By performing comparative analysis across batches, BoT identifies high-quality reasoning templates, detects errors through consistency checks, and amortizes computational costs. We instantiate BoT within a multi-agent reflection architecture (BoT-R), where a Reflector performs joint evaluation to unlock mutual information gain unavailable in isolated processing. Experiments across three model families and six benchmarks demonstrate that BoT-R consistently improves accuracy and confidence calibration while reducing inference costs by up to 61%. Our theoretical and experimental analysis reveals when and why batch-aware reasoning benefits LLM systems.
翻译:当前大型语言模型推理系统独立处理查询,丢弃了宝贵的跨实例信号,如共享推理模式和一致性约束。本文提出批处理思维方法,这是一种无需训练的技术,通过联合处理相关查询实现跨实例学习。通过对批处理内容进行比较分析,该方法能够识别高质量推理模板、通过一致性检查检测错误,并分摊计算成本。我们在多智能体反思架构中实现了该方法,其中反思器执行联合评估以获取孤立处理无法获得的互信息增益。在三个模型系列和六个基准测试上的实验表明,该方法持续提升准确率和置信度校准,同时将推理成本降低最高达61%。我们的理论与实验分析揭示了批处理感知推理何时以及为何能使大型语言模型系统受益。