In this paper, we introduce DiversiGATE, a unified framework that consolidates diverse methodologies for LLM verification. The proposed framework comprises two main components: Diversification and Aggregation which provide a holistic perspective on existing verification approaches, such as Self-Consistency, Math Prompter and WebGPT. Furthermore, we propose a novel `SelfLearner' model that conforms to the DiversiGATE framework which can learn from its own outputs and refine its performance over time, leading to improved accuracy. To evaluate the effectiveness of SelfLearner, we conducted a rigorous series of experiments, including tests on synthetic data as well as on popular arithmetic reasoning benchmarks such as GSM8K. Our results demonstrate that our approach outperforms traditional LLMs, achieving a considerable 54.8% -> 61.8% improvement on the GSM8K benchmark.
翻译:本文提出DiversiGATE,一个整合多种大语言模型验证方法的统一框架。该框架包含两个核心组件:多样化(Diversification)与聚合(Aggregation),为现有验证方法(如Self-Consistency、Math Prompter和WebGPT)提供全局视角。此外,我们提出符合DiversiGATE框架的新型"SelfLearner"模型,该模型能够从其自身输出中学习并持续优化性能,从而提升准确性。为评估SelfLearner的有效性,我们开展了一系列严格实验,包括对合成数据的测试以及对GSM8K等主流算术推理基准的验证。实验结果表明,我们的方法优于传统大语言模型,在GSM8K基准上实现了从54.8%到61.8%的显著性能提升。