In this paper, we introduce DiversiGATE, a unified framework that consolidates diverse methodologies for LLM verification. The proposed framework comprises two main components: Diversification and Aggregation which provide a holistic perspective on existing verification approaches, such as Self-Consistency, Math Prompter and WebGPT. Furthermore, we propose a novel `SelfLearner' model that conforms to the DiversiGATE framework which can learn from its own outputs and refine its performance over time, leading to improved accuracy. To evaluate the effectiveness of SelfLearner, we conducted a rigorous series of experiments, including tests on synthetic data as well as on popular arithmetic reasoning benchmarks such as GSM8K. Our results demonstrate that our approach outperforms traditional LLMs, achieving a considerable 54.8% -> 61.8% improvement on the GSM8K benchmark.
翻译:本文提出DiversiGATE这一统一框架,整合了多种大语言模型验证方法。该框架包含两大核心组件——多样化和集成,为现有验证方法(如自一致性、数学提示器及WebGPT)提供了全景式视角。我们进一步提出符合DiversiGATE框架的新型“自学习器”模型,该模型能够从自身输出中学习并逐步优化性能,从而提升精度。为评估自学习器的有效性,我们开展了一系列严格实验,涵盖合成数据测试与GSM8K等主流算术推理基准。结果表明,该方法显著优于传统大语言模型,在GSM8K基准上实现了从54.8%至61.8%的大幅提升。