Self-consistency (SC), leveraging multiple samples from LLMs, shows significant gains on various reasoning tasks but struggles with free-form generation due to the difficulty of aggregating answers. Its variants, UCS and USC, rely on sample selection or voting mechanisms to improve output quality. These methods, however, face limitations due to their inability to fully utilize the nuanced consensus knowledge present within multiple candidate samples, often resulting in suboptimal outputs. We propose Fine-Grained Self-Consistency (FSC) to addresses these limitations by extracting and integrating segment-level commonalities from candidate samples, enhancing the performance of LLMs both in open-ended and reasoning tasks. Based on this, we present two additional strategies: candidate filtering, which enhances overall quality by identifying highly similar candidate sets, and merging, which reduces input token requirements by combining similar samples. The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning, using GPT-3.5-turbo and GPT-4. The results indicate significant improvements over baseline methods, showcasing the potential of FSC to optimize output quality by effectively synthesizing fine-grained consensus knowledge from multiple samples.
翻译:自一致性(Self-Consistency, SC)通过利用大语言模型(LLMs)生成的多个样本,在各种推理任务上展现出显著优势,但由于答案聚合困难,其在自由形式生成任务中表现不佳。其变体UCS和USC依赖于样本选择或投票机制来提升输出质量。然而,这些方法由于无法充分利用多个候选样本中存在的细微共识知识,往往导致输出结果欠佳。我们提出细粒度自一致性(Fine-Grained Self-Consistency, FSC)方法以解决这些局限,该方法通过提取并整合候选样本中片段级别的共性,提升LLMs在开放式任务和推理任务上的性能。基于此,我们提出了两种补充策略:候选过滤(通过识别高度相似的候选集来提升整体质量)与样本合并(通过合并相似样本来减少输入令牌需求)。FSC的有效性通过在摘要、代码生成和数学推理等多种任务上,使用GPT-3.5-turbo和GPT-4进行的大量实验得到验证。结果表明,相较于基线方法,FSC取得了显著提升,展示了其通过有效综合来自多个样本的细粒度共识知识来优化输出质量的潜力。