Selecting a single high-quality output from multiple stochastic generations remains a fundamental challenge for large language models (LLMs), particularly in open-ended tasks where no canonical answer exists. While Best-of-N and self-consistency methods show that aggregating multiple generations can improve performance, existing approaches typically rely on external evaluators, reward models, or exact string-match voting, limiting their applicability and efficiency. We propose Mode Extraction (ModeX), an evaluator-free Best-of-N selection framework that generalizes majority voting to open-ended text generation by identifying the modal output representing the dominant semantic consensus among generated texts. ModeX constructs a similarity graph over candidate generations and recursively applies spectral clustering to select a representative centroid, without requiring additional inference or auxiliary models. We further instantiate this selection principle as ModeX-Lite, an improved version of ModeX with early pruning for efficiency. Across open-ended tasks -- including text summarization, code generation, and mathematical reasoning -- our approaches consistently outperform standard single- and multi-path baselines, providing a computationally efficient solution for robust open-ended text generation. Code is released in https://github.com/deeplearning-wisc/ModeX.
翻译:从多次随机生成中选出一个高质量输出仍是大型语言模型(LLMs)面临的根本性挑战,尤其在无标准答案的开放式任务中。虽然Best-of-N和自一致性方法表明聚合多次生成能提升性能,但现有方法通常依赖外部评估器、奖励模型或精确字符串匹配投票,限制了其适用性和效率。我们提出模式提取(ModeX),一种无评估器的Best-of-N选择框架,通过识别代表生成文本中主导语义共识的众数输出,将多数投票推广至开放式文本生成。ModeX在候选生成结果上构建相似度图,并递归应用谱聚类选择代表性中心点,无需额外推理或辅助模型。我们进一步将该选择原则实例化为ModeX-Lite——通过早期剪枝提升效率的改进版ModeX。在文本摘要、代码生成和数学推理等开放式任务中,我们的方法持续优于标准单路径与多路径基线,为鲁棒的开放式文本生成提供了计算高效方案。代码已发布于 https://github.com/deeplearning-wisc/ModeX。