Inference-Time-Compute (ITC) methods like Best-of-N and Tree-of-Thoughts are meant to produce output candidates that are both high-quality and diverse, but their use of high-temperature sampling often fails to achieve meaningful output diversity. Moreover, existing ITC methods offer limited control over how to perform reasoning, which in turn limits their explainability. We present STATe-of-Thoughts (STATe), an interpretable ITC method that searches over high-level reasoning patterns. STATe replaces stochastic sampling with discrete and interpretable textual interventions: a controller selects actions encoding high-level reasoning choices, a generator produces reasoning steps conditioned on those choices, and an evaluator scores candidates to guide search. This structured approach yields three main advantages. First, action-guided textual interventions produce greater response diversity than temperature-based sampling. Second, in a case study on argument generation, STATe's explicit action sequences capture interpretable features that are highly predictive of output quality. Third, estimating the association between performance and action choices allows us to identify promising yet unexplored regions of the action space and steer generation directly toward them. Together, these results establish STATe as a practical framework for generating high-quality, diverse, and interpretable text. Our framework is available at https://github.com/zbambergerNLP/state-of-thoughts.
翻译:诸如Best-of-N和思维树等推理时计算(ITC)方法旨在生成既高质量又多样化的输出候选,但它们对高温采样的使用往往无法实现有意义的输出多样性。此外,现有的ITC方法对如何执行推理提供的控制有限,这反过来又限制了它们的可解释性。我们提出了STATe-of-Thoughts(STATe),一种可解释的ITC方法,它在高层推理模式上进行搜索。STATe用离散且可解释的文本干预替代了随机采样:一个控制器选择编码高层推理决策的行动,一个生成器根据这些选择生成推理步骤,一个评估器对候选进行评分以引导搜索。这种结构化方法带来了三个主要优势。首先,行动引导的文本干预比基于温度的采样能产生更大的响应多样性。其次,在论证生成的案例研究中,STATe显式的行动序列捕获了可解释的特征,这些特征对输出质量具有高度预测性。第三,通过评估性能与行动选择之间的关联,我们能够识别行动空间中前景广阔但尚未探索的区域,并直接将生成过程引导至这些区域。综上所述,这些结果确立了STATe作为一个实用框架,用于生成高质量、多样化且可解释的文本。我们的框架可在 https://github.com/zbambergerNLP/state-of-thoughts 获取。