The Power of Test-Time Training for Approximate Sampling

Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been proposed to solve challenging reasoning problems. The efficacy of such sampling algorithms is limited, however, by the relationship between the LLM and the particular sampling task at hand, which has motivated the framework of test-time training (TTT). TTT works by updating a model's weights in response to partial generations and reward feedback received at inference time, thus adapting to the particular problem. In this work, we propose a formalization for TTT as the problem of producing a sample from a given probability measure $μ^\star$ belonging to a known class ${F}$ of distributions, given an oracle $\hat μ$ which yields approximate density estimates for $μ^\star$. This is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989): namely, when ${F}$ is the class of all distributions, it coincides exactly with the aforementioned counting-to-sampling reduction. In this paper, we first show a quadratic lower bound on the query complexity of sampling from $μ^\star$ given query access to $\hat μ$ (for sufficiently large classes ${F}$), thus showing that the random walk approach proposed by Jerrum & Sinclair (1989) and refined by Hayes & Sinclair (2010), is optimal. This answers an open question posed by Hayes & Sinclair. We then show that this lower bound can be circumvented if the size of ${F}$ is bounded appropriately. As we discuss, this latter result can be viewed as an abstraction of TTT, and thus represents a starting point for the development of a principled theoretical framework for TTT.

翻译：从复杂概率分布中高效采样是一个基础性问题，近年来随着生成式人工智能的兴起，这一问题变得愈发重要——为求解具有挑战性的推理问题，研究者已提出基于大语言模型的复杂采样程序。然而，此类采样算法的有效性受限于大语言模型与特定采样任务之间的关联性，这促使了测试时训练框架的诞生。测试时训练通过根据推理过程中接收的部分生成结果与奖励反馈来更新模型权重，从而适应具体问题。本文提出将测试时训练形式化为如下问题：给定能提供概率测度$μ^\star$近似密度估计的预言机$\hat μ$，从属于已知分布类${F}$的特定概率测度$μ^\star$中生成样本。该问题与Jerrum、Valiant与Vazirani（1986）及Jerrum与Sinclair（1989）开创性工作中研究的"将采样问题归约到近似计数问题"密切相关：具体而言，当${F}$为所有分布的集合时，该形式化恰好等同于上述计数到采样的归约。本文首先证明：在仅能通过查询方式访问$\hat μ$的条件下（针对足够大的分布类${F}$），从$μ^\star$中采样的查询复杂度存在二次下界，从而表明Jerrum与Sinclair（1989）提出并经Hayes与Sinclair（2010）改进的随机游走方法具有最优性。这一结果回答了Hayes与Sinclair提出的开放问题。随后我们证明：若适当限定${F}$的规模，该下界可被规避。正如文中所讨论，后者可视为测试时训练的抽象化表述，因此为建立测试时训练的原则性理论框架奠定了基础。