Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is fast: selection and finetuning both happen per query, making each a direct bottleneck. Existing methods trade speed for quality: fast retrieval is often redundant, while stronger diversity-aware selection adds prohibitive per-query cost. We introduce HullFT, a geometric approach to TTFT that addresses both bottlenecks. Given a query, HullFT first represents the query embedding as a sparse convex combination of few training sequences, using efficient projection-free Frank-Wolfe optimization. This yields a support set that is inherently relevant and diverse. We then convert the fractional convex weights into an exact integer multiset for finetuning through a geometric integerization procedure. The resulting multiplicities naturally create repeated examples, which we exploit with Gradient Reuse to amortize forward-backward computation across repeated finetuning steps. Our experiments show that HullFT improves the quality-efficiency tradeoff over current state-of-the-art TTFT methods, achieving lower bits-per-byte at substantially lower total runtime.
翻译:测试时微调(TTFT)是一种快速发展的范式,通过检索相关序列、在其上更新模型并评估查询,使语言模型适应每个查询。然而,TTFT仅在其运行速度足够快时才具有实用性:选择与微调均需针对每个查询执行,因此每个环节都成为直接瓶颈。现有方法以速度换取质量:快速检索往往冗余,而更强的多样性感知选择则增加每个查询的过高成本。我们提出HullFT,一种解决上述两个瓶颈的几何方法。给定查询,HullFT首先利用高效的免投影Frank-Wolfe优化,将查询嵌入表示为少量训练序列的稀疏凸组合,得到天然具有相关性和多样性的支持集。随后通过几何整数化过程,将分数凸权重转换为用于微调的精确整数多重集。由此产生的多重性自然形成重复样本,我们利用梯度复用在重复微调步骤中分摊前向-反向计算。实验表明,HullFT在质量-效率权衡上优于当前最先进的TTFT方法,能在显著降低总运行时间的同时获得更低的每字节比特数。