Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is fast: selection and finetuning both happen per query, making each a direct bottleneck. Existing methods trade speed for quality: fast retrieval is often redundant, while stronger diversity-aware selection adds prohibitive per-query cost. We introduce HullFT, a geometric approach to TTFT that addresses both bottlenecks. Given a query, HullFT first represents the query embedding as a sparse convex combination of few training sequences, using efficient projection-free Frank-Wolfe optimization. This yields a support set that is inherently relevant and diverse. We then convert the fractional convex weights into an exact integer multiset for finetuning through a geometric integerization procedure. The resulting multiplicities naturally create repeated examples, which we exploit with Gradient Reuse to amortize forward-backward computation across repeated finetuning steps. Our experiments show that HullFT improves the quality-efficiency tradeoff over current state-of-the-art TTFT methods, achieving lower bits-per-byte at substantially lower total runtime.

翻译：测试时微调（TTFT）是一种快速发展的范式，通过检索相关序列、在其上更新模型并评估查询，使语言模型适应每个查询。然而，TTFT仅在其运行速度足够快时才具有实用性：选择与微调均需针对每个查询执行，因此每个环节都成为直接瓶颈。现有方法以速度换取质量：快速检索往往冗余，而更强的多样性感知选择则增加每个查询的过高成本。我们提出HullFT，一种解决上述两个瓶颈的几何方法。给定查询，HullFT首先利用高效的免投影Frank-Wolfe优化，将查询嵌入表示为少量训练序列的稀疏凸组合，得到天然具有相关性和多样性的支持集。随后通过几何整数化过程，将分数凸权重转换为用于微调的精确整数多重集。由此产生的多重性自然形成重复样本，我们利用梯度复用在重复微调步骤中分摊前向-反向计算。实验表明，HullFT在质量-效率权衡上优于当前最先进的TTFT方法，能在显著降低总运行时间的同时获得更低的每字节比特数。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

29+阅读 · 4月6日

重新审视测试时扩展：一项综述与面向多样性的高效推理方法

专知会员服务

10+阅读 · 2025年6月8日

PEFT A2Z：大型语言与视觉模型的参数高效微调综述

专知会员服务

22+阅读 · 2025年4月22日

是什么、如何、何处，以及效果如何？——大语言模型测试时扩展的调研

专知会员服务

26+阅读 · 2025年4月1日