Fine tuning has been regarded as a de facto approach for adapting large language models (LLMs) to downstream tasks, but the high training memory consumption inherited from LLMs makes this process inefficient. Among existing memory efficient approaches, activation-related optimization has proven particularly effective, as activations consistently dominate overall memory consumption. Although prior arts offer various activation optimization strategies, their data-agnostic nature ultimately results in ineffective and unstable fine tuning. In this paper, we propose TokenSeek, a universal plugin solution for various transformer-based models through instance-aware token seeking and ditching, achieving significant fine-tuning memory savings (e.g., requiring only 14.8% of the memory on Llama3.2 1B) with on-par or even better performance. Furthermore, our interpretable token seeking process reveals the underlying reasons for its effectiveness, offering valuable insights for future research on token efficiency. Homepage: https://runjia.tech/iclr_tokenseek/
翻译:微调一直被视为将大语言模型(LLM)适配到下游任务的事实标准方法,但继承自LLM的高训练内存消耗使得这一过程效率低下。在现有的内存高效方法中,与激活相关的优化已被证明特别有效,因为激活始终主导着整体内存消耗。尽管现有技术提供了多种激活优化策略,但其数据无关的本质最终导致了低效且不稳定的微调。本文提出TokenSeek,一种通过实例感知的令牌寻访与丢弃实现的、适用于各类基于Transformer模型的通用插件方案,在保持相当甚至更优性能的同时,实现了显著的微调内存节省(例如,在Llama3.2 1B模型上仅需14.8%的内存)。此外,我们可解释的令牌寻访过程揭示了其有效性的内在原因,为未来关于令牌效率的研究提供了宝贵见解。项目主页:https://runjia.tech/iclr_tokenseek/