Instruction tuning is essential for aligning large language models (LLMs) to downstream tasks and commonly relies on large, diverse corpora. However, small, high-quality subsets, known as coresets, can deliver comparable or superior results, though curating them remains challenging. Existing methods often rely on coarse, sample-level signals like gradients, an approach that is computationally expensive and overlooks fine-grained features. To address this, we introduce TRIM (Token Relevance via Interpretable Multi-layer Attention), a forward-only, token-centric framework. Instead of using gradients, TRIM operates by matching underlying representational patterns identified via attention-based "fingerprints" from a handful of target samples. Such an approach makes TRIM highly efficient and uniquely sensitive to the structural features that define a task. Coresets selected by our method consistently outperform state-of-the-art baselines by up to 9% on downstream tasks and even surpass the performance of full-data fine-tuning in some settings. By avoiding expensive backward passes, TRIM achieves this at a fraction of the computational cost. These findings establish TRIM as a scalable and efficient alternative for building high-quality instruction-tuning datasets.
翻译:指令微调对于将大型语言模型(LLM)与下游任务对齐至关重要,通常依赖于大规模多样化语料库。然而,小而高质量的子集(即核心集)能够提供相当甚至更优的结果,但其构建仍具挑战性。现有方法常依赖粗粒度的样本级信号(如梯度),这类方法计算成本高昂且忽略了细粒度特征。为此,我们提出TRIM(基于可解释多层注意力的词元相关性),一种仅需前向传播、以词元为中心的框架。TRIM通过匹配基于少量目标样本注意力“指纹”识别的底层表征模式来运作,而非使用梯度。该方法使TRIM具有极高效率,并对定义任务的结构特征具备独特敏感性。通过本方法选取的核心集在下游任务中持续优于现有先进基线达9%,在某些设定下甚至超越全数据微调的性能。通过避免昂贵的反向传播计算,TRIM仅需极低计算成本即可实现上述效果。这些发现确立了TRIM作为构建高质量指令微调数据集的可扩展高效替代方案。