General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across three small open models ($\approx 3B$) we split the training data into complexity categories by a single token answer entropy (ROC AUC $0.73$), fine-tune large language models (LLMs) via SFT and distillation, and show that our pipeline significantly outperforms the standard SFT approach ($0.58$ vs $0.45$ average accuracy) and outperforms the distillation approach ($0.58$ vs $0.56$ average accuracy) while using $81\%$ less data.
翻译:通用大型语言模型(LLMs)常通过监督微调(SFT)进行领域特定性能优化。通过蒸馏更大模型的思维链可获得更优结果,但需付出大量昂贵调用和海量数据成本。本文提出一种高效微调新方案,仅对基于熵值识别的复杂数据使用推理过程。具体而言,我们在三个小型开源模型(约30亿参数)上,依据单标记答案熵(ROC AUC 0.73)将训练数据划分为复杂度类别,通过SFT与蒸馏技术微调大型语言模型,实验表明:该流程在平均准确率上显著超越标准SFT方法(0.58对比0.45),同时优于蒸馏方法(0.58对比0.56),且数据使用量减少81%。