EDCO：面向领域特定大语言模型微调的动态课程编排 (EDCO: Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning)

Domain-specific large language models (LLMs), typically developed by fine-tuning a pre-trained general-purpose LLM on specialized datasets, represent a significant advancement in applied AI. A common strategy in LLM fine-tuning is curriculum learning, which pre-orders training samples based on metrics like difficulty to improve learning efficiency compared to a random sampling strategy. However, most existing methods for LLM fine-tuning rely on a static curriculum, designed prior to training, which lacks adaptability to the model's evolving needs during fine-tuning. To address this, we propose EDCO, a novel framework based on two key concepts: inference entropy and dynamic curriculum orchestration. Inspired by recent findings that maintaining high answer entropy benefits long-term reasoning gains, EDCO prioritizes samples with high inference entropy in a continuously adapted curriculum. EDCO integrates three core components: an efficient entropy estimator that uses prefix tokens to approximate full-sequence entropy, an entropy-based curriculum generator that selects data points with the highest inference entropy, and an LLM trainer that optimizes the model on the selected curriculum. Comprehensive experiments in communication, medicine and law domains, EDCO outperforms traditional curriculum strategies for fine-tuning Qwen3-4B and Llama3.2-3B models under supervised and reinforcement learning settings. Furthermore, the proposed efficient entropy estimation reduces computational time by 83.5% while maintaining high accuracy.

翻译：领域特定大语言模型（LLMs）通常通过在专业数据集上对预训练的通用大语言模型进行微调而开发，代表了应用人工智能领域的重大进展。大语言模型微调中的一个常见策略是课程学习，即根据难度等指标对训练样本进行预排序，相比随机采样策略，该方法能提高学习效率。然而，现有的大语言模型微调方法大多依赖于静态课程，此类课程在训练前设计，无法适应微调过程中模型不断变化的需求。为解决这一问题，我们提出了EDCO，一个基于两个关键概念的新框架：推理熵与动态课程编排。受近期研究发现——保持高答案熵有利于长期推理收益——的启发，EDCO在持续调整的课程中优先选择具有高推理熵的样本。EDCO集成了三个核心组件：一个使用前缀词元近似全序列熵的高效熵估计器，一个基于熵的课程生成器（用于选择具有最高推理熵的数据点），以及一个在选定课程上优化模型的大语言模型训练器。在通信、医学和法律领域的综合实验中，EDCO在监督学习和强化学习设置下，对Qwen3-4B和Llama3.2-3B模型进行微调时，其性能均优于传统课程策略。此外，所提出的高效熵估计方法在保持高精度的同时，将计算时间减少了83.5%。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

赋能大型语言模型多领域资源挑战

专知会员服务

10+阅读 · 2025年6月10日

不可错过！《大语言模型》课程

专知会员服务

28+阅读 · 2025年4月15日

面向统计学家的大型语言模型概述

专知会员服务

32+阅读 · 2025年3月16日

如何将领域知识注入大模型？最新《将领域特定知识注入大语言模型》综述

专知会员服务

79+阅读 · 2025年2月24日