From Atoms to Chains: Divergence-Guided Reasoning Curriculum for Unlabeled LLM Domain Adaptation

Adapting Large Language Models (LLMs) to specialized domains without human-annotated data is a crucial yet formidable challenge. Widely adopted knowledge distillation methods often devolve into coarse-grained mimicry, where the student model inefficiently targets its own weaknesses and risks inheriting the teacher's reasoning flaws. This exposes a critical pedagogical dilemma: how to devise a reliable curriculum when the teacher itself is not an infallible expert. Our work resolves this by capitalizing on a key insight: while LLMs may exhibit fallibility in complex, holistic reasoning, they often exhibit high fidelity on focused, atomic sub-problems. Based on this, we propose Divergence-Guided Reasoning Curriculum (DGRC), which constructs a learning path from atomic knowledge to reasoning chains by dynamically deriving two complementary curricula from disagreements in reasoning pathways. When a student and teacher produce conflicting results, DGRC directs the teacher to perform a diagnostic analysis: it analyzes both reasoning paths to formulate atomic queries that target the specific points of divergence, and then self-answers these queries to create high-confidence atomic question-answer pairs. These pairs then serve a dual purpose: (1) providing an atomic curriculum to rectify the student's knowledge gaps, and (2) serving as factual criteria to filter the teacher's original reasoning chains, yielding a verified CoT curriculum that teaches the student how to integrate atomic knowledge into complete reasoning paths. Experiments across the medical and legal domains on student models of various sizes demonstrate the effectiveness of our DGRC framework. Notably, our method achieves a 7.76% relative improvement for the 1.5B student model in the medical domain over strong unlabeled baseline.

翻译：在无人工标注数据条件下将大语言模型（LLMs）适配至专业领域是一项至关重要却极具挑战性的任务。广泛采用的知识蒸馏方法常退化为粗粒度的模仿学习，学生模型低效地针对自身弱点进行优化，并可能继承教师模型的推理缺陷。这揭示了一个关键的教学困境：当教师本身并非绝对可靠的专家时，如何设计出可靠的课程体系？本研究通过把握一个核心洞见解决了该问题：尽管大语言模型在复杂整体推理中可能表现出错误，但在聚焦的原子子问题上往往具有高度保真性。基于此，我们提出基于分歧引导的推理课程（DGRC），该方法通过从推理路径的分歧中动态推导出两个互补课程，构建从原子知识到推理链条的学习路径。当学生模型与教师模型产生矛盾结果时，DGRC引导教师执行诊断分析：通过解析双方推理路径，构建针对具体分歧点的原子查询，并自主回答这些查询以生成高置信度的原子问答对。这些问答对具有双重作用：（1）作为原子课程以修正学生的知识缺口；（2）作为事实准则过滤教师原始推理链，从而产生经过验证的思维链课程，指导学生如何将原子知识整合为完整推理路径。在医疗和法律领域的多规模学生模型实验中，我们的DGRC框架均展现出显著效果。值得注意的是，在医疗领域，我们的方法使1.5B参数的学生模型相比强无标注基线实现了7.76%的相对性能提升。