Domain-adaptive post-training of large language models (LLMs) has emerged as a promising approach for specialized domains such as medicine and finance. However, significant challenges remain in identifying optimal adaptation criteria and training strategies across varying data and model configurations. To address these challenges, we introduce FINDAP, a systematic and fine-grained investigation into domain-adaptive post-training of LLMs for the finance domain. Our approach consists of four key components: FinCap, which defines the core capabilities required for the target domain; FinRec, an effective training recipe that jointly optimizes continual pre-training and instruction-following, along with a novel preference data distillation method leveraging process signals from a generative reward model; FinTrain, a curated set of training datasets supporting FinRec; and FinEval, a comprehensive evaluation suite aligned with FinCap. The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks. Our analysis also highlights how each post-training stage contributes to distinct capabilities, uncovering specific challenges and effective solutions, providing valuable insights for domain adaptation of LLMs
翻译:大型语言模型(LLM)的领域自适应后训练已成为医学、金融等专业领域的一种前景广阔的方法。然而,在不同数据和模型配置下,如何确定最优的适应标准和训练策略仍面临重大挑战。为应对这些挑战,我们提出了FINDAP,一项针对金融领域LLM领域自适应后训练的系统性、细粒度研究。我们的方法包含四个关键组成部分:FinCap,定义了目标领域所需的核心能力;FinRec,一种联合优化持续预训练与指令跟随的有效训练方案,以及一种利用生成式奖励模型过程信号的新型偏好数据蒸馏方法;FinTrain,一套支持FinRec的精选训练数据集;以及FinEval,一个与FinCap对齐的综合性评估套件。由此得到的模型Llama-Fin,在广泛的金融任务中实现了最先进的性能。我们的分析还揭示了每个后训练阶段如何贡献于不同的能力,发现了具体的挑战和有效的解决方案,为LLM的领域适应提供了宝贵的见解。