Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach named parameter ReaLlocation, which dynamically redistributes LLM parameters in the cluster and adapts parallelization strategies during training. Building upon this idea, we introduce ReaLHF, a pioneering system capable of automatically discovering and running efficient execution plans for RLHF training given the desired algorithmic and hardware configurations. ReaLHF formulates the execution plan for RLHF as an augmented dataflow graph. Based on this formulation, ReaLHF employs a tailored search algorithm with a lightweight cost estimator to discover an efficient execution plan. Subsequently, the runtime engine deploys the selected plan by effectively parallelizing computations and redistributing parameters. We evaluate ReaLHF on the LLaMA-2 models with up to $4\times70$ billion parameters and 128 GPUs. The experiment results showcase ReaLHF's substantial speedups of $2.0-10.6\times$ compared to baselines. Furthermore, the execution plans generated by ReaLHF exhibit an average of $26\%$ performance improvement over heuristic approaches based on Megatron-LM. The source code of ReaLHF is publicly available at https://github.com/openpsi-project/ReaLHF .
翻译:基于人类反馈的强化学习(RLHF)是赋能大语言模型(LLM)应用的关键技术。由于RLHF涉及多样化的计算负载以及多个LLM之间复杂的依赖关系,直接采用监督训练中的并行化技术可能导致次优性能。为克服这一局限,我们提出了一种名为参数重分配的新方法,该方法在训练过程中动态地在集群中重新分配LLM参数并调整并行化策略。基于这一思想,我们推出了ReaLHF——一个开创性的系统,能够在给定期望算法与硬件配置的情况下,自动发现并运行高效的RLHF训练执行计划。ReaLHF将RLHF的执行计划形式化为一个增强的数据流图。基于此形式化表达,ReaLHF采用定制化的搜索算法配合轻量级成本估算器来发现高效执行计划。随后,运行时引擎通过有效并行化计算与重新分配参数来部署所选计划。我们在参数量高达$4\times70$B的LLaMA-2模型及128个GPU上评估了ReaLHF。实验结果表明,与基线方法相比,ReaLHF实现了$2.0-10.6\times$的显著加速。此外,ReaLHF生成的执行计划相比基于Megatron-LM的启发式方法平均带来$26\%$的性能提升。ReaLHF的源代码已公开于https://github.com/openpsi-project/ReaLHF。