Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using hundreds of GPUs for training. NeMo-Aligner comes with highly optimized and scalable implementations for major paradigms of model alignment such as: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM, and Self-Play Fine-Tuning (SPIN). Additionally, our toolkit supports running most of the alignment techniques in a Parameter Efficient Fine-Tuning (PEFT) setting. NeMo-Aligner is designed for extensibility, allowing support for other alignment techniques with minimal effort. It is open-sourced with Apache 2.0 License and we invite community contributions at https://github.com/NVIDIA/NeMo-Aligner
翻译:将大型语言模型(LLMs)与人类价值观和偏好对齐,是确保其具有实用性及安全性的关键。然而,构建高效对齐工具具有挑战性,尤其是对于包含数百亿甚至数千亿参数的超大规模高性能语言模型。我们创建了 NeMo-Aligner——一个可扩展至数百块GPU进行训练的高效模型对齐工具包。该工具包提供了主流对齐范式的高度优化与可扩展实现,包括:基于人类反馈的强化学习(RLHF)、直接偏好优化(DPO)、SteerLM以及自我博弈微调(SPIN)。此外,我们的工具包支持在参数高效微调(PEFT)框架下运行大多数对齐技术。NeMo-Aligner 具备高度可扩展性,能以极低成本支持其他对齐技术。该工具包基于Apache 2.0许可证开源,诚邀社区贡献:https://github.com/NVIDIA/NeMo-Aligner