ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.
翻译:ChatGPT类模型彻底改变了人工智能领域的各类应用,从摘要生成、代码编写到翻译,其性能已可与人类媲美甚至超越人类。然而,当前缺乏一种可获取、高效且具成本效益的端到端RLHF(基于人类反馈的强化学习)训练流水线,尤其是针对数十亿参数规模的模型训练。本文介绍了DeepSpeed-Chat这一创新系统,它推动了RLHF训练的民主化,使其对人工智能社区更易获取。DeepSpeed-Chat提供三项关键能力:针对ChatGPT类模型的易用训练与推理体验;复现InstructGPT训练流程的DeepSpeed-RLHF流水线;以及将多种训练与推理优化技术统一整合的鲁棒DeepSpeed-RLHF系统。该系统实现了无与伦比的效率与可扩展性,能够以创纪录的时间和极低的成本训练千亿参数规模的模型。通过这一突破,DeepSpeed-Chat为更广泛的先进RLHF训练铺平了道路,即使资源有限的数据科学家也能参与其中,从而推动人工智能领域的创新与进一步发展。