Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.
翻译:通过在LLaMA-Factory中引入序列并行技术,我们在 https://github.com/Qihoo360/360-LLaMA-Factory 开源了360-LLaMA-Factory。360-LLaMA-Factory已获得广泛认可,并被应用于Light-R1 arXiv:2503.10460、TinyR1 arXiv:2503.04872、Kaggle AIMO数学竞赛模型以及多家大型公司的训练框架中。本技术报告深入探讨了360-LLaMA-Factory背后的不同序列并行模式,并讨论了我们的实现见解。