Reinforcement learning has emerged as a promising paradigm for aligning diffusion and flow-matching models with human preferences, yet practitioners face fragmented codebases, model-specific implementations, and engineering complexity. We introduce Flow-Factory, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture. This design enables seamless integration of new algorithms and architectures, as demonstrated by our support for GRPO, DiffusionNFT, and AWM across Flux, Qwen-Image, and WAN video models. By minimizing implementation overhead, Flow-Factory empowers researchers to rapidly prototype and scale future innovations with ease. Flow-Factory provides production-ready memory optimization, flexible multi-reward training, and seamless distributed training support. The codebase is available at https://github.com/X-GenGroup/Flow-Factory.
翻译:强化学习已成为将扩散模型和流匹配模型与人类偏好对齐的一种前景广阔的方法,然而实践者面临着代码库碎片化、模型特定实现以及工程复杂性等挑战。我们提出了Flow-Factory,这是一个通过模块化、基于注册表的架构将算法、模型和奖励解耦的统一框架。该设计实现了新算法与架构的无缝集成,这体现在我们对GRPO、DiffusionNFT和AWM算法在Flux、Qwen-Image和WAN视频模型上的支持中。通过最小化实现开销,Flow-Factory使研究人员能够轻松地快速原型化和扩展未来的创新。Flow-Factory提供了生产就绪的内存优化、灵活的多奖励训练以及无缝的分布式训练支持。代码库发布于 https://github.com/X-GenGroup/Flow-Factory。