In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.
翻译:本研究介绍了Janus-Pro,它是先前工作Janus的进阶版本。具体而言,Janus-Pro整合了(1)优化的训练策略,(2)扩展的训练数据,以及(3)更大规模的模型扩展。通过这些改进,Janus-Pro在多模态理解和文本到图像指令跟随能力方面均取得了显著进展,同时增强了文本到图像生成的稳定性。我们希望这项工作能激发该领域的进一步探索。代码和模型已公开提供。