We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.
翻译:我们介绍了LeX-Art,这是一个高质量文生图综合套件,它系统性地弥合了提示词表达能力与文本渲染保真度之间的差距。我们的方法遵循以数据为中心的范式,基于Deepseek-R1构建了一个高质量数据合成流程,以精心策划LeX-10K数据集,该数据集包含一万张高分辨率、美学精炼的1024×1024图像。除了数据集构建,我们还开发了LeX-Enhancer,一个强大的提示词增强模型,并训练了两个文生图模型LeX-FLUX和LeX-Lumina,实现了最先进的文本渲染性能。为了系统评估视觉文本生成,我们引入了LeX-Bench,这是一个评估保真度、美学质量和对齐度的基准,并辅以成对归一化编辑距离(PNED),这是一种用于鲁棒文本准确性评估的新颖度量标准。实验证明了显著的改进,其中LeX-Lumina在CreateBench上实现了79.81%的PNED增益,而LeX-FLUX在颜色(+3.18%)、位置(+4.45%)和字体准确性(+3.81%)方面均优于基线模型。我们的代码、模型、数据集和演示均已公开。