Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

We study post-training quantization (PTQ) of Ideogram 4.0, a 9.3B flow-matching diffusion transformer (DiT) that realizes classifier-free guidance with two separate-weight copies of a single-stream backbone and is conditioned by a Qwen3-VL text encoder, targeting Ampere RTX~3090 GPUs, which lack FP8 tensor cores. Because Ideogram~4.0 is trained on structured JSON captions, we evaluate every variant under schema-valid JSON prompts produced by an LLM expander built to Ideogram's published caption specification, and score them with a battery spanning human-preference (HPSv2), CLIP, and PickScore for standalone quality; PP-OCR exact-match and edit distance for text; and PSNR/SSIM/LPIPS for fidelity to the FP8 reference (the highest-precision public checkpoint) output. On a 300-prompt benchmark with paired bootstrap confidence intervals, an INT8 W8A8 recipe (per-channel weights, per-token dynamic activations, SmoothQuant, and bf16 protection of a small high-fragility layer set) is statistically indistinguishable from FP8 on CLIP and PickScore (paired CIs include zero) and within ~0.004 HPSv2, and, at its 8-bit size, is the most faithful reproduction of the FP8 output (LPIPS 0.243 vs 0.277/0.306 for the half-size 4-bit baselines; the INT8-Q4_K gap excludes zero). A GGUF Q4_K quantization reaches the same standalone quality as the published NF4 baseline at the same on-disk size, making it the Pareto choice on the quality-memory frontier. We further show that under JSON prompts all four variants reach parity on standalone quality, the variants separate on fidelity and text rendering, not on aggregate image-quality scores, and that text legibility, near-zero when the model is prompted with raw strings, reaches 55% OCR exact-match under the JSON captions it expects. We release the INT8 W8A8 and GGUF Q4_K quantized weights on Hugging Face under a gated, non-commercial license.

翻译：我们研究了Ideogram 4.0的后训练量化（PTQ），该模型是一个93亿参数的流匹配扩散变压器（DiT），通过单流骨干的两个独立权重副本来实现无分类器指导，并由Qwen3-VL文本编码器提供条件，目标平台为缺乏FP8张量核心的Ampere RTX 3090 GPU。由于Ideogram 4.0在结构化JSON标题上训练，我们使用符合Ideogram发布标题规范的LLM扩展器生成的模式有效JSON提示评估每个变体，并通过涵盖人类偏好（HPSv2）、CLIP和PickScore的指标集对独立质量进行评分；通过PP-OCR精确匹配和编辑距离评估文本；通过PSNR/SSIM/LPIPS评估相对于FP8参考（最高精度公开检查点）输出的保真度。在包含配对自助置信区间的300提示基准测试中，INT8 W8A8方案（逐通道权重、逐令牌动态激活、SmoothQuant以及对少量高脆弱性层进行bf16保护）在CLIP和PickScore上与FP8统计无显著差异（配对置信区间包含零），在HPSv2上差距约0.004，且在其8比特尺寸下是FP8输出最忠实的复现（LPIPS为0.243，而半大小4比特基线为0.277/0.306；INT8-Q4_K差距排除零）。GGUF Q4_K量化在相同磁盘尺寸下达到与已发布NF4基线相同的独立质量，成为质量-内存前沿上的帕累托最优选择。我们进一步表明，在JSON提示下所有四个变体在独立质量上达到同等水平，变体在保真度和文本渲染上有所区分，而非聚合图像质量分数，且文本可读性（当模型以原始字符串提示时接近零）在其期望的JSON标题下达到55%的OCR精确匹配。我们在Hugging Face上以门控非商业许可发布INT8 W8A8和GGUF Q4_K量化权重。