PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

Haojun Chen,Haoyang He,Chengming Xu,Qingdong He,Junwei Zhu,Yabiao Wang,Zhucun Xue,Xianfang Zeng,Zhennan Chen,Xiaobin Hu,Hao Zhao,Yong Liu,Jiangning Zhang,Dacheng Tao

from arxiv, Project page is available at https://haojunchen663.github.io/projects/PixVerve/

Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the extreme desire for better visual experience and the rapid development of imaging technology, the demand for Ultra-High-Resolution (UHR) image generation has grown significantly. However, UHR image generation poses great challenges due to the scarcity and complexity of high-resolution content. In this paper, we first introduce PixVerve-95K, a high-quality, open-source UHR T2I dataset curated with a carefully designed data pipeline, which contains 95K images across diverse scenarios (each image has a minimum pixel-count of 100M) and seven-dimensional annotations. Based on our large-scale image-text dataset, we take a pioneering step to extend various T2I foundation models to native 100MP generation with three training schemes. Finally, leveraging both conventional metrics and multimodal large language model-based assessments, our proposed PixVerve-Bench benchmark establishes a comprehensive evaluation protocol for UHR images encompassing visual quality and semantic alignment. Extensive experimental results on our benchmark and the constructive exploration of training strategies collaboratively provide valuable insights for future breakthroughs.

翻译：文本到图像（T2I）模型近期在1K和2K分辨率上取得了显著进展。随着对更优视觉体验的强烈需求以及成像技术的快速发展，超高清（UHR）图像生成的需求大幅增长。然而，由于高分辨率内容的稀缺性和复杂性，UHR图像生成面临巨大挑战。本文首先介绍了PixVerve-95K，这是一个通过精心设计的数据管道构建的高质量、开源UHR T2I数据集，包含涵盖多样场景的95K张图像（每张图像的最小像素数达100M）以及七维标注。基于我们的大规模图像-文本数据集，我们率先迈出一步，通过三种训练方案将多种T2I基础模型扩展到原生100MP生成。最后，结合传统指标与基于多模态大语言模型的评估，我们提出的PixVerve-Bench基准建立了一套涵盖视觉质量和语义对齐的UHR图像综合评估协议。在基准上的大量实验结果以及对训练策略的建设性探索，共同为未来突破提供了宝贵见解。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR2025】先获取后适配：挖掘文本‑图像生成模型在图像复原中的潜力

专知会员服务

11+阅读 · 2025年4月22日

IMAGINE-E：最先进文本到图像模型的图像生成智能评估

专知会员服务

13+阅读 · 2025年2月3日

文本到图像合成：十年回顾

专知会员服务

31+阅读 · 2024年11月26日

【CVPR2024】用于文本到图像生成的判别性探测和调整

专知会员服务

15+阅读 · 2024年3月11日