YaART: Yet Another ART Rendering Technology

Sergey Kastryulin,Artem Konev,Alexander Shishenya,Eugene Lyapustin,Artem Khurshudov,Alexander Tselousov,Nikita Vinokurov,Denis Kuznedelev,Alexander Markovich,Grigoriy Livshits,Alexey Kirillov,Anastasiia Tabisheva,Liubov Chubarova,Marina Kaminskaia,Alexander Ustyuzhanin,Artemii Shvetsov,Daniil Shlenskii,Valerii Startsev,Dmitrii Kornilov,Mikhail Romanov,Artem Babenko,Sergei Ovcharenko,Valentin Khrulkov

from arxiv, Prompts and additional information are available on the project page, see https://ya.ru/ai/art/paper-yaart-v1

In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models.

翻译：在生成模型快速发展的领域中，开发高效且高保真的文本到图像扩散系统代表着一个重要前沿。本研究介绍了YaART，一种新颖的工业级文本到图像级联扩散模型，该模型通过基于人类反馈的强化学习（RLHF）与人类偏好对齐。在YaART的开发过程中，我们特别关注模型与训练数据集规模的选择，这些方面此前在文本到图像级联扩散模型中尚未得到系统研究。具体而言，我们全面分析了这些选择如何影响训练过程的效率与生成图像的质量，这两者在实际中至关重要。此外，我们证明，基于较小规模的高质量图像数据集训练的模型能够成功与基于更大数据集训练的模型竞争，从而建立了一种更高效的扩散模型训练方案。从质量角度来看，YaART相对于许多现有最先进模型持续获得用户的偏好。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日