In this paper, we introduce CalliffusionV2, a novel system designed to produce natural Chinese calligraphy with flexible multi-modal control. Unlike previous approaches that rely solely on image or text inputs and lack fine-grained control, our system leverages both images to guide generations at fine-grained levels and natural language texts to describe the features of generations. CalliffusionV2 excels at creating a broad range of characters and can quickly learn new styles through a few-shot learning approach. It is also capable of generating non-Chinese characters without prior training. Comprehensive tests confirm that our system produces calligraphy that is both stylistically accurate and recognizable by neural network classifiers and human evaluators.
翻译:本文介绍了一种新颖的系统CalliffusionV2,旨在通过灵活的多模态控制生成自然的中国书法。与以往仅依赖图像或文本输入且缺乏细粒度控制的方法不同,我们的系统同时利用图像在细粒度层面引导生成,并利用自然语言文本来描述生成作品的特征。CalliffusionV2擅长生成广泛的字符,并能通过少样本学习方法快速学习新风格。它还能在未经预先训练的情况下生成非汉字字符。综合测试证实,我们的系统生成的书法作品在风格上准确,并且能够被神经网络分类器和人类评估者准确识别。