Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image
翻译:文本到图像生成技术因其在各行业广泛而多样化的应用,近年来吸引了研究人员和实践者的极大兴趣。尽管视觉与语言研究领域已取得进展,但现有文献仍相对有限,尤其是在该领域的进展与应用方面。本文探讨了多模态应用中一个相关的研究方向,包括文本、视觉、音频及其他模态。除本文讨论的研究外,我们还将持续在 https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image 更新最新的相关论文、数据集、应用项目及相应信息。