The success of modern machine learning, particularly in facial translation networks, is highly dependent on the availability of high-quality, paired, large-scale datasets. However, acquiring sufficient data is often challenging and costly. Inspired by the recent success of diffusion models in high-quality image synthesis and advancements in Large Language Models (LLMs), we propose a novel framework called LLM-assisted Paired Image Generation (LaPIG). This framework enables the construction of comprehensive, high-quality paired visible and thermal images using captions generated by LLMs. Our method encompasses three parts: visible image synthesis with ArcFace embedding, thermal image translation using Latent Diffusion Models (LDMs), and caption generation with LLMs. Our approach not only generates multi-view paired visible and thermal images to increase data diversity but also produces high-quality paired data while maintaining their identity information. We evaluate our method on public datasets by comparing it with existing methods, demonstrating the superiority of LaPIG.
翻译:现代机器学习的成功,尤其是在人脸转换网络领域,高度依赖于高质量、配对的大规模数据集的可用性。然而,获取充足的数据通常具有挑战性且成本高昂。受扩散模型在高质量图像合成方面的近期成功以及大语言模型(LLMs)进展的启发,我们提出了一种新颖的框架,称为LLM辅助的配对图像生成(LaPIG)。该框架能够利用LLMs生成的描述,构建全面、高质量的配对可见光与热成像图像。我们的方法包含三个部分:使用ArcFace嵌入的可见光图像合成、利用潜在扩散模型(LDMs)的热成像图像转换,以及使用LLMs生成描述。我们的方法不仅生成多视角的配对可见光与热成像图像以增加数据多样性,还能在保持身份信息的同时生成高质量的配对数据。我们在公开数据集上通过与其他现有方法进行比较来评估我们的方法,证明了LaPIG的优越性。