Recently, patch-wise contrastive learning is drawing attention for the image translation by exploring the semantic correspondence between the input and output images. To further explore the patch-wise topology for high-level semantic understanding, here we exploit the graph neural network to capture the topology-aware features. Specifically, we construct the graph based on the patch-wise similarity from a pretrained encoder, whose adjacency matrix is shared to enhance the consistency of patch-wise relation between the input and the output. Then, we obtain the node feature from the graph neural network, and enhance the correspondence between the nodes by increasing mutual information using the contrastive loss. In order to capture the hierarchical semantic structure, we further propose the graph pooling. Experimental results demonstrate the state-of-art results for the image translation thanks to the semantic encoding by the constructed graphs.
翻译:最近,基于补丁的对比学习通过探索输入与输出图像之间的语义对应关系,引起了图像翻译领域的关注。为进一步挖掘补丁拓扑结构以实现高层语义理解,本文利用图神经网络捕获拓扑感知特征。具体而言,我们基于预训练编码器输出的补丁相似性构建图,其邻接矩阵在输入与输出之间共享,以增强补丁关系的语义一致性。随后,通过图神经网络获取节点特征,并利用对比损失函数增大互信息以增强节点间的语义对应。为捕获层级语义结构,我们进一步提出图池化操作。实验结果表明,所构建图的语义编码能力在图像翻译任务上达到了当前最优性能。