Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of a given sample respective to its domain. Existing works in visual style representation literature have tried to disentangle style from content during training explicitly. A complete separation between these has yet to be fully achieved. Our paper aims to learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We use Neural Style Transfer (NST) to measure and drive the learning signal and achieve state-of-the-art representation learning on explicitly disentangled metrics. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics, encoding far less semantic information and achieving state-of-the-art accuracy in downstream multimodal applications.
翻译:表示学习旨在以紧凑且描述性的形式发现领域中各个显著特征,从而强烈识别给定样本相对于其领域的独特特性。现有视觉风格表示学习研究在训练过程中尝试明确分离风格与内容,但二者之间的完全分离尚未实现。本文旨在学习一种与图像中语义内容更强烈解耦的视觉艺术风格表示。我们利用神经风格迁移(NST)来度量并驱动学习信号,在显式解耦指标上取得了最先进的表示学习效果。研究表明,强力解决风格与内容的解耦问题可大幅提升风格特定指标,编码更少的语义信息,并在下游多模态应用中达到最先进的准确率。