Understanding identifiability of latent content and style variables from unaligned multi-domain data is essential for tasks such as domain translation and data generation. Existing works on content-style identification were often developed under somewhat stringent conditions, e.g., that all latent components are mutually independent and that the dimensions of the content and style variables are known. We introduce a new analytical framework via cross-domain \textit{latent distribution matching} (LDM), which establishes content-style identifiability under substantially more relaxed conditions. Specifically, we show that restrictive assumptions such as component-wise independence of the latent variables can be removed. Most notably, we prove that prior knowledge of the content and style dimensions is not necessary for ensuring identifiability, if sparsity constraints are properly imposed onto the learned latent representations. Bypassing the knowledge of the exact latent dimension has been a longstanding aspiration in unsupervised representation learning -- our analysis is the first to underpin its theoretical and practical viability. On the implementation side, we recast the LDM formulation into a regularized multi-domain GAN loss with coupled latent variables. We show that the reformulation is equivalent to LDM under mild conditions -- yet requiring considerably less computational resource. Experiments corroborate with our theoretical claims.
翻译:从不对齐的多领域数据中辨识潜在内容与风格变量,对于领域转换和数据生成等任务至关重要。现有关于内容-风格辨识的研究往往基于较为严格的条件,例如要求所有潜在分量相互独立,且已知内容与风格变量的维度。本文通过跨域**潜在分布匹配**(LDM)提出一种新的分析框架,在显著更宽松的条件下建立了内容-风格可辨识性理论。具体而言,我们证明了可以移除潜在变量各分量相互独立等限制性假设。尤为重要的是,我们证明:若对学习到的潜在表征施加适当的稀疏性约束,则无需预先知晓内容与风格维度即可保证可辨识性。绕过对精确潜在维度知识的依赖一直是无监督表征学习领域的长期追求——我们的分析首次从理论与实证层面论证了其可行性。在实现方面,我们将LDM框架转化为具有耦合潜在变量的正则化多领域GAN损失函数,并证明在温和条件下该转化与LDM等价,且所需计算资源显著减少。实验结果验证了我们的理论主张。