Understanding identifiability of latent content and style variables from unaligned multi-domain data is essential for tasks such as domain translation and data generation. Existing works on content-style identification were often developed under somewhat stringent conditions, e.g., that all latent components are mutually independent and that the dimensions of the content and style variables are known. We introduce a new analytical framework via cross-domain \textit{latent distribution matching} (LDM), which establishes content-style identifiability under substantially more relaxed conditions. Specifically, we show that restrictive assumptions such as component-wise independence of the latent variables can be removed. Most notably, we prove that prior knowledge of the content and style dimensions is not necessary for ensuring identifiability, if sparsity constraints are properly imposed onto the learned latent representations. Bypassing the knowledge of the exact latent dimension has been a longstanding aspiration in unsupervised representation learning -- our analysis is the first to underpin its theoretical and practical viability. On the implementation side, we recast the LDM formulation into a regularized multi-domain GAN loss with coupled latent variables. We show that the reformulation is equivalent to LDM under mild conditions -- yet requiring considerably less computational resource. Experiments corroborate with our theoretical claims.
翻译:从未对齐的多域数据中理解潜在内容与风格变量的可辨识性,对于域迁移和数据生成等任务至关重要。现有关于内容-风格辨识的研究通常基于较为严格的条件展开,例如要求所有潜在分量相互独立,且内容与风格变量的维度已知。本文通过跨域\textit{潜在分布匹配}(LDM)提出一种新的分析框架,该框架在显著更宽松的条件下建立了内容-风格可辨识性。具体而言,我们证明可以移除诸如潜在变量分量独立等限制性假设。尤为重要的是,我们证明若对学习到的潜在表示施加适当的稀疏性约束,则无需预先知晓内容与风格维度即可保证可辨识性。绕过对精确潜在维度知识的依赖一直是无监督表示学习领域的长期追求——我们的分析首次为其理论与实际可行性提供了支撑。在实现层面,我们将LDM框架重构为具有耦合潜在变量的正则化多域GAN损失函数,并证明在温和条件下该重构与LDM等价,且所需计算资源显著减少。实验结果验证了我们的理论主张。