Nonlinear independent component analysis (ICA) aims to uncover the true latent sources from their observable nonlinear mixtures. Despite its significance, the identifiability of nonlinear ICA is known to be impossible without additional assumptions. Recent advances have proposed conditions on the connective structure from sources to observed variables, known as Structural Sparsity, to achieve identifiability in an unsupervised manner. However, the sparsity constraint may not hold universally for all sources in practice. Furthermore, the assumptions of bijectivity of the mixing process and independence among all sources, which arise from the setting of ICA, may also be violated in many real-world scenarios. To address these limitations and generalize nonlinear ICA, we propose a set of new identifiability results in the general settings of undercompleteness, partial sparsity and source dependence, and flexible grouping structures. Specifically, we prove identifiability when there are more observed variables than sources (undercomplete), and when certain sparsity and/or source independence assumptions are not met for some changing sources. Moreover, we show that even in cases with flexible grouping structures (e.g., part of the sources can be divided into irreducible independent groups with various sizes), appropriate identifiability results can also be established. Theoretical claims are supported empirically on both synthetic and real-world datasets.
翻译:非线性独立成分分析旨在从可观测的非线性混合中揭示真实的潜在源变量。尽管其重要性显著,但已知非线性ICA在未添加额外假设的情况下无法实现可辨识性。最近的研究提出,基于从源变量到观测变量的连接结构条件(即结构稀疏性),可在无监督方式下实现可辨识性。然而,稀疏性约束在实践中可能并非对所有源变量普遍适用。此外,ICA框架中混合过程的双射性假设及所有源变量间的独立性假设,在众多实际场景中也可能被违背。为解决这些局限并泛化非线性ICA,我们提出一组新的可辨识性结果,涵盖欠完备性、部分稀疏性、源依赖性及灵活分组结构等通用设定。具体而言,我们证明了当观测变量多于源变量(欠完备)时,以及当特定稀疏性和/或源独立性假设对某些变化源变量不成立时的可辨识性。此外,我们表明即使在具有灵活分组结构(例如部分源变量可分解为不同大小的不可约独立组)的情况下,仍可建立适当的可辨识性结果。理论主张在合成数据集与真实数据集上均获得了实证支持。