We study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures.
翻译:我们研究无标签有限混合模型中的成分恢复和混合矩阵估计问题,其中观测分布共享相同的潜在成分但混合权重未知。核心识别信号是边际独立性:每个成分假定至少在一个坐标对上是独立的,但未观测到标签、纯净成分样本或混合权重。首先证明乘积成分的结构性结果:在单变量边际线性独立条件下,成分的任何独立仿射组合必须与单一成分重合。随后将该原理推广至可观测混合模型,表明在满秩与无抵消条件下,边际独立的仿射组合可恢复对应的潜在成分。当每个成分在某个坐标对上独立时,所有成分可识别,且在所述完备条件下混合矩阵可恢复。最后,提出基于可观测混合仿射组合的乘积边际最大均值差异(PM-MMD)估计量,并证明其在近似边际独立条件下的一致收敛性与稳定性。该框架同时分离了各假设的经验角色:一般而言,不可约性无法仅从无标签混合直接检验,而边际独立性可通过留出PM-MMD提供候选级诊断。受控实验与流式细胞术实验展示了边际独立性何时提供有效的恢复信号。在报告的多成分对比中,条件感知的代表性选择可稳定PM-MMD,并相比使用相同无标签混合的聚类、因子分解及成对混合比例基线方法,改进恢复性能。