Wild Bootstrap Inference for Non-Negative Matrix Factorization with Random Effects

Non-negative matrix factorization (NMF) is widely used for parts-based representations, yet formal inference for covariate effects is rarely available when the basis is learned under non-negativity. We introduce non-negative matrix factorization with random effects (NMF-RE), a mean-structure latent-variable model $Y=X(ΘA+U)+\mathcal{E}$ that combines covariate-driven scores with unit-specific deviations. Random effects act as a working device for modeling heterogeneity and controlling complexity; we monitor their effective degrees of freedom and enforce a df-based cap to prevent near-saturated fits. Estimation alternates closed-form ridge (BLUP-like) updates for $U$ with multiplicative non-negative updates for $X$ and $Θ$. For inference on $Θ$, we condition on $(\widehat X,\widehat U)$ and obtain fast uncertainty quantification via asymptotic linearization, a one-step Newton update, and a multiplier (wild) bootstrap; this avoids repeated constrained re-optimization. Simulations include a targeted stress test showing that, without df control, the random-effects penalty can collapse and inference for $Θ$ becomes degenerate, whereas the df-cap prevents this failure mode. The non-negativity constraint induces sparse, parts-based loadings -- a measurement-side variable selection -- while inference on $Θ$ identifies which covariates affect which components, providing covariate-side selection. Longitudinal, psychometric, spatial-flow, and text examples further illustrate stable, interpretable covariate-effect inference.

翻译：非负矩阵分解（NMF）被广泛用于基于部分的表示，然而，在非负性约束下学习基矩阵时，关于协变量效应的正式推断却鲜有研究。我们提出了带随机效应的非负矩阵分解（NMF-RE），这是一个均值结构潜变量模型 $Y=X(ΘA+U)+\mathcal{E}$，它将协变量驱动的得分与单元特异性偏差相结合。随机效应作为一种建模异质性和控制复杂性的工作装置；我们监控其有效自由度，并强制执行基于自由度的上限以防止接近饱和的拟合。估计过程交替进行：对 $U$ 采用闭式岭（类BLUP）更新，对 $X$ 和 $Θ$ 采用乘性非负更新。对于 $Θ$ 的推断，我们在 $(\widehat X,\widehat U)$ 条件下，通过渐近线性化、一步牛顿更新和乘数（野）自助法获得快速的不确定性量化；这避免了重复的约束重新优化。模拟研究包括一项针对性压力测试，结果表明，若无自由度控制，随机效应惩罚可能崩溃，导致对 $Θ$ 的推断退化，而自由度上限可防止这种失效模式。非负性约束诱导了稀疏的、基于部分的载荷——这是一种测量侧的变量选择——而对 $Θ$ 的推断则识别了哪些协变量影响哪些成分，提供了协变量侧的选择。纵向数据、心理测量学、空间流和文本示例进一步说明了稳定、可解释的协变量效应推断。