Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.
翻译:二元数据分解是常见任务,但实值方法忽略了离散性且产生难以解释的因子。布尔矩阵分解通过逻辑与和逻辑或运算将二元矩阵分解为两个低秩二元矩阵,将数据表示为可解释模式的布尔析取形式。在癌症基因组学中,布尔矩阵分解能揭示驱动肿瘤演化的协同特征变化,这与旋转分解或加法分解不同。现有大多数布尔矩阵分解方法基于启发式贪婪算法,对初始值敏感,易陷入局部最优,且不支持原则性模型选择或不确定性量化。我们提出贝叶斯布尔矩阵分解,这是一个具有稀疏诱导先验的完全共轭生成模型。该模型强制执行布尔约束,产生具有连贯不确定性量化的可解释潜因子,并支持具有闭合形式全条件分布的吉布斯采样。由于癌症演化常涉及广泛近同步的染色体数目变化(如全基因组复制后伴随不稳定性和选择),布尔分解比加法模型更自然地捕获这些模式。应用于多发性骨髓瘤的臂水平拷贝数改变数据(其中条目表示染色体臂扩增的存在/缺失),贝叶斯布尔矩阵分解找到一组可解释的小型双团簇,将患者子集与反复共变的染色体臂关联起来,提供肿瘤异质性的紧凑且生物学有意义的总结,展示了贝叶斯布尔矩阵分解在揭示复杂二元数据中离散潜结构方面的实用性。