Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.
翻译:针对布尔数据上非负矩阵分解(NMF)的可解释性问题,布尔矩阵分解(BMF)采用布尔代数将输入分解为低秩布尔因子矩阵。这些矩阵具有高度可解释性且实际应用价值显著,但求解这一NP困难组合优化问题需承担高昂计算代价。为降低计算负担,我们提出采用一种新型弹性二元正则化器对BMF进行连续松弛,并据此推导出近端梯度算法。通过大量实验验证,该方法在实际场景中表现优异:在合成数据上,算法快速收敛、精确恢复真实值并准确估计模拟秩;在真实数据上,我们在召回率、损失函数和运行时间方面均超越现有最优方法。医学领域案例研究证实,我们的结果易于解释且具有语义意义。