Boolean matrix factorization (BMF) is a fundamental tool for analyzing binary data and discovering latent information hidden in the data. Formal Concept Analysis (FCA) provides us with an essential insight into BMF and the design of algorithms. Due to FCA, we have the GreCon and GreCon2 algorithms providing high-quality factorizations at the cost of high memory consumption and long running times. In this paper, we introduce GreCon3, a substantial revision of these algorithms, significantly improving both computational efficiency and memory usage. These improvements are achieved with a novel space-efficient data structure that tracks unprocessed data. Further, a novel strategy incrementally initializing this data structure is proposed. This strategy reduces memory consumption and omits data irrelevant to the remainder of the computation. Moreover, we show that the first factors can be discovered with less effort. Since the first factors tend to describe large portions of the data, this optimization, along with others, significantly contributes to the overall improvement of the algorithm's performance. An experimental evaluation shows that GreCon3 substantially outperforms its predecessor GreCon2. The proposed algorithm thus advances the state of the art in BMF based on FCA and enables efficient factorization of datasets previously infeasible for the GreCon algorithm.
翻译:布尔矩阵分解(BMF)是分析二进制数据及发掘数据中隐含潜在信息的基础工具。形式概念分析(FCA)为BMF及算法设计提供了关键理论视角。基于FCA,GreCon与GreCon2算法能够提供高质量分解,但代价是高昂的内存消耗与较长的运行时间。本文提出GreCon3算法,作为对前述算法的重大改进版本,显著提升了计算效率并降低了内存占用。这些改进通过一种新颖的、用于追踪未处理数据的空间高效数据结构实现。此外,本文提出一种增量式初始化该数据结构的创新策略,该策略不仅降低了内存消耗,还剔除了与后续计算无关的数据。进一步地,我们证明了首轮因子的发现可消耗更少计算资源。由于初始因子往往能描述数据的较大部分,此项优化与其他改进共同显著提升了算法的整体性能。实验评估表明,GreCon3在性能上大幅超越其前代算法GreCon2。因此,所提算法推动了基于FCA的BMF技术发展,使得对以往GreCon算法难以处理的数据集进行高效分解成为可能。