We consider the densest submatrix problem, which seeks the submatrix of fixed size of a given binary matrix that contains the most nonzero entries. This problem is a natural generalization of fundamental problems in combinatorial optimization, e.g., the densest subgraph, maximum clique, and maximum edge biclique problems, and has wide application the study of complex networks. Much recent research has focused on the development of sufficient conditions for exact solution of the densest submatrix problem via convex relaxation. The vast majority of these sufficient conditions establish identification of the densest submatrix within a graph containing exactly one large dense submatrix hidden by noise. The assumptions of these underlying models are not observed in real-world networks, where the data may correspond to a matrix containing many dense submatrices of varying sizes. We extend and generalize these results to the more realistic setting where the input matrix may contain \emph{many} large dense subgraphs. Specifically, we establish sufficient conditions under which we can expect to solve the densest submatrix problem in polynomial time for random input matrices sampled from a generalization of the stochastic block model. Moreover, we also provide sufficient conditions for perfect recovery under a deterministic adversarial. Numerical experiments involving randomly generated problem instances and real-world collaboration and communication networks are used empirically to verify the theoretical phase-transitions to perfect recovery given by these sufficient conditions.
翻译:我们考虑稠密子矩阵问题,该问题旨在从给定二元矩阵中寻找包含最多非零元素的固定尺寸子矩阵。此问题是组合优化中基本问题(如稠密子图、最大团和最大边双团问题)的自然推广,并在复杂网络研究中具有广泛应用。近期大量研究聚焦于通过凸松弛精确求解稠密子矩阵问题的充分条件建立。绝大多数充分条件针对的是图中仅存在单个被噪声隐藏的大规模稠密子矩阵的识别场景。这些基础模型的假设在现实网络数据中并不成立,实际数据对应的矩阵可能包含多个不同尺寸的稠密子矩阵。我们将这些结果扩展并推广至更现实的设定:输入矩阵可能包含多个大规模稠密子图。具体而言,我们建立了在多项式时间内求解随机输入矩阵稠密子矩阵问题的充分条件,这些随机矩阵采样自随机分块模型的泛化形式。此外,我们还提供了在确定性对抗模型下实现完美恢复的充分条件。通过随机生成问题实例及现实合作与通信网络的数值实验,我们实证验证了这些充分条件所揭示的完美恢复理论相变现象。