This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.
翻译:本文提出了一种双凸改进的凸双聚类方法,旨在提升其在高维场景下的性能。与预先丢弃部分噪声特征的启发式方法不同,我们的方法在发现双聚类的过程中联合学习并相应加权信息特征。此外,该方法具有数据自适应性,并配备基于邻近交替最小化的高效算法,附有超参数调优的详细指导及优化子问题的高效求解方案。这些贡献具有理论基础:我们建立了次高斯误差下目标函数的有限样本界,并将这些保证推广至输入亲和度不必均匀的情形。大量仿真结果表明,我们的方法在适当加权和选择特征的同时持续恢复潜在双聚类,性能优于同类方法。对淋巴瘤样本基因微阵列数据集的应用,成功恢复了与底层分类匹配的双聚类,并通过列分组与拟合权重为mRNA样本提供了额外解释。